LLM Quiz for beginners (google form)

SatoshiNotMe@alien.top · 1 year ago

LLM Quiz for beginners (google form)

SatoshiNotMe@alien.top · 1 year ago

That was exactly my thought! In Langroid (the agent-oriented LLM framework from ex-CMU/UW-Madison researchers), we call it Relevance Extraction — given a passage and a query, use the LLM to extract only the portions relevant to the query. In a RAG pipeline where you optimistically retrieve top k chunks (to improve recall), the chunks could be large and hence contain irrelevant/distracting text. We concurrently do relevance extraction from these k chunks: https://github.com/langroid/langroid/blob/main/langroid/agent/special/doc\_chat\_agent.py#L801
One thing often missed in this is the un-necessary cost (latency and token-cost) of parroting out verbatim text from context. In Langroid we use a numbering trick to mitigate this: pre-annotate the passage sentences with numbers, and ask the LLM to simply specify the relevant sentence-numbers. We have an elegant implementation of this in our RelevanceExtractorAgent using tools/function-calling.

Here’s a post I wrote about comparing Langroid’s method with LangChain’s naive equivalent of relevance extraction called `LLMChainExtractor.compress` , and no surprise Langroid’s methos is far faster and cheaper:
https://www.reddit.com/r/LocalLLaMA/comments/17k39es/relevance_extraction_in_rag_pipelines/

If I had the time, the next steps would have been: 1. give it a fancy name, 2. post on arxiv with a bunch of experiments, but I’d rather get on with building 😄

SatoshiNotMe@alien.top · 1 year ago

You mean we don’t need to use llama-cpp-Python anymore to serve this at an OAI-like endpoint?

SatoshiNotMe@alien.top · 1 year ago

A bit related. I think all the tools mentioned here are for using an existing UI.

But what if you wanted to easily roll your own, preferably in Python. I know of some options:

StreamLit

Gradio https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks

Panel https://www.anaconda.com/blog/how-to-build-your-own-panel-ai-chatbots

Reflex (formerly Pynecone) https://github.com/reflex-dev/reflex-chat https://news.ycombinator.com/item?id=35136827

Solara https://news.ycombinator.com/item?id=38196008 https://github.com/widgetti/wanderlust

I like streamlit (simple but not very versatile) And reflex seems to have a richer set of features.

My questions - Which of these do people like to use the most? Or are the tools mentioned by OP also good for rolling your own UI on top of your own software ?

SatoshiNotMe@alien.top · 1 year ago

OpenAI Assistants API in an Agent Framework

SatoshiNotMe@alien.top · 1 year ago

Langroid has a DocChatAgent, you can see an example script here -

https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat.py

Every generated answer is accompanied by Source (doc link or local path), and Extract (the first few and last few words of the reference — I avoid quoting the whole sentence to save on token costs).

There are other variants of RAG scripts in that same folder, like multi-agent RAG (doc-chat-2.py) where you have one master agent delegating smaller questions to a retrieval agent and asking it in different ways if it can’t answer etc. There’s also a doc-chat-multi-llm.py where you can have the master agent powered by GPT4 and the RAG agent powered by a local LLM (because after all it only needs to do extraction and summarization).

SatoshiNotMe@alien.top · 1 year ago

> intuitively it seems like you might be able to avoid calling a model at all b/c shouldn’t the relevant sentences just be closer to the search

Not really, as I mention in my reply to u/jsfour above: Embeddings will give you similarity to the query, whereas an LLM can identify relevance to answering a query. Specifically, embeddings won’t be able to find cross-references (e.g. Giraffes are tall. They eat mostly leaves), and won’t be able to zoom in on answers -- e.g. the President Biden question I mention there.

SatoshiNotMe@alien.top · 1 year ago

Here is the comparison for that specific example.

https://preview.redd.it/60yx347rkexb1.png?width=1126&format=png&auto=webp&s=9aeb12c48a85aee87c51ec94373afb9782cce200

SatoshiNotMe@alien.top · 1 year ago

Relevance Extraction in RAG pipelines