Chunking and storing structured data and vectors for RAG

Smerfj@alien.top · 11 months ago

Chunking and storing structured data and vectors for RAG

AdamDhahabi@alien.top · 11 months ago

Yesterday I tried GPT4All, and it references context by outputting 3 passages from my local documents. I could click on each of them and read the passage. But their implementation is only using some algorithm at the moment. Embedding based on semantic search is still on their roadmap.

https://preview.redd.it/dnoqmk4olazb1.png?width=1807&format=png&auto=webp&s=cdd1f17a2ea20100504c275094e52b61a6e054f7

SatoshiNotMe@alien.top · 11 months ago

Langroid has a DocChatAgent, you can see an example script here -

https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat.py

Every generated answer is accompanied by Source (doc link or local path), and Extract (the first few and last few words of the reference — I avoid quoting the whole sentence to save on token costs).

There are other variants of RAG scripts in that same folder, like multi-agent RAG (doc-chat-2.py) where you have one master agent delegating smaller questions to a retrieval agent and asking it in different ways if it can’t answer etc. There’s also a doc-chat-multi-llm.py where you can have the master agent powered by GPT4 and the RAG agent powered by a local LLM (because after all it only needs to do extraction and summarization).

Hey_You_Asked@alien.top · 11 months ago

RemindMe! 1 week

grumpy_autist@alien.top · 10 months ago

@smerfj - I’m currently researching same problem. You can find some information in LlamaIndex project docs. What you probably need is so called composite index with both vector database and knowledge graph that links particular knowledge bits or text paragraphs together. Alternatively you can try restricting vector search to chunks computed from one particular document.

I suspect that knowlege graphs are “the shit” because you can keep and query really small but highly relevant pieces of data without overflowing LLM context and slowing it down.

Smerfj@alien.top · 10 months ago

Thanks for the pointers. Since my aims are using local models eventually, I’ll take any efficiency I can squeeze out.