I’m exploring techniques to improve memory handling in LLMs without resorting to vector databases like Pinecone. In the scenario of an ongoing conversation of days or weeks in length, previous chats roll off the context window. The idea would be for a conversation manager (could be the LLM prompting itself as space fills up) to allocate space of a pre-set ratio within the context window for storing memories.
2 techniques I’ve thought about:
- Memory hierarchization based on keyword, timestamp, or subjective importance scores
- Text compression via various techniques such as syntactic/semantic shrinking, tokenization, substitution, etc.
Certainly this has been achieved before. Any experience with it?
All you need a 32K LLM. Everything beyond that needs a tool invocation where the archived texts can be pulled from. You’ll have to make your orchestrator smart enough to know that there is content beyond just needs to be invoked