I’m currently working on some RAG-based tooling for some non-profits and am having difficulty doing the following things. Wondering what people are using?
- Tracking model performance across experiments and productized pipelines
- changes in test or finetuning data sets
- Changes in chunking strategy
- changes in RAG tooling (e.g. RAG Fusion or RAG-DIT)
- Changes in underlying models and/or finetuning strategies
- Tracking pipeline performance (e.g. speed, throughput, latency, etc.) as we change items laid out above
What products do you use and how do you choose them?
I had so much success with text embeddings and retrieval, I didn’t end up needing to deploy an LLM at work. I do however have a secret Mistral-Trismegistus-7B@Q4_K hosted on a retired cheapo dell optiplex with a tarot card reader system prompt that I share with my teammates 😁