• 0 Posts
  • 1 Comment
Joined 10 months ago
cake
Cake day: November 14th, 2023

help-circle
  • It might help to think of RAG as multiple steps (Retrieve, Augment, Generate), all of which you can debug / look at, to see where it might be failing.

    What I would do is look first at the retrieval stage. This is where you are executing a Vector (or Hybrid or whatever) search against your vector store and retrieve a set of documents that match your query. Keep in mind, in Retrieve, you are not sending the vectorized prompt, but more likely the question the user is asking. Take a look at what is coming back and make sure they seem correct. If not, there is probably something wrong here to look at. BTW, I personally prefer to start with 500 tokens with around 50 tokens of overlap between chunks, but that can vary greatly on models, content, etc.

    If that works, I would then look at the “Augment” part which is where you are injecting the results from the Retrieval stage into your prompt. Does it look correct? I doubt this is where the issue is, but worth a look.

    Finally take a look at what comes in the “Generate” stage when you pass this augmented prompt. Does it look different from what you saw previously?