Hi all, I posted originally to langchain sub but didn’t get any response yet, could anyone give some pointers, thanks.

Basic workflow for questioning data locally?

Hi all,

I’m using lang chain js, and most examples I find are using openAI but I’m using llama. I managed to get a simple text file embedded and can ask basic questions, but most of the time the model just spits out the prompt.

I’m using just cpu at the moment so it’s very slow but that’s ok. I’m experimenting with loading txt files, csv files etc but clearly it’s not going well, I can ask some very simple question but most of the time it fails.

My understanding is;

  1. Load model
  2. Load data and chunk (csv file for example. I chunk usually with something like 200 and by separators /n
  3. Load embedding (I’m supposed to load llama gguf model right? The same one as in step 1? As a parameter in llamaCppEmbeddings)
  4. Vector store in memory
  5. Create chain and ask question
  6. Console log answer

Is this concept correct and do you have any tips to help me get better results.

Thank you

  • Cotega@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    It might help to think of RAG as multiple steps (Retrieve, Augment, Generate), all of which you can debug / look at, to see where it might be failing.

    What I would do is look first at the retrieval stage. This is where you are executing a Vector (or Hybrid or whatever) search against your vector store and retrieve a set of documents that match your query. Keep in mind, in Retrieve, you are not sending the vectorized prompt, but more likely the question the user is asking. Take a look at what is coming back and make sure they seem correct. If not, there is probably something wrong here to look at. BTW, I personally prefer to start with 500 tokens with around 50 tokens of overlap between chunks, but that can vary greatly on models, content, etc.

    If that works, I would then look at the “Augment” part which is where you are injecting the results from the Retrieval stage into your prompt. Does it look correct? I doubt this is where the issue is, but worth a look.

    Finally take a look at what comes in the “Generate” stage when you pass this augmented prompt. Does it look different from what you saw previously?