• 0 Posts
  • 3 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle
  • Yes. We are currently planning AI project and implementations in my company. We are handling sensitive data which requires us to do it local. And we want to setup a team and establish compentence and more experience in ML/LLMs. For our current use cases we don’t need a “super intelligence” or “the best” LLM on the market ! RAG with smaller models is totally fine and sufficient for us.



  • Below is a link to a sample i’ve put together for me recently to create a QA training dataset from source text with llamaindex dataset generator.

    I’ve used Oobabooga with extension “openai” as inference api (with a zephyr 7b model).

    It worked quite well to generate a dataset fully local. One should use a smaller + a larger model in service_context and service_context_large (which i didn’t so far).

    Also you have to change the beginning where it currently only reads in a single file “output/Test.txt”. And maybe change chunk_size and num_questions_per_chunk.

    The output json consists of “input” and “output” (which i did for a mistral model…). For llama based models i would maybe change it to “instruction”, “input” (=empty), “output”, “text” (=text chunk)

    Please keep in mind that this is only a ugly early prototype that needs cleanup etc…

    https://pastebin.com/cjF1eawK