can someone give me ELI5 version of how can i train ORca2 with my local data files/folders? pretty please.
It’d be interesting to see how an MoE framework of multiple Orca 2s each trained on different subsets of data basically routing your prompt to different orca 2 experts would fair. I feel like that can come extraordinarily close to a GPT 4 in performance metrics but would take decent computing power to test the hypothesis. If each orca 2 expert is 10 billion parameters and you wanted to run a 100 billion sparse orca 2 MoE that’s gonna require at least 500 gig+ of VRAM at minimum.
Important: researcher only, non commercial license.
IANAL, but theoretically, it’s not possible to copyright model weights (at least in the US). While the licensing of large language models hasn’t been specifically tested in court, people have tried and failed with other machine learning models. The alleged copyright holder may refuse to do business with you in the future, but you’re unlikely to face legal repercussions.
Ugh
Wow! Exciting! Are these uncensored models or does the training data include refusals? Does anyone know? What was orca 1?
Do we get the dataset this time?
Given the legal challenges to the use of training data, you’re probably never going to see the public release of training data of a major corporation LLM.
There will be leaks from time to time but no corporation will expose themselves to litigation just help the open source community
Tried the models, the 13B is very slow, the 7B is speedy but a little quirky. It made the plan how to solve the task but didn’t actually proceed in solving the task. It doesn’t have good conversational flair.
We love you TheBloke https://huggingface.co/TheBloke/Orca-2-7B-GGUF
The paper does not explain the real interesting question to me, which is the reasoning strategy and its related system instruction for each sub-tasks, and how did they select the strategy for each clustered sub-task, manually or through some prompts by leveraging openai api.
If they did the main task by hand, then this paper is not insightful and useful at all.
Obvious question (and I’m assuming the answer is We didn’t try it yet): How does this model fare in terms of performance/output?
Progressive Learning: We start with LLaMA-2-7B or LLaMA-2-13B checkpoint and
finetune it on the train split of FLAN-v2 dataset for one epoch. Note that FLAN-v2 dataset
contains both zero-shot and few-shot problems. We then train on 5 million ChatGPT data
from Orca 1 for 3 epochs. Then we train on the combination of 1 million GPT-4 data from
Orca 1 and Orca 2’s 817K data for 4 epochs.