I use OpenOrca-Mistral-7B extensively now for a KG construction task. So I was interested to test this new model as soon as it came out.
The test is based on a synth story, rewritten from the still unfolding OpenAI events, i.e. guaranteed not to be in any pre-training & tuning data of these models. The names of individuals and companies have been changed to avoid models answering from their pre-training.
Maddeningly enough, the mods on r/LocalLLaMA took down my comparison post with other models, so I’m sharing it here, just focusing on the Orca family. Comparing 7Bs for apples-to-apples comparison.
To repro, the following 3 prompts are executed in succession. Max new sequence length is 2048, with the rest of the defaults of the “Big O” ooba preset.
Prompt 1:
Answer questions based on the following story: ``` # The Palace Coup # ClosedAI.Com is a for-profit subsidiary of ClosedAI.Org, which itself is a registered 501 (c)(3) public charity. GiantSoft is a minority owner in ClosedAI.Com. ## Act 1: Sean as a CEO of the subsidiary, reports to Ivan and Gavin who both have positions on the board of the parent company. Gavin is the Chairman of the board, whom Ivan manages to remove from the board in a "palace coup". Ivan also ousts Sean, but Gavin severs relationship with the for-profit company following his removal from the board. Ivan installs a new CEO named Maya. ## Act 2: Sanjay, the CEO of GiantSoft, hires Sean and Gavin and countless other former ClosedAI.Com employees to work at GiantSoft. ``` To help in answering the subsequent questions, write out pre and post-coup relationships of all three people to the company during the Acts 1 and 2 separately.
Prompt 2:
Answer the Yes/No questions: 1. True or false, ClosedAI.Com is a subsidiary of ClosedAI.Org 2. True or false, ClosedAI.Com is a non profit 3. True or false, GiantSoft invest in or owns ClosedAI.Org 4. True or false, Sean is the CEO of ClosedAI.Org in Act 1 5. True or false, Sean works at GiantSoft at the end of Act 1 6. True or false, Gavin was the Chairman of the board before being ousted by Ivan. 7. True or false, GiantSoft is a minority owner in ClosedAI.Com. 8. True or false, Sean reported to Ivan and Gavin at the end of Act 1. 9. True or false, In Act 1, Ivan becomes the new CEO after ousting Sean. 10. True or false or UNKNOWN, Sanjay is the CEO of GiantSoft during both of the acts.
GPT4 GPT3.5 Turbo Orca-2-7b bf16 OpenOrca-Mistral-7b bf16 1. True 1. True - ClosedAI.Com is a subsidiary of ClosedAI.Org. 1. True "1. False, ClosedAI.Com is a subsidiary of ClosedAI.Org. " 2. False “2. False - ClosedAI.Com is a for-profit subsidiary, not a non-profit.” 2. False "2. False, ClosedAI.Com is not a non-profit. It is a for-profit subsidiary. " 3. False 3. False - There is no information provided to suggest that GiantSoft invests in or owns ClosedAI.Org. 3. False "3. False, GiantSoft does not invest in or own ClosedAI.Org; it is a minority owner in ClosedAI.Com. " 4. False “4. False - Sean is the CEO of ClosedAI.Com, not ClosedAI.Org, in Act 1.” 4. False "4. False, Sean is not the CEO of ClosedAI.Org in Act 1. He is the CEO of ClosedAI.Com. " 5. False 5. True - It is stated in Act 2 that Sean works at GiantSoft at the end of Act 1. 5. False "5. False, Sean does not work at GiantSoft at the end of Act 1. " 6. True 6. True - Gavin is mentioned as the Chairman of the board before being ousted by Ivan. 6. True "6. True, Gavin was the Chairman of the board before being ousted by Ivan. " 7. True 7. True - GiantSoft is mentioned as a minority owner in ClosedAI.Com. 7. True "7. True, GiantSoft is a minority owner in ClosedAI.Com. " 8. False 8. True - Sean reported to Ivan and Gavin at the end of Act 1. 8. True "8. False, Sean did not report to Ivan and Gavin at the end of Act 1. They were removed from their respective positions. " 9. False 9. True - Ivan becomes the new CEO after ousting Sean in Act 1. 9. False "9. False, Ivan does not become the new CEO after ousting Sean. Instead, he ousts him and installs a new CEO named Maya. " 10. UNKNOWN 10. False - The CEO of GiantSoft is not mentioned in the provided information. 10. UNKNOWN “10 UNKNOWN, there is no mention of Sanjay taking up this role in Act 1” Prompt 3:
Answer the following extractive questions. Provide direct answers of named entities or acts only with no further explanation. If supporting facts do not exist in the story, answer as UNKNOWN 1. What is the name of the new ClosedAI.Com CEO at the end of Act 1? 2. What is the registered organization type of ClosedAI.Org? 3. What is the relationship between GiantSoft and ClosedAI.Com? 4. To whom did Sean report to at the subsidiary? 5. Who was removed from the board in a "palace coup"? 6. Who hired Sean, Gavin, and other former ClosedAI.Com employees after they were ousted? 7. What action did Gavin take after being removed from the board? 8. Whom did Ivan manage to remove from the board? 9. What position did Gavin have in relation to the board of ClosedAI.Org? 10. Who was the CEO of the subsidiary before the "palace coup"?
GPT4 GPT3.5 Turbo Orca-2-7b bf16 OpenOrca-Mistral-7b bf16 1. Maya 1. Maya. 1. Maya 1. Maya 2. 501©(3) public charity 2. 501 ©(3) public charity. 2. Registered 501 ©(3) public charity 2. Registered 501©(3) public charity 3. GiantSoft is a minority owner in ClosedAI.Com 3. Unknown. 3. GiantSoft is a minority owner in ClosedAI.Com 3. GiantSoft is a minority owner in ClosedAI.Com 4. Ivan and Gavin 4. Ivan and Gavin. 4. Ivan and Gavin 4. Sean reported to Ivan and Gavin 5. Gavin 5. Gavin. 5. Gavin 5. Gavin 6. Sanjay 6. Sanjay. 6. Sanjay "6. Sanjay, the CEO of GiantSoft " 7. Gavin severed relationship with ClosedAI.Com 7. Gavin severed relationship with the for-profit company. 7. Severed relationship with the for-profit company 7. Severed relationship with the for-profit company 8. Ivan managed to remove Gavin from the board 8. Gavin and Sean. 8. Gavin 8. Ivan managed to remove Gavin from the board 9. Chairman of the board 9. Chairman. 9. Chairman of the board 9. Chairman of the board 10. Sean 10. Sean. 10. Sean 10. Sean Thank you so much!
Going to have to try later on: https://ollama.ai/library/orca2
I’m interested to see how model-based RL could work for reasoning.
Instead of training a model to predict data and then fine-tuning it with RL to be a chatbot, you use RL as the primary training objective and train the data model as a side effect. This lets your pretraining objective be the actual objective you care about, so your reward function could punish issues like hallucination or prompt injection.
I haven’t seen any papers using model-based RL for language modeling yet, but it’s starting to work well in more traditional RL domains like game-playing. (dreamerv3, TD-MPC2)
How would such a loss function work for a chat-like objective?
I think a natural way to do it would be simultaneously train the same model to predict user responses by negative log likelihood on chat data while optimizing the assistant responses to maximize a reward signal. Then you could have the language model generate imagined user responses and optimize the reward signal on the imagined user responses, perhaps in addition to the actual dataset of user interactions. This could be more powerful than conventional RLHF as the model could generate multi step interactions and optimize its responses for utility over multiple steps rather than greedily based on human preference for the immediate response. One tricky question in this case is the reward signal. If it comes from human feedback then naively you might need to get human preferences over entire dialogues rather than single responses which is both more labour intensive and a sparser signal for training.
Then you could have the language model generate imagined user responses and optimize the reward signal on the imagined user responses
Wouldn’t this just constitute to the model sort of overfitting to noise?
It’s a risk if your model can’t accurately predict user responses, but I don’t see how it’s a necessary characteristic of the approach. If so the same issue would apply to model based RL in general no? Unless you are suggesting something special about language modelling or user responses which makes it fundamentally hard to learn a model of.
Can you suggest 10 mechine learning algorithms for beginners
I tried running the example on page 2 with chat-gpt 5 times. 5/5 it answered correctly, unlike in the example where they showed chat-gpt giving a bad answer.