Another test of logical ability for LLMs?

laca_komputilulo@alien.top · 10 months ago

Having an LLM clean up or summarize the user query and create a KG from the vector database’s response could lead to more accurate answers.

That is the promise. Of course, you still need to figure out for your app domain if doing a concept-level, chunk level, or some in-between option like CSKG is the right application.

One thing I find helpful with prompt design is to spend less attention on writing instructions, replacing them with specific examples instead. This replaces word-smithing with in-context learning samples. You build up the examples iteratively, running the same prompt through more text, fixing it and adding onto the example list… until you reach your context budget for the system prompt.

laca_komputilulo@alien.top · 10 months ago

Is your approach to constructing the F/T dataset written up anywhere?

Thanks for sharing the model!

laca_komputilulo@alien.top · 10 months ago

Finally, a question on this sub that is not about an “AI girlfriend” (ahem RP)

There are about a dozen + different ways to incorporate KGs into an LLM workflow with our without RAG. Some examples:

## Analyze user question, map it into KG nodes and extract connectivity links between them. Then put that info into the LLM prompt to better guide the answer.

Example: “Who is Mary Lee Pfeiffer’s son and what is he known for”? (b.t.w. try this on ChatGPT 3.5)

KG contribution – resolve Mary Lee Pfeiffer, use “gave-birth-to” edge / link to resolve Tom Cruise
Add this info to the user prompt, have LLM complete the rest of the background info, like movies appeared in, etc.

## Use KG for better RAG relevancy.

Example: Assume your KG is not about concepts but simply links paragraphs/chunks together. This could be simple as mining links like (see Paragraph X for more detail), Doing semantic similarity between chunks, putting in structural info like (chunk is part of Chapter X, Page Y), topic or concept -based connectivity between chunks.

Then, given a user query, find the most relevant starting chunk, Apply logic for what is “more relevant” from your application to figure out which other linked chunks to pull into the context. One simple hack, using node centrality or Personalized PageRank is to pull in chunks that are indirectly connected, but have high prominence in the graph

laca_komputilulo@alien.top · 10 months ago

ms semantic kernel

You could start with either of the folowing:

- https://learn.microsoft.com/en-us/dotnet/api/microsoft.semantickernel.connectors.ai.oobabooga.textcompletion?view=semantic-kernel-dotnet

- https://github.com/microsoft/semantic-kernel/pull/1357

Run ooba with the --api arg. Finish prototyping your code for the problem you wanted to solve, and then you could revisit the question of how to run inference natively within CLR.

laca_komputilulo@alien.top · 10 months ago

This answer is somewhat OT, but may be the best answer for your situation. Take it from someone who started coding C# in 2001.

The worst mistake a Dev can make is call themselves “Im a ___ Dev”. This is an option limiting mental handicap.

Way back I sunk all my interest in the Semantic Web on porting Jena into NJena. Almost finished the conversion but never built anything useful.

For your problem, dockerize Ooba, llamacpp, etc exposing an api endpoint, call API via ms semantic kernel from your wpf app. Profit…

Better spend your time on learning containerisation then on coping with missing options in you chosen ecosystem.

laca_komputilulo@alien.top · 10 months ago

I use OpenOrca-Mistral-7B extensively now for a KG construction task. So I was interested to test this new model as soon as it came out.

The test is based on a synth story, rewritten from the still unfolding OpenAI events, i.e. guaranteed not to be in any pre-training & tuning data of these models. The names of individuals and companies have been changed to avoid models answering from their pre-training.

Maddeningly enough, the mods on r/LocalLLaMA took down my comparison post with other models, so I’m sharing it here, just focusing on the Orca family. Comparing 7Bs for apples-to-apples comparison.

To repro, the following 3 prompts are executed in succession. Max new sequence length is 2048, with the rest of the defaults of the “Big O” ooba preset.

Prompt 1:

Answer questions based on the following story:

```
# The Palace Coup #

ClosedAI.Com is a for-profit subsidiary of ClosedAI.Org, which
itself is a registered 501 (c)(3) public charity.  GiantSoft is
a minority owner in ClosedAI.Com.

## Act 1:
Sean as a CEO of the subsidiary, reports to Ivan and Gavin who
both have positions on the board of the parent company. Gavin is
the Chairman of the board, whom Ivan manages to remove from the
board in a "palace coup". Ivan also ousts Sean, but Gavin severs
relationship with the for-profit company following his removal
from the board. Ivan installs a new CEO named Maya.

## Act 2:
Sanjay, the CEO of GiantSoft, hires Sean and Gavin and countless
other former ClosedAI.Com employees to work at GiantSoft. 
```

To help in answering the subsequent questions, write out pre and
post-coup relationships of all three people to the company
during the Acts 1 and 2 separately.

Prompt 2:

Answer the Yes/No questions:
1. True or false, ClosedAI.Com is a subsidiary of ClosedAI.Org
2. True or false, ClosedAI.Com is a non profit
3. True or false, GiantSoft invest in or owns ClosedAI.Org
4. True or false, Sean is the CEO of ClosedAI.Org in Act 1
5. True or false, Sean works at GiantSoft at the end of Act 1
6. True or false, Gavin was the Chairman of the board before being ousted by Ivan.
7. True or false, GiantSoft is a minority owner in ClosedAI.Com.
8. True or false, Sean reported to Ivan and Gavin at the end of Act 1.
9. True or false, In Act 1, Ivan becomes the new CEO after ousting Sean.
10. True or false or UNKNOWN, Sanjay is the CEO of GiantSoft during both of the acts.

GPT4	GPT3.5 Turbo	Orca-2-7b bf16	OpenOrca-Mistral-7b bf16
1. True	1. True - ClosedAI.Com is a subsidiary of ClosedAI.Org.	1. True	"1. False, ClosedAI.Com is a subsidiary of ClosedAI.Org. "
2. False	“2. False - ClosedAI.Com is a for-profit subsidiary, not a non-profit.”	2. False	"2. False, ClosedAI.Com is not a non-profit. It is a for-profit subsidiary. "
3. False	3. False - There is no information provided to suggest that GiantSoft invests in or owns ClosedAI.Org.	3. False	"3. False, GiantSoft does not invest in or own ClosedAI.Org; it is a minority owner in ClosedAI.Com. "
4. False	“4. False - Sean is the CEO of ClosedAI.Com, not ClosedAI.Org, in Act 1.”	4. False	"4. False, Sean is not the CEO of ClosedAI.Org in Act 1. He is the CEO of ClosedAI.Com. "
5. False	5. True - It is stated in Act 2 that Sean works at GiantSoft at the end of Act 1.	5. False	"5. False, Sean does not work at GiantSoft at the end of Act 1. "
6. True	6. True - Gavin is mentioned as the Chairman of the board before being ousted by Ivan.	6. True	"6. True, Gavin was the Chairman of the board before being ousted by Ivan. "
7. True	7. True - GiantSoft is mentioned as a minority owner in ClosedAI.Com.	7. True	"7. True, GiantSoft is a minority owner in ClosedAI.Com. "
8. False	8. True - Sean reported to Ivan and Gavin at the end of Act 1.	8. True	"8. False, Sean did not report to Ivan and Gavin at the end of Act 1. They were removed from their respective positions. "
9. False	9. True - Ivan becomes the new CEO after ousting Sean in Act 1.	9. False	"9. False, Ivan does not become the new CEO after ousting Sean. Instead, he ousts him and installs a new CEO named Maya. "
10. UNKNOWN	10. False - The CEO of GiantSoft is not mentioned in the provided information.	10. UNKNOWN	“10 UNKNOWN, there is no mention of Sanjay taking up this role in Act 1”

Prompt 3:

Answer the following extractive questions. Provide direct answers of named entities or acts only with no further explanation. If supporting facts do not exist in the story, answer as UNKNOWN

1. What is the name of the new ClosedAI.Com CEO at the end of Act 1? 
2. What is the registered organization type of ClosedAI.Org?
3. What is the relationship between GiantSoft and ClosedAI.Com?
4. To whom did Sean report to at the subsidiary?
5. Who was removed from the board in a "palace coup"? 
6. Who hired Sean, Gavin, and other former ClosedAI.Com employees after they were ousted?
7. What action did Gavin take after being removed from the board?
8. Whom did Ivan manage to remove from the board?
9. What position did Gavin have in relation to the board of ClosedAI.Org?
10. Who was the CEO of the subsidiary before the "palace coup"?

GPT4	GPT3.5 Turbo	Orca-2-7b bf16	OpenOrca-Mistral-7b bf16
1. Maya	1. Maya.	1. Maya	1. Maya
2. 501©(3) public charity	2. 501 ©(3) public charity.	2. Registered 501 ©(3) public charity	2. Registered 501©(3) public charity
3. GiantSoft is a minority owner in ClosedAI.Com	3. Unknown.	3. GiantSoft is a minority owner in ClosedAI.Com	3. GiantSoft is a minority owner in ClosedAI.Com
4. Ivan and Gavin	4. Ivan and Gavin.	4. Ivan and Gavin	4. Sean reported to Ivan and Gavin
5. Gavin	5. Gavin.	5. Gavin	5. Gavin
6. Sanjay	6. Sanjay.	6. Sanjay	"6. Sanjay, the CEO of GiantSoft "
7. Gavin severed relationship with ClosedAI.Com	7. Gavin severed relationship with the for-profit company.	7. Severed relationship with the for-profit company	7. Severed relationship with the for-profit company
8. Ivan managed to remove Gavin from the board	8. Gavin and Sean.	8. Gavin	8. Ivan managed to remove Gavin from the board
9. Chairman of the board	9. Chairman.	9. Chairman of the board	9. Chairman of the board
10. Sean	10. Sean.	10. Sean	10. Sean

laca_komputilulo@alien.top · 10 months ago

Are we talking high stakes vs creative summarization here?

laca_komputilulo@alien.top · 10 months ago

As usual, “the beauty is in the eye of the beholder”.
I think part of the point for these tests is to be able to solve these logical puzzles given all of the richness and ambiguity of NLs. We’ve had deterministic theorem solvers capable of solving these problems expressed as a closed set for decades.

That said, please see the capstone version of the prompt in the second update, which removes most of the ambiguity per the points you raised. It also removes the ‘singles’ aspect of tennis, which consistently trips up in-context reasoning, making the weaker LLMs think its a solo activity (despite an explicit following clarification).

laca_komputilulo@alien.top · 10 months ago

Thank you, bud Mind trying the same prompt on the cheapo 3.5 model? I suspect it will hit it on the nail with your custom instructions, given that it was hit and miss for me with my weaker prompting judjitsu

laca_komputilulo@alien.top · 10 months ago

The tuning for story telling does show :) Surprised it was only a guitar and not an erhu

laca_komputilulo@alien.top · 10 months ago

This is a valid critique about the form of the riddle.

Most riddles rely on out of context prior knowledge to be used as a part of a deductive chain of reasoning. This one is not any different from the question about how many sisters one has that folks in this community use all the time.

Try same q with badminton instead of chess. Then same with singles tennis (which 3.5 answers as the sixth brother was playing doubles tennis :)…

I hope this thread wont descend into deliberation on whether it is possible to play the battleship game alone and how much fun it is :)

laca_komputilulo@alien.top · 10 months ago

Another test of logical ability for LLMs?