Found this in a children’s book of riddles:

Six brothers were spending their time together.

The first brother was reading a book.
The second brother was playing chess.
The third brother was solving a crossword.
The fourth brother was watering the lawn.
The fifth brother was drawing a picture.

Question: what was the sixth brother doing?

I cant get ChatGPT to answer correctly with the usual tricks, even after hinting to consider one and two-person activities and emphasizing the word “together”.

After a bunch of CoT turns we arrive to a conclusion that this is an open ended question and not a riddle :)

After trying 3 times with fresh promots, I got a correct response once, but when prompted to provide supporting reasoning the model backtracked and started apologizing.

Cant test gpt 4 r/n…

  • laca_komputilulo@alien.topOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    As usual, “the beauty is in the eye of the beholder”.
    I think part of the point for these tests is to be able to solve these logical puzzles given all of the richness and ambiguity of NLs. We’ve had deterministic theorem solvers capable of solving these problems expressed as a closed set for decades.

    That said, please see the capstone version of the prompt in the second update, which removes most of the ambiguity per the points you raised. It also removes the ‘singles’ aspect of tennis, which consistently trips up in-context reasoning, making the weaker LLMs think its a solo activity (despite an explicit following clarification).