• 0 Posts
  • 3 Comments
Joined 11 months ago
cake
Cake day: October 28th, 2023

help-circle
  • Most riddles rely on out of context prior knowledge to be used as a part of a deductive chain of reasoning. This one is not any different from the question about how many sisters one has that folks in this community use all the time.

    Sure, but they do their best to avoid gaps that make the riddle unsolvable. A riddle like “a girl has as many brothers as sisters, but each brother has half as many brothers as sisters, how many sisters does she have?” has exactly one correct answer.

    But the gap in this one is just big enough it’s a problem. Like you said, replacing chess with a mandatory two-person experience is much better! (Though still open-ended, because there’s no implication they are alone.) The other commenter changed the question to “where are they”, which is also a good improvement!

    I hope this thread wont descend into deliberation on whether it is possible to play the battleship game alone and how much fun it is :)

    Anything to stop the losing streak!


  • Another (less?) open-ended question with the same premise would be “Where are they?” and I expect the answer to be “In a garden”.

    Perhaps there’s a language barrier here, but none of those activities hint to a garden? In my locale, a garden is a small patch used to grow veggies, herbs, and/or flowers. So I would answer this with “their back yard.”

    This is a much better riddle for children IMO, because it’s barely open-ended at all. The original has almost infinite answers without any leaps or tricks, but yours has a very limited domain: a yard/garden. Though if someone were extra clever, the problem space does open back to nearly infinity (if brother 4 is playing a video game).

    Open-ended question are the best for evaluating LLM, because they require common sense/world knowledge/doxa/human like behavior.

    For personal testing, that’s certainly a valid opinion! But it’s not very productive from an objective standpoint because it can’t be graded and tests a “gotcha” path of thinking, when we’re still focusing on fundamentals like uniform context attention, consistency over time, etc.