• gebregl@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    We need a name for the fallacy where people call highly nonlinear algorithms with billions of parameters “just statistics”, as if all they’re doing is linear regression.

    ChatGPT isn’t AGI yet, but it is a huge leap in modeling natural language. The fact that there’s some statistics involved explains neither of those two points.

    • psyyduck@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Let’s ask GPT4!

      You’re probably talking about the “fallacy of composition”. This logical fallacy occurs when it’s assumed that what is true for individual parts will also be true for the whole group or system. It’s a mistaken belief that specific attributes of individual components must necessarily be reflected in the larger structure or collection they are part of.

      Here are some clearly flawed examples illustrating the fallacy of composition.

      • Building Strength: Believing that if a single brick can hold a certain amount of weight, a wall made of these bricks can hold the same amount of weight per brick. This ignores the structural integrity and distribution of weight in a wall.
      • Athletic Team: Assuming that a sports team will be unbeatable because it has a few star athletes. This ignores the importance of teamwork, strategy, and the fact that the performance of a team is not just the sum of its individual players’ skills.

      These examples highlight the danger of oversimplifying complex systems or groups by extrapolating from individual components. They show that the interactions and dynamics within a system play a crucial role in determining the overall outcome, and these interactions can’t be understood by just looking at individual parts in isolation.

      • MohKohn@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        How… did it map oversimplification to… holistic thinking??? Saying that it’s “just statistics” is wrong because “just statistics” covers some very complicated models in principle. They weren’t saying that simple subsystems are incapable of generating complex behavior.

        God, why do people think these things are intelligent? I guess people fall for cons all the time…

      • kelkulus@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I dunno. The “fallacy of composition” is just made up of 3 words, and there’s not a lot that you can explain with only three words.

    • Appropriate_Ant_4629@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      We need a name for the fallacy where people call highly nonlinear algorithms with billions of parameters “just statistics”

      Well, thanks to quantum mechanics; pretty much all of existence is probably “just statistics”.

      as if all they’re doing is linear regression.

      Well, practically all interesting statistics are NONlinear regressions. Including ML. And your brain. And physics.

      • KoalaNumber3@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        What a lot of people don’t understand is that linear regression can still handle non-linear relationships.

        For a statistician, linear regression just means the coefficients are linear, it doesn’t mean the relationship itself is a straight line.

        That’s why linear models are still incredibly powerful and are used so widely across so many fields.

        • Appropriate_Ant_4629@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Yet still limited compared to even not-very-deep NNs. If the user wants to fit a parabola with a linear regression, he pretty much has to manually add a quadratic term himself.

          I think they’re widely used primarily because they’re widely taught in school.

    • venustrapsflies@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      It’s not a fallacy at all. It is just statistics, combined with some very useful inductive biases. The fallacy is trying to smuggle some extra magic into the description of what it is.

      Actual AGI would be able to explain something that no human has understood before. We aren’t really close to that at all. Falling back on “___ may not be AGI yet, but…” is a lot like saying “rocket ships may not be FTL yet, but…”

      • TheBlindIdiotGod@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Why would a human level AGI need to be able to explain something that no human has understood before? That sounds more like ASI than AGI.

      • red75prime@alien.top
        cake
        B
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        a lot like saying “rocket ships may not be FTL yet, but…”

        And the human brain is FTL then?

      • InterstitialLove@alien.top
        cake
        B
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        The fallacy is the part where you imply that humans have magic.

        “An LLM is just doing statistics, therefore an LLM can’t match human intellect unless you add pixie dust somewhere.” Clearly the implication is that human intellect involves pixie dust somehow?

        Or maybe, idk, humans are just the result of random evolutionary processes jamming together neurons into a configuration that happens to behave in a way that lets us build steam engines, and there’s no fundamental reason that jamming together perceptrons can’t accomplish the same thing?

        • Basic-Low-323@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          I mean, if your hypothesis is that the human brain is the product of one billion years of evolution ‘searching’ for a configuration of neurons and synapses that is very efficient at sampling the environment, detect any changes, and act accordingly to increase likelihood of survival, and also communicate with other such configurations in order to devise and execute more complicated plans, then that…doesn’t bode very well for current AI architectures, does it? Their training sessions are incredibly weak by comparison, simply learning to predict and interpolate some sparse dataset that some human brains produced.

          If by ‘there’s no fundamental reason we can’t jam together perceptrons this way’ you mean that we can always throw a bunch of them into an ever-changing virtual world, let them mutate and multiply and after some long time fish out the survivors and have them work for us, sure, but we’re talking about A LOT of compute here. Our hope is that we can find some sort of shortcut, because if we truly have to do it like evolution did, it probably won’t happen this side of the millenium.

          • InterstitialLove@alien.top
            cake
            B
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            We don’t currently know exactly why gradient descent works to find powerful, generalizing minima

            But, like, it does

            The minima we can reliably find, in practice, don’t just interpolate the training data. I mean, they do that, but they find compressions which seem to actually represent knowledge, in the sense that they can identify true relationships between concepts which reliably hold outside the training distribution.

            I want to stress, “predict the next token” is what the models are trained to do, it is not what they learn to do. They learn deep representations and learn to deploy those representations in arbitrary contexts. They learn to predict tokens the same way a high-school student learns to fill in scantrons: the scantron is designed so that filling it out requires other more useful skills.

            It’s unclear if gradient descent will continue to work so unreasonably well as we try to push it farther and farther, but so long as the current paradigm holds I don’t see a huge difference between human inference ability and Transformer inference ability. Number of neurons* and amount of training data seem to be the things holding LLMs back. Humans beat LLMs on both counts, but in some ways LLMs seem to outperform biology in terms of what they can learn with a given quantity of neurons/data. As for the “billions of years” issue, that’s why we are using human-generated data, so they can catch up instead of starting from scratch.

            • By “number of neurons” I really mean something like “expressive power in some universally quantified sense.” Obviously you can’t directly compare perceptrons to biological neurons
            • Basic-Low-323@alien.topB
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 year ago

              I have to say, this is completely the *opposite* of what i have gotten by playing around with those models(GPT4). At no point did I got the impression that I’m dealing with something that, had you taught it all humanity knew in the early 1800s about, say, electricity and magnetism, it would have learned ‘deep representations’ of those concepts to a degree that it would allow it to synthesize something truly novel, like prediction of electromagnetic waves.

              I mean, the model has already digested most of what’s written out there, what’s the probability that something that has the ability to 'learn deep representations and learn to deploy those representations in arbitrary contexts’ would have made zero contributions, drew zero new connections that had escaped humans, in something more serious that ‘write an Avengers movie in the style of Shakespeare’? I’m not talking about something as big as electromagnetism but…something? Anything? It has ‘grokked’, as you say, pretty much the entirety of stack overflow, and yet I know of zero new programming techniques or design patterns or concepts it has come up with?

        • venustrapsflies@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Real brains aren’t perceptrons. They don’t learn by back-propagation or by evaluating performance on a training set. They’re not mathematical models, or even mathematical functions in any reasonable sense. This is a “god of the gaps” scenario, wherein there are a lot of things we don’t understand about how real brains work, and people jump to fill in the gap with something they do understand (e.g. ML models).

          • InterstitialLove@alien.top
            cake
            B
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            Brains are absolutely mathematical functions in a very reasonable sense, and anyone who says otherwise is a crazy person

            You think brains aren’t turing machines? Like, you really think that? Every physical process ever studied, all of them, are turing machines. Every one. Saying that brains aren’t turing machines is no different from saying that humans have souls. You’re positing the existence of extra-special magic outside the realm of science just to justify your belief that humans are too special for science to ever comprehend

            (By “is a turing machine” I mean that its behavior can be predicted to arbitrary accuracy by a turing machine, and so observing its behavior is mathematically equivalent to running a turing machine)

        • red75prime@alien.top
          cake
          B
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          LLMs might still lack something that the human brain has. Internal monologue, for example, that allows us to allocate more than fixed amount of compute per output token.

          • InterstitialLove@alien.top
            cake
            B
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            You can just give an LLM an internal monologue. It’s called a scratchpad.

            I’m not sure how this applies to the broader discussion, like honestly I can’t tell if we’re off-topic. But once you have LLMs you can implement basically everything humans can do. The only limitations I’m aware of that aren’t trivial from an engineering perspective are

            1. current LLMs mostly aren’t as smart as humans, like literally they have fewer neurons and can’t model systems as complexly
            2. humans have more complex memory, with a mix of short-term and long-term and a fluid process of moving between them
            3. humans can learn on-the-go, this is equivalent to “online training” and is probably related to long-term memory
            4. humans are multimodal, it’s unclear to what extent this is a “limitation” vs just a pedantic nit-pick, I’ll let you decide how to account for it
            • red75prime@alien.top
              cake
              B
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 year ago

              It’s called a scratchpad.

              And the network still uses skills that it learned in a fixed-computation-per-token regime.

              Sure, future versions will lift many existing limitations, but I was talking about current LLMs.

    • Toasty_toaster@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      ChatGPT predicts the most probable next token, or the next token that yields the highest probability of a thumbs up, depending on whether you’re talking about the semi-supervised learning or the reinforcement learning stage of training. That is the conceptual underpinning of how the parameter updates are calculated. It only achieves the ability to communicate because it was trained on text that successfully communicates.

      • gebregl@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        The vacuous truth is saying that AI is statistical. It certainly is, but it’s also much more.

        The fallacy part is to take that fact and claim about an AI algorithm, that because it’s “just statistics” that it therefore cannot exhibit “true” intelligence but it’s somehow faking or mimicking intelligence.

    • Dongslinger420@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      We need a name for the illness that is people throwing some shoddy homebrew "L"LM at their misshapen prompts - if you can call them that - and then concluding that it’s just bad imitation of speech because they keep asking their models to produce page after page of sonic fanfic. Or thinking everything is equally hallucination-prone.

      Really, the moronic takes about all this are out of this world, never even mind how people all of a sudden have a very clear idea of what intelligence, among the most ambiguous and ill-defined notions we debate, entails. Except they’re struggling putting this knowledge into words, it’s more about the feel of it all, y’know.

    • samrus@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      embeddings are statistics. they evolved from linear models of statistics but they are now non-linear models of statistics. Bengio 03 explains this