• El_Minadero@alien.topB
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    I mean, everyone is just sorta ignoring the fact that no ML technique has been shown to do anything more than just mimic statistical aspects of the training set. Is statistical mimicry AGI? On some performance benchmarks, it appears better statistical mimicry does approach capabilities we associate with AGI.

    I personally am quite suspicious that the best lever to pull is just giving it more parameters. Our own brains have such complicated neural/psychological circuitry for executive function, long and short term memory, types I and II thinking, “internal” dialog and visual models, and more importantly, the ability to few-shot learn the logical underpinnings of an example set. Without a fundamental change in how we train NNs or even our conception of effective NNs to begin with, we’re not going to see the paradigm shift everyone’s been waiting for.

    • napolitain_@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      How do you think your brain works ? You think you have magic or it is mostly automatism learned by human learning and now you simply do inference on your training (childhood)

    • gebregl@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      We need a name for the fallacy where people call highly nonlinear algorithms with billions of parameters “just statistics”, as if all they’re doing is linear regression.

      ChatGPT isn’t AGI yet, but it is a huge leap in modeling natural language. The fact that there’s some statistics involved explains neither of those two points.

      • samrus@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        embeddings are statistics. they evolved from linear models of statistics but they are now non-linear models of statistics. Bengio 03 explains this

      • psyyduck@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        Let’s ask GPT4!

        You’re probably talking about the “fallacy of composition”. This logical fallacy occurs when it’s assumed that what is true for individual parts will also be true for the whole group or system. It’s a mistaken belief that specific attributes of individual components must necessarily be reflected in the larger structure or collection they are part of.

        Here are some clearly flawed examples illustrating the fallacy of composition.

        • Building Strength: Believing that if a single brick can hold a certain amount of weight, a wall made of these bricks can hold the same amount of weight per brick. This ignores the structural integrity and distribution of weight in a wall.
        • Athletic Team: Assuming that a sports team will be unbeatable because it has a few star athletes. This ignores the importance of teamwork, strategy, and the fact that the performance of a team is not just the sum of its individual players’ skills.

        These examples highlight the danger of oversimplifying complex systems or groups by extrapolating from individual components. They show that the interactions and dynamics within a system play a crucial role in determining the overall outcome, and these interactions can’t be understood by just looking at individual parts in isolation.

        • kelkulus@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          I dunno. The “fallacy of composition” is just made up of 3 words, and there’s not a lot that you can explain with only three words.

        • MohKohn@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          How… did it map oversimplification to… holistic thinking??? Saying that it’s “just statistics” is wrong because “just statistics” covers some very complicated models in principle. They weren’t saying that simple subsystems are incapable of generating complex behavior.

          God, why do people think these things are intelligent? I guess people fall for cons all the time…

        • gebregl@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          The vacuous truth is saying that AI is statistical. It certainly is, but it’s also much more.

          The fallacy part is to take that fact and claim about an AI algorithm, that because it’s “just statistics” that it therefore cannot exhibit “true” intelligence but it’s somehow faking or mimicking intelligence.

      • Toasty_toaster@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        ChatGPT predicts the most probable next token, or the next token that yields the highest probability of a thumbs up, depending on whether you’re talking about the semi-supervised learning or the reinforcement learning stage of training. That is the conceptual underpinning of how the parameter updates are calculated. It only achieves the ability to communicate because it was trained on text that successfully communicates.

      • venustrapsflies@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        It’s not a fallacy at all. It is just statistics, combined with some very useful inductive biases. The fallacy is trying to smuggle some extra magic into the description of what it is.

        Actual AGI would be able to explain something that no human has understood before. We aren’t really close to that at all. Falling back on “___ may not be AGI yet, but…” is a lot like saying “rocket ships may not be FTL yet, but…”

        • InterstitialLove@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          The fallacy is the part where you imply that humans have magic.

          “An LLM is just doing statistics, therefore an LLM can’t match human intellect unless you add pixie dust somewhere.” Clearly the implication is that human intellect involves pixie dust somehow?

          Or maybe, idk, humans are just the result of random evolutionary processes jamming together neurons into a configuration that happens to behave in a way that lets us build steam engines, and there’s no fundamental reason that jamming together perceptrons can’t accomplish the same thing?

          • Basic-Low-323@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            10 months ago

            I mean, if your hypothesis is that the human brain is the product of one billion years of evolution ‘searching’ for a configuration of neurons and synapses that is very efficient at sampling the environment, detect any changes, and act accordingly to increase likelihood of survival, and also communicate with other such configurations in order to devise and execute more complicated plans, then that…doesn’t bode very well for current AI architectures, does it? Their training sessions are incredibly weak by comparison, simply learning to predict and interpolate some sparse dataset that some human brains produced.

            If by ‘there’s no fundamental reason we can’t jam together perceptrons this way’ you mean that we can always throw a bunch of them into an ever-changing virtual world, let them mutate and multiply and after some long time fish out the survivors and have them work for us, sure, but we’re talking about A LOT of compute here. Our hope is that we can find some sort of shortcut, because if we truly have to do it like evolution did, it probably won’t happen this side of the millenium.

            • InterstitialLove@alien.topB
              link
              fedilink
              English
              arrow-up
              1
              ·
              10 months ago

              We don’t currently know exactly why gradient descent works to find powerful, generalizing minima

              But, like, it does

              The minima we can reliably find, in practice, don’t just interpolate the training data. I mean, they do that, but they find compressions which seem to actually represent knowledge, in the sense that they can identify true relationships between concepts which reliably hold outside the training distribution.

              I want to stress, “predict the next token” is what the models are trained to do, it is not what they learn to do. They learn deep representations and learn to deploy those representations in arbitrary contexts. They learn to predict tokens the same way a high-school student learns to fill in scantrons: the scantron is designed so that filling it out requires other more useful skills.

              It’s unclear if gradient descent will continue to work so unreasonably well as we try to push it farther and farther, but so long as the current paradigm holds I don’t see a huge difference between human inference ability and Transformer inference ability. Number of neurons* and amount of training data seem to be the things holding LLMs back. Humans beat LLMs on both counts, but in some ways LLMs seem to outperform biology in terms of what they can learn with a given quantity of neurons/data. As for the “billions of years” issue, that’s why we are using human-generated data, so they can catch up instead of starting from scratch.

              • By “number of neurons” I really mean something like “expressive power in some universally quantified sense.” Obviously you can’t directly compare perceptrons to biological neurons
              • Basic-Low-323@alien.topB
                link
                fedilink
                English
                arrow-up
                1
                ·
                10 months ago

                I have to say, this is completely the *opposite* of what i have gotten by playing around with those models(GPT4). At no point did I got the impression that I’m dealing with something that, had you taught it all humanity knew in the early 1800s about, say, electricity and magnetism, it would have learned ‘deep representations’ of those concepts to a degree that it would allow it to synthesize something truly novel, like prediction of electromagnetic waves.

                I mean, the model has already digested most of what’s written out there, what’s the probability that something that has the ability to 'learn deep representations and learn to deploy those representations in arbitrary contexts’ would have made zero contributions, drew zero new connections that had escaped humans, in something more serious that ‘write an Avengers movie in the style of Shakespeare’? I’m not talking about something as big as electromagnetism but…something? Anything? It has ‘grokked’, as you say, pretty much the entirety of stack overflow, and yet I know of zero new programming techniques or design patterns or concepts it has come up with?

          • venustrapsflies@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            10 months ago

            Real brains aren’t perceptrons. They don’t learn by back-propagation or by evaluating performance on a training set. They’re not mathematical models, or even mathematical functions in any reasonable sense. This is a “god of the gaps” scenario, wherein there are a lot of things we don’t understand about how real brains work, and people jump to fill in the gap with something they do understand (e.g. ML models).

            • InterstitialLove@alien.topB
              link
              fedilink
              English
              arrow-up
              1
              ·
              10 months ago

              Brains are absolutely mathematical functions in a very reasonable sense, and anyone who says otherwise is a crazy person

              You think brains aren’t turing machines? Like, you really think that? Every physical process ever studied, all of them, are turing machines. Every one. Saying that brains aren’t turing machines is no different from saying that humans have souls. You’re positing the existence of extra-special magic outside the realm of science just to justify your belief that humans are too special for science to ever comprehend

              (By “is a turing machine” I mean that its behavior can be predicted to arbitrary accuracy by a turing machine, and so observing its behavior is mathematically equivalent to running a turing machine)

          • red75prime@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            10 months ago

            LLMs might still lack something that the human brain has. Internal monologue, for example, that allows us to allocate more than fixed amount of compute per output token.

            • InterstitialLove@alien.topB
              link
              fedilink
              English
              arrow-up
              1
              ·
              10 months ago

              You can just give an LLM an internal monologue. It’s called a scratchpad.

              I’m not sure how this applies to the broader discussion, like honestly I can’t tell if we’re off-topic. But once you have LLMs you can implement basically everything humans can do. The only limitations I’m aware of that aren’t trivial from an engineering perspective are

              1. current LLMs mostly aren’t as smart as humans, like literally they have fewer neurons and can’t model systems as complexly
              2. humans have more complex memory, with a mix of short-term and long-term and a fluid process of moving between them
              3. humans can learn on-the-go, this is equivalent to “online training” and is probably related to long-term memory
              4. humans are multimodal, it’s unclear to what extent this is a “limitation” vs just a pedantic nit-pick, I’ll let you decide how to account for it
              • red75prime@alien.topB
                link
                fedilink
                English
                arrow-up
                1
                ·
                10 months ago

                It’s called a scratchpad.

                And the network still uses skills that it learned in a fixed-computation-per-token regime.

                Sure, future versions will lift many existing limitations, but I was talking about current LLMs.

        • red75prime@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          a lot like saying “rocket ships may not be FTL yet, but…”

          And the human brain is FTL then?

        • TheBlindIdiotGod@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          Why would a human level AGI need to be able to explain something that no human has understood before? That sounds more like ASI than AGI.

      • Appropriate_Ant_4629@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        We need a name for the fallacy where people call highly nonlinear algorithms with billions of parameters “just statistics”

        Well, thanks to quantum mechanics; pretty much all of existence is probably “just statistics”.

        as if all they’re doing is linear regression.

        Well, practically all interesting statistics are NONlinear regressions. Including ML. And your brain. And physics.

        • KoalaNumber3@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          What a lot of people don’t understand is that linear regression can still handle non-linear relationships.

          For a statistician, linear regression just means the coefficients are linear, it doesn’t mean the relationship itself is a straight line.

          That’s why linear models are still incredibly powerful and are used so widely across so many fields.

          • Appropriate_Ant_4629@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            10 months ago

            Yet still limited compared to even not-very-deep NNs. If the user wants to fit a parabola with a linear regression, he pretty much has to manually add a quadratic term himself.

            I think they’re widely used primarily because they’re widely taught in school.

      • Dongslinger420@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        We need a name for the illness that is people throwing some shoddy homebrew "L"LM at their misshapen prompts - if you can call them that - and then concluding that it’s just bad imitation of speech because they keep asking their models to produce page after page of sonic fanfic. Or thinking everything is equally hallucination-prone.

        Really, the moronic takes about all this are out of this world, never even mind how people all of a sudden have a very clear idea of what intelligence, among the most ambiguous and ill-defined notions we debate, entails. Except they’re struggling putting this knowledge into words, it’s more about the feel of it all, y’know.

    • rp20@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      But given the diversity of the training data at the scale of hundreds of trillions of tokens, you can expect the model to cover almost all of the tasks we care to do.

    • visarga@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      no ML technique has been shown to do anything more than just mimic statistical aspects of the training set

      That’s ok when the agent creates its own training set, like AlphaZero. It is learning from feedback as opposed to learning from next token prediction.

    • nemoknows@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      See the trouble with the Turing test is that the linguistic capabilities of the most sophisticated models well exceed those of the dumbest humans.

        • Gurrako@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          I don’t think so. I doubt GPT-4 will be able to convince someone who is trying to determine whether or not the if the think they are talking to is a human.

          • RdtUnahim@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            10 months ago

            There’s literally been a website you could go on that opens a chat with either a human or GPT, but you do not know which one, and then you get like 30s to figure it out by chatting with them. Then you need to guess if it was a human or an AI you just talked to. And people get it wrong all the time.

            Edit: link to the research that came from that https://www.ai21.com/blog/human-or-not-results

      • COAGULOPATH@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        I think you have to use a reasonably smart human as a baseline, otherwise literally any computer is AGI. Babbage’s Analytical Engine from 1830 was more intelligent than a human in a coma.

        • AntDracula@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          Ironically for robots and the like to truly be accepted, they will have to be coded to make mistakes to seem more human.

      • rreighe2@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        i kinda agree. the turing should take accuracy and wisdom into account. gpt4 is, much like how gpt3.5 was, very confidently wrong some times. the code or advice it could be giving you could be technically true, but very very stupid to do in practice.

        • nemoknows@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          “Very confidently wrong sometimes” is how I would describe most of humanity.

    • Log_Dogg@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      everyone is just sorta ignoring the fact that no ML technique has been shown to do anything more than just mimic statistical aspects of the training set

      What? I hope you’re talking about LLMs exclusively because otherwise this is just blatantly false. AlphaGo Zero is just one of many such examples.

    • dragosconst@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      no ML technique has been shown to do anything more than just mimic statistical aspects of the training set

      What? Are you familiar with the field of statistical learning? Formal frameworks for proving generalization have existed for some decades at this point. So when you look at anything pre-Deep Learning, you can definitely show that many mainstream ML models do more than just “mimic statistical aspects of the training set”. Or if you want to go on some weird philosophical tangent, you can equivalently say that “mimicing statistical aspects of the training set” is enough to learn distributions, provided you use the right amount of the data and the right model.

      And even for DL, which at the moment lacks a satisfying theoretical framework for generalization, it’s obvious that empirically models can generalize.

      • On_Mt_Vesuvius@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        From statistical learning theory, there is always some adversarial distribution where the model will fail to generalize… (no free lunch). And isn’t generalization about extrapolation beyond the training distribution? So learning the training distribution itself is not generalization.

        • dragosconst@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          The No free lunch theorem in Machine Learning refers to the case in which the hypothesis class contains all possible classifiers in your domain (and your training set is either too small, or the domain set is infinite), and learning becomes impossible to guarantee, i.e. you have no useful bounds on generalization. When you restrict your class to something like linear classifiers, for example, you can reason about things like generalization and so on. For finite domain sets, you can even reason about the “every hypothesis” classifier, but that’s not very useful in practice.

          I’m not sure about your point about the training distribution. In general, you are interested in generalization on your training distribution, as that’s where your train\test\validation data is sampled from. Note that overfitting your training set is not the same thing as learning your training distribution. You can think about stuff like domain adaptation, where you reason about your performance on “similar” distributions and how you might improve on that, but that’s already something very different.

    • currentscurrents@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      no ML technique has been shown to do anything more than just mimic statistical aspects of the training set.

      Reinforcement learning does far more than mimic.

    • No_Advantage_5626@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Actually, the claim that “all ML models are doing is statistics” has proven to be a fallacy that dominated the field of AI for a long time.

      See this video for instance, where Ilya (probably the #1 AI researcher in the world currently) explains how GPT is much more than statistics, it is more akin to “compression” and that can lead to intelligence: https://www.youtube.com/watch?v=GI4Tpi48DlA (4.30 - 7.30)