• Repple (she/her)@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    edit-2
    2 days ago

    I’m talking about models printing out the component letters first not just printing out the full word. As in “S - T - R - A - W - B - E - R - R - Y” then getting the answer wrong. You’re absolutely right that it reads in words at a time encoded to vectors, but if it’s holding a relationship from that coding to the component spelling, which it seems it must be given it is outputting the letters individually, then something else is wrong. I’m not saying all models fail this way, and I’m sure many fail in exactly the way you describe, but I have seen this failure mode (which is what I was trying to describe) and in that case an alternate explanation would be necessary.

    • vivendi@programming.dev
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      2 days ago

      The model ISN’T outputing the letters individually, binary models (as I mentioned) do; not transformers.

      The model output is more like Strawberry <S-T-R><A-W-B>

      <S-T-R-A-W-B><E-R-R>

      <S-T-R-A-W-B-E-R-R-Y>

      Tokens can be a letter, part of a word, any single lexeme, any word, or even multiple words (“let be”)

      Okay I did a shit job demonstrating the time axis. The model doesn’t know the underlying letters of the previous tokens and this processes is going forward in time