• vivendi@programming.dev
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    3 days ago

    The model ISN’T outputing the letters individually, binary models (as I mentioned) do; not transformers.

    The model output is more like Strawberry <S-T-R><A-W-B>

    <S-T-R-A-W-B><E-R-R>

    <S-T-R-A-W-B-E-R-R-Y>

    Tokens can be a letter, part of a word, any single lexeme, any word, or even multiple words (“let be”)

    Okay I did a shit job demonstrating the time axis. The model doesn’t know the underlying letters of the previous tokens and this processes is going forward in time