• visarga@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    for now we might be able to 10x our language data, but the top quality content has already been used

    beyond that I think synthetic data will rule; it needs to be validated or filtered somehow; I think we need to use agents and RL to make it high quality