Didn’t mean to say those papers are completely useless, but even for those with a strong Math/ML background I would advise starting with recent survey papers. Reading “Attention is All You Need” is kind of like reading the General Relativity papers of Einstein - cool as a historical curiosity, but not ideal for optimizing expertise acquisition.
https://arxiv.org/abs/2106.04554
If you’re trying to learn more about language models don’t bother with anything written before 2020. That’s basically the Stone Age.