[R]eading List for Andrej Karpathy’s “Busy person’s intro to Large Language Models” Video

FallMindless3563@alien.top · 3 years ago

[R]eading List for Andrej Karpathy’s “Busy person’s intro to Large Language Models” Video

akardashian@alien.top · 3 years ago

thanks for compiling!

um-xpto@alien.top · 3 years ago

Nice! Thank you for your work.

Regarding the video.

Q1) minute 14:14 Finetuning into an Assistant, when you have multiple tasks / datasets with diverse outputs how is training performed ? Are all datasets combined in a single training ? Or Is finetuning done over a previous finetuning ? Or the question is parsed and sent to a specific model ?

Q2) minute 27:43 Tool Use (Browser, Calculator, etc. ) Anyone has links for similar implementations for llama and how is done or what kind of tech/frameworks are used ?

Disastrous_Elk_6375@alien.top · 3 years ago

Q2) minute 27:43 Tool Use (Browser, Calculator, etc. ) Anyone has links for similar implementations for llama and how is done or what kind of tech/frameworks are used ?

The naive way is to use langchain, but that’s hit and miss for several reasons, and whatever you build will be held together by duct tape and prayers. Alternative frameworks include Haystack and Griptape.

I’ve found that for local models the best tool-usage you can get is by using an advanced control library. This gives you a lot of flexibility in organising the prompts and “helping” the local models a lot. Guidance and LMQL are two such libraries.

um-xpto@alien.top · 3 years ago

Thanks. Guidance seems a good fit I’ll start looking for more info.

FallMindless3563@alien.top · 3 years ago

You certainly can combine all the tasks and datasets into a single instruction fine tuning dataset. Then you would have a separate dataset for the reinforcement learning half where the model is learning human preferences.

derpgod123@alien.top · 3 years ago

Only papers to read no books?

FallMindless3563@alien.top · 3 years ago

The only book he explicitly mentions is “Thinking Fast and Slow” by Daniel Kahneman, but I think there are a ton of books that would be great resources along side the papers. I just happened to pull a lot of the papers from the footnotes and concepts he mentioned.

Maykey@alien.top · 3 years ago

I haven’t watch the talk, but I think the reading list should have some love for SSM. (S4, S5, H3): on one hand their variants are very prominent on long range arena on other they are relatively “unknown”.

They are not unknown to researchers seeing how many variants there are, but there are hundreds more videos and blogs explaining transformers. If you find a course about LLM, it will likely include Transformers but not SSM, so I think their success in LRA and absence in learning materials qualifies them for “dive in deeper” list.

coumineol@alien.top · 3 years ago

Thanks but here’s the problem with this list: most of the papers mentioned are on a very high technical level, and people who would be able to understand them are probably people who have already read them. Note that Andrej was careful to keep the material at a certain level because he addresses those who want to go one step further than talking to ChatGPT, without necessarily understanding all the underlying theory.

teryret@alien.top · 3 years ago

Right, that’s why OP prefaced with “to dive deeper into a lot of the topics”. If folks aren’t at a point where diving deeper makes sense, it’s not a list for them. There are plenty of resources for any given level of understanding, obviously no list is going to be appropriate for every member of a diverse community.

coumineol@alien.top · 3 years ago

Not to start an argument here but I can’t imagine anybody with any level of understanding who should start diving deeper by reading the “Attention is All You Need” paper. Yes, this is a diverse community, but when you try to address everybody’s needs, you usually end up with addressing nobody’s needs.

eek04@alien.top · 3 years ago

Since “Attention is All You Need” is fairly high on my reading list for understanding the details of transformer architecture, what do you recommend instead?

coumineol@alien.top · 3 years ago

https://arxiv.org/abs/2106.04554

If you’re trying to learn more about language models don’t bother with anything written before 2020. That’s basically the Stone Age.

eek04@alien.top · 3 years ago

Thank you!

whymauri@alien.top · 3 years ago

Just me, but I think of busy coworkers with great background in math/stats and ‘classic’ ML who would ramp up quickly from a list like this. When I onboarded chemists (PhDs) to my ML team at a drug startup, I would send them a similarly dense reading list. With their strong background in physics, it would take them two weeks flat to understand the necessary theory and jargon to be productive (in our niche field).

coumineol@alien.top · 3 years ago

Didn’t mean to say those papers are completely useless, but even for those with a strong Math/ML background I would advise starting with recent survey papers. Reading “Attention is All You Need” is kind of like reading the General Relativity papers of Einstein - cool as a historical curiosity, but not ideal for optimizing expertise acquisition.

lakolda@alien.top · 3 years ago

Some of the content also seems to allude to what Q* might be…

currentscurrents@alien.top · 3 years ago

It really doesn’t, because no one has any clue what Q* is or if it’s even real.

lakolda@alien.top · 3 years ago

Ever hear the term might?