I’ve got research background in ML but never actually developed any models as it was all theoretical work. I got lucky during the interview stage for this role as my research impressed them. My project involves fine-tuning a GPT-3 model for a specific task and host the model on a website. Does anyone have any tips on how to go about learning what I need to know to do this? Also what should I consider when curating my custom dataset when fine-tuning the model? I really want this to be a learning experience for me.
Sounds a little like you’re overqualified. Fine tuning is something you could probably pick up from a udacity course if you’d like.
s/over/under/d
Or a huggingface tutorial
He obviously is not overqualified, otherwise he would know how to do it. Sure, calling some random code snippets to fine-tune is easy, but doing this properly takes quite some experience.
How can someone be overqualified when they don’t know how to do their work tasks?
r/learnmachinelearning
r/languagetechnology
Google it yourself: https://www.google.com/search?q=finetuning+gpt3
The first result of the Google query is literally OpenAI explaining how to fine-tune GPT-3, which also requires minimal ML and coding knowledge since they have already done all the heavy lifting. Curating the dataset is the hardest part, but even that is just basic data science.
Maybe practice on smaller open source models to get a feel for it. Then work your way up. Tuning is more of an art than a science
I don’t have any resources as I’m a SWE, but I do have some advice.
Ask for help from your mentor/other engineers. Seriously, I’m a software engineer (non-ML, but ML teams operate similar to SWE teams) and we don’t expect interns to know almost anything, and we understand they’re gonna need quite a bit of hand holding. I know I did. It’s okay! That’s how we all learn, and being able to ask for help when you need it is one of the most vital skills to have in software. The absolute worst thing you could do is struggle the whole internship without getting the help you need.
All you gotta do is say, “Hey, I’m struggling with the fine-tuning of this model for my project. My research and academic experience have all been extremely theoretical, but I never got the chance to do much practical tuning. Do you have some suggestions given where I’m at?”. Obviously provide a lot of extra context for where you’re at/what you’re struggling with, but you get the point. They’re not gonna fire you so don’t worry about that (literally every interns worst fear), and they want you to learn! Asking would reflect well on you too since you’re showing 1) you know your short comings and 2) you are actively working to overcome them. If you can do both of those things you’re already ahead of most people.
Good luck!
Start with BERT as it’s easy to pickup. Follow some guide. I agree that you are actually overqualified for this, unless you are afraid of coding.
fine tuning via the OpenAI API is actually easier because you only need to work on preparing a clean data set and sending it to OpenAI and not worry about any other part of the pipeline.
Yeah basically what I think is probably happened is that these guys are like everybody else you thinks that fine tuning is somehow going to make it possible to open Pandora’s black box on GPT and more than likely the folks that tasked you with this have absolutely no clue about how fine tuning works…
You should check out the Low Rank Adaptation (LoRA) repo and try running the example for finetuning gpt2 small. Once you got than running you could use the library for finetuning other larger open source models on cloud (check out SkyPilot for this)
What if u save the company money on fine-tuning and figure out if prompting will be better for this task
I can help you with this. Would love to share knowledge and network. Let me know if you are interested
I can only hope this wasn’t a highly competitive internship that more qualified (in the sense of having done similar work) students were passed over for. Majority of ML students regardless of speciality should have applied experience even if it’s on a smaller scale imo.
TowardsDatascience, Kaggle and Stack… are your sites to go. Most of the sites tells you how to integrate and use APIs. And off course, you can just Google it!!
I’m a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:
- [/r/datascienceproject] I got a ML internship and have no idea what to do (r/MachineLearning)
^(If you follow any of the above links, please respect the rules of reddit and don’t vote in the other threads.) ^(Info ^/ [1](/message/compose?to=/r/TotesMessenger))
Contact ↩︎
Yeah none of this requires an ML degree. You are doing data plumbing which means looking at the api and making sure your data goes from your computer to theirs and then using the stored server state to perform computation.
I recommend using python and using the openai python bindings to handle the plumbing. I wrote a simple script in about a day. Depending on your level of skill with python it could take a bit longer than that.