My rule of thumb has been to LoRA (r between 4 and 16) until unsatisfied with results. It of course depends on data/task but imo most cases don’t require full fine-tune and perf/compute ROI is low.
My rule of thumb has been to LoRA (r between 4 and 16) until unsatisfied with results. It of course depends on data/task but imo most cases don’t require full fine-tune and perf/compute ROI is low.
Have/currently helping hire for senior MLE right now. Personal preference is the more abstract version but that being said I don’t think it would play a strong role (either is fine). I know in some cases recruiters will screen based on keywords for the job description so potentially good to hit those.
I think I’ve seen more cases of overtechnical resumes not living up to the technicalness than abstract resumes that were not technical enough in interviews.
I’ve been working on some experimental context window extensions using multimodal models https://github.com/sshh12/multi_token
Similar to the idea of putting text into an image for GPT4V, I’m just directly encoding chunks of texts into embeddings and injecting them in the models. This gives you a very lossy 128x extension of your context window which is pretty massive.
+1, when in doubt, LLM it out.
You could also ask for explanations so when it gets it wrong, you can work on modifying your prompts/examples to get better performance.
Potentially you wouldn’t want to do this if:
Definitely one tricky part as you mentioned is the dataset. In an ideal world, you’ll have a supervised dataset of (document, personality type) pairs and you can train a model on these (just like u/Veggies-are-okay mentioned).
Assuming you don’t have this data, a couple options:
Recently had to make a similar decision but as new grad whether to go directly into ML industry vs masters/PhD route. After speaking with a bunch of Machine Learning Engineers and Data Scientists in industry it my main conclusions were:
Ended up going directly into an MLE/DS role and no regrets so far 1 year in and I honestly don’t think I would’ve gotten a better/higher paid position having had a PhD.
Huge fan of modal, have been using them for a couple serverless LLM and Diffusion models. Can be definitely on the costly side, but like that the cost directly scales based on requests and setup is trivial.
recent project with modal: https://github.com/sshh12/llm-chat-web-ui/tree/main/modal
Hey! I wrote a blog post recently only how these types of vision LLMs work: https://blog.sshh.io/p/large-multimodal-models-lmms
Specifically focusing on LLaVA, but generally the same high level idea.
Can’t speak for industry standards, but a little bit ago I worked on aerial object recognition and we used YOLO + NVIDA Jetson. Jetson seemed like the best GPU accelerated hardware that was light enough to mount to a smallish drone.
The best way is still open to some research but my understanding is that current open source SOTA is ShareGPT4V uses a high quality dataset based on GPT4V + I believe a LLaVA-like architecture. This works by essentially encoding the other domain as text embeddings that are understood by the LLM.
If you are interested I have a library for more easily training these on custom modalities: https://github.com/sshh12/multi_token (uses basically the same idea from the LLaVA 1.5 paper)