• 0 Posts
  • 10 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle


  • Have/currently helping hire for senior MLE right now. Personal preference is the more abstract version but that being said I don’t think it would play a strong role (either is fine). I know in some cases recruiters will screen based on keywords for the job description so potentially good to hit those.

    I think I’ve seen more cases of overtechnical resumes not living up to the technicalness than abstract resumes that were not technical enough in interviews.



  • +1, when in doubt, LLM it out.

    You could also ask for explanations so when it gets it wrong, you can work on modifying your prompts/examples to get better performance.

    Potentially you wouldn’t want to do this if:

    • Your classification problem is very unusual/cannot be explained by a prompt
    • You want to be able to run this extremely fast or on a ton of data
    • You want to learn non-LLM deep learning/NLP (in which case I would’ve suggested basically some form of finetuning BERT)

  • Definitely one tricky part as you mentioned is the dataset. In an ideal world, you’ll have a supervised dataset of (document, personality type) pairs and you can train a model on these (just like u/Veggies-are-okay mentioned).

    Assuming you don’t have this data, a couple options:

    • Make the data. Some quick google searches show that many celebrities do have known Big-5s. You could manually curate Big-5s and text written by these celebrities to build these pairs.
    • Use synthetic data. Try asking an LLM (like ChatGPT) to write a text on a random topic as if they were $RANDOM big-5 then just use these results as your training pairs.
    • Try clustering. Potentially similar personality types have similar embeddings. Take a dataset of writings, embed them using something like BERT, label/best-effort-guess a few and then predict personalities based on the proximity of a piece of known big-5 text in the embedding space. You could extend this to training a model that asks “do text A and text B display the same big-5” which could potentially be an easier problem to get samples for and then run this model against a set of know big-5s and your unknown example.
    • Use a proxy. There might be datasets/models out there that predict heuristics that could be combined to find big 5. Like maybe a sentiment score is correlated with agreeableness. Potentially you might be able to create word/phrase banks such that using certain phrases is potentially indicative of a leaning on big-5 (“has_neurotic_phrases” is then a feature in your model)

  • Recently had to make a similar decision but as new grad whether to go directly into ML industry vs masters/PhD route. After speaking with a bunch of Machine Learning Engineers and Data Scientists in industry it my main conclusions were:

    1. Most people whether or not they went the PhD route were happy with what they did, where they ended up, and how much they are being paid (the exception being people who started PhD and didn’t complete it – that group regretted spending time on it)
    2. Most MLE/DS roles, especially now in the AI craze, are not researchy and are much more so applied in a way that a PhD isn’t usually needed

    Ended up going directly into an MLE/DS role and no regrets so far 1 year in and I honestly don’t think I would’ve gotten a better/higher paid position having had a PhD.