I’m trying to create an NLP Emotion Classification Model for a research project but kind of confused on where and how to start. I have this huge dataset of Reddit posts and want to classify each post into like 12 different emotion categories.
Is there a way to do this using existing models eg. BERT or can I also do this using unsupervised learning?
I have at least 12000 different posts and so want to avoid supervised learning because its going to take so long to label a set for training data also I might lose a lot of time doing that.
Whats the most efficient and accurate way to do this? Any help would be amazing!
You can label a balanced subset and try something like SeTFiT
Go google the setfit library on GitHub. Frame it as a few shot learning task, won’t get perfect results but seems tractable as a problem.
I doubt you’ll find 12 different emotions on Reddit. I think everything can fit into:
- sarcasm
- rage
- polemic rage
- bewilderment
- jocularity
- serious
I might have missed one or two, but I’m sure there aren’t 12.
Try the Go Emotions dataset from Google