I want to fine tune some LLM models with my own dataset which contains very long examples (a little > 2048 tokens). vRAM usage jumps up several GBs by just increasing the Cutoff Length from 512 to 1024.
Is there a way to feed those long examples into the models without increasing vRAM significantly?
you can try changing the attention to something like flash attention