There was a paper where you’d return a faster model to come up with a sentence and then basically run a batch on them big model with each prompt being the same sentence, with different lengthsending in a different word predicted by the small model, to basically see where the small one went wrong. That gets you a speed up if the two models are more or less aligned.
Other than that I could imagine other things, like having batches with one sentence being generated for each actor, one for descriptions, one for actions, etc. Or simply multiple options for you to choose.
There was a paper where you’d return a faster model to come up with a sentence and then basically run a batch on them big model with each prompt being the same sentence, with different lengthsending in a different word predicted by the small model, to basically see where the small one went wrong. That gets you a speed up if the two models are more or less aligned.
Other than that I could imagine other things, like having batches with one sentence being generated for each actor, one for descriptions, one for actions, etc. Or simply multiple options for you to choose.