Sure.
I’m using an instruct style dataset with a system field (in Axolotl I use either the orcamini dataset type or chatml). I’ve then collated a bunch of writing that I like (up to 4096 tokens in length) and then reverse prompted it in an LLM to create instructions. So, for example, one sample might have a system field that is “You are a professional author with a raw, visceral writing style” or “You are an AI built for storytelling.” Then the instruction might be “write a short story about X that touches on themes of Y and Z, write in the style of W.” Or the instruction might be a more detailed template, setting out genre, plot, characters, scene description, POV, etc. Then the response is the actual piece. My dataset also includes some contemporary non-rhyming poetry, some editing/rephrasing samples, and some literary analysis.
I have three datasets. A small one that is purely top quality writing in a dataset structured as above, a middle sized one that also works in some fiction-focused synthetic GPT-4 data I’ve generated myself and curated from other datasets, and a larger one that also incorporates conversational responses derived from a dataset that is entirely Claude generated.
I’ve then run a full fine-tune on Mistral with those datasets using Axolotl on RunPod, using either 2 or 3 A100s.
I find utilising a system prompt very beneficial – it seems to help build associative connections.
Overall results have been pretty good. The larger dataset model is a great all round writer and still generalises well. The smaller dataset model produces writing that is literary, verbose, and pretty.
I’ve also had some success training on Zephyr as a base model. It helps to give underlying structure and coherence. Finding the right balance of writing pretty and long, with enough underlying reasoning to sustain coherence has been the key challenge for me.
Doing a full fine tune on Mistral 7B is the only way I’ve gotten human, literary text out of any of these models. Occasionally the vanilla Llama-2 70B will output something great. Yi-34B-Chat, while not by default a literary writer (it’s got that clunky, purple prose, GPT-4 feel to it) impressed me with its ability to write in a requested style.
The old Guanaco 33B and 65B models produced nice prose, but unfortunately they’re only 2048 context and they weren’t the best at following instructions.