IMO don’t bother with Frankenstein models unless you plan to seriously train them with a broad dataset. They just tend towards getting confused, not following instructions etc. You’d probably need to run an orca dataset at it, and then some RP on top.
I think that’s where the real performance will be. Not sure about vram, but probably would make sense to start with mistral 11b, or llama-2 20b splices. Proof of concept.