Not sure, but it seems they finetuned gpt-3.5-turbo-16k, which is faster than GPT-4, hence the claim of GPT-3.5 speed with 16K context limit.
They’re dubiously naming it Phind V7. Also, they’ve ripped off WizardLM’s code in the past and rebranded it to secure seed funding.
I doubt it’s based on CodeLlama 34B. Unless they trained on a specific dataset that makes the model hallucinate as if it’s GPT-3.5 Turbo.
GPT-3.5 turbo apparently has 20 billion parameters, significantly less than the previous best Phind models. Given how bad GPT-3.5 is, I think it was more likely just fine tuned some other base model on GPT-3.5 outputs.
isn’t it 175B?
The recent Microsoft paper on codefusion leaked it.