1.3B with 68.29% Humaneval lol, don't behead me. Part of my project PIC (partner-in-crime)

ahm_rimer@alien.top · 2 years ago

1.3B with 68.29% Humaneval lol, don't behead me. Part of my project PIC (partner-in-crime)

alchemist1e9@alien.top · 2 years ago

Thank you. Really interesting. I have a question for you. Do you happen know of there are any trained from scratch coding model projects? The reason I ask is I have a very specific idea about how to best teach an LLM to program, but it requires changing some details at the very base encoding level and a change in presentation of the training data. I’ve been programming for over 30 years now and I strongly suspect there is this fairly simple trick to improving coding models, so I’d like to look at something open source that starts from the very beginning. Then I can investigate how hard it would be to implement what I’m thinking. The design I have should result in very small but capable models.

AI_Simp@alien.top · 2 years ago

How much is this costing you if you don’t mind?

BrainSlugs83@alien.top · 2 years ago

Interesting, is Partner in Crime (PIC) like an open source co-pilot type project? I haven’t heard of it before (did you coin this phrase yourself, or is it well known)?

I ask because the tasks you describe (json/md/function calling/empathy) and then the name itself, all basically make it sound like the “open source” models equivalent of a co-pilot model.

A_for_Anonymous@alien.top · 2 years ago

Try to use it for coding, it’ll be as good as offshoring.

metalman123@alien.top · 2 years ago

Do you plan to fine tune the 7b as well?

Dankmemexplorer@alien.top · 2 years ago

“the model can have the test data as a treat”

balianone@alien.top · 2 years ago

try ask to create powerful ransomware script with nodejs

naptastic@alien.top · 2 years ago

Ok, it finally downloaded and I’ve spent a few minutes with it. It keeps getting into endless pathways of jaron (e.g., “fair play make world communal environment tolerant embraces diversity embrace equity promote unity instill resilience proactive leadership” and it just goes on like that–no punctuation, no connecting words–until it reaches the token limit.) What loader and settings work best with this model?

AfterAte@alien.top · 2 years ago

Try using the alpaca template, turn temperature down to 0.1 or 0.2 and repetition penalty to 1. I haven’t tested this yet, but those settings work for Deepseek-coder. If you’re using oobabooga, the StarChat preset works for me.

naptastic@alien.top · 2 years ago

Just selecting StarChat, it instantly became conversational. :+1:

AfterAte@alien.top · 2 years ago

Great! Have fun!

ahm_rimer@alien.top · 2 years ago

Try the chat inference code mentioned in the model card if you’re running it on GPU. The size is good enough to test on free colab as well.

naptastic@alien.top · 2 years ago

That definitely works better. I wouldn’t trust it too far though. It just told me I can remove the first part of a file with one seek() and one truncate() call…

AfterAte@alien.top · 2 years ago

Wow, that’s amazing. On the eval+ leaderboards, Deepseek-coder-1.3B-instruct gets 64.6, so that’s a ~4% increase. It’s about 3% less than Phind-v2’s result, which is amazing.

AfterAte@alien.top · 2 years ago

Btw, does your dataset include coding examples? If so, do you include Rust? I find current models really suck at Rust, but can make a pretty good Snake game in Python 😂

ahm_rimer@alien.top · 2 years ago

Not enough instruct data for Rust, i also am not familiar with Rust so can’t test it out.

I usually test things out with C, C++ and python only on my level.

Though if you know of some good source, I’ll use it for Rust fine tuning.

AfterAte@alien.top · 2 years ago

I don’t know much about Rust, but Easy Rust is a good source for learning: https://github.com/Dhghomon/easy_rust

But in a useful format for fine-tuning… no idea where to get that. And I’m not qualified to make it either. But i don’t want to burden you with extra work so I guess C++ will have to do for now :) Thank you for the model, from me and everyone else with a potato PC m(_ _)m

FullOf_Bad_Ideas@alien.top · 2 years ago

Do you plan to release the dataset? Have you checked for data contamination with benchmarks? I am overall pretty confused by scores of this model on HumanEval, not just your finetune. DeepSeek AI got very weird scaling in benchmarks, since their 6.7B model scores really closely to 33B one, which usually doesn’t work this way. 6.7B instruct scores 78.6% while 33B instruct scores 79.3%. I am now using 33B model daily at work and it’s really good. I have no evidence to support my claim, but I totally wouldn’t be surprised if they were pre-training on contaminated dataset.