Intel Lunar Lake-MX

Monkey_1505@alien.top · 2 years ago

I think that’s where the real performance will be. Not sure about vram, but probably would make sense to start with mistral 11b, or llama-2 20b splices. Proof of concept.

Monkey_1505@alien.top · 2 years ago

IMO don’t bother with Frankenstein models unless you plan to seriously train them with a broad dataset. They just tend towards getting confused, not following instructions etc. You’d probably need to run an orca dataset at it, and then some RP on top.

Monkey_1505@alien.top · 2 years ago

I dislike Frankenstein models. the 20b, the 120b they are all the same - major confusion, can’t follow logic or instructions properly. Great prose, but pretty useless for that reason.

Someone would have to invest some major training on one of them before it’d be any good.

Monkey_1505@alien.top · 2 years ago

Why do people brew their own beer, or grow their own weed?

It’s because they want to be more connected to the process, in control of it, and cut out the middleman. Also, local models probably won’t destroy civilization.

Monkey_1505@alien.top · 2 years ago

For instruct specifically, certain models do better with certain things. OpenChat, OpenHermes and Capybara seem to be the best. But they will all underperform next to a good merge/finetune of a 13B model. Depending on the type of instruction one of those will be better than the others.

For repetition this seems to fall away somewhat with very long context sizes. Because of the sliding window, it can handle these context sizes, and if you use something like llamacpp the context can be reused such that you won’t have to process the whole prompt each time.

7b is generally better for creative writing, however, there are as I said, specific types of instructions they will handle well.

Monkey_1505@alien.top · 2 years ago

Intel Lunar Lake-MX

Monkey_1505@alien.top · 2 years ago

I wouldn’t rule it out. If some company wins the big pie slices, the others might then decide the best offense is co-operation. Nothing to depend on ofc.

Monkey_1505@alien.top · 2 years ago

Here I am just hoping any of it becomes open source.

IDC about more wannabe corporate models.

Monkey_1505@alien.top · 2 years ago

My guess is some kind of exploit, feature or flaw that he knew about with potential PR impacts, that he didn’t tell them. Something akin to knowing Bing would go Sydney.

Monkey_1505@alien.top · 2 years ago

Knowledge is a strange goal for any model when we have the internet. IMO. Just connect your model to a web search.

Monkey_1505@alien.top · 2 years ago

Having used it a lot, I can say for sure that without much prompting it readily produces junk web text, urls etc, so it is not a fully filtered or fully synthetic dataset.

My guess would be that it’s just ‘a bit better filtered than llama-2’, and maybe slightly more trained on that set. Slightly better quality set, slightly more trained on that set.

My intuition based on this, is that per parameter size EVERYTHING open source could be optimized considerably more.

Monkey_1505@alien.top · 2 years ago

ST. By far the most customizability.

Monkey_1505@alien.top · 2 years ago

Mostly silly tavern.

Monkey_1505@alien.top · 2 years ago

I use Tail Free Sampling all the time, exclusively and I never touch anything else.

Monkey_1505@alien.top · 2 years ago

The latencies involved make it tricky. You can’t just split it across them due to latency, which means both computers need to do their compute independently and then get combined somehow, which means you need to be able to break up inference into two completely distinct tasks.

I’m not sure if this is possible, but if it is, it hasn’t been invented yet.

Monkey_1505@alien.top · 2 years ago

Whatever it is, you need a human to look it over. LLMs make errors.

Monkey_1505@alien.top · 2 years ago

Sentient is such a weird standard. It simply means having an experience, which is completely immeasurable. There is no means we will ever know, at all what is and isn’t sentient, beyond guessing.

Self-awareness, cognition, higher reasoning, these are all somewhat measurable.

Monkey_1505@alien.top · 2 years ago

it takes up more vram than a dense model.

If you are using qlora, it’s not by much. The main issue is that you need another model to parse the prompt. But I could see this being useful sometimes. Maybe as an option though, rather than default

That’s useful, though its gonna be mixed with real data for model robustness.

I actually really don’t like synthetic data. It’s a great method for filtering large datasets, and perhaps augmenting them, but if you use purely synthetic data you are replicating inaccuracies and prose from the origin model that will only be exaggerated by the target model. I’d rather this was a quality control step, not a dataset producer.

Multimodality

I’m personally very eh about this. It has it’s uses, and I’ve used it. But if LLM intelligence has a long way to go and this could take focus away from that. Let that be a seperate project IMO. I’m sure it has it’s uses, and it’s fans, not knocking it - I just think open source is nessasarily already behind proprietary models, and mixed focus could just make that worse.

Massive ctx len

Because of the accuracy issues involved, I’d rather they worked on smarter data retrieval like openAI has (it doesn’t really have the context sizes quoted, it grabs out the relevant bits). Generally speaking for prompts, relevancy beats quantity.

Monkey_1505@alien.top · 2 years ago

What are these imaginary ‘abuses’ this random picture of some text is talking about?

Monkey_1505@alien.top · 2 years ago

Unfortunately this is beyond the edge of what can reasonably be run on consumer hardware so unlikely to be easily available to most people. Hell, a 70b really requires two graphics cards or a high end mac mini already. If it can’t run on that kinda gear, it’s probably not going to be on ai horde or any API either. Which means you have to use runpod or something - most people are not going to do that.

Monkey_1505@alien.top · 2 years ago

That’s true, but they still have less impetus to do that. They are being fairly heavily subsidized by microsoft so running costs and compute isn’t much of a concern. It’s only really at the point where more data, and more compute hits a wall, where they have to worry too much about data refinement.