@Maykey

Maykey@alien.top · 1 year ago

This “only memory saved” amounts to throwing away 2 copies of the entire model. Pretty sweet deal.

Maykey@alien.top · 1 year ago

I haven’t watch the talk, but I think the reading list should have some love for SSM. (S4, S5, H3): on one hand their variants are very prominent on long range arena on other they are relatively “unknown”.

They are not unknown to researchers seeing how many variants there are, but there are hundreds more videos and blogs explaining transformers. If you find a course about LLM, it will likely include Transformers but not SSM, so I think their success in LRA and absence in learning materials qualifies them for “dive in deeper” list.

Maykey@alien.top · 1 year ago

I don’t think a linear transformer has a serious chance to beat a standard transformer with the same number of parameters.

I do. Transformers are not good on long range area.. They perform well only if they are backed by better architectures as in case of MEGA.

Maykey@alien.top · 1 year ago

I like to imagine one research in terms of another. For example I see Luna as a cousin of RMT (core idea of both is to get smaller sequence from a bigger, but methods and goals are very different), but if you squint, you will see the similarities. Helps with breaking down whole paper to smaller parts and see how one research is different from another and how they are similar. And I reward myself with a cookie if I find similarities when papers do not mention each other. I also have a (paper) notebook where I write down notes

Disclaimer, I’m not student/researcher, but a dirty hobbyist

Maykey@alien.top · 1 year ago

Yeah, it just needs more integration of commands with llm(/go east vs east vs map actual exits or /take with take) because now it’s confusing what can be done in game actually and what is hallucination which doesn’t change real game state that much

Maykey@alien.top · 1 year ago

https://chasm.run/worlds

Well, there’s only one port there.

Also it seems LLM is not good at paring /map and look:

> look In addition to the main path leading deeper into the Vastarium, there are exits to the south and west…

> /map
Vastarium Entrance Hall, Dining Area, Bar, Restrooms
Uncanny Valley, to the northwest
Bizarre Botanical Garden, to the southeast
Labyrinthine Library, to the northeast

Also bottom part is confusing:

• • Hhgg | Vastarium | 7 -2 | 47 turns | 10
Nyaran | Uncover the truth behind the ancient artifacts

Not sure who is Hhgg (maybe me and there was character generation but I forgot it already), 7 -2 is map coordinates, 47 turns is history length, 10 is ???, “Uncover the truth…” constantly disappears so I’m not sure what it is, if it’s local area quest, global quest (there is no /quests commands afaict)

Maykey@alien.top · 1 year ago

Documentation is wrong. It says chasm_server = "chasm.run:1234". Program wants server = "tcp://chasm.run:25566"

Then client crashed when I changed key only. I had to add “tcp://” to the server. Then I got to the banner, it said “type /help”, I typed, nothing happened. I dunno if this is another instance “wouldn’t happen in telnet” or server is ovlerloaded.

Maykey@alien.top · 1 year ago

As avid mud player in the past, why do I need a client? What’s wrong with telnet? Installing client (into venv on top of it) is extremely wasteful for sending text messages to the remote server and getting them back.

Maykey@alien.top · 1 year ago

Yes. ExLlama2 is much faster IME. It also supports 8-bit cache to save even more VRAM(I don’t know if llama.cpp has it).

Maykey@alien.top · 1 year ago

My hot take is that local models will become truly feasible on phones(and in general) only once we move past transformers towards something more FLOP and memory efficient(RetNet, S5)

Maykey@alien.top · 1 year ago

At current capabilities it’s faster to query server on the opposite hemisphere than to generate locally.

Maykey@alien.top · 1 year ago

I hope openai becomes more open under new leadership, but I am not holding my breath

Maykey@alien.top · 1 year ago

My own because if I didn’t want to have control I would use ChatGPT and which I tried lack features I want: parameter randomization mid inference; generating several responses in sequence(not at once as kobold); having good editing experience(no undo tree = not for me); manual limiting of what tokens are being sent to the models(I don’t want silent trimming when I have to guess the actual context)

Maykey@alien.top · 1 year ago

No edit or reroll.
Chat only, no notebook format
Shift enter sends message instead of inserting enter
Chat history is not saved

Maykey@alien.top · 1 year ago

phi-CTNL 2

Maykey@alien.top · 1 year ago

I moved from desktop with GTX1070(and laptop with 1050) to laptop with 3080ti specifically so I can run video games when I’m not running LLM.

My only two regrets is downgrade in RAM(64GB->32GB) and storage(4TB hdd -> 2TB M.2 NVME), but it’s not critical.

I thought about upgrading desktop, but it wouldn’t be minor upgrade so after calculations it turned out getting laptop is better. ~Year and half later I still think so.

Maykey@alien.top · 1 year ago

Looks pretty standard llama. Doesn’t have base64 blob like (quantized) chatglm does.

Maykey@alien.top · 1 year ago

20B documents that are deduplicated.

I wonder if we’ll see even slimmer version