Interests: programming, video games, anime, music composition

I used to be on kbin as e0qdk@kbin.social before it broke down.

  • 3 Posts
  • 420 Comments
Joined 3 years ago
cake
Cake day: November 27th, 2023

help-circle
  • If you just pulled the default version of qwen3.5 from ollama’s repo you downloaded a mediocre one that only uses ~6GB.

    Check ollama show qwen3.5 and see if you get something like this in the result:

      Model
        architecture        qwen35    
        parameters          9.7B      
        context length      262144    
        embedding length    4096      
        quantization        Q4_K_M 
    

    This is the default version I got when I first tried using ollama without any experience. It worked, but it’s a heavily quantized, lower parameter version of the model – i.e. it’s pretty dumb – compared to what you can actually run on your hardware.


  • e0qdk@reddthat.comtoSelfhosted@lemmy.worldDo you host your own AI?
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    3 days ago

    I started running LLMs a couple months ago on my own hardware. I have a Framework Desktop that I ordered last year and also recently picked up a refurbished 24GB AMD RX 7900 XTX which I’m doing some performance testing against. The dGPU is much better for dense models, and slightly faster for MoE if I’m willing to run them at a lower quant – but uses more power and has annoying coil whine. The Framework Desktop uses ~100W under load, is quieter, and for the MoE models already runs them fast enough for most of my needs – so most of my LLM use happens on that system still.

    For software: I’m using ollama on the Framework currently, but I want to replace it with just using llama.cpp directly eventually. I’ve been using llama-cli for testing the dGPU. I wrote my own chat client to interact with ollama as well as a few other programs for specific tasks.

    I’ve been using the LLMs for a mix of research (both personal and professional), entertainment, practical coding tasks (mostly debugging and brainstorming, plus a bit of UI prototyping, automatic generation of sequence diagrams for documentation, and light scripting), as well as automation of tedious tasks.

    As an example of the latter, people often send me requests to prepare data sets by email but don’t specify the sources they want precisely so I have to go match the name against the real name in our archives; LLMs are great for mapping the imperfect name – with typos, missing prefixes, incorrect addition of spaces, addition/removal of hyphens, etc. – to the exact name I actually need to pull the data off disk when given a lookup table to compare against.

    As far as models go, I’m mostly using various Qwen 3.6 and Gemma4 variants. I have multiple versions of each for different purposes. llmfan46’s uncensored Qwen 3.6 35B-A3B @ Q6_K (from Hugging Face) is my default model currently.


  • “Dry humor” maybe, for the style?

    Did some searching and one term that came up was the German word “Schmunzeln” which I hadn’t heard of before.

    wiktionary describes it as:

    to smile slightly, smirk (not smugly, but in an amused or contented way; e.g. when witnessing an amusing scene or conversation)

    Not sure if that’s quite right or not, but might be what you’re looking for if you want a word to describe the reaction? (Maybe someone who actually speaks German can confirm…)



  • Assuming you mean in the computer sense, it’s a device that forwards messages from one network connection to another. e.g. between ethernet connections and/or WiFi in home use, typically, or sometimes other kinds of networks in industry.

    Messages on the internet are usually transferred as IP packets (Internet Protocol packets). The router looks at the destination address on each packet that arrives, consults a table (“routing table”) to determine which connection to use to forward the message out on, and then it actually copies the message onto that connection. The basic idea is pretty straightforward, but it can get complicated in real world situations when you have multiple networks, redundant links, etc.


  • I agree that the hardware being used right now is not well suited. I don’t agree that it’s strictly necessary to use the right hardware – there’s just less tedious waiting involved for the computation to happen if you’ve got better hardware. Real-time interaction is the boundary where you need to have good enough hardware. For everything else you just have to be patient enough – sometimes absurdly so, but you could, in principle, still perform the computation.

    LLMs are as close as we have right now, and they have miles to go. But they need hundreds of times more power than the brain does. No it won’t be soon and it won’t be with this kind of silicon processors.

    There are people already baking LLMs into custom hardware – e.g. https://chatjimmy.ai/

    Their demo page isn’t the best LLM I’ve seen (Qwen and Gemma are much more clever and more likely to give decent results) but this is a taste of what’s possible… It gives responses at ~17000 tokens a second today.

    If I could get answers back from the best Qwen model I’ve got at that speed, I could just retry every query three times, feed it through another pass to self-assess the results, and then reply before you can blink. That would get rid of a lot of the “confidently claims knowledge about a made up subject” issue we currently see – we can do the same thing on CPUs/GPUs but you’re stuck waiting so long for the result that most people don’t bother.


  • Yes. I don’t know about the timeline for the higher bar definitions of AGI. For the lower bar definitions, we’re basically already at “good enough” today.

    If you’d told me 10 years ago that I’d be able to run a program on my computer which would let me feed in an image along with some CSS and JS files and it would then give me a correction that fixes the bug I indicated visually… I would not have believed you. Here I am in 2026 though, and I have done exactly that several times with local LLMs on my own hardware. That same program can also take a natural language description of characters, motivations, and a vague scenario and write a scene. Not an especially well written scene, most of the time, but good enough to get the characters from the initial conditions to ending conditions via complex intermediate steps. I can also define tools it hasn’t seen before and it can combine them in sequence to solve a problem defined in natural language. Is it perfectly reliable? Hell no. Is it always coherent? Definitely not. The fact that it can do as much as it can is just bonkers though. If we’re getting this far with what I strongly suspect is not the ideal architecture for general intelligence, god only knows what we’ll see when we do hit on the right architecture.






  • Hmm. I’m not exactly sure how I got there or what would work for other people, but it can be done.

    Maybe try thinking of it like pressing the clutch in a manual drive car? The engine might keep spinning, but if you hold down the clutch and ignore it eventually it’ll run out of gas…

    Or maybe think of it like tuning out someone annoying chattering nearby. They might keep talking for a bit but if you ignore them, eventually they’ll get bored and shut up / leave. Even if they come back, just ignore them again if you don’t want to engage.

    Or, try focusing on sensory details instead of mental chatter. Really notice what you’re seeing/hearing/feeling without actively describing it or planning anything.

    I don’t usually stay in that state all that long, but sometimes it’s nice to just be.


  • I suspect most of them do not have an internal monologue in the same (verbose) sense that humans can have, but the relatively closely related ones (e.g. mammals, probably) likely have similar memory/sensory integration experiences. It’s possible to get your own inner monologue to “shut up” for a bit, and just be and feel and do. You can still remember an experience without talking to yourself about it as well. I suspect that closely related animals’ experience is like that – although differing based on the particular set of senses and drives unique to their species.

    The further away you go from that, the less idea I have of what’s going on (besides “state machine” of some sort). I have only the vaguest notion of what it might be like to be a spider, and even less of an idea of what it’s like to be a starfish.







  • He writes out the entire code, and it works every time.

    Well, I’m not sure if they’re entirely human if it actually works the first time every time – but they’re definitely not any of the LLMs I’ve encountered… :-)

    I’m thinking obsessive about work (never mutes their phone type) and using AI tools. Politely check (preferably in person) to make sure you’re not waking them up in the middle of the night with off hour requests; there are some people who feel compelled to respond to everything immediately instead of getting back to you the next day.