Communick News
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Covid-Plannedemic_@alien.topB to LocalLLaMA@poweruser.forumEnglish · 2 years ago

Yi-23B-Llama: Distil version of Yi-34B-Llama

huggingface.co

external-link
message-square
17
link
fedilink
1
external-link

Yi-23B-Llama: Distil version of Yi-34B-Llama

huggingface.co

Covid-Plannedemic_@alien.topB to LocalLLaMA@poweruser.forumEnglish · 2 years ago
message-square
17
link
fedilink
ByteWave/Yi-23B-Llama · Hugging Face
huggingface.co
external-link
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
alert-triangle
You must log in or register to comment.
  • sergeant113@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Can’t wait!!!

  • mcmoose1900@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    There are in fact 3 different distillations: https://huggingface.co/collections/ByteWave/distil-yi-models-655a5697ec17c88302ce7ea1

    Its not the 200K model though.

    • a_beautiful_rhind@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      Which is a shame because the same performance + the extra context would have been huge.

  • kristaller486@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Is there a code for distillation?

    • llama_in_sunglasses@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      I had okayish results blowing up layers from 70b… but messing with the first or last 20% lobotomizes the model, and I didn’t snip more than a couple layers from any one place. By the time I got the model far enough down in size that q2_K could load in 24GB of VRAM it fell apart, so I didn’t consider mergekit all that useful of a distillation/parameter reduction process.

    • mcmoose1900@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      Oh yeah, it be busted.

  • roselan@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    and of course TheBloke already prepped everything for our fine consumption.

    • LocoMod@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      Had the same problem last night and I promptly deleted it.

  • mpasila@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Did anyone manage to get them working? I tried GGUF/GPTQ and running then unquantized with trust-remote-code and they just produced garbage. (I did try removing BOS tokens and still same thing)

    • Jelegend@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      Yeah, exactly the same thing. Produced absolutely rubbish whatever i tried. I tried 8B 15B and 23B

    • watkykjynaaier@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      I’ve completely fixed gibberish output on Yi-based and other models by setting the RoPE Frequency Scale to a number less than one, which seems to be the default. I have no idea why that works, but it does.

      What I find even more strange is the models often keep working after setting the frequency scale back to 1.

      • Aaaaaaaaaeeeee@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 years ago

        What value specifically worked?

  • vasileer@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    did you test the model before advertising it?

    • bearbarebere@alien.topB
      link
      fedilink
      arrow-up
      1
      ·
      2 years ago

      Lmao

  • ltduff69@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    I haven’t had any issues running these Yi models. I think they are really good personally.

    https://preview.redd.it/xddjserqii1c1.jpeg?width=3024&format=pjpg&auto=webp&s=bd9b3124954ff5d6a7c3452b857949d8363c9e87

    • No_Afternoon_4260@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      You took a picture of nous capybara…

      • ltduff69@alien.topB
        link
        fedilink
        arrow-up
        1
        ·
        2 years ago

        Yeah I am kinda petty lol.

LocalLLaMA@poweruser.forum

localllama@poweruser.forum

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@poweruser.forum

Community to discuss about Llama, the family of large language models created by Meta AI.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 4 users / day
  • 1 user / week
  • 4 users / month
  • 4 users / 6 months
  • 1 local subscriber
  • 4 subscribers
  • 1.02K Posts
  • 5.81K Comments
  • Modlog
  • mods:
  • communick@poweruser.forum
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org