Communick News
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
PookaMacPhellimen@alien.topB to LocalLLaMA@poweruser.forumEnglish · 2 years ago

Qwen-72B released

huggingface.co

external-link
message-square
39
link
fedilink
1
external-link

Qwen-72B released

huggingface.co

PookaMacPhellimen@alien.topB to LocalLLaMA@poweruser.forumEnglish · 2 years ago
message-square
39
link
fedilink
Qwen/Qwen-72B · Hugging Face
huggingface.co
external-link
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
alert-triangle
You must log in or register to comment.
  • norsurfit@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    In my informal testing, Qwen72b is quite good. I anecdotally rate it stronger than Llama 2 from the few tests that I have conducted.

    • Secret_Joke_2262@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      What tests have you tested this in?

      ​

      I’m very interested in storytelling and RP

  • balianone@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    can it beats 3.5-turbo?

  • polawiaczperel@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Would it be possible to merge it with deepseek coder 35B?

  • Postorganic666@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Is it censored?

  • drooolingidiot@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    This is amazing. Yesterday we got Deepseek, and today we’re getting Qwen. Thank you for releasing this model!

    I’m looking forward to seeing comparisons

    • lunar2solar@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      Is there any free website where I can test those Chinese models? Thanks.

      • roselan@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 years ago

        for Deepseek there is https://chat.deepseek.com/ , for qwen I don’t know.

  • ambient_temp_xeno@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    The first thing I looked for was the number of training tokens. I think yi34 got a lot of benefit from 3 trillion, so this model having 3 trillion bodes well.

  • ASL_Dev@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Their own quants for transformers run:
    https://huggingface.co/Qwen/Qwen-72B-Chat-Int8
    https://huggingface.co/Qwen/Qwen-72B-Chat-Int4

  • PookaMacPhellimen@alien.topOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    https://github.com/QwenLM/Qwen

    Also released was a 1.8B model.

    From Bunyan Hui’s Twitter announcement:

    “We are proud to present our sincere open-source works: Qwen-72B and Qwen-1.8B! Including Base, Chat and Quantized versions!

    🌟 Qwen-72B has been trained on high-quality data consisting of 3T tokens, boasting a larger parameter scale and more training data to achieve a comprehensive performance upgrade. Additionally, we have expanded the context window length to 32K and enhanced the system prompt capability, allowing users to customize their own AI assistant with just a single prompt.

    🎁 Qwen-1.8B is our additional gift to the research community, striking a balance between maintaining essential functionalities and maximizing efficiency, generating 2K-length text content with just 3GB of GPU memory.

    We are committed to continuing our dedication to the open-source community and thank you all for your enjoyment and support! 🚀 Finally, Happy 1st birthday ChatGPT. 🎂 “

    • candre23@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      we have expanded the context window length to 32K

      Kinda buried the lead here. This is far and away the biggest feature of this model. Here’s hoping it’s actually decent as well!

      • jeffwadsworth@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 years ago

        Well, it depends on how well it keeps the context resolution. Did you see that comparison sheet on Claude and GPT-4? Astounding.

        • domlincog@alien.topB
          link
          fedilink
          arrow-up
          1
          ·
          2 years ago

          ​

          https://preview.redd.it/c5k1ugynhj3c1.png?width=1100&format=png&auto=webp&s=4024b3e295ab740f341e132b9d9662104fdc09ef

    • rePAN6517@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      my heart skipped a beat because I thought it was Qwen-1.8T.

  • Wonderful_Ad_5134@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    If the US keeps going full woke and are too afraid to work as hard as possible on the LLM ecosystem, China won’t wait twice before winning this battle (which is basically the 21th century battle in terms of technology)

    Feels sad to see the US decline like that…

  • carbocation@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    It would be great to see gguf versions. (At least, my workflow right now goes via ollama.) How are people running Qwen-72B locally right now?

  • PookaMacPhellimen@alien.topOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    https://preview.redd.it/sdofti9odg3c1.jpeg?width=1792&format=pjpg&auto=webp&s=d6f56d56c3596924ea61e1e5429018c0222907d2

    Amazing capabilities on some benchmarks if true.

    • Disastrous_Elk_6375@alien.topB
      link
      fedilink
      arrow-up
      1
      ·
      2 years ago

      big if true

    • a_slay_nub@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      Bit disappointed by the coding performance but it is a general use case model. It’s insane how good gpt 3.5 is for how fast it is.

      • ambient_temp_xeno@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 years ago

        Apparently the chat version has about 64 for humaneval.

    • Secret_Joke_2262@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse

      • rileyphone@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 years ago

        All the cases it is better than GPT-4 are benchmarks involving Chinese language. OpenAI is going to have a hard time getting access to extensive Chinese language datasets so it’s not surprising a 72B model can beat GPT-4, though it’s still impressive in it’s own right.

  • Secret_Joke_2262@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Now everyone is most interested in how much better it is than 70b llama

  • perlthoughts@alien.topB
    link
    fedilink
    arrow-up
    1
    ·
    2 years ago

    Very nice.

  • extopico@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    I wonder what the performance degradation is after quantising. For other models some users reported that quantizing greatly affected other language capabilities and this model seems to be at leash 50% Chinese.

    • Art10001@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      I’ve seen that ChatGLM began talking in mixed Chinese/English when asked “What tips do you have for a mountaineering trip?”

  • EnergyUnlucky@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Just when I’d talked myself out of getting a second 3090

LocalLLaMA@poweruser.forum

localllama@poweruser.forum

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@poweruser.forum

Community to discuss about Llama, the family of large language models created by Meta AI.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 4 users / day
  • 1 user / week
  • 4 users / month
  • 4 users / 6 months
  • 1 local subscriber
  • 4 subscribers
  • 1.02K Posts
  • 5.81K Comments
  • Modlog
  • mods:
  • communick@poweruser.forum
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org