• 0 Posts
  • 51 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle
    1. Grant of Copyright License. Subject to the terms and conditions of this License, DeepSeek hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.

    I really really enjoy seeing perpetual irrevocable licenses.





  • Instead, it uses an Amazon platform known as Bedrock, which connects several A.I. systems together, including Amazon’s own Titan as well as ones developed by Anthropic and META.

    It’s a llama! :D i wonder how they can comply with llama license, I think they have more than 700M customer.

    Good to see more competitors at least, enterprise office people are totally In MS hands so that’s not an area where open source end-to-end solutions have too much chance of competing, the only way to get them there is if big corp like Amazon adopts them in their infrastructure for a product like this.








  • Jondurbin made something like this with qlora.

    The explanation that gpt-4 is MoE model doesn’t make sense to me. Gpt4 api is 30x more expensive than gpt-3-5-turbo. Gpt-3-5 turbo is 175B parameters, right? So, if they had 8 220B experts, it wouldn’t need to cost 30x more, it would be 20-50% more for API use. There was also some speculation that 3.5 turbo is 22B. In that case it also doesn’t make sense to me that it would be 30x as expensive.


  • I upgraded from gtx 1080 to rtx 3090 ti 2 weeks ago. I think going with rtx 3090 / 3090 ti / 4090 would be a good option for you, I don’t know how big of a difference having stronger cpu would have, I think exllama v2 has some cpu bottlenecking going on, but I have no idea what is computed on cpu and why. There were moments during generation where it seemed like it was using only 1 thread and it was maxing it out, being bottleneck for gpu. I don’t think ram matters a lot unless you train and merge loras and models.


  • I use Deepseek Coder instruct at work for writing PowerShell scripts and some help with troubleshooting. I set up a pc that wasn’t used anymore and had quadro rtx 4000 with DeepSeek Coder instruct 6.7B model and shared it among the team, earlier I was sharing 33B version from home to my work computer and using it at work myself only. I find it better than Bing Chat Enterprise for my use-case - it’s much faster and I don’t have to fight with it just to generate me some code. It’s also all local so I don’t have to worry about how private actually is Bing chat Enterprise.

    At home I use various models for questions that I would be too embarrassed about when asking real human, or I would have to pay that human for an answer. It’s a really big deal for me to have some private pocket intelligence that I can talk to that won’t remember what you talked with it about and won’t log the conversation god knows where.





  • I believe that gpu offloading in llama.cpp can be used to merge your vram and ram. I would suggest you to try some airoboros llama 2 70b q3_k_m quant and Tess-m-1.3 q5_k_m once TheBloke makes quants. There will be some leftover space in your RAM after loading Tess, but it’s a model with 200k context, so you will need it for context. Max out your vram and maybe use batch size of -1 to trade prompt processing speed for more vram space, try offloading both with cublas and clBLAST. Last time I checked, it seemed like using clBLAST allowed to offload more layers to gpu in the same memory footprint.