• 0 Posts
  • 9 Comments
Joined 1 year ago
cake
Cake day: November 14th, 2023

help-circle





  • Another question is about memory and context length. Does a big memory let you increase the context length with smaller models where the parameters don’t fill the memory? I feel a big context would be useful for writing books and things.

    Of course. Long context also requires VRAM. Larger VRAM is always good for LLM or other AI stuff.


  • I’m using M1 Max Mac Studio with 64GB of memory. I can use up to 48GB of memory as VRAM. I don’t know how much memory is on your M3 Pro, so I am talking about my case. 7B models are easy. 13B and 20B models are okay. Maybe 30B models are also okay. More than 30B models are tough.

    One definite thing is that you must use llama.cpp or its variant (oobabooga with llama.cpp loader, koboldcpp derived from llama.cpp) for Metal acceleration. llama.cpp and GGUF will be your friends. llama.cpp is the only one program to support Metal acceleration properly with model quantizations.

    Using llama.cpp or its variants, I found that prompt evaluation, BLAS matrix calculation, is very slow especially than cuBLAS from NVIDIA CUDA Development Kit. If model parameters are bigger, the prompt evaluation times get also longer.

    I heard that design of M3 GPU has quite changed, so I guess it may speed up for BLAS, but not sure…



  • My main computer is M1 Max Mac Studio. It has 64GB of memory and I can use up to 48GB of video memory. However, it is difficult to use because the modules, libraries, and software support are not very good to use. If you’re a software developer, you will have tough time to make everything working well.

    I bought 4060 Ti 16GB 2 months ago, and I felt that it was very easy making everything runs well on CUDA Development Kit. With Metal from Apple Silicon, I had quite tough time. With Metal environment, I almost always had quite minor problems, and sometimes there was no solution at all. But with NVIDIA’s GPU, things like this never happened. Only small VRAM is the problem.

    I have no experience with A770, but I guess it is similar with Metal.