Around 1.5 months ago, I started https://github.com/michaelfeil/infinity. With the hype in Retrieval-Augmented-Generation, this topic got important over the last month in my view. With this Repo being the only option under a open license.

I now implemented everything from faster attention, onnx / ctranslate2 / torch inference, caching, better docker images, better queueing stategies. Now I am pretty much running out of ideas - if you got some, feel free to open an issue, would be very welcome!

  • SlowSmarts@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Looks very interesting!

    Will this work on a pre-AVX CPU only machine? ( I happen to be far away from a computer right now to test)