Around 1.5 months ago, I started https://github.com/michaelfeil/infinity. With the hype in Retrieval-Augmented-Generation, this topic got important over the last month in my view. With this Repo being the only option under a open license.
I now implemented everything from faster attention, onnx / ctranslate2 / torch inference, caching, better docker images, better queueing stategies. Now I am pretty much running out of ideas - if you got some, feel free to open an issue, would be very welcome!
Looks very interesting!
Will this work on a pre-AVX CPU only machine? ( I happen to be far away from a computer right now to test)