I have been working on a presentation that sums up the main points why vector databases are often an unnecessary optimization that’s too often being promoted by vector database vendors. The slides are available here: https://vec3.ai/
Do you think vector databases are overrated?
In what instances have vector databases proved most useful in your projects, and were they commercial implementations?
Plugins for opensearch and Postgres are useful. Dedicated vectordbs are not IMO.
No, everyone is crazy and not thinking at all right now. Vector databases are a great example of cargo culting, as are many other approaches in AI and ML.
I increasingly work with the embedding vectors, but I keep them in memory or in a regular database column. By keeping them in a regular database you can tag ordinary records with locations within embedding spaces, and you gain all kinds of helpful clustering and joining capabilities through embeddings tuned to specific tasks. You just loop over the hydrated records. You get all the same benefits and more.
I agree on the over hype. I think you can get most of the features you are talking about through metadata tagging in vector dbs. So at that point it becomes a question of which is more affordable/quicker and I guess we don’t definitively know.
But also to your point, some vector dbs have top k similar caps so a db with records above these caps wouldn’t return all records like a sql where query.
In terms of semantic search you are pretty much running the same process unless you are implement some custom distance metric which is doable in most vector dbs.
So, you are totally correct on the cargo culting thing but there could be a benefit if it is faster/cheaper or tremendous downside if it is slower/more expensive. I guess we will never know.
But functionality is the same if you choose the right vec db or a relational db
** Edit **
If I am wrong, call me an idiot and let me know where i am wrong
Locality sensitive hashing gets you fast multidimensional retrieval on traditional databases, so vector databases aren’t important unless you need to detect similarity in feature space. I thought vector databases were important until I learned about LSH because I assumed high dimensional retrieval was slow and not exploiting concentration of measure.
Isnt KDB a vector DB? A staple of finance.
Interesting slides. Your last bit about letting LLMs do keyword search makes a lot of sense and feels related to HyDE which might be considered to be “letting LLMs do semantic vector search”. https://github.com/texttron/hyde
Calls back to openai’s experiments with webGPT - no semantic vectors involved.