I want to know the tools and methods you use for the observability and monitoring of your ML (LLM) performance and responses in production.
I want to know the tools and methods you use for the observability and monitoring of your ML (LLM) performance and responses in production.
I am a data scientist at Fiddler. Fiddler AI (https://www.fiddler.ai) provides a nice set of tools for LLMOps and MLOps observability. It supports pre-production and post-production monitoring for both predictive models and generative AI.
Specifically, Fiddler Auditor (https://github.com/fiddler-labs/fiddler-auditor) is an open source package that can be used to evaluate LLMs and NLP Models. In additional to that, Fiddler provides helpful tools for monitoring and visualization of NLP data (eg text embeddings) which can be used for data drift detection, user/model feedback analysis, and evaluation of safety metrics as well as custom metrics.