Talk

Testing LLM Applications with DeepEval

In Russian

This session will examine the practice of testing applications built on large language models (LLM). The main focus will be on solving the problems of evaluating non-deterministic LLM applications using a specialized tool called DeepEval, which automates the evaluation of LLM application performance using the LLM-as-a-Judge approach and applies various metrics and test scenarios. The slides will cover the main concepts of this tool. A real-world example will also demonstrate how to integrate DeepEval into the development process to evaluate response relevance and monitor system quality. The target audience is machine learning engineers, AI product developers, and QA specialists faced with the need to ensure the reliability and predictability of LLM applications.

Speakers

Talks