Talk

Evaluation of GenAI Applications

In Russian

A year ago, when I began exploring how testing teams could ensure the quality of non-deterministic systems based on large LLM models, I realized that such practices were virtually non-existent.

Unlike traditional approaches to assessing the quality of ML solutions, evaluating generative AI becomes significantly more challenging due to the lack of predefined datasets and the absence of information about the data used during training. Moreover, the scope of evaluation of generative AI is much broader, as it is crucial not only to verify that the system responds accurately and correctly but also to ensure it avoids bias, adheres to ethical standards, complies with safety requirements, and more. For this reason, I decided to share my experience in evaluation of generative AI-based solutions because I believe that within 2–3 years, these approaches will be widely adopted by testers in their work.

Speakers

Aleksandr Meshkov
Company: First Line Software

Talks

Evaluation of GenAI Applications

Speakers

Aleksandr Meshkov