Evaluation Manager

Description

The Evaluation Manager allows users to manage datasets and evaluations in a single place. It presents datasets and evaluations in a card-based view, supports search, and enables users to open datasets to review structured data. Evaluations linked to datasets help assess prompt or model outputs. This feature exists to support systematic evaluation and quality assessment. Creation and modification are limited to Admin and Member roles.

Example

A QA specialist opens a dataset containing sample resumes and runs an evaluation to assess how consistently a prompt extracts candidate details across entries.

Requests A/B Testing

​Description

​Example

Description

Example