This document provides a comprehensive overview of all features in PromptMetrics, organized by functional area. These are the core features launched in the private beta (January 2026) that establish the platform’s foundation for EU-first LLM observability and prompt management.
Welcome to PromptMetrics
PromptMetrics is an EU-first LLM observability and prompt management platform designed specifically for European AI teams building compliant, observable, and cost-controlled AI applications. All P0 features are built with data sovereignty and EU AI Act compliance at their core.Core Features
Authentication, Request Logging, Prompt Management, Analytics, Evaluations, and SDKs
Authentication & Access Control (P0 - MVP)
User Registration and Authentication
New users can create accounts with email/password authentication secured with industry standards. The platform supports both traditional email credentials and social login via Google and GitHub, all verified through EU-hosted infrastructure. Key Capabilities:- Email and password registration with secure password requirements (minimum 8 characters, 1 uppercase, 1 lowercase, 1 number)
- Email verification via one-time password (OTP) sent from
[email protected]using EU SendGrid - Password reset functionality with secure email links that expire after 24 hours
- OAuth support for Google and GitHub social login
- JWT tokens signed with keys stored in AWS Secrets Manager (EU region only)
- All authentication data stored exclusively in EU PostgreSQL database (AWS eu-central-1)
Default Workspace Creation
Upon signup, every new user automatically receives a personal workspace, eliminating manual setup and enabling immediate platform access. Key Capabilities:- Automatic personal workspace creation with the user’s name
- User assigned as Admin of their personal workspace
- Personal workspace cannot be deleted (only deactivated)
- Workspace data fully isolated from other workspaces
- Free plan users limited to 1 workspace; Pro plan users can create multiple
- Workspace stores data exclusively in EU regions
- Default retention policy: 5 days for Free plan, unlimited for Pro plan
Team Member Invitation
Workspace Admins can invite team members to collaborate on prompts, compliance reviews, and governance. Key Capabilities:- “Invite User” button in Settings → Team
- Invitation modal with email field and role dropdown (Viewer, Compliance Officer, Member)
- Admin role cannot be assigned via invitation
- Invitation emails sent with PromptMetrics branding from EU infrastructure
- Invitation links allow users to sign up or log in directly
- User automatically added to workspace after accepting invitation
- Free plan users cannot invite team members (feature disabled with tooltip)
- Pro plan users can invite unlimited team members
Role Management
Workspace Admins have full control over team member roles and permissions with granular access control. Key Capabilities:- Settings → Team page displays all team members in table format
- Table columns: Name, Email, Role (dropdown), Status (Active/Pending), Actions (Remove)
- Four role levels available:
- Admin: Full workspace management, user invitation, billing access
- Member: Create/edit prompts, view reports, access playground
- Viewer: Read-only access, export data
- Compliance Officer: Read-only access, export data, compliance report generation
- Viewer and Compliance Officer have identical permissions
- Admin cannot change their own role if they are the last Admin
- Workspace must have at least one Admin at all times
- Role changes logged in audit trail
Request Logging & Observability (P0 - MVP)
Automatic Request Logging
The SDK automatically captures all LLM requests with complete context, enabling comprehensive observability without manual instrumentation. Key Capabilities:- Automatic logging of all LLM requests via SDK
- Captured data: prompts, responses, parameters, timestamps, cost, latency
- API keys stored encrypted (AES-256) on backend
- LLM requests executed locally by SDK (API keys never sent to PromptMetrics)
- All request data stored exclusively in EU (AWS eu-central-1)
- Streaming responses supported with delta-by-delta logging
- Time-to-first-token (TTFT) tracked separately from total latency
- Logs stored in append-only format (immutable)
- Cryptographic integrity verification (hash chain) applied to logs for EU AI Act compliance
Request History Search and Filtering
Developers and compliance officers can search and filter request history for debugging, optimization, and compliance review. Key Capabilities:- Request history page with searchable interface
- Advanced filter options:
- Date Range (Last 7/30/90 days, Custom range)
- User ID, Request Group, Risk Level, Model, Prompt Template
- Compliance Status (Approved, Pending Review, Flagged)
- Request detail view shows full context (prompt, response, metadata, compliance fields)
- Shareable request URLs with permission controls
- Export to CSV/JSON for audits
- Compliance report generation from filtered requests
Metadata and Tags
Developers can add custom metadata and tags to requests for categorization, filtering, and compliance tracking. Key Capabilities:- SDK supports
pl_tagsparameter for custom tags - Metadata dictionaries for structured data (user_id, session_id, location)
- Compliance tags available: risk_level, use_case, ai_act_category
- Metadata-based search in request history
- Tags displayed in request detail view
- Tags and metadata included in exported data
- SDK method:
client.track.metadata(request_id, metadata={...})
Request ID Tracking
Each LLM call receives a unique request ID for tracking, scoring, and grouping related requests. Key Capabilities:- Each request assigned unique
pl_request_id - SDK parameter
return_pl_idreturns request ID - Request ID used to associate metadata, scores, templates
- Request grouping by conversation ID via
pl_tags - SDK method:
client.track.group(request_id, group_id) - Grouped requests visible in UI with timeline visualization
- Request ID included in compliance audit trail
EU AI Act Compliance Fields (P0 - MVP)
Automated Risk Level Flagging
Compliance officers can leverage automated risk detection to identify high-risk AI systems and ensure EU AI Act compliance. Key Capabilities:- Risk level classification: Prohibited, High-risk, Limited-risk, Minimal-risk
- Automated detection based on:
- Personal information
- Biometric data
- Healthcare information
- Financial data
- Critical infrastructure
- Law enforcement
- Confidence score (0-100%) for each automated flag
- Low confidence flags (<70%) highlighted for manual review
- Compliance Officer can review and override automated flags (logged in audit trail)
- Use case categorization per EU AI Act Annex III
- High-risk categories tracked: Biometric Identification, Critical Infrastructure, Education, Employment, Essential Services, Law Enforcement, Migration/Border Control, Justice/Democracy
Prompt Management (P0 - MVP)
Prompt Registry (CMS)
Centralized prompt management system enabling prompt engineers to create, organize, and manage prompts without hardcoding them. Key Capabilities:- Visual prompt editor in dashboard
- Create, read, update, delete (CRUD) prompt operations
- Support for multiple message types (system, user, assistant)
- Variable substitution with Jinja2 syntax
- Folder organization for prompts
- Search and filter prompts by name, tags, folder
- Compliance annotations per prompt (risk level, use case)
- Prompt preview before saving
- Prompt templates stored in EU PostgreSQL database
Prompt Versioning
Automatic version creation when prompts are modified, providing complete history and rollback capability. Key Capabilities:- New version created automatically when prompt text/content changes
- Version created when system/user/assistant message structure changes
- Version created when model selection or parameters change
- In-place updates (no new version) for metadata edits, folder changes, compliance annotations
- Version history with timestamps and authors
- Version comparison view (side-by-side diff) with text changes highlighted
- Rollback to previous versions
- Production flag toggle per version
- Change reason documentation required for high-risk prompts
Prompt Deployment
Control which prompt version is returned by the API, enabling production updates without code changes. Key Capabilities:- Release labels: Production, Staging, Development, Custom
- Only one Production version per prompt
- API call
client.templates.get("prompt_name")returns Production version by default - API call
client.templates.get("prompt_name", version=2)returns specific version - API call
client.templates.get("prompt_name", label="staging")returns staging version - Changing production version requires confirmation modal
- Production version shows green “Production” badge in UI
- Dynamic prompt updates without code deployment
- Deployment approval logs for audit trail
Template Execution
Execute prompt templates with variable substitution to dynamically generate prompts for LLM requests. Key Capabilities:template.run()method executes template with variables- Variable substitution at runtime using Jinja2
- Automatic request logging with compliance fields
- Streaming response support
- Input validation and sanitization
- Error handling for missing variables
- Template execution includes compliance metadata
- Execution results logged immutably in EU database
Interactive Testing Playground (P0 - MVP)
No-Code Prompt Testing
Prompt engineers can test prompts without writing code, iterating quickly and validating before deployment. Key Capabilities:- Default view: Model selector, Prompt text area, Variable inputs, Risk level indicator, Run button
- Advanced settings: Temperature, Max Tokens, Top P, Frequency Penalty, Presence Penalty
- Support for all OpenAI parameters
- Function calling/tools support
- Model selection across providers (OpenAI, Anthropic, Google)
- Risk level auto-detected and displayed
- Real-time streaming output display
- Saved preferences (last-used settings per workspace)
Analytics & Cost Governance (P0 - MVP)
Analytics Dashboard
High-level analytics overview for monitoring usage, costs, and compliance status. Key Capabilities:- High-level usage overview (total requests, active users)
- Cost breakdown by model and prompt template
- Template-level analytics (usage frequency, performance)
- Risk category distribution pie chart
- Risk trend line chart (usage over time)
- Request volume charts (daily, weekly, monthly)
- Error rate tracking
- Compliance status overview
Cost Tracking
Automatic cost calculation for LLM requests enabling cost monitoring and optimization. Key Capabilities:- Automatic cost calculation by model pricing (OpenAI, Anthropic, etc.)
- Per-request cost display in request detail view
- Total costs by time period (daily, weekly, monthly)
- Cost by prompt template and model breakdown
- Budget alerts when approaching spending limits
- Cost export to CSV for accounting
- Cost trends visualization in dashboard
Latency Monitoring
Track latency metrics for all requests to identify performance issues and optimize prompts. Key Capabilities:- Total request duration tracking (start to finish)
- Time to first token (TTFT) measurement for streaming requests
- Streaming performance metrics
- Latency by model, prompt template, and region
- Latency percentiles (P50, P90, P99)
- Latency alerts when exceeding thresholds
- Historical latency trends
Error Tracking
Automatic error logging and grouping for quick identification and resolution of issues. Key Capabilities:- Automatic error logging for all failed requests
- Stack traces and error details captured
- Error grouping by type and message
- Error filtering by date, model, prompt
- Error rate trends over time
- Alert integration for critical errors
- Incident reporting for high-risk systems
Compliance Monitoring
Real-time compliance metrics ensuring EU AI Act and GDPR compliance. Key Capabilities:- High-risk system performance metrics dashboard
- Human oversight coverage percentage
- Transparency disclosure status tracking
- Log retention compliance verification
- Data residency verification (100% EU storage)
- Audit readiness score
- Compliance alerts for threshold violations
- Compliance report generation (EU AI Act, GDPR)
Token Usage Tracking
Detailed token usage metrics for cost optimization and efficiency monitoring. Key Capabilities:- Automatic token counting per request (input & output tokens)
- Token consumption by prompt template
- Token usage trends over time (daily/weekly/monthly)
- Token efficiency metrics (tokens per successful request)
- Model-specific token pricing
- Token budget alerts at workspace-level thresholds
- Export token usage reports for cost allocation
Evaluation & Testing (P0 - MVP)
Base Evaluation System
Batch evaluate prompts against datasets for systematic quality testing before deployment. Key Capabilities:- Batch evaluation against datasets
- Multiple evaluation types: regression, one-off, backtest
- Side-by-side version comparison
- Evaluation result tracking over time
- Online and programmatic execution via API
- Compliance test suites for high-risk prompts
- Evaluation results stored in EU database
- Pass/fail criteria configuration
Dataset Management
Create and manage evaluation datasets for consistent testing across multiple scenarios. Key Capabilities:- Create datasets from CSV/JSON upload
- Build datasets from historical requests
- Manual entry support for test cases
- Dataset versioning
- Split management for targeted testing (train/test splits)
- EU-only dataset storage (AWS S3 eu-central-1)
- Dataset search and filtering
- Dataset export to CSV/JSON
Scoring & Ranking
Rate individual request responses to provide feedback for quality improvement. Key Capabilities:- User feedback (thumbs up/down) in request list view
- Five-star rating widget on request detail page
- Optional comment field for qualitative feedback
- Rating by criteria (Pro): Accuracy, Helpfulness, Safety, Compliance
- Manual RLHF through dashboard
- Multiple named scores per request
- Score-based prompt ranking in analytics
- Average rating shown in prompt analytics
Core SDKs (P0 - MVP)
Python SDK
Drop-in replacement for OpenAI SDK enabling minimal code changes for integration. Key Capabilities:- Drop-in replacement for OpenAI SDK
- Anthropic support
- Sync and async clients
- Template management methods:
client.templates.get() - Template execution with
run()method - Automatic request logging
- Tracking methods: metadata, score, group
@traceabledecorator for custom functions- EU region enforcement (default: eu-central-1)
- Python 3.9+ support; PyPI package available
JavaScript/TypeScript SDK
JavaScript SDK with TypeScript support for Node.js applications. Key Capabilities:- Node.js, Bun, Deno support
- OpenAI and Anthropic wrappers
- TypeScript type definitions
pl_tagsandreturn_pl_idsupport- Edge function compatibility
- EU region configuration (default: eu-central-1)
- Async/await support
- NPM package available
LangChain Integration
Native LangChain integration for automatic workflow tracking. Key Capabilities:- PromptMetricsCallbackHandler available
- Support for all LangChain LLMs
- Async request support
- Chains, agents, and memory support
- Tag and metadata support via callback
- Compliance tracking for LangChain workflows
- Distributed tracing for multi-step chains
Infrastructure & Deployment (P0 - MVP)
Cloud Deployment (EU Regions)
SaaS platform hosted exclusively in EU ensuring data residency compliance. Key Capabilities:- SaaS offering hosted in AWS eu-central-1 (Frankfurt) primary region
- Backup region: AWS eu-west-1 (Ireland)
- Multi-region support (EU only, no cross-border transfers)
- High availability architecture
- Auto-scaling capabilities
- 99.9% uptime SLA
- Data residency guarantee (100% EU)
- All databases, object storage, compute in EU regions
Security & Compliance Infrastructure
Robust security and compliance measures for sensitive data handling. Key Capabilities:- GDPR-compliant data handling
- Encryption at rest (AES-256) for all data
- Encryption in transit (TLS 1.3)
- API key management with encrypted storage
- Audit logging for all actions
- EU AI Act compliance by design
- Regular penetration testing
- Immutable request logs with cryptographic integrity
Pricing & Plans (P0 - MVP)
Free Plan
Entry-level plan for new users to evaluate the platform. Key Capabilities:- €0/month pricing
- 5,000 requests/month limit
- Request counter displays: “X / 5,000 requests used this month”
- Soft limit warning at 80% (4,000 requests)
- Hard limit blocks requests at 5,000 with upgrade prompt
- 7 days log retention
- 1 workspace (personal only)
- Cannot invite team members
- EU data residency included
Pro Plan
Professional plan for production applications with advanced features. Key Capabilities:- €49/month per user pricing
- 100,000 requests/month included
- Unlimited log retention
- Multiple workspaces
- Invite unlimited team members
- All advanced features: evaluations, A/B testing, advanced analytics
- Priority email support
- Advanced compliance reporting & exports
- API access for automation
Stripe Integration
Seamless payment processing for subscription management. Key Capabilities:- Stripe payment processing (EUR currency)
- Subscription management (monthly billing)
- Usage-based overage charges for request capacity
- EU VAT compliant invoicing
- Automatic payment method management
- Invoice generation with company details
- Payment failure notifications
- Subscription upgrade/downgrade support
Compliance Capabilities
All P0 features are designed with EU AI Act compliance as a first-class concern:- Article 12 & 19 (Record-Keeping & Automatic Logs): Complete automatic logging with immutable audit trails
- Transparency Disclosure: Risk flagging and categorization for high-risk systems
- Data Residency: All data stored exclusively in EU regions with no cross-border transfers
- Audit Trail: Complete history of all actions with user, timestamp, and change details