Article 10 — Data Governance
Article 10 of the EU AI Act imposes data governance requirements on providers of high-risk AI systems. It covers training, validation, and test data quality — and extends into production through the obligation to monitor data quality in real-world use.
What the Article Requires
Article 10 requires that training, validation, and test datasets:
- Are subject to appropriate data governance and management practices
- Are relevant, sufficiently representative, and free from errors to the best possible extent
- Have appropriate statistical properties for the system’s intended purpose
- Are examined for possible biases that could affect health, safety, or fundamental rights
Article 10(5) additionally requires providers to apply relevant data governance and management practices throughout the entire lifecycle of the system — which means monitoring production data quality post-deployment, not just at training time.
Article 10 is primarily a design and development-time obligation. VeriProof addresses the production monitoring subset: detecting when real-world inputs or outputs diverge from the distribution the system was designed for.
Training and Validation Data (Design Time)
VeriProof does not manage training datasets or validation pipelines. For these obligations, your data governance programme should address:
- Data source documentation (where data came from, when it was collected)
- Bias analysis methodology and results
- Statistical properties of the training distribution
- Documentation of data preparation, cleaning, and augmentation steps
These are typically documented as part of your model card or system card, which forms part of the Article 11 technical documentation package.
Production Data Monitoring (Article 10(5))
Article 10(5) is where VeriProof plays a direct role. The Act’s requirement to ensure data quality throughout the lifecycle means your governance system must detect when production inputs differ materially from the training distribution — a sign that the system is operating outside the conditions it was designed for.
What to Monitor
For LLM-based systems, useful production data quality signals include:
| Signal | What it detects | How to capture |
|---|---|---|
| Input token distribution | Unusual input lengths or vocabulary | Session metadata |
| Input language | Non-intended languages appearing in production | Adapter metadata |
| Topic drift | Inputs on topics not represented in training | Governance scoring dimension |
| Demographic patterns in inputs | Potential sampling bias in real-world usage | Metadata enrichment |
| Refusal rate shifts | Model encountering inputs it wasn’t trained on | Governance score signal |
Configuring Production Data Monitoring
VeriProof’s current control surface is centered on captured session metadata plus shared governance settings, not a generic per-application drift-rule builder. To monitor Article 10 signals in practice:
- Emit the production data-quality signals you care about as session metadata from your SDK adapter, such as detected language, input length, topic category, refusal reason, or other domain-specific input characteristics.
- Use Settings → Governance Policies to require the declarations and process controls that should always be present for the monitored workflow.
- Use Settings → Governance Thresholds to alert when platform-wide oversight, grounding, or guardrail rates fall below your accepted operating level.
- Use Compliance → Scoring Settings to control how partial, critical, major, and minor findings contribute to framework scores reviewed by compliance teams.
Refer to the SDK adapter guide for instructions on enriching sessions with the metadata fields you want to analyse.
Generating Data Quality Evidence
To produce Article 10 evidence for your annual conformity assessment, go to Compliance → Evidence Exports, choose the EU AI Act framework, select Article 10, set the date range, enable Include blockchain proofs, and click Download Evidence Pack (PDF).
The package includes:
- Input distribution summary (token counts, detected languages, topic signals)
- Drift detection summary (comparison to baseline distribution established at deployment)
- Sessions flagged for data quality concerns, with full payload for review
Bias Monitoring in Production
Article 10(3) requires examination for biases. While bias analysis typically happens at training time, detecting bias in production outputs is increasingly expected as part of the system’s ongoing risk management.
VeriProof supports production bias review when your adapters emit the relevant grouping metadata alongside each session. Capture the user-context or demographic fields that are appropriate for your use case, then use evidence exports, session review, and your governance workflows to assess whether refusal rates, outcomes, or other monitored signals differ materially across groups.
This requires your SDK adapter to emit the relevant grouping metadata field. What constitutes a meaningful grouping depends entirely on your system’s use case and the population your model serves.
Documentation for Auditors
Your Article 10 documentation package should include:
- Training data provenance and governance summary (produced by your data team separately)
- VeriProof’s production data monitoring configuration and threshold rationale
- Production data quality report for the period under review
- Any flagged drift or bias incidents and the corrective actions taken
Next Steps
- Article 9 — Risk Management — monitoring thresholds and corrective action
- Article 11 — Technical Documentation — integrating data quality evidence into documentation packages
- Governance Scoring guide — advanced scoring configuration