MEASURE Function
The MEASURE function translates identified AI risks into quantitative and qualitative metrics that can be tracked over time. It answers the question: how much risk is present, and is it getting better or worse?
MEASURE is where VeriProof’s governance scoring and session analytics capabilities are most directly applicable. The metrics you configure define the production risk signal; VeriProof’s infrastructure captures and stores it in a form that’s both queryable and auditable.
Relevant MEASURE Categories
MEASURE 1 — Metrics and Methods
MEASURE 1.1 Approaches to evaluate AI risks are in place.
VeriProof operationalises this practice through three shipped controls that work together:
- Settings → Governance Policies defines the formal rules VeriProof evaluates.
- Settings → Governance Thresholds sets alert thresholds for key monitoring rates.
- Compliance → Scoring Settings controls how partial and severity-weighted findings affect framework scores.
Review all three surfaces together when documenting how you measure AI risk in production.
Document each dimension with:
- The risk it measures (from your MAP risk identification)
- Why the threshold value was chosen (from benchmark data, regulation, or expert judgement)
- The measurement granularity (per-session, rolling average, etc.)
MEASURE 1.3 Internal experts and affected communities are involved in risk evaluation.
When your governance dimensions include signals from user feedback or escalation paths,
VeriProof captures these alongside automated signals. Custom metadata fields allow
you to capture human review outcomes. Use the SDK’s session.add_metadata() method
to attach reviewer ID, outcome, and notes to a flagged session after human review.
Aggregate human review outcomes as a calibration signal against your automated governance scores.
MEASURE 2 — Risk Metrics in Practice
MEASURE 2.1 System performance metrics are captured.
VeriProof captures the following metrics for every session:
| Metric | Field | Notes |
|---|---|---|
| Governance score | governance_score | Composite score from all active dimensions |
| Dimension scores | governance_dimension_scores.* | Per-dimension scores |
| Input token count | metadata.input_token_count | If emitted by adapter |
| Output token count | metadata.output_token_count | If emitted by adapter |
| Latency | metadata.latency_ms | End-to-end processing time |
| Model confidence | metadata.confidence_score | If emitted by model/adapter |
| Safety classification | metadata.safety_score | If emitted by safety classifier |
MEASURE 2.5 Privacy risks are evaluated.
VeriProof’s GDPR data subject management directly supports this practice:
- Sessions are linked to data subjects when personal data processing is involved
- Erasure workflow tracks the privacy risk lifecycle from subject creation to erasure
- Legal holds prevent premature erasure when regulatory retention applies
For a privacy risk report, open Compliance → Evidence Exports in the Customer Portal. Select NIST AI RMF as the framework, check the MEASURE function, and click Download Evidence Pack (PDF). The privacy & data rights section of the package includes data subject counts, erasure completion rates, and legal hold inventory.
MEASURE 2.8 AI system outputs are evaluated for trustworthiness.
Blockchain anchoring is the technical mechanism for output trustworthiness measurement. Every session record has a Merkle root anchored on Solana. Any tampering with the record after anchoring breaks the cryptographic chain.
To verify a specific session’s integrity, open the session detail view in the Customer Portal and click Verify Blockchain Proof. The portal checks the stored Merkle root against the current on-chain state and returns a pass or fail result. For a trustworthiness summary across all sessions, the MEASURE section of the evidence package (generated via Compliance → Evidence Exports) includes the total anchored session count and the verification pass rate for the period.
A trustworthiness summary across all sessions in a period (total anchored, verification pass rate) is included in MEASURE evidence packages.
MEASURE 2.11 Fairness indicators are tracked.
Define the governance rules that should always apply in Settings → Governance Policies,
such as required oversight, grounding, risk declarations, or alert-rule coverage. Then
capture a user_group or equivalent metadata field in your adapter so reviewers can
assess whether quality, refusal rates, or other monitored outcomes differ materially
across groups during evidence review.
MEASURE 3 — Impact Assessment Metrics
MEASURE 3.3 Metrics are available for AI impact assessment.
VeriProof’s periodic evidence export generates the session-level and aggregate metrics used in impact assessments. Open Compliance → Evidence Exports in the Customer Portal, select NIST AI RMF as the framework, check the MEASURE function, set your report period, and click Download Evidence Pack (PDF).
The MEASURE section includes: governance score distribution (mean, p10, p50, p90, p99), fairness dimension summaries, trustworthiness verification rate, alert trigger counts by dimension, and statistical comparison to the prior period.
Next Steps
- MANAGE function — responding to what MEASURE finds
- GOVERN function — policies that define MEASURE thresholds
- Governance Scoring guide — complete scoring configuration reference