Skip to Content

MEASURE Function

The MEASURE function translates identified AI risks into quantitative and qualitative metrics that can be tracked over time. It answers the question: how much risk is present, and is it getting better or worse?

MEASURE is where VeriProof’s governance scoring and session analytics capabilities are most directly applicable. The metrics you configure define the production risk signal; VeriProof’s infrastructure captures and stores it in a form that’s both queryable and auditable.


Relevant MEASURE Categories

MEASURE 1 — Metrics and Methods

MEASURE 1.1 Approaches to evaluate AI risks are in place.

VeriProof operationalises this practice through three shipped controls that work together:

  • Settings → Governance Policies defines the formal rules VeriProof evaluates.
  • Settings → Governance Thresholds sets alert thresholds for key monitoring rates.
  • Compliance → Scoring Settings controls how partial and severity-weighted findings affect framework scores.

Review all three surfaces together when documenting how you measure AI risk in production.

Document each dimension with:

  • The risk it measures (from your MAP risk identification)
  • Why the threshold value was chosen (from benchmark data, regulation, or expert judgement)
  • The measurement granularity (per-session, rolling average, etc.)

MEASURE 1.3 Internal experts and affected communities are involved in risk evaluation.

When your governance dimensions include signals from user feedback or escalation paths, VeriProof captures these alongside automated signals. Custom metadata fields allow you to capture human review outcomes. Use the SDK’s session.add_metadata() method to attach reviewer ID, outcome, and notes to a flagged session after human review.

Aggregate human review outcomes as a calibration signal against your automated governance scores.


MEASURE 2 — Risk Metrics in Practice

MEASURE 2.1 System performance metrics are captured.

VeriProof captures the following metrics for every session:

MetricFieldNotes
Governance scoregovernance_scoreComposite score from all active dimensions
Dimension scoresgovernance_dimension_scores.*Per-dimension scores
Input token countmetadata.input_token_countIf emitted by adapter
Output token countmetadata.output_token_countIf emitted by adapter
Latencymetadata.latency_msEnd-to-end processing time
Model confidencemetadata.confidence_scoreIf emitted by model/adapter
Safety classificationmetadata.safety_scoreIf emitted by safety classifier

MEASURE 2.5 Privacy risks are evaluated.

VeriProof’s GDPR data subject management directly supports this practice:

  • Sessions are linked to data subjects when personal data processing is involved
  • Erasure workflow tracks the privacy risk lifecycle from subject creation to erasure
  • Legal holds prevent premature erasure when regulatory retention applies

To generate a privacy risk report, open the Compliance workspace and use the Evidence Exports tab to download an Auditor Evidence ZIP for the application and period. Include data subject counts and legal hold inventory from the Compliance workspace in your MEASURE evidence package.


MEASURE 2.8 AI system outputs are evaluated for trustworthiness.

Blockchain anchoring is the technical mechanism for output trustworthiness measurement. Every session record has a Merkle root anchored on Solana. Any tampering with the record after anchoring breaks the cryptographic chain.

To verify a specific session’s integrity, open the session detail view in the Customer Portal and click Verify Blockchain Proof. The portal checks the stored Merkle root against the current on-chain state and returns a pass or fail result. For a trustworthiness summary across all sessions, the MEASURE section of the evidence package is available by downloading the Blockchain Audit Certificate from the Compliance workspace; it includes the total anchored session count and the verification pass rate for the period.

A trustworthiness summary across all sessions in a period (total anchored, verification pass rate) is captured in the Blockchain Audit Certificate from the Compliance workspace.


MEASURE 2.11 Fairness indicators are tracked.

Define the governance rules that should always apply in Settings → Governance Policies, such as required oversight, grounding, risk declarations, or alert-rule coverage. Then capture a user_group or equivalent metadata field in your adapter so reviewers can assess whether quality, refusal rates, or other monitored outcomes differ materially across groups during evidence review.


MEASURE 3 — Impact Assessment Metrics

MEASURE 3.3 Metrics are available for AI impact assessment.

For impact assessment metrics, open the Compliance workspace and use the Evidence Exports tab to download an Auditor Evidence ZIP for the relevant applications and report period. The export includes governance score data and alert trigger counts by dimension for the sessions in scope.


Next Steps

Last updated on