Skip to Content

MEASURE Function

The MEASURE function translates identified AI risks into quantitative and qualitative metrics that can be tracked over time. It answers the question: how much risk is present, and is it getting better or worse?

MEASURE is where VeriProof’s governance scoring and session analytics capabilities are most directly applicable. The metrics you configure define the production risk signal; VeriProof’s infrastructure captures and stores it in a form that’s both queryable and auditable.


Relevant MEASURE Categories

MEASURE 1 — Metrics and Methods

MEASURE 1.1 Approaches to evaluate AI risks are in place.

VeriProof operationalises this practice through three shipped controls that work together:

  • Settings → Governance Policies defines the formal rules VeriProof evaluates.
  • Settings → Governance Thresholds sets alert thresholds for key monitoring rates.
  • Compliance → Scoring Settings controls how partial and severity-weighted findings affect framework scores.

Review all three surfaces together when documenting how you measure AI risk in production.

Document each dimension with:

  • The risk it measures (from your MAP risk identification)
  • Why the threshold value was chosen (from benchmark data, regulation, or expert judgement)
  • The measurement granularity (per-session, rolling average, etc.)

MEASURE 1.3 Internal experts and affected communities are involved in risk evaluation.

When your governance dimensions include signals from user feedback or escalation paths, VeriProof captures these alongside automated signals. Custom metadata fields allow you to capture human review outcomes. Use the SDK’s session.add_metadata() method to attach reviewer ID, outcome, and notes to a flagged session after human review.

Aggregate human review outcomes as a calibration signal against your automated governance scores.


MEASURE 2 — Risk Metrics in Practice

MEASURE 2.1 System performance metrics are captured.

VeriProof captures the following metrics for every session:

MetricFieldNotes
Governance scoregovernance_scoreComposite score from all active dimensions
Dimension scoresgovernance_dimension_scores.*Per-dimension scores
Input token countmetadata.input_token_countIf emitted by adapter
Output token countmetadata.output_token_countIf emitted by adapter
Latencymetadata.latency_msEnd-to-end processing time
Model confidencemetadata.confidence_scoreIf emitted by model/adapter
Safety classificationmetadata.safety_scoreIf emitted by safety classifier

MEASURE 2.5 Privacy risks are evaluated.

VeriProof’s GDPR data subject management directly supports this practice:

  • Sessions are linked to data subjects when personal data processing is involved
  • Erasure workflow tracks the privacy risk lifecycle from subject creation to erasure
  • Legal holds prevent premature erasure when regulatory retention applies

For a privacy risk report, open Compliance → Evidence Exports in the Customer Portal. Select NIST AI RMF as the framework, check the MEASURE function, and click Download Evidence Pack (PDF). The privacy & data rights section of the package includes data subject counts, erasure completion rates, and legal hold inventory.


MEASURE 2.8 AI system outputs are evaluated for trustworthiness.

Blockchain anchoring is the technical mechanism for output trustworthiness measurement. Every session record has a Merkle root anchored on Solana. Any tampering with the record after anchoring breaks the cryptographic chain.

To verify a specific session’s integrity, open the session detail view in the Customer Portal and click Verify Blockchain Proof. The portal checks the stored Merkle root against the current on-chain state and returns a pass or fail result. For a trustworthiness summary across all sessions, the MEASURE section of the evidence package (generated via Compliance → Evidence Exports) includes the total anchored session count and the verification pass rate for the period.

A trustworthiness summary across all sessions in a period (total anchored, verification pass rate) is included in MEASURE evidence packages.


MEASURE 2.11 Fairness indicators are tracked.

Define the governance rules that should always apply in Settings → Governance Policies, such as required oversight, grounding, risk declarations, or alert-rule coverage. Then capture a user_group or equivalent metadata field in your adapter so reviewers can assess whether quality, refusal rates, or other monitored outcomes differ materially across groups during evidence review.


MEASURE 3 — Impact Assessment Metrics

MEASURE 3.3 Metrics are available for AI impact assessment.

VeriProof’s periodic evidence export generates the session-level and aggregate metrics used in impact assessments. Open Compliance → Evidence Exports in the Customer Portal, select NIST AI RMF as the framework, check the MEASURE function, set your report period, and click Download Evidence Pack (PDF).

The MEASURE section includes: governance score distribution (mean, p10, p50, p90, p99), fairness dimension summaries, trustworthiness verification rate, alert trigger counts by dimension, and statistical comparison to the prior period.


Next Steps

Last updated on