Governance Engineer Track

Audience: AI policy leads, risk engineers, ML governance specialists
Goal: Author and maintain policy rules that keep AI behavior within your organization’s defined risk tolerance
Estimated time: ~3 hours across 7 modules

Who can use this

Available to

Governance EngineerAdministrator

Not available to

DeveloperCompliance OfficerBusiness OwnerAuditor

This track is written for the Governance Engineer role. Administrators have the same capabilities and share all content in this track.

Track overview

What makes the Governance Engineer role distinct. You are the person responsible for translating risk expectations into enforceable policy logic. You author rules in Rego (Open Policy Agent), evaluate them against real session data, and manage the lifecycle from draft through production activation. You do not need PII access or cost data to do this work — those are intentionally excluded from your role.

Module	Title	Time
1	The VeriProof governance model	25 min
2	Metric rules: threshold-based policy	25 min
3	Rego policy rules: advanced logic	30 min
4	The Playground: live policy validation	20 min
5	Evaluation datasets: regression testing	25 min
6	Governance thresholds and alerts	20 min
7	Production approval workflow and version history	15 min

Module 1 — The VeriProof governance model

Goal: Understand the data model you are writing policy against so your rules produce accurate, consistent results.

Key concepts:

Session — one complete AI interaction (user request → AI response chain)
Step — a sub-event within a session (LLM call, tool call, retrieval, human handoff)
Governance attribute — structured metadata attached to a session: riskLevel, outcomeType, intentLabel, requiresHumanOversight, regulatoryScope, and more
Declared vs. inferred — attributes declared explicitly by the SDK in your application code (GovernanceInferredMask = 0) are stronger evidence than platform-inferred attributes
Policy score — a composite 0–100 score calculated from attribute completeness, schema compliance, and behavioral consistency
Guardrail action — the real-time action taken on a session: allowed, flagged, blocked

Session document shape (what your Rego policies receive as input):


{
  "session_id": "...",
  "application_id": "...",
  "risk_level": "HIGH",
  "outcome_type": "APPROVED",
  "intent_label": "loan_approval",
  "policy_score": 72,
  "requires_human_oversight": true,
  "guardrail_action": "flagged",
  "step_count": 4,
  "total_tokens": 3241,
  "model_id": "gpt-4o",
  "steps": [...]
}

Self-assessment:

I can describe the difference between a declared and inferred governance attribute
I understand what policy_score measures and what causes it to drop
I know which fields are available in the Rego input document

Module 2 — Metric rules: threshold-based policy

Goal: Create your first metric rule and understand how threshold rules drive alerts, review routing, and evidence.

Read:

Rules Builder

Creating a metric rule:

Navigate to Rules and click New Rule → Metric rule.
Enter a Name that describes the policy intent (e.g., “Flag high-risk loan decisions lacking human oversight”).
Set the Scope — All Applications or a specific application.
Build your condition:


risk_level is HIGH or CRITICAL
AND requires_human_oversight is false

Set the Action — Alert, Add to review queue, or Block.
Click Save as draft first. Activate only after testing in the Playground.

Conditions available in metric rules:

Field	Type	Example
`risk_level`	Enum	`is CRITICAL`
`policy_score`	Numeric	`is below 60`
`outcome_type`	Enum	`is DENIED`
`guardrail_action`	Enum	`is blocked`
`requires_human_oversight`	Boolean	`is false`
`step_count`	Numeric	`> 20`
`total_tokens`	Numeric	`> 8000`

Self-assessment:

First metric rule created and saved as draft
Rule tested in the Playground before activation
Action configured (alert, review queue, or block)

Module 3 — Rego policy rules: advanced logic

Goal: Write your first Rego policy rule and understand when Rego is the right tool versus a metric rule.

Read:

Rules Builder — Rego section

When to use Rego instead of a metric rule:

You need to compare multiple steps within a session
You want to encode a specific regulatory clause as executable policy
You need to inspect nested annotation structures
You are reusing policy logic across multiple organizations or contexts

Your first Rego policy:


# Policy: High-risk decisions must have human oversight
package veriproof.policy

import future.keywords.if

default allow = true

allow = false if {
    input.risk_level == "HIGH"
    input.requires_human_oversight == false
}

allow = false if {
    input.risk_level == "CRITICAL"
}

Testing your policy in the editor:

Write your policy in the Rego editor.
Click the Test tab.
Paste a sample session document as the input.
Click Evaluate — the panel shows whether allow returns true or false and the full OPA trace.

Performance matters in Rego. Policies run synchronously on every session. Avoid unbounded iteration (some i; i := input.steps[_] without a guard). Always use the Test panel to confirm evaluation time is under 50ms before activating.

Self-assessment:

First Rego policy written and evaluated in the Test panel
Policy returns false for at least one failing input case
Evaluation time confirmed acceptable in the Test panel

Module 4 — The Playground: live policy validation

Goal: Use the Playground to validate policy logic against your live application configuration before any draft rule goes to production.

Read:

Playground

Playground workflow:

Navigate to Playground.
Select the application whose policy configuration you want to test.
Enter a prompt and structured context that represents the scenario you want to check.
Click Run.
Read the output: model response, governance evaluation, rule results, intent, risk level, and policy score.
If a draft rule does not fire as expected, return to the Rules editor, adjust the condition, and re-test.

Playground runs do not create production session records or blockchain anchors. They are exploration-only and do not affect your compliance metrics.

Self-assessment:

Playground used to validate at least one metric rule condition
Playground used to validate at least one Rego policy
I understand which session fields appear in the Playground results panel

Module 5 — Evaluation datasets: regression testing

Goal: Build a curated evaluation dataset and use it to catch governance regressions before model, prompt, or rule changes reach production.

Read:

Evaluation Datasets

What an evaluation dataset is: An evaluation dataset is a collection of test cases — each with a known input and a declared expected governance outcome (intent, risk level, outcome type). When you run the dataset against your current endpoint configuration, VeriProof compares actual results to expected results and flags any regressions.

Building a useful dataset:

Import from sessions — find historical sessions where the governance result was manually confirmed correct; import them as test cases
Add adversarial edge cases — include inputs designed to probe the boundaries of your rules (high-risk inputs that should be blocked, valid inputs that should pass cleanly)
Tag your cases — use tags like "regression", "edge-case", "high-risk" to organize large datasets

Running a regression check:

Navigate to Evaluation → Datasets and open your dataset.
Click Run Evaluation.
Select the application and endpoint to test against.
Review the results: pass rate, failed cases, and delta from the last run.

Self-assessment:

First evaluation dataset created with at least 5 test cases
Dataset includes at least one high-risk edge case that should fail policy
Evaluation run completed and pass rate reviewed

Module 6 — Governance thresholds and alerts

Goal: Configure governance-specific alert thresholds so your team is notified when policy performance degrades or drift is detected.

Your threshold configuration responsibilities:

Threshold type	Where to configure	What triggers it
Policy score floor	Settings → Compliance → Policy Thresholds	Alert when rolling 7-day score drops below your target
Annotation coverage	Settings → Compliance → Policy Thresholds	Alert when declared governance attribute coverage drops below threshold
Drift detection	Application workspace → Monitor tab	Alert when session behavior diverges from baseline
Review queue SLA	Review Queues → [Queue] → Edit	Escalate when items age past SLA

Generic alert delivery channels (Slack, webhook, email) are configured by Administrators under Settings → Integrations. Your job is to configure the thresholds that determine when alerts fire — not the delivery channels.

Self-assessment:

Policy score floor configured for each production application
At least one threshold tested by temporarily lowering it and confirming the alert fired
Drift detection baseline confirmed for at least one application

Module 7 — Production approval workflow and version history

Goal: Understand how rule changes move from draft to production and use version history to track and revert changes.

The production approval flow:

You create or modify a rule and save it as Draft.
You test the draft in the Playground and against your evaluation dataset.
When satisfied, you click Request Activation.
An Administrator or Compliance Officer reviews the request and approves or rejects it with a documented rationale.
Approved rules become Active and begin evaluating sessions immediately.

In production environments, you cannot self-activate rules. This is by design. The approval requirement exists to ensure that governance changes are reviewed by someone with compliance or operational authority before they affect live decisions.

Version history: Every rule maintains a full version history with diffs. To view it:

Open the rule in the Rules Builder.
Click History in the top-right corner.
Select any two versions to see a side-by-side diff.

To revert to a previous version, select the target version and click Restore as Draft.

Self-assessment:

At least one activation request submitted and tracked to approval
Version history reviewed for at least one rule
I understand how to revert a rule that produces unexpected behavior in production

What’s next?

Full reference for metric rules, Rego rules, and policy templates.

Rules Builder reference

Interactive prompt testing against live policy configuration.

Playground

Build and run governance regression tests.

Evaluation Datasets

Framework mappings for EU AI Act, ISO 42001, NIST AI RMF, HIPAA, and more.

Policy & Compliance reference