Skip to Content
Training AcademyGovernance Engineer Track

Governance Engineer Track

Audience: AI policy leads, risk engineers, ML governance specialists
Goal: Author and maintain policy rules that keep AI behavior within your organization’s defined risk tolerance
Estimated time: ~3 hours across 7 modules

Who can use this
Available to
Governance EngineerAdministrator
Not available to
DeveloperCompliance OfficerBusiness OwnerAuditor

This track is written for the Governance Engineer role. Administrators have the same capabilities and share all content in this track.


Track overview

What makes the Governance Engineer role distinct. You are the person responsible for translating risk expectations into enforceable policy logic. You author rules in Rego (Open Policy Agent), evaluate them against real session data, and manage the lifecycle from draft through production activation. You do not need PII access or cost data to do this work — those are intentionally excluded from your role.

ModuleTitleTime
1The VeriProof governance model25 min
2Metric rules: threshold-based policy25 min
3Rego policy rules: advanced logic30 min
4The Playground: live policy validation20 min
5Evaluation datasets: regression testing25 min
6Governance thresholds and alerts20 min
7Production approval workflow and version history15 min

Module 1 — The VeriProof governance model

Goal: Understand the data model you are writing policy against so your rules produce accurate, consistent results.

Key concepts:

  • Session — one complete AI interaction (user request → AI response chain)
  • Step — a sub-event within a session (LLM call, tool call, retrieval, human handoff)
  • Governance attribute — structured metadata attached to a session: riskLevel, outcomeType, intentLabel, requiresHumanOversight, regulatoryScope, and more
  • Declared vs. inferred — attributes declared explicitly by the SDK in your application code (GovernanceInferredMask = 0) are stronger evidence than platform-inferred attributes
  • Policy score — a composite 0–100 score calculated from attribute completeness, schema compliance, and behavioral consistency
  • Guardrail action — the real-time action taken on a session: allowed, flagged, blocked

Session document shape (what your Rego policies receive as input):

{ "session_id": "...", "application_id": "...", "risk_level": "HIGH", "outcome_type": "APPROVED", "intent_label": "loan_approval", "policy_score": 72, "requires_human_oversight": true, "guardrail_action": "flagged", "step_count": 4, "total_tokens": 3241, "model_id": "gpt-4o", "steps": [...] }

Self-assessment:

  • I can describe the difference between a declared and inferred governance attribute
  • I understand what policy_score measures and what causes it to drop
  • I know which fields are available in the Rego input document

Module 2 — Metric rules: threshold-based policy

Goal: Create your first metric rule and understand how threshold rules drive alerts, review routing, and evidence.

Read:

Creating a metric rule:

  1. Navigate to Rules and click New Rule → Metric rule.
  2. Enter a Name that describes the policy intent (e.g., “Flag high-risk loan decisions lacking human oversight”).
  3. Set the Scope — All Applications or a specific application.
  4. Build your condition:
risk_level is HIGH or CRITICAL AND requires_human_oversight is false
  1. Set the Action — Alert, Add to review queue, or Block.
  2. Click Save as draft first. Activate only after testing in the Playground.

Conditions available in metric rules:

FieldTypeExample
risk_levelEnumis CRITICAL
policy_scoreNumericis below 60
outcome_typeEnumis DENIED
guardrail_actionEnumis blocked
requires_human_oversightBooleanis false
step_countNumeric> 20
total_tokensNumeric> 8000

Self-assessment:

  • First metric rule created and saved as draft
  • Rule tested in the Playground before activation
  • Action configured (alert, review queue, or block)

Module 3 — Rego policy rules: advanced logic

Goal: Write your first Rego policy rule and understand when Rego is the right tool versus a metric rule.

Read:

When to use Rego instead of a metric rule:

  • You need to compare multiple steps within a session
  • You want to encode a specific regulatory clause as executable policy
  • You need to inspect nested annotation structures
  • You are reusing policy logic across multiple organizations or contexts

Your first Rego policy:

# Policy: High-risk decisions must have human oversight package veriproof.policy import future.keywords.if default allow = true allow = false if { input.risk_level == "HIGH" input.requires_human_oversight == false } allow = false if { input.risk_level == "CRITICAL" }

Testing your policy in the editor:

  1. Write your policy in the Rego editor.
  2. Click the Test tab.
  3. Paste a sample session document as the input.
  4. Click Evaluate — the panel shows whether allow returns true or false and the full OPA trace.

Performance matters in Rego. Policies run synchronously on every session. Avoid unbounded iteration (some i; i := input.steps[_] without a guard). Always use the Test panel to confirm evaluation time is under 50ms before activating.

Self-assessment:

  • First Rego policy written and evaluated in the Test panel
  • Policy returns false for at least one failing input case
  • Evaluation time confirmed acceptable in the Test panel

Module 4 — The Playground: live policy validation

Goal: Use the Playground to validate policy logic against your live application configuration before any draft rule goes to production.

Read:

Playground workflow:

  1. Navigate to Playground.
  2. Select the application whose policy configuration you want to test.
  3. Enter a prompt and structured context that represents the scenario you want to check.
  4. Click Run.
  5. Read the output: model response, governance evaluation, rule results, intent, risk level, and policy score.
  6. If a draft rule does not fire as expected, return to the Rules editor, adjust the condition, and re-test.

Playground runs do not create production session records or blockchain anchors. They are exploration-only and do not affect your compliance metrics.

Self-assessment:

  • Playground used to validate at least one metric rule condition
  • Playground used to validate at least one Rego policy
  • I understand which session fields appear in the Playground results panel

Module 5 — Evaluation datasets: regression testing

Goal: Build a curated evaluation dataset and use it to catch governance regressions before model, prompt, or rule changes reach production.

Read:

What an evaluation dataset is: An evaluation dataset is a collection of test cases — each with a known input and a declared expected governance outcome (intent, risk level, outcome type). When you run the dataset against your current endpoint configuration, VeriProof compares actual results to expected results and flags any regressions.

Building a useful dataset:

  1. Import from sessions — find historical sessions where the governance result was manually confirmed correct; import them as test cases
  2. Add adversarial edge cases — include inputs designed to probe the boundaries of your rules (high-risk inputs that should be blocked, valid inputs that should pass cleanly)
  3. Tag your cases — use tags like "regression", "edge-case", "high-risk" to organize large datasets

Running a regression check:

  1. Navigate to Evaluation → Datasets and open your dataset.
  2. Click Run Evaluation.
  3. Select the application and endpoint to test against.
  4. Review the results: pass rate, failed cases, and delta from the last run.

Self-assessment:

  • First evaluation dataset created with at least 5 test cases
  • Dataset includes at least one high-risk edge case that should fail policy
  • Evaluation run completed and pass rate reviewed

Module 6 — Governance thresholds and alerts

Goal: Configure governance-specific alert thresholds so your team is notified when policy performance degrades or drift is detected.

Your threshold configuration responsibilities:

Threshold typeWhere to configureWhat triggers it
Policy score floorSettings → Compliance → Policy ThresholdsAlert when rolling 7-day score drops below your target
Annotation coverageSettings → Compliance → Policy ThresholdsAlert when declared governance attribute coverage drops below threshold
Drift detectionApplication workspace → Monitor tabAlert when session behavior diverges from baseline
Review queue SLAReview Queues → [Queue] → EditEscalate when items age past SLA

Generic alert delivery channels (Slack, webhook, email) are configured by Administrators under Settings → Integrations. Your job is to configure the thresholds that determine when alerts fire — not the delivery channels.

Self-assessment:

  • Policy score floor configured for each production application
  • At least one threshold tested by temporarily lowering it and confirming the alert fired
  • Drift detection baseline confirmed for at least one application

Module 7 — Production approval workflow and version history

Goal: Understand how rule changes move from draft to production and use version history to track and revert changes.

The production approval flow:

  1. You create or modify a rule and save it as Draft.
  2. You test the draft in the Playground and against your evaluation dataset.
  3. When satisfied, you click Request Activation.
  4. An Administrator or Compliance Officer reviews the request and approves or rejects it with a documented rationale.
  5. Approved rules become Active and begin evaluating sessions immediately.

In production environments, you cannot self-activate rules. This is by design. The approval requirement exists to ensure that governance changes are reviewed by someone with compliance or operational authority before they affect live decisions.

Version history: Every rule maintains a full version history with diffs. To view it:

  1. Open the rule in the Rules Builder.
  2. Click History in the top-right corner.
  3. Select any two versions to see a side-by-side diff.

To revert to a previous version, select the target version and click Restore as Draft.

Self-assessment:

  • At least one activation request submitted and tracked to approval
  • Version history reviewed for at least one rule
  • I understand how to revert a rule that produces unexpected behavior in production

What’s next?

Last updated on