Playground

Playground is the safe place to experiment with prompts, model inputs, and draft policy logic without affecting production telemetry or audit metrics. It runs the same evaluation pipeline used in production, including guardrails, intent classification, and risk scoring.

Use it to explore behavior quickly, validate new ideas, and promote only the changes that perform well against your policy expectations.

Playground

The Playground is a sandboxed test environment for prompts and model interactions. It runs prompts through the same policy evaluation pipeline used for production sessions, including guardrails, intent classification, risk scoring, and annotation, without creating live records or affecting production metrics.

Who can use this

Available to

AdministratorDeveloperGovernance Engineer

Not available to

Compliance OfficerBusiness OwnerAuditor

Playground runs never create production session records or blockchain anchors. They are for exploration and validation only.

Developer

Use the Playground to check that your SDK instrumentation produces the expected intent, risk, and outcome classifications before you push to production. Select the application whose active configuration you want to test.

Gov. Engineer

Use the Playground to validate Rego policy logic interactively. Select an application, provide a test prompt, and observe how your draft rules would evaluate the session — without activating or saving anything.

Playground runs do not generate production session records or blockchain anchors. They are for testing and exploration only and do not appear in Sessions or compliance evidence exports.

Running a Prompt

Navigate to Playground.
Select the application whose policy configuration you want to test. The playground uses that application’s active guardrails, redaction policy, and rules.
Enter your prompt in the input panel.
Optionally add context — structured fields like userId, intent, agentRole — to simulate a realistic session context.
Click Run.

The playground calls your application’s configured model endpoint and processes the result through the policy pipeline. Results appear in the output panel within a few seconds.

Reading the Results

Model Response

The model’s raw response, optionally redacted if the active redaction policy matches any content.

Governance Evaluation

Signal	Description
Intent	The classified intent label
Risk level	The assigned risk level (LOW → CRITICAL)
Policy score	Point-in-time score for this evaluation
Decision	Approved / Denied / Deferred based on active rules
Guardrail result	Which guardrails evaluated and what action each took (allowed / flagged / blocked)
Content safety	Whether content safety signals were detected

Annotation Preview

A preview of the annotations that would be attached to a production session, including step-level annotations emitted by the SDK.

Testing Guardrail Configuration

Use the playground to confirm that new or changed rules behave as expected before rollout. To test a guardrail change:

Draft your rule change in the Rules Builder and save as draft (do not activate).
In the Playground, toggle Use draft rules to enable draft rule evaluation.
Run prompts that should trigger the new rule and prompts that should not.
Verify the guardrail results match your expectation.
Return to the Rules Builder and activate the rule when you are confident.

Saving Runs as Dataset Cases

Any playground run can be saved as a test case in an Evaluation Dataset:

After a run, click Save as test case.
Select an existing dataset or create a new one.
Set the expected outcome, intent, and risk level to the values you just confirmed are correct.
Click Add to dataset.

This is the fastest way to build evaluation datasets from validated prompt behavior.

Build curated test sets and run A/B evaluations against instrumented endpoints.

Evaluation Datasets

Author and test policy rules before activating them in production.

Rules Builder

Search production sessions and compare real outcomes to your sandbox runs.

Sessions

Playground

Playground

Running a Prompt

Reading the Results

Model Response

Governance Evaluation

Annotation Preview

Testing Guardrail Configuration

Saving Runs as Dataset Cases

Related Documentation