Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.idun-group.com/llms.txt

Use this file to discover all available pages before exploring further.

Idun Engine supports 15 guardrail types that validate agent inputs, outputs, or both. Each guardrail has a config_id, a reject message returned when the guard triggers, and type-specific configuration fields.

Guardrail positions

Guardrails are placed in one of two positions:
  • Input: Applied to user messages before they reach the agent
  • Output: Applied to agent responses before they are returned to the user
You can place the same guardrail type in both positions.
config.yaml
guardrails:
  input:
    - config_id: detect_pii
      reject_message: "PII detected in your input"
      pii_entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"]
      on_fail: exception

  output:
    - config_id: toxic_language
      reject_message: "Response contains inappropriate language"
      threshold: 0.7

Guardrail types

BAN_LIST

Blocks messages containing specific words or phrases.
FieldTypeDescription
config_idban_list
reject_messagestringMessage returned when triggered
banned_wordslist[string]Words or phrases to block
- config_id: ban_list
  reject_message: "Message contains banned content"
  banned_words: ["banned-word", "another phrase"]

DETECT_PII

Detects personally identifiable information in text.
FieldTypeDescription
config_iddetect_pii
reject_messagestringMessage returned when triggered
pii_entitieslist[string]PII entity types to detect (e.g., EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, SSN)
on_failstringAction on detection. Default: exception
- config_id: detect_pii
  reject_message: "Personal information detected"
  pii_entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"]
  on_fail: exception

NSFW_TEXT

Detects not-safe-for-work content.
FieldTypeDescription
config_idnsfw_text
reject_messagestringMessage returned when triggered
thresholdfloatSensitivity level (0.0 to 1.0). Lower values are more sensitive
- config_id: nsfw_text
  reject_message: "Inappropriate content detected"
  threshold: 0.5

COMPETITION_CHECK

Flags mentions of competitor companies or products.
FieldTypeDescription
config_idcompetition_check
reject_messagestringMessage returned when triggered
competitorslist[string]Names of competitor companies or products
- config_id: competition_check
  reject_message: "Competitor reference detected"
  competitors: ["CompetitorA", "CompetitorB"]

BIAS_CHECK

Detects biased language in text.
FieldTypeDescription
config_idbias_check
reject_messagestringMessage returned when triggered
thresholdfloatSensitivity level (0.0 to 1.0)
- config_id: bias_check
  reject_message: "Biased language detected"
  threshold: 0.7

CORRECT_LANGUAGE

Validates that text is in one of the expected languages.
FieldTypeDescription
config_idcorrect_language
reject_messagestringMessage returned when triggered
expected_languageslist[string]Valid ISO language codes (e.g., en, fr, es)
- config_id: correct_language
  reject_message: "Please use English or French"
  expected_languages: ["en", "fr"]

GIBBERISH_TEXT

Filters nonsensical or garbled input.
FieldTypeDescription
config_idgibberish_text
reject_messagestringMessage returned when triggered
thresholdfloatSensitivity level (0.0 to 1.0)
- config_id: gibberish_text
  reject_message: "Input appears to be nonsensical"
  threshold: 0.8

TOXIC_LANGUAGE

Detects toxic or harmful language.
FieldTypeDescription
config_idtoxic_language
reject_messagestringMessage returned when triggered
thresholdfloatSensitivity level (0.0 to 1.0)
- config_id: toxic_language
  reject_message: "Toxic language detected"
  threshold: 0.7

RESTRICT_TO_TOPIC

Keeps conversations within a defined set of allowed topics.
FieldTypeDescription
config_idrestrict_to_topic
reject_messagestringMessage returned when triggered
topicslist[string]List of allowed topics
- config_id: restrict_to_topic
  reject_message: "That topic is outside the scope of this agent"
  topics: ["customer support", "product information", "billing"]

DETECT_JAILBREAK

Detects jailbreak attempts in user input.
FieldTypeDescription
config_iddetect_jailbreak
reject_messagestringMessage returned when triggered
thresholdfloatSensitivity level (0.0 to 1.0)
- config_id: detect_jailbreak
  reject_message: "Jailbreak attempt detected"
  threshold: 0.5

PROMPT_INJECTION

Detects prompt injection attacks.
FieldTypeDescription
config_idprompt_injection
reject_messagestringMessage returned when triggered
thresholdfloatSensitivity level (0.0 to 1.0)
- config_id: prompt_injection
  reject_message: "Prompt injection detected"
  threshold: 0.5

RAG_HALLUCINATION

Detects hallucinations in RAG (Retrieval-Augmented Generation) responses by comparing the response against the retrieved context.
FieldTypeDescription
config_idrag_hallucination
reject_messagestringMessage returned when triggered
thresholdfloatSensitivity level (0.0 to 1.0)
- config_id: rag_hallucination
  reject_message: "Response may contain unsupported claims"
  threshold: 0.7

CODE_SCANNER

Scans and validates code in messages, restricting to allowed programming languages.
FieldTypeDescription
config_idcode_scanner
reject_messagestringMessage returned when triggered
allowed_languageslist[string]List of allowed programming languages
- config_id: code_scanner
  reject_message: "Code in this language is not allowed"
  allowed_languages: ["python", "javascript", "sql"]

MODEL_ARMOR (Google Cloud)

Uses Google Cloud’s Model Armor service for content safety evaluation.
FieldTypeDescription
config_idmodel_armor
namestringName of the armor configuration
project_idstringGoogle Cloud project ID
locationstringGoogle Cloud region (e.g., us-central1)
template_idstringModel Armor template ID
- config_id: model_armor
  name: production-armor
  project_id: my-gcp-project
  location: us-central1
  template_id: my-armor-template
Model Armor requires a Google Cloud project with the Model Armor API enabled. This guardrail type does not use the Guardrails AI hub.

CUSTOM_LLM

Uses a large language model as a custom guardrail with a prompt you define.
FieldTypeDescription
config_idcustom_llm
namestringName of the custom guardrail
modelstringLLM model to use for evaluation
promptstringSystem instruction prompt that defines the guardrail logic
Supported models:
Model IDName
Gemini 2.5 flash liteGemini 2.5 Flash Lite
Gemini 2.5 flashGemini 2.5 Flash
Gemini 2.5 proGemini 2.5 Pro
Gemini 3 proGemini 3 Pro
OpenAi GPT-5.1OpenAI GPT-5.1
OpenAi GPT-5 miniOpenAI GPT-5 Mini
OpenAi GPT-5 nanoOpenAI GPT-5 Nano
- config_id: custom_llm
  name: compliance-check
  model: "Gemini 2.5 flash"
  prompt: "Evaluate the following text for regulatory compliance violations. Return PASS if compliant, FAIL if not."
Custom LLM guardrails use a separate LLM call for evaluation. This adds latency and cost to each guarded request.

Summary table

TypeConfig IDPositionKey config fields
Ban listban_listInput/Outputbanned_words
Detect PIIdetect_piiInput/Outputpii_entities, on_fail
NSFW textnsfw_textInput/Outputthreshold
Competition checkcompetition_checkInput/Outputcompetitors
Bias checkbias_checkInput/Outputthreshold
Correct languagecorrect_languageInput/Outputexpected_languages
Gibberish textgibberish_textInput/Outputthreshold
Toxic languagetoxic_languageInput/Outputthreshold
Restrict to topicrestrict_to_topicInput/Outputtopics
Detect jailbreakdetect_jailbreakInputthreshold
Prompt injectionprompt_injectionInputthreshold
RAG hallucinationrag_hallucinationOutputthreshold
Code scannercode_scannerInput/Outputallowed_languages
Model Armormodel_armorInput/Outputproject_id, location, template_id
Custom LLMcustom_llmInput/Outputmodel, prompt

Next steps

Guardrails overview

How guardrails fit into the agent request lifecycle.

Observability

Trace guardrail decisions alongside agent runs.

Troubleshooting

Diagnose configuration and provider errors.
Last modified on May 20, 2026