Documentation Index Fetch the complete documentation index at: https://docs.idun-group.com/llms.txt
Use this file to discover all available pages before exploring further.
Idun Engine supports 15 guardrail types that validate agent inputs, outputs, or both. Each guardrail has a config_id, a reject message returned when the guard triggers, and type-specific configuration fields.
Guardrail positions
Guardrails are placed in one of two positions:
Input : Applied to user messages before they reach the agent
Output : Applied to agent responses before they are returned to the user
You can place the same guardrail type in both positions.
guardrails :
input :
- config_id : detect_pii
reject_message : "PII detected in your input"
pii_entities : [ "EMAIL_ADDRESS" , "PHONE_NUMBER" , "CREDIT_CARD" ]
on_fail : exception
output :
- config_id : toxic_language
reject_message : "Response contains inappropriate language"
threshold : 0.7
Guardrail types
BAN_LIST
Blocks messages containing specific words or phrases.
Field Type Description config_idban_listreject_messagestringMessage returned when triggered banned_wordslist[string]Words or phrases to block
- config_id : ban_list
reject_message : "Message contains banned content"
banned_words : [ "banned-word" , "another phrase" ]
DETECT_PII
Detects personally identifiable information in text.
Field Type Description config_iddetect_piireject_messagestringMessage returned when triggered pii_entitieslist[string]PII entity types to detect (e.g., EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, SSN) on_failstringAction on detection. Default: exception
- config_id : detect_pii
reject_message : "Personal information detected"
pii_entities : [ "EMAIL_ADDRESS" , "PHONE_NUMBER" , "CREDIT_CARD" ]
on_fail : exception
NSFW_TEXT
Detects not-safe-for-work content.
Field Type Description config_idnsfw_textreject_messagestringMessage returned when triggered thresholdfloatSensitivity level (0.0 to 1.0). Lower values are more sensitive
- config_id : nsfw_text
reject_message : "Inappropriate content detected"
threshold : 0.5
COMPETITION_CHECK
Flags mentions of competitor companies or products.
Field Type Description config_idcompetition_checkreject_messagestringMessage returned when triggered competitorslist[string]Names of competitor companies or products
- config_id : competition_check
reject_message : "Competitor reference detected"
competitors : [ "CompetitorA" , "CompetitorB" ]
BIAS_CHECK
Detects biased language in text.
Field Type Description config_idbias_checkreject_messagestringMessage returned when triggered thresholdfloatSensitivity level (0.0 to 1.0)
- config_id : bias_check
reject_message : "Biased language detected"
threshold : 0.7
CORRECT_LANGUAGE
Validates that text is in one of the expected languages.
Field Type Description config_idcorrect_languagereject_messagestringMessage returned when triggered expected_languageslist[string]Valid ISO language codes (e.g., en, fr, es)
- config_id : correct_language
reject_message : "Please use English or French"
expected_languages : [ "en" , "fr" ]
GIBBERISH_TEXT
Filters nonsensical or garbled input.
Field Type Description config_idgibberish_textreject_messagestringMessage returned when triggered thresholdfloatSensitivity level (0.0 to 1.0)
- config_id : gibberish_text
reject_message : "Input appears to be nonsensical"
threshold : 0.8
TOXIC_LANGUAGE
Detects toxic or harmful language.
Field Type Description config_idtoxic_languagereject_messagestringMessage returned when triggered thresholdfloatSensitivity level (0.0 to 1.0)
- config_id : toxic_language
reject_message : "Toxic language detected"
threshold : 0.7
RESTRICT_TO_TOPIC
Keeps conversations within a defined set of allowed topics.
Field Type Description config_idrestrict_to_topicreject_messagestringMessage returned when triggered topicslist[string]List of allowed topics
- config_id : restrict_to_topic
reject_message : "That topic is outside the scope of this agent"
topics : [ "customer support" , "product information" , "billing" ]
DETECT_JAILBREAK
Detects jailbreak attempts in user input.
Field Type Description config_iddetect_jailbreakreject_messagestringMessage returned when triggered thresholdfloatSensitivity level (0.0 to 1.0)
- config_id : detect_jailbreak
reject_message : "Jailbreak attempt detected"
threshold : 0.5
PROMPT_INJECTION
Detects prompt injection attacks.
Field Type Description config_idprompt_injectionreject_messagestringMessage returned when triggered thresholdfloatSensitivity level (0.0 to 1.0)
- config_id : prompt_injection
reject_message : "Prompt injection detected"
threshold : 0.5
RAG_HALLUCINATION
Detects hallucinations in RAG (Retrieval-Augmented Generation) responses by comparing the response against the retrieved context.
Field Type Description config_idrag_hallucinationreject_messagestringMessage returned when triggered thresholdfloatSensitivity level (0.0 to 1.0)
- config_id : rag_hallucination
reject_message : "Response may contain unsupported claims"
threshold : 0.7
CODE_SCANNER
Scans and validates code in messages, restricting to allowed programming languages.
Field Type Description config_idcode_scannerreject_messagestringMessage returned when triggered allowed_languageslist[string]List of allowed programming languages
- config_id : code_scanner
reject_message : "Code in this language is not allowed"
allowed_languages : [ "python" , "javascript" , "sql" ]
MODEL_ARMOR (Google Cloud)
Uses Google Cloud’s Model Armor service for content safety evaluation.
Field Type Description config_idmodel_armornamestringName of the armor configuration project_idstringGoogle Cloud project ID locationstringGoogle Cloud region (e.g., us-central1) template_idstringModel Armor template ID
- config_id : model_armor
name : production-armor
project_id : my-gcp-project
location : us-central1
template_id : my-armor-template
Model Armor requires a Google Cloud project with the Model Armor API enabled. This guardrail type does not use the Guardrails AI hub.
CUSTOM_LLM
Uses a large language model as a custom guardrail with a prompt you define.
Field Type Description config_idcustom_llmnamestringName of the custom guardrail modelstringLLM model to use for evaluation promptstringSystem instruction prompt that defines the guardrail logic
Supported models:
Model ID Name Gemini 2.5 flash liteGemini 2.5 Flash Lite Gemini 2.5 flashGemini 2.5 Flash Gemini 2.5 proGemini 2.5 Pro Gemini 3 proGemini 3 Pro OpenAi GPT-5.1OpenAI GPT-5.1 OpenAi GPT-5 miniOpenAI GPT-5 Mini OpenAi GPT-5 nanoOpenAI GPT-5 Nano
- config_id : custom_llm
name : compliance-check
model : "Gemini 2.5 flash"
prompt : "Evaluate the following text for regulatory compliance violations. Return PASS if compliant, FAIL if not."
Custom LLM guardrails use a separate LLM call for evaluation. This adds latency and cost to each guarded request.
Summary table
Type Config ID Position Key config fields Ban list ban_listInput/Output banned_wordsDetect PII detect_piiInput/Output pii_entities, on_failNSFW text nsfw_textInput/Output thresholdCompetition check competition_checkInput/Output competitorsBias check bias_checkInput/Output thresholdCorrect language correct_languageInput/Output expected_languagesGibberish text gibberish_textInput/Output thresholdToxic language toxic_languageInput/Output thresholdRestrict to topic restrict_to_topicInput/Output topicsDetect jailbreak detect_jailbreakInput thresholdPrompt injection prompt_injectionInput thresholdRAG hallucination rag_hallucinationOutput thresholdCode scanner code_scannerInput/Output allowed_languagesModel Armor model_armorInput/Output project_id, location, template_idCustom LLM custom_llmInput/Output model, prompt
Next steps
Guardrails overview How guardrails fit into the agent request lifecycle.
Observability Trace guardrail decisions alongside agent runs.
Troubleshooting Diagnose configuration and provider errors.