Guardrails Export
Guardrails are security rules extracted from your test results. They capture the attack patterns and boundary violations discovered during testing and translate them into actionable rules for runtime defense.
Guardrails are the bridge between testing and protection โ they carry the knowledge gained from adversarial testing into the firewall's evaluation logic.
How It Works
hb test โ findings (what attacks succeeded)
โ
hb guardrails โ rules (what to block)
โ
humanbound-firewall โ runtime protection (blocking attacks)
Each guardrail rule includes:
- Threat class โ which OWASP category it addresses
- Pattern โ description of the attack technique
- Severity โ how critical the vulnerability was
- Action โ block (default)
Export Guardrails
From Local Test Results
# After running a test
hb test --endpoint ./config.json --scope ./scope.json --wait
# Export guardrails (reads from latest local results)
hb guardrails -o rules.json
hb guardrails --format yaml -o rules.yaml
From Platform Data (Logged In)
Platform guardrails are enriched by data from continuous monitoring โ more test cycles produce more diverse attack patterns and therefore more comprehensive rules.
Output Formats
# JSON (default)
hb guardrails -o guardrails.json
# YAML
hb guardrails --format yaml -o guardrails.yaml
# OpenAI moderation format
hb guardrails --vendor openai -o openai_rules.json
Using with humanbound-firewall
Guardrails configure the firewall's Tier 3 LLM judge โ they define what the agent is allowed and restricted from doing:
The agent.yaml scope (permitted/restricted intents) acts as the guardrail configuration. Exported rules can supplement or override the base configuration.
See Firewall for full integration details.
Training Firewall Classifiers
Beyond rule-based guardrails, test results can train ML classifiers for the firewall's Tier 2:
# Train from local test data
hb firewall train
# Train from external red teaming results
hb firewall train --import pyrit_results.json
hb firewall train --import results.json:promptfoo
# Train from platform data (richer, requires login)
hb firewall train --source platform
See Firewall โ Tier 2 for details on classifier training.