Telemetry (Whitebox Testing)
Telemetry enables whitebox agentic testing. When configured, Humanbound sees inside your agent's reasoning -- tool calls, memory operations, retrieval steps, and resource usage -- giving the judge far richer context than blackbox request/response testing alone.
Without telemetry, the judge evaluates only the conversation (what the agent said). With telemetry, the judge also sees what the agent did -- which tools it called, what parameters it passed, what data it accessed, and how many resources it consumed.
How It Works
- Your agent runs with an observability platform (LangFuse, LangSmith, etc.)
- You add a
telemetryblock to your agent config JSON - After each conversation, Humanbound fetches the trace from your observability platform
- The judge receives both the conversation and the telemetry data (tool executions, memory ops, resource usage)
- Findings include whitebox evidence (e.g., "agent called
transfer_fundswith unauthorized parameters")
Configuration
The telemetry object sits inside your agent config JSON, alongside chat_completion, thread_init, etc. If the telemetry block is present, it is enabled -- no separate flag needed.
{
"streaming": false,
"thread_init": { "..." },
"chat_completion": { "..." },
"telemetry": {
"endpoint": "https://your-observability-platform/api/sessions/$session_id",
"headers": { "Authorization": "Bearer ..." },
"format": "langfuse",
"mode": "end_of_conversation"
}
}
Configuration Reference
| Field | Required | Default | Description |
|---|---|---|---|
endpoint |
Yes (for end_of_conversation) |
-- | URL to fetch telemetry data from. Supports $session_id and other meta-variables from thread_init response. |
headers |
No | {} |
HTTP headers for the telemetry request (API keys, auth tokens). |
payload |
No | {} |
Request body for the telemetry request. |
format |
No | custom |
Observability platform: langfuse, langsmith, openai_assistants, wandb, helicone, agentops, custom. |
mode |
No | end_of_conversation |
When to collect: end_of_conversation (fetch after all turns) or per_turn (extract from each response). |
telemetry_auth |
No | -- | Separate auth endpoint for the telemetry API. Same shape as thread_auth. |
extraction_map |
For per_turn / custom |
-- | JSONPath-like paths for extracting telemetry fields. |
Meta-Variables
The endpoint field supports placeholder replacement using values from the thread_init response:
$session_id-- replaced with the session/thread ID returned by your agent$HUMANBOUND_EID-- replaced with the Humanbound experiment ID$TOTAL_TURNS-- replaced with the number of conversation turns- Any key returned by
thread_initcan be referenced as$key_name
Modes
end_of_conversation (default)
After all turns in a conversation complete, Humanbound fetches telemetry from your observability platform's API. Best for platforms that expose trace data via REST (LangFuse, LangSmith, OpenAI Assistants).
Humanbound waits for the platform to ingest traces (with automatic retry), then fetches and parses the data.
per_turn
Extracts metadata from each chat response using extraction_map. No separate API call needed -- telemetry is pulled from the agent's response payload. Best for agents that return tool call metadata inline.
Supported Platforms
LangFuse
LangFuse is an open-source observability platform for LLM applications. Humanbound fetches session traces and parses tool executions, memory operations, and token usage from LangFuse observations.
Setup:
- Enable LangFuse tracing in your agent (e.g.,
langfuse.langchain.CallbackHandler) - Create a Basic auth token from your LangFuse public + secret keys:
base64(public_key:secret_key)
Session ID alignment is critical
Humanbound uses $session_id in the telemetry endpoint URL to fetch traces from LangFuse. This variable is replaced with the value of session_id returned by your agent's thread_init response.
Your agent must do two things:
-
Return the session ID as
session_idin thethread_initresponse. When Humanbound calls your agent's session creation endpoint, the response must include a field calledsession_id. This is what$session_idgets replaced with. If your agent returns the session ID under a different name (e.g.,thread_id,conversation_id), use that name instead:$thread_id,$conversation_id, etc. -
Register traces in LangFuse under the same session ID. Your agent must pass the exact same session ID to LangFuse when logging traces. For LangGraph/LangChain agents, this means using
propagate_attributes(session_id=session_id)or passingsession_idto theCallbackHandler. If LangFuse traces are registered under a different ID than what your agent returns to Humanbound, the telemetry fetch will return empty data.
In short: the session ID your agent gives Humanbound and the session ID your agent gives LangFuse must be the same value.
Configuration:
{
"telemetry": {
"format": "langfuse",
"mode": "end_of_conversation",
"endpoint": "https://cloud.langfuse.com/api/public/sessions/$session_id",
"headers": {
"Authorization": "Basic <base64(public_key:secret_key)>"
}
}
}
If your agent returns the session under a different field name, adjust the meta-variable accordingly:
Generate the Basic auth token:
What Humanbound extracts:
- Tool executions (from
TOOLandGENERATIONobservations) - Memory operations (from
SPANobservations with memory-related names) - Token usage and cost (from
GENERATIONobservation usage data) - External API calls (from
EVENTobservations)
LangSmith [PREVIEW]
LangSmith is LangChain's tracing and evaluation platform.
Preview
This integration has not been validated end-to-end. Turn-level granularity may be limited. Please report issues.
Configuration:
{
"telemetry": {
"format": "langsmith",
"mode": "end_of_conversation",
"endpoint": "https://api.smith.langchain.com/runs",
"headers": {
"x-api-key": "ls-your-api-key"
}
}
}
What Humanbound extracts:
- Tool executions (from run steps with tool call data)
- Token usage (from run metadata)
- Run hierarchy (parent/child relationships)
OpenAI Assistants [PREVIEW]
For agents built on the OpenAI Assistants API. Fetches run steps including tool calls, code interpreter, and retrieval.
Preview
This integration has not been validated end-to-end. Please report issues.
Configuration:
{
"telemetry": {
"format": "openai_assistants",
"mode": "end_of_conversation",
"endpoint": "https://api.openai.com/v1/threads/$thread_id/runs/$run_id/steps",
"headers": {
"Authorization": "Bearer sk-...",
"OpenAI-Beta": "assistants=v2"
}
}
}
What Humanbound extracts:
- Tool calls (function calling, code interpreter, file search)
- Token usage per step
- Step execution order and timing
Weights & Biases (W&B) [PREVIEW]
Weights & Biases tracing for LLM applications.
Preview
This integration has not been validated end-to-end. Please report issues.
Configuration:
{
"telemetry": {
"format": "wandb",
"mode": "end_of_conversation",
"endpoint": "https://api.wandb.ai/runs/$session_id",
"headers": {
"Authorization": "Bearer wandb-api-key"
}
}
}
Helicone [PREVIEW]
Helicone is an observability layer for LLM APIs.
Preview
This integration has not been validated end-to-end. Please report issues.
Configuration:
{
"telemetry": {
"format": "helicone",
"mode": "end_of_conversation",
"endpoint": "https://api.helicone.ai/v1/requests",
"headers": {
"Authorization": "Bearer helicone-api-key"
}
}
}
AgentOps [PREVIEW]
AgentOps provides session-level observability for AI agents.
Preview
This integration has not been validated end-to-end. Turn-level granularity may be limited. Please report issues.
Configuration:
{
"telemetry": {
"format": "agentops",
"mode": "end_of_conversation",
"endpoint": "https://api.agentops.ai/sessions/$session_id",
"headers": {
"Authorization": "Bearer agentops-api-key"
}
}
}
Custom Format
For observability platforms not listed above, or for agents that return telemetry data in a custom format. Uses extraction_map with JSONPath-like paths to locate telemetry fields.
End-of-Conversation Custom
Fetch from a custom API and extract fields using JSONPath:
{
"telemetry": {
"format": "custom",
"mode": "end_of_conversation",
"endpoint": "https://your-platform/api/traces/$session_id",
"headers": { "Authorization": "Bearer ..." },
"extraction_map": {
"tool_executions": "$.steps[*]",
"tool_executions.tool_name": "name",
"tool_executions.parameters": "input",
"tool_executions.result": "output",
"memory_operations": "$.memory_events[*]",
"memory_operations.operation_type": "type",
"memory_operations.content": "data"
}
}
}
Per-Turn Custom
Extract telemetry from each agent response (no separate API call):
{
"telemetry": {
"format": "custom",
"mode": "per_turn",
"extraction_map": {
"metadata_path": "data.response.metadata",
"tool_executions": "tool_calls",
"tool_executions.tool_name": "name",
"tool_executions.parameters": "arguments",
"tool_executions.result": "output",
"resource_usage.tokens_used": "usage.total_tokens"
}
}
}
Extraction Map Reference
| Path | Description |
|---|---|
metadata_path |
(per_turn only) Dot-notation path to the metadata object in each response |
tool_executions |
JSONPath to the array of tool execution objects |
tool_executions.tool_name |
Field name within each tool execution for the tool name |
tool_executions.parameters |
Field name for tool input parameters |
tool_executions.result |
Field name for tool output/result |
tool_executions.turn |
Field name for turn number (optional) |
memory_operations |
JSONPath to memory operation objects |
memory_operations.operation_type |
Field name for operation type (read/write/delete) |
memory_operations.content |
Field name for operation content |
resource_usage.tokens_used |
JSONPath to total token count |
Standardized Output
Regardless of the source platform, Humanbound normalizes all telemetry into a standard schema before passing it to the judge:
{
"tool_executions": [
{ "turn": 1, "tool_name": "get_balance", "parameters": {"account_id": "ACC-001"}, "result": "{...}" }
],
"memory_operations": [
{ "turn": 2, "operation_type": "store", "content": "user preference saved" }
],
"external_calls": [
{ "turn": 1, "url": "https://api.bank.com/accounts", "method": "GET", "status": "200" }
],
"resource_usage": {
"tokens_used": 3500,
"api_calls_count": 2,
"total_cost_usd": 0.015
},
"authorization_events": [],
"agent_delegation": []
}
Troubleshooting
Telemetry returns empty data: Observability platforms need time to ingest traces. Humanbound waits ~10 seconds after each conversation, then retries up to 3 times with progressive delays (total max ~25 seconds). If your platform has higher latency, traces may still be empty.
Session not found:
Ensure your agent passes the same session ID to both Humanbound (via thread_init response) and your observability platform. For LangFuse with LangGraph, use propagate_attributes(session_id=session_id).
Wrong credentials:
For LangFuse, the Basic auth header uses public_key:secret_key base64-encoded. Ensure the keys belong to the same LangFuse project your agent logs to.