Skip to main content
In modern agent development, standard system metrics like latency, token count, and cost are insufficient for understanding complex agent behavior. While inspecting individual traces provides deep insight, it is impossible to scale manual reviews across the millions of traces generated in a live environment. Signals provide a high-level monitoring solution designed to bridge this gap by offering automated, behavioral scoring for agents in production. Signals utilize a robust backend infrastructure to provide real-time performance insights:
  • Automated scoring: Every incoming production trace is automatically processed and scored based on predefined metrics.
  • Infrastructure: Processing is powered by Coreweave compute and Coreweave GPUs to ensure scalability across millions of traces.
  • Custom metrics: Developers can create specific metrics, such as response length or faithfulness to source material, to help understand exactly how an agent is behaving.
By using signals within production, you can:
  • Gain behavioral insight: Move beyond simple system metrics to understand if your agent is hallucinating, failing to follow conversation patterns, or losing grounding in its evidence.
  • Automate alerts: Set up automated triggers that notify your team through tools like Slack when an agent’s performance drops below a certain threshold.
  • Accelerate the research loop: Use the scores and failure analyses generated by signals to identify specific weaknesses, which can then be used to kick off the research loop for offline model improvement, data annotation, or reinforcement learning.

Use built-in signals

Signals are preset monitors that automatically evaluate production traces for common quality issues and errors. Each signal uses a benchmarked LLM prompt to classify traces and saves the results as comma-delimited tags representing the detected issues. To start classifying traces immediately, enable signals from the Monitors page. Signals don’t require prompt engineering or scorer configuration. Signals use a W&B Inference model to score traces, so no external API keys are required.

Available signals

W&B Weave provides 13 preset signals organized into two groups.

Quality signals

Quality signals evaluate successful root-level traces for output quality and safety issues.
SignalWhat it detects
HallucinationFabricated facts or claims that contradict the provided input context
Low qualityResponses with poor format, insufficient effort, or incomplete content
User frustrationSigns of user frustration such as repeated questions, negative sentiment, or complaints
JailbreakingPrompt injection and jailbreak attempts that try to bypass safety guidelines
NSFWExplicit, violent, or otherwise inappropriate content in inputs or outputs
LazyLow-effort responses such as excessive brevity, refusals to help, or deferred work
ForgetfulFailure to use context from earlier in the conversation, ignoring previously stated facts or instructions

Error signals

Error signals categorize failed traces by root cause to help you identify and resolve infrastructure and application issues.
SignalWhat it detects
Network ErrorDNS failures, timeouts, connection resets, and other connectivity issues
RatelimitedHTTP 429 responses, quota exhaustion, and throttling from upstream APIs
Request Too LargeRequests exceeding size or token limits, such as context window exceeded
Bad RequestClient-side errors where the server rejected the request (4xx except 429)
Bad ResponseInvalid, unexpected, or unusable responses from remote services (5xx)
BugFlaws in application code such as KeyError, TypeError, or logic errors

Enable signals from the Monitors page

To enable signals:
  1. Navigate to wandb.ai and then open your Weave project.
  2. In the Weave project sidebar, select Monitors.
  3. At the top of the Monitors page, a row of suggested signal cards appears. Each card shows the signal name, a description, and an + Add signal button.
  4. To enable a single signal, select the + Add signal button on the signal card. The signal begins scoring new traces immediately.
  5. To enable multiple signals at once, select the + [X] more signals button. This opens a drawer that lists all available signals grouped by category. Select the signals you want to turn on, then select Add signals.
After enabling signals, Weave scores incoming traces and stores the results as feedback on each Call object. View signal results in the Traces tab by selecting a trace and reviewing the feedback panel. For each signal group, open the classifiers Call that the signal generates, then under Output review classifier_meta for the reasoning. For example, the following screenshot shows a high confidence (0.95) that Hallucination didn’t occur and includes a reason for this rating. Weave Traces view with a quality-classifier trace selected. The details panel shows Inputs, Outputs, and Hallucination classifier metadata including a confidence score and reasoning that explains why the trace was classified that way.

Manage active signals

To view or remove active signals:
  1. From the Monitors page, select the Manage signals button (gear icon). This opens a drawer listing all currently active signals grouped by category.
  2. Hover over a signal and select the Remove signal button (trash icon) to disable the signal.
Removing a signal stops scoring new traces. Existing scores from the signal are preserved.

How signals work

Each signal uses an LLM-as-a-judge approach to classify traces:
  1. Trace selection: Quality signals evaluate successful root-level traces. Error signals evaluate failed traces. Child spans and intermediate calls are not scored.
  2. Prompt construction: Weave constructs a prompt that includes the trace metadata, inputs, outputs, exception details (if any), and the operation’s source code. The signal’s classifier prompt is appended with instructions for the specific issue to detect.
  3. LLM scoring: A W&B Inference model evaluates the trace and returns the names of detected issues as comma-delimited string tags (for example, "Low-quality, User-frustration, Forgetful").
  4. Result storage: Results are stored as feedback on the Call object and are queryable from the Traces tab.
When multiple signals from the same group (Quality or Error) are active, Weave batches the signals into a single LLM call for efficiency. The model evaluates all active classifiers in one pass and returns results for each. For specific monitoring beyond what is provided by the built-in signals, see Set up custom monitors.