Home AI/ML What Is a Hook in AI? Lifecycle, PyTorch, and Webhook Patterns

What Is a Hook in AI? Lifecycle, PyTorch, and Webhook Patterns

Last updated: May 27, 2026
k
Published May 25, 2026 · Updated May 27, 2026 · 36 min read

The term “hook” in the context of artificial intelligence will elicit different responses depending on the audience. The agent-framework engineer typically refers to a shell command that fires before Claude Code runs a tool. The deep-learning researcher has in mind a Python callback registered on a neural network layer to capture activations. The MLOps engineer envisions an HTTP POST that lands in Slack the moment a training run finishes. The same term covers three distinct mechanisms, three distinct audiences, and three distinct sets of debugging considerations.

This overloading is not accidental: all three variants share the same underlying idea, namely a callback that fires at a defined point in another system’s execution. Treating them as interchangeable, however, is a frequent source of confusion. Advice to “use a hook” carries little practical value without specifying which variant is intended. The present guide therefore draws the boundaries explicitly and then accompanies each variant with working code.

Summary

What this post covers: The word “hook” in AI refers to at least three distinct mechanisms — agent lifecycle hooks (Claude Code and similar frameworks), model introspection hooks (PyTorch forward and backward callbacks), and MLOps event hooks (webhooks fired by training jobs and model registries). This post defines each, shows working code, and gives you a decision framework for picking the right one.

Key insights:

  • Claude Code exposes 12 lifecycle events and a small number of handler types, with exit code 2 reserved as the “block this action” signal that feeds stderr back to Claude as an error message.
  • PyTorch hooks come in three core flavors — register_forward_pre_hook, register_forward_hook, and register_full_backward_hook — each with a fixed signature and a RemovableHandle you must call .remove() on to avoid leaks.
  • MLOps webhooks are just HTTP POSTs with HMAC signatures, but they amplify failures: a slow receiver can block a model registry, and a missing signature check turns your training pipeline into an open RCE surface.
  • The three flavors are not interchangeable — picking the wrong one (a PyTorch hook to enforce safety, a webhook for activation extraction) leads to brittle systems that fight their own runtime.
  • Hooks are powerful precisely because they don’t require modifying the host system, but the same property makes them invisible — discoverability and audit logging matter as much as the hook code itself.

Main topics: Three different things people mean by “hook” in AI, Lifecycle hooks the agent-lifecycle flavor, A working Claude Code hooks example, Model introspection hooks the PyTorch flavor, A working PyTorch hooks example, Event hooks the MLOps webhook flavor, When to use which kind of hook, Common pitfalls.

Three different things people mean by “hook” in AI

Vocabulary first, then code. The three variants of “hook” in AI share the same skeletal definition—a user-supplied callback that fires at a defined point in another system’s execution—but they differ in every operationally important respect: where the callback runs, which process owns it, whether it can block the host, and what data it observes.

A lifecycle hook fires at a specific moment in an agent’s session loop. The canonical example is Claude Code’s PreToolUse event, which fires after the model has decided to invoke a tool but before the tool actually executes. The hook is a separate process—a shell command, an HTTP endpoint, or an MCP server—that the agent invokes with structured JSON describing the intended action. The hook may approve, modify, or block the action through its exit code or response. Lifecycle hooks exist because agent runtimes require extensibility points that do not necessitate forking the agent itself.

A model introspection hook is an in-process Python callback registered on a neural network module. PyTorch’s register_forward_hook is the canonical case: a function is supplied, and PyTorch calls that function every time the module’s forward() runs, passing the module, its input, and its output. The hook lives in the same process as the model, runs synchronously within the autograd graph (the system that tracks tensor operations for gradient computation), and may read or even modify tensors on the fly. Such hooks exist because researchers need to inspect a model without rewriting its source code.

An event hook, usually called a webhook in MLOps contexts, is an HTTP POST issued by one service to another when a defined event occurs—a training run completes, a model is promoted to production, or a drift detector exceeds a threshold. The hook receiver lives in an entirely different process (often on a different host or behind a load balancer), authenticates via a shared secret with HMAC (a cryptographic signature method that proves the message was not tampered with), and runs asynchronously with respect to the event source. Webhooks exist because MLOps stacks are heterogeneous and require a low-friction mechanism for distributing events across systems.

Three observations render this taxonomy useful rather than pedantic. First, the audiences scarcely overlap: the researcher confronting a vanishing gradient and the platform engineer integrating a model registry both rely on “hooks,” but their tooling, vocabulary, and failure modes have little in common. Second, the level of trust required differs sharply: a PyTorch hook runs inside the process and is implicitly trusted; a Claude Code hook executes shell commands and is trusted but auditable; a webhook crosses a network boundary and must therefore authenticate. Third, the cost of misclassification scales accordingly: an errant PyTorch hook leaks memory, an errant Claude Code hook may erase a file, and an errant webhook handler may broadcast secrets. Selecting the right variant is not merely a stylistic choice; it defines the security boundary of the entire feature.

The figure below summarises the taxonomy:

Three Meanings of “Hook” in AI Same skeletal idea (callback at a defined point), three operational realities Lifecycle Hook (Claude Code, agent frameworks) Fires at: agent session events Runs in: separate process (shell/HTTP) Can block? yes (exit code 2) Typical user: agent builders, safety teams Example: block rm -rf, auto-format after Edit Introspection Hook (PyTorch, TensorFlow, JAX) Fires at: forward / backward pass Runs in: same process, sync, in graph Can block? no, but can modify tensors Typical user: researchers, model debuggers Example: capture activations, log gradient norms Event Hook (Webhook) (MLflow, W&B, model registry) Fires at: infra/business events Runs in: remote service, async, HTTP Can block? indirectly (timeout, retries) Typical user: MLOps, platform teams Example: Slack alert on training failure All three are “callbacks at a defined point”, but they share nothing else. Pick by problem type, not by name.

Key Takeaway: Readers interested in only one variant may proceed directly to the relevant section. Agent builders should consult the sections on lifecycle hooks and the Claude Code example. Deep-learning practitioners should refer to the PyTorch sections. MLOps engineers should focus on the webhook section. The decision-framework section at the end is intended for all readers.

Lifecycle hooks: the agent-lifecycle flavor

Lifecycle hooks are the most recent of the three variants to enter the AI lexicon, largely because agent frameworks themselves are recent. The mechanism is straightforward: an agent runtime defines a small set of events that mark notable moments in its operation, and handlers are registered to fire when those events occur.

Claude Code, the CLI agent developed by Anthropic, exposes twelve such events in its current hooks system (per the official documentation at code.claude.com/docs/en/hooks, as of 2026-05-25). The events span the full session arc, from SessionStart when the agent boots, through UserPromptSubmit when the user submits input, to PreToolUse and PostToolUse that wrap every tool call, and finally to Stop and SessionEnd. Each event passes structured JSON to the handler describing the current operation, and the handler may respond with text (returned to Claude as additional context), a block decision, or simply an exit code.

The significance of this mechanism is as follows: without hooks, customising an agent’s behaviour requires either writing a custom tool (a heavy approach) or relying on a CLAUDE.md instruction (an unreliable one). Hooks provide a third option—deterministic, code-enforced policy that fires regardless of the model’s decisions. If a hook returns exit code 2 on a PreToolUse for any Bash call matching /rm -rf \//, the tool will not run. The model is not merely asked not to run it; the tool will not run. This distinction constitutes the entire value proposition.

Claude Code Session Lifecycle & Hook Insertion Points Each event lets you register a handler that fires at that exact moment SessionStart agent boots UserPromptSubmit you hit enter PreToolUse CAN BLOCK tool runs PostToolUse log, format, scan Notification tool perms etc. Stop CAN BLOCK SessionEnd cleanup Other events in the 12: – SubagentStop — fires when a spawned sub-agent finishes – PreCompact — fires before context is compacted (your chance to save state) – PreRespond — fires before Claude streams its reply (modify or annotate output) – Plus additional events for slash commands, file edits, and session restoration Red = blocking-capable. Check the official docs for the current authoritative list.

The twelve events may be categorised by responsibility as follows:

Event When it fires Can block? Typical use case
SessionStart Agent boots up No Inject project context, set env vars
UserPromptSubmit After you hit enter Yes Validate prompt, expand templates
PreToolUse Before any tool runs Yes Safety check, dry-run preview
PostToolUse After tool returns No Auto-format, log, scan output
Notification Permission prompts, etc. No Forward to phone, log audit trail
Stop Claude finishes its turn Yes Force continuation, run tests
SubagentStop A sub-agent finishes Yes Collect sub-agent artifacts
SessionEnd Session terminates No Final cleanup, session summary
PreCompact Before context compaction No Persist scratchpad to disk
PreRespond Before reply streams Yes Redact, annotate, classify
Edit/file events On file modifications No Format, lint, version control
Slash command events On /command invocation Varies Custom command preprocessing

 

The names matter because matching is partially name-based. A hook configuration in .claude/settings.json specifies an event name and an optional matcher (a regular expression tested against the tool name for tool-related events), followed by a list of handlers. The handler contains the code that executes.

Handler Types and Where They Run

Claude Code’s hooks system currently supports four handler types per the official documentation (as of 2026-05-25; readers should consult the latest reference for the authoritative list, as this area continues to evolve). The three most commonly encountered are described below:

Claude Code Hook Handler Types Input flow (event JSON) → handler → output flow (exit code + stdout/stderr or HTTP response) Command shell command on local disk Input: JSON on stdin Output: stdout/stderr + exit code Exit semantics: 0 = ok non-zero != 2 = warn 2 = block (stderr → Claude) Pros: simple, no server needed Cons: shell-injection risk if naive cold-start cost per call HTTP POST to a web endpoint Input: JSON in request body Output: JSON in HTTP response Response semantics: 200 + {action:”allow”} 200 + {action:”block”} 5xx / timeout = error Pros: central policy, multi-user Cons: network latency in hot path availability dependency MCP Model Context Protocol server Input: MCP request message Output: MCP response message Semantics: structured tool-like reply streaming supported capability-negotiated Pros: reuses MCP tooling/infra Cons: more setup than Command harder to debug ad-hoc

The handler type should be chosen on the basis of the desired operational profile rather than syntactic preference:

Handler type Best for Security posture When to pick
Command Local, per-developer policies Runs as the local user; care required with untrusted arguments Default for solo or single-machine use
HTTP Team-wide central policy Use TLS and auth header; isolate the receiver When a single policy must be enforced across many developers
MCP Integration with existing MCP servers Inherits the MCP server security model When MCP infrastructure is already in operation and consistency is required

 

Readers new to MCP may find the Model Context Protocol primer a useful companion. Hooks and MCP servers represent two of the principal extensibility surfaces in modern agent runtimes, and they frequently operate in concert.

Exit Code Semantics for Command Handlers

The Command handler’s contract is small but precise. According to the official hooks documentation (as of 2026-05-25):

  • Exit 0: success. Stdout is captured but treated as informational, and Claude proceeds normally.
  • Exit 2: blocking error. Stderr is returned to Claude as an error message. For PreToolUse this blocks the tool call entirely; for Stop it forces continuation. This is the appropriate code for deterministic prevention.
  • Other non-zero: warning. The event is logged but not blocked, which is useful for soft policy (“not recommended, but permitted”).

PreToolUse Hook Flow How Claude Code decides whether your hook lets a tool run Claude decides to call a tool e.g. Bash: “rm -rf /tmp/x” Matcher checked PreToolUse + matcher=”Bash” → handler is selected Handler invoked JSON event payload on stdin: {“tool”:”Bash”,”input”:{“command”:”…”}} Handler runs, reads JSON, decides exit 0 Tool runs as planned stdout → context (optional addendum) exit 1 (or other) Tool runs anyway Warning logged stderr captured exit 2 Tool is BLOCKED stderr → Claude as error message

Caution: Exit 1 should not be conflated with exit 2. Many shell scripts exit with code 1 on any error condition. When the intent is to block, the script must specifically exit with code 2. A hook that uses set -e and then crashes will exit non-zero but probably not with code 2, so the tool will run anyway and only a warning will be logged. Blocking paths should be tested explicitly.

A Working Claude Code Hooks Example

Concrete code follows. The .claude/settings.json file below configures three hooks: a PreToolUse safety check, a PostToolUse auto-formatter, and a SessionStart context injector.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "handlers": [
          {
            "type": "command",
            "command": ".claude/hooks/safety-check.sh"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "handlers": [
          {
            "type": "command",
            "command": ".claude/hooks/auto-format.sh"
          },
          {
            "type": "http",
            "url": "https://hooks.internal.example.com/claude-edit",
            "headers": {
              "Authorization": "Bearer ${CLAUDE_HOOK_TOKEN}"
            }
          }
        ]
      }
    ],
    "SessionStart": [
      {
        "handlers": [
          {
            "type": "command",
            "command": ".claude/hooks/inject-context.sh"
          }
        ]
      }
    ]
  }
}

Note the two-handler array on PostToolUse: hooks compose. Both execute, and their outputs are aggregated. The matcher is a regular expression matched against the tool name; Edit|Write means the hook fires on either event.

PreToolUse Safety Hook in Bash

The shell script below blocks dangerous rm patterns and writes an audit log of every Bash invocation. It reads the event JSON from stdin (using jq for parsing) and exits with code 2 and an explanatory stderr message when a risky pattern is observed.

#!/usr/bin/env bash
# .claude/hooks/safety-check.sh
# Blocks dangerous rm patterns; audits all Bash invocations.
set -uo pipefail

PAYLOAD=$(cat)
CMD=$(echo "$PAYLOAD" | jq -r '.tool_input.command // empty')

# Audit log first — we want every attempt recorded.
mkdir -p .claude/audit
echo "$(date -u +%FT%TZ)  $CMD" >> .claude/audit/bash.log

# Block obvious destructive patterns.
DANGEROUS_PATTERNS=(
  'rm[[:space:]]+-rf?[[:space:]]+/($|[[:space:]])'
  'rm[[:space:]]+-rf?[[:space:]]+/\*'
  'rm[[:space:]]+-rf?[[:space:]]+~'
  ':\(\)\{[[:space:]]*:\|:&[[:space:]]*\};:'  # fork bomb
  'mkfs\.'
  'dd[[:space:]]+if=/dev/(zero|random|urandom)[[:space:]]+of=/dev/sd'
)

for pat in "${DANGEROUS_PATTERNS[@]}"; do
  if [[ "$CMD" =~ $pat ]]; then
    echo "Blocked: command matches dangerous pattern '$pat'" >&2
    echo "If you really need to run this, do it manually outside Claude." >&2
    exit 2
  fi
done

# Also block writes to anything under /etc or /usr without sudo prompting.
if [[ "$CMD" =~ (^|[[:space:]])(rm|mv|cp|tee|>)[[:space:]].*(/etc/|/usr/) ]]; then
  echo "Blocked: write to system path detected." >&2
  exit 2
fi

exit 0

The pattern list is intentionally short, because long pattern lists provide a false sense of security. The real defence is the audit log: even when a command is not blocked, a tamper-evident record of Claude’s attempted actions remains available.

PostToolUse Auto-Formatter

#!/usr/bin/env bash
# .claude/hooks/auto-format.sh
# Runs Prettier / Black on any file Claude just edited.
set -euo pipefail

PAYLOAD=$(cat)
FILE=$(echo "$PAYLOAD" | jq -r '.tool_input.file_path // .tool_input.path // empty')

if [[ -z "$FILE" ]] || [[ ! -f "$FILE" ]]; then
  exit 0
fi

case "$FILE" in
  *.py)        ruff format "$FILE" 2>/dev/null || true ;;
  *.ts|*.tsx)  npx prettier --write "$FILE" 2>/dev/null || true ;;
  *.js|*.jsx)  npx prettier --write "$FILE" 2>/dev/null || true ;;
  *.json)      npx prettier --write "$FILE" 2>/dev/null || true ;;
  *.go)        gofmt -w "$FILE" 2>/dev/null || true ;;
esac

# PostToolUse is not blocking — exit 0 even on format failure.
exit 0

Note the || true: a missing formatter should not cause the hook to fail. Failing a PostToolUse hook with exit code 2 has no effect (the tool has already run), but exit code 1 still produces noise in the agent’s view.

HTTP PostToolUse Hook (FastAPI Receiver)

For team-wide policy or central observability, an HTTP hook is preferable to a per-machine command. A minimal FastAPI receiver is shown below:

"""Webhook receiver for Claude Code PostToolUse events.

Run: uvicorn receiver:app --host 0.0.0.0 --port 8080
"""
import hashlib
import hmac
import json
import logging
import os
from datetime import datetime, timezone

from fastapi import FastAPI, Header, HTTPException, Request

app = FastAPI()
log = logging.getLogger("claude_hook")
logging.basicConfig(level=logging.INFO)

SECRET = os.environ["CLAUDE_HOOK_SECRET"].encode("utf-8")


def verify_signature(body: bytes, signature: str) -> bool:
    """HMAC-SHA256 signature check — prevents spoofed events."""
    expected = hmac.new(SECRET, body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, signature or "")


@app.post("/claude-edit")
async def claude_edit(
    request: Request,
    authorization: str | None = Header(default=None),
    x_signature: str | None = Header(default=None),
):
    body = await request.body()

    if not verify_signature(body, x_signature or ""):
        raise HTTPException(status_code=401, detail="bad signature")

    event = json.loads(body)
    log.info(
        "edit by %s on %s at %s",
        event.get("session_id", "?"),
        event.get("tool_input", {}).get("file_path", "?"),
        datetime.now(timezone.utc).isoformat(),
    )

    # Return JSON the agent can use. An empty body is fine for fire-and-forget.
    return {"status": "logged"}

The signature check is important. Without it, any party able to reach the endpoint can fabricate “Claude edited /etc/passwd” events. The shared secret resides in CLAUDE_HOOK_SECRET on both the Claude Code client and the receiver.

SessionStart Context Injector

#!/usr/bin/env bash
# .claude/hooks/inject-context.sh
# Adds current git status, branch, and any TODO.md to Claude's session context.
set -euo pipefail

cat <<EOF
Session starting at $(date -u +%FT%TZ).
Current branch: $(git branch --show-current 2>/dev/null || echo 'not a git repo')
Modified files:
$(git status --short 2>/dev/null || echo 'none')

TODOs in repo:
$(test -f TODO.md && head -20 TODO.md || echo 'no TODO.md')
EOF

exit 0

Whatever the hook prints on stdout becomes part of the session’s context: the model receives it before the first user prompt. This is the most underused hook event, since it provides Claude with project-specific situational awareness without enlarging CLAUDE.md.

For further information on customising Claude Code’s behaviour beyond hooks, refer to the custom commands guide and the skills primer. Hooks fire automatically, whereas commands and skills are user-invoked. Together, these three mechanisms cover most extension scenarios.

Model Introspection Hooks: the PyTorch Variant

The context now changes. Setting agents aside, consider a Python process holding a PyTorch nn.Module in which the behaviour of tensors flowing through the module must be observed. Typical use cases include capturing activations for a probing experiment, logging gradient magnitudes to debug a training run, and clipping gradients per layer for an ablation study.

PyTorch’s nn.Module class exposes a small set of hook registration methods that address these requirements without modifying the module’s forward code. The three most commonly used methods are described below:

API Signature Fires when Typical use case
register_forward_pre_hook hook(module, input) Before module.forward() runs Modify or inspect inputs
register_forward_hook hook(module, input, output) After module.forward() returns Capture activations, inspect outputs
register_full_backward_hook hook(module, grad_input, grad_output) After gradients computed for module Log/clip gradients, debug training

 

All three methods return a RemovableHandle. This handle should be retained, and handle.remove() should be called when the hook is no longer required. Failure to remove the handle leaves the hook firing on every forward pass indefinitely, until the module is garbage-collected. In a long-running training job, this constitutes a memory and performance leak.

PyTorch Forward Hook Synchronous, in-process, sees the (module, input, output) tuple Input tensor x: (B, C, H, W) from upstream Module.forward(x) e.g. nn.Conv2d, nn.Linear, ResNet block, transformer layer computes y = f(x) forward hook fires hook(module, input, output) your code runs here read tensors, save copies, log statistics, modify output Output y continues downstream Key properties: – Hook runs synchronously inside the autograd graph (gradient-tracking system) — overhead is real – Returning a non-None value from the hook replaces the output (advanced use, easy to break things) – Detach tensors before storing (output.detach().clone()) to avoid blowing up memory with the graph

The backward hook operates similarly but in the reverse direction. After loss.backward() propagates gradients back through the graph, the backward hook fires for each module that has one registered, receiving the gradients flowing into and out of that module:

PyTorch Backward Hook Fires during loss.backward(), in reverse order through the graph loss.backward() starts at scalar loss, walks graph in reverse grad flow Module (during backprop) computes ∂L/∂x from ∂L/∂y grad_input ← grad_output (via chain rule) backward hook fires hook(module, grad_input, grad_output) read grad norms, detect explode/vanish, clip in place Differences from forward hook: – grad_input and grad_output are TUPLES (one entry per tensor arg) — index carefully – Use register_full_backward_hook, not the deprecated register_backward_hook (broken for in-place ops) – Returning a modified grad_input tuple actually replaces what flows further upstream

The distinction between register_backward_hook (deprecated) and register_full_backward_hook (current) is a small but consequential point that wastes considerable time when overlooked. The deprecated version exhibited ordering issues with in-place operations and produced incorrect gradients for modules with non-trivial structure. The full_ variant should always be preferred.

For readers approaching this material from outside deep learning, brief definitions are provided. The forward pass is the computation that transforms inputs into outputs—for example, running an image through ResNet to obtain class scores. The backward pass is the reverse computation that determines the contribution of each parameter to the loss, using the chain rule of calculus. Autograd is PyTorch’s gradient-tracking machinery, which records every operation performed on a tensor so that those operations can be replayed in reverse when loss.backward() is called. A gradient is the vector of partial derivatives of the loss with respect to each parameter; it is the signal that informs the optimiser of the direction in which to adjust each weight. Hooks permit observation and modification of any of these quantities at module boundaries without altering the module’s source code.

A Working PyTorch Hooks Example

Three concrete tasks are demonstrated below: capturing activations from a ResNet block, logging gradient norms per layer to detect training instability, and clipping gradients in place to study the effect on a small training run.

Activation Extraction for Probing or Visualisation

Consider a scenario in which a pretrained ResNet-50 is available and the feature map following layer4 for an input image is required—perhaps to feed into a linear probe, perhaps to visualise the network’s response. Modifying the ResNet source code is undesirable, and a forward hook is the appropriate tool.

Activation Extraction with a Forward Hook Capture intermediate features from a frozen pretrained model Step 1: Register the hook captures = {} handle = model.layer4.register_forward_hook(lambda m,i,o: captures.update(layer4=o.detach())) Image input (1, 3, 224, 224) PIL → tensor ResNet-50 forward pass conv1 → bn1 → relu → layer1 → layer2 → layer3 → layer4 ← hook attached here final logits (1, 1000) we discard these Step 2: After forward, captures[“layer4”] holds the activation – Shape: (1, 2048, 7, 7) for a 224×224 input — 2048-channel feature map – Detached from the autograd graph (we used.detach() to avoid keeping forward state alive) – Now usable for: linear probe, CAM visualization, feature similarity search, dataset embedding – Step 3: handle.remove() when done. Forget this and you leak the hook.

import torch
import torchvision.models as models
from torchvision import transforms
from PIL import Image

# Pretrained ResNet-50, eval mode.
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)
model.eval()

# Where we will stash the activation.
captures: dict[str, torch.Tensor] = {}

def grab_layer4(module: torch.nn.Module,
                inp: tuple[torch.Tensor, ...],
                out: torch.Tensor) -> None:
    """Forward hook — copy the output, detach, store."""
    captures["layer4"] = out.detach().clone()

# Register on the layer4 stack (a Sequential of three Bottleneck blocks).
handle = model.layer4.register_forward_hook(grab_layer4)

try:
    # Standard ImageNet preprocessing.
    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225]),
    ])
    img = Image.open("dog.jpg").convert("RGB")
    x = preprocess(img).unsqueeze(0)

    with torch.no_grad():
        _ = model(x)   # we discard logits; we want the captured activation

    act = captures["layer4"]
    print(f"layer4 activation shape: {tuple(act.shape)}")
    # → layer4 activation shape: (1, 2048, 7, 7)

    # Now use `act` for whatever downstream analysis you want.
finally:
    # ALWAYS remove the hook when done.
    handle.remove()
Tip: The try/finally pattern is important. If downstream code raises an exception, a dangling hook will quietly increase memory pressure on the next inference. Registrations should be wrapped in a context manager if this pattern is used frequently.

Logging Gradient Norms with a Backward Hook

Gradient explosions are easier to diagnose when norms can be observed per layer. A few lines of backward hook code reduce this to a single-line printout per step:

import torch
import torch.nn as nn

class SmallNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(128, 256)
        self.fc2 = nn.Linear(256, 256)
        self.fc3 = nn.Linear(256, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

model = SmallNet()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)

# Track grad norms by layer name.
grad_norms: dict[str, float] = {}
handles = []

def make_hook(name: str):
    def hook(module, grad_input, grad_output):
        # grad_output is a tuple of grads w.r.t. each output tensor.
        # We log the L2 norm of the first one as a simple health metric.
        if grad_output[0] is not None:
            grad_norms[name] = grad_output[0].norm().item()
    return hook

for name, module in model.named_modules():
    if isinstance(module, nn.Linear):
        handles.append(module.register_full_backward_hook(make_hook(name)))

# Fake training step.
x = torch.randn(32, 128)
y = torch.randint(0, 10, (32,))

for step in range(3):
    optimizer.zero_grad()
    logits = model(x)
    loss = torch.nn.functional.cross_entropy(logits, y)
    loss.backward()
    optimizer.step()
    print(f"step {step}: " + ", ".join(f"{k}={v:.4f}" for k, v in grad_norms.items()))

# Cleanup.
for h in handles:
    h.remove()

A typical output line takes the form step 0: fc1=0.0421, fc2=0.0573, fc3=0.1382. If norms expand by orders of magnitude between steps, or fall to zero for a layer that should be learning, the source of the problem is readily identifiable. This pattern is also common in transformer training: instrumenting attention and MLP blocks separately follows the same approach, simply across more modules. For further discussion of training-stack instrumentation, see the LLM training guide.

Event Hooks: the MLOps Webhook Variant

The third variant operates at an entirely different level. Webhooks are not located within an agent or a model; they connect services. When a training job finishes, that fact must reach a dashboard, a notification service, a downstream pipeline, and a model registry. Webhooks are the mechanism by which this distribution occurs without each service polling the others.

The pattern is consistent across MLflow, Weights & Biases, HuggingFace, AWS SageMaker, and most model registries: when a defined event occurs, the source service sends an HTTP POST to a configured URL, with a JSON body describing the event and an HMAC signature in a header. The receiver verifies the signature, processes the event, and returns a 2xx status code (or signals failure and waits for a retry).

MLOps Webhook Flow Event source → HTTP POST with HMAC signature → receiver fans out Training job epoch 50 complete val_acc=0.912 run_id=abc123 MLflow / W&B fires registered webhook POST + JSON body + X-Signature header (HMAC) Your webhook receiver FastAPI / Cloud Run / Lambda — anything HTTP verifies HMAC, returns 200 Fan out to multiple downstreams Slack “run abc123 hit 0.912” human notification PagerDuty on failure events only paged escalation Internal dashboard append to time series trigger eval pipeline

Readers familiar with GitHub webhooks will recognise the design of MLOps webhooks: it is essentially the same. The header names vary by service (MLflow uses one name, Weights & Biases another, and so on), but the structure is invariant.

Common events that vendors expose as webhooks include run.started, run.finished, and run.failed from training trackers; model.version.created, model.version.staged, and model.version.promoted from model registries; dataset.uploaded or dataset.versioned from data platforms; drift.detected or alert.fired from monitoring systems; and increasingly evaluation.completed from automated evaluation services. Each event is accompanied by a stable JSON schema, fixed per major version, and a payload-signing scheme that almost invariably follows the GitHub pattern: a SHA-256 HMAC of the raw body, hex-encoded, in a single header.

One small but consequential decision concerns the location of the receiver. A long-running FastAPI application on a virtual machine places operational responsibility on the team when it fails outside business hours. A serverless function (Lambda, Cloud Run, Vercel) delegates availability to the platform and is charged per call, which is generally cheaper for low-volume webhook traffic. Most MLOps teams adopt serverless solutions for fan-out webhooks and reserve dedicated services for high-throughput hot paths such as real-time inference logging. The pattern is identical in either case; what differs is the operational profile.

A webhook receiver for an MLflow “run completed” event, with strict HMAC checking, is shown below:

"""Receiver for MLflow run-completed webhooks.

POST body shape (illustrative — check your MLflow version):
{
  "event": "run.finished",
  "run": {
    "run_id": "abc123",
    "experiment_id": "42",
    "status": "FINISHED",
    "metrics": {"val_accuracy": 0.912, "val_loss": 0.231}
  },
  "timestamp": "2026-05-25T14:23:11Z"
}
"""
import hashlib
import hmac
import json
import os
from fastapi import FastAPI, Header, HTTPException, Request

app = FastAPI()
MLFLOW_SECRET = os.environ["MLFLOW_WEBHOOK_SECRET"].encode("utf-8")
SLACK_WEBHOOK = os.environ.get("SLACK_WEBHOOK_URL")


def verify(body: bytes, signature_header: str) -> bool:
    """MLflow-style: 'sha256=<hex digest>'."""
    if not signature_header.startswith("sha256="):
        return False
    expected = hmac.new(MLFLOW_SECRET, body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, signature_header[len("sha256="):])


@app.post("/mlflow/run-finished")
async def run_finished(
    request: Request,
    x_mlflow_signature: str = Header(default=""),
):
    body = await request.body()
    if not verify(body, x_mlflow_signature):
        # Constant-time compare above; reject fast here.
        raise HTTPException(status_code=401, detail="bad signature")

    event = json.loads(body)
    run = event["run"]
    metrics = run.get("metrics", {})
    val_acc = metrics.get("val_accuracy")

    # Fan out: alert humans on Slack only above a threshold so we don't spam.
    if SLACK_WEBHOOK and val_acc is not None and val_acc > 0.90:
        import httpx
        msg = f"Run {run['run_id']} finished with val_accuracy={val_acc:.3f}"
        async with httpx.AsyncClient(timeout=5.0) as client:
            await client.post(SLACK_WEBHOOK, json={"text": msg})

    # Acknowledge — keep response tiny; the sender may impose a timeout.
    return {"ok": True}

Three points warrant attention. The signature check uses hmac.compare_digest rather than ==; the latter leaks timing information that allows an attacker to recover the signature byte by byte. The Slack call uses a short timeout, because a slow Slack response should not hold the MLflow connection open and trigger MLflow’s own timeout-and-retry behaviour. Finally, the receiver returns quickly: for any work heavier than a Slack notification, the operation should be pushed onto a queue and acknowledged immediately.

Webhook-adjacent patterns also appear in workflow orchestrators. Airflow’s on_success_callback and on_failure_callback are conceptually identical: they are in-process Python callbacks rather than HTTP POSTs, but they serve the same purpose. The Airflow orchestration guide describes how those callbacks compose with cross-system webhooks.

Selecting the Appropriate Hook: A Decision Framework

The three variants should by this point be clearly distinct. The remaining question is operational: given a problem, which variant should be selected? The matrix below provides guidance:

Decision Matrix — Which Hook for Which Problem Green = best fit. Yellow = workable but suboptimal. Red = wrong tool. Problem Lifecycle (Claude Code) PyTorch (introspection) Webhook Safety enforcement (agents) YES wrong layer via HTTP hook Activation extraction wrong scope YES wrong process Training-complete alert wrong scope possible but odd YES Gradient debugging wrong layer YES no visibility Auto-format after edit YES (PostToolUse) wrong layer wrong process Model registry promotion wrong scope wrong scope YES

A simple rule covers most cases: select the hook variant that operates at the same level as the entity to be observed or modified. Tensor flow occurs inside the model process and calls for PyTorch hooks. Agent decisions occur inside the agent runtime and call for lifecycle hooks. The training-job lifecycle spans services and calls for webhooks. Mixing levels works occasionally but usually creates more integration work than it eliminates.

A side-by-side reference for rapid selection follows:

Property Lifecycle (Claude Code) PyTorch introspection Webhook (MLOps)
Fires when Agent session events Per forward/backward call Infra/business events
Runs where Shell, HTTP, or MCP Same Python process, sync Remote HTTP service, async
Blocks execution? Yes (exit 2) No, but can modify tensors Indirectly (timeout/retry)
Language Any (shell, Python, Go) Python only Any (it’s just HTTP)
Typical user Agent builders, safety teams Researchers, model debuggers MLOps, platform teams
Auth model Filesystem perms (Command) or bearer (HTTP) In-process trust HMAC signature

 

Common Pitfalls

Each variant has its own failure modes. Awareness of these failure modes in advance saves considerable debugging time.

Claude Code Hook Pitfalls

Shell injection through tool arguments. A PreToolUse hook receives JSON containing whatever Claude intends to execute. Naively interpolating fields into a shell command—such as echo "$CMD" | grep ...—exposes a path to remote code execution from prompt-injection-style attacks. JSON should always be parsed with jq or another proper parser, never with string slicing.

Infinite hook loops. A PostToolUse hook that itself uses Claude to summarise the output, where Claude then invokes tools to summarise, and those tools trigger the PostToolUse hook again, produces a stack that is typically discovered at an inconvenient hour. Hooks should be terminal: they observe but do not re-invoke the agent.

Exit-code confusion. Bash’s set -e exits non-zero on any failure but not necessarily with code 2. If a hook’s safety-check command crashes for an unrelated reason, the tool will run anyway because the exit code is not the blocking value. When blocking matters, the script should exit with code 2 explicitly.

The hook is not versioned with the agent. Hook semantics evolve. A handler that worked under one Claude Code version may break under another (renamed fields in the event JSON, new required fields, and so on). Hook scripts should be pinned to the agent version against which they were tested, and re-tested after upgrades.

PyTorch Hook Pitfalls

Failing to call handle.remove(). This is the most common bug. A leaked forward hook is difficult to detect: the model continues to function, but more slowly, and memory usage drifts upwards. handle.remove() should be treated like close() and written on the same line as the registration where possible, or wrapped in a context manager.

Storing tensors with the graph attached. Storing output rather than output.detach() retains the entire computation graph leading to that output. On a fifty-layer model the consequences are severe. Tensors should always be detached, and usually cloned, before storage.

Hooks added in __init__ versus registered post hoc. Hooks registered on a module from outside do not survive a deep copy of the model (a common pattern in distributed training). Hooks installed in the module’s own __init__ do survive, because they form part of the module’s state. If the training launcher uses copy.deepcopy or torch.nn.parallel.replicate, registration should occur inside the module.

Overhead in tight loops. Every hook adds Python-level overhead per call. This is acceptable for offline analysis but problematic in a training loop with tens of thousands of iterations per epoch. Hooks should be registered only on the modules of interest, only for the steps of interest, and removed immediately afterwards.

For training-loop instrumentation that extends beyond gradient logging, the self-supervised learning guide presents similar patterns applied to representation extraction during pretraining.

Webhook Pitfalls

Timeout amplification. The sender (MLflow, Weights & Biases, or the model registry in question) typically imposes a short timeout, often five or ten seconds. If the receiver performs any slow operation inline—a database write, a slow Slack call, or ML inference—events will be missed and retries triggered. The recommended pattern is to receive quickly, queue the work, and return a 2xx status code.

Missing signature verification. An unverified webhook endpoint is a public remote-code-execution risk if the handler performs any privileged operation with the payload. HMAC should be verified on every request, compared with hmac.compare_digest, and the source IP should not be relied upon.

At-least-once semantics. Almost every webhook sender retries on failure, so the receiver will observe the same event more than once. The handler must be idempotent: the same event delivered twice should not double-count, double-notify, or double-promote.

Replay attacks. Even with HMAC, a captured request can be replayed. A timestamp should be included in the signature payload (most senders do this already), and events older than a small window should be rejected.

Caution: Across all three variants, the most common silent failure is the same: the hook is in place but is not actually executing. A misconfigured matcher, a leftover handle, or a webhook endpoint that senders no longer reach can all produce this outcome. Observability should be added through audit logs and gauge metrics on hook invocation counts so that a non-firing hook is detected.

Frequently Asked Questions

Are Claude Code hooks the same as MCP servers?

No. MCP servers extend what an agent can do by exposing new tools, resources, and prompts that the agent can call. Hooks extend the agent’s lifecycle by inserting policy at predefined moments. Both can be used simultaneously; a common pattern is an MCP server that provides project context paired with a PreToolUse hook that enforces safety on the agent’s tool calls. The two systems are complementary rather than redundant.

Does register_forward_hook affect gradients?

It can. If the hook returns a tensor in place of None, that tensor replaces the module’s output for the remainder of the forward pass, and gradients flow through the replacement during backpropagation. If the hook only reads tensors and returns None, gradients are unaffected. The same applies to backward hooks: returning a modified grad_input tuple replaces what propagates further back. For read-only inspection, the hook should return nothing.

Can webhooks block a training job?

Indirectly. If a model-registry promotion event has a configured webhook receiver that times out, some registries pause the promotion pending retries while others fail the promotion entirely. In either case, the system being hooked into determines whether a slow receiver can stall the workflow. The documentation for the specific service should be consulted. As a general rule, webhooks should be treated as fire-and-forget signals rather than synchronous gates.

What is the relationship between hooks and callbacks?

The terms are largely synonymous, with a difference of connotation. “Callback” implies a function registered by the user, often for a single defined moment. “Hook” implies a registered extension point exposed by the host system, often one of many. PyTorch documentation uses “hook”; asyncio documentation uses “callback”; the underlying concept is the same. In MLOps, Airflow uses “callback” (on_success_callback) while GitHub uses “webhook”—the same pattern, expressed in different vocabulary.

Are there security risks specific to lifecycle hooks?

Yes, three principal risks. First, hooks run with the agent’s privileges, which usually corresponds to the user’s shell, so a bug in a hook script can cause real damage on the machine. Second, hook payloads contain whatever the model intends to do, including potentially adversarial content arising from prompt injection; naive shell interpolation is dangerous. Third, hooks are invisible: a colleague inspecting an agent session will not see the hook fire unless it is logged. Audit logging and code review for hook scripts are as important as for production code. The harness engineering guide covers the broader threat model.

References

Conclusion

“Hook” in AI is a small term performing three distinct functions. Lifecycle hooks allow deterministic policy to be inserted into an agent’s session without forking the agent. Model introspection hooks allow tensor flow to be read or modified without forking the model. Event hooks allow services to communicate about significant moments without polling. The mechanisms share a name and a skeletal definition—a callback at a defined point—but they differ in process, language, blocking semantics, security model, and audience.

The practical guidance reduces to three rules. First, select the variant that matches the layer at which the problem resides; agent safety should not be enforced with a PyTorch hook, nor should activations be extracted via a webhook. Second, treat hook code as production code: review it, audit it, log it, and version it alongside the system it extends. Third, recall that hooks are powerful precisely because they are invisible to the host; that invisibility is also their principal failure mode, so observability should be built in to detect when a hook ceases to fire.

One habit worth taking from this guide is the following: whenever the advice to “use a hook” appears in documentation or in a blog post, the appropriate first question is which variant. The answer almost always determines the correct design.

You Might Also Like

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *