The term “hook” in the context of artificial intelligence will elicit different responses depending on the audience. The agent-framework engineer typically refers to a shell command that fires before Claude Code runs a tool. The deep-learning researcher has in mind a Python callback registered on a neural network layer to capture activations. The MLOps engineer envisions an HTTP POST that lands in Slack the moment a training run finishes. The same term covers three distinct mechanisms, three distinct audiences, and three distinct sets of debugging considerations.
This overloading is not accidental: all three variants share the same underlying idea, namely a callback that fires at a defined point in another system’s execution. Treating them as interchangeable, however, is a frequent source of confusion. Advice to “use a hook” carries little practical value without specifying which variant is intended. The present guide therefore draws the boundaries explicitly and then accompanies each variant with working code.
Summary
What this post covers: The word “hook” in AI refers to at least three distinct mechanisms — agent lifecycle hooks (Claude Code and similar frameworks), model introspection hooks (PyTorch forward and backward callbacks), and MLOps event hooks (webhooks fired by training jobs and model registries). This post defines each, shows working code, and gives you a decision framework for picking the right one.
Key insights:
- Claude Code exposes 12 lifecycle events and a small number of handler types, with exit code 2 reserved as the “block this action” signal that feeds stderr back to Claude as an error message.
- PyTorch hooks come in three core flavors —
register_forward_pre_hook,register_forward_hook, andregister_full_backward_hook— each with a fixed signature and aRemovableHandleyou must call.remove()on to avoid leaks. - MLOps webhooks are just HTTP POSTs with HMAC signatures, but they amplify failures: a slow receiver can block a model registry, and a missing signature check turns your training pipeline into an open RCE surface.
- The three flavors are not interchangeable — picking the wrong one (a PyTorch hook to enforce safety, a webhook for activation extraction) leads to brittle systems that fight their own runtime.
- Hooks are powerful precisely because they don’t require modifying the host system, but the same property makes them invisible — discoverability and audit logging matter as much as the hook code itself.
Main topics: Three different things people mean by “hook” in AI, Lifecycle hooks the agent-lifecycle flavor, A working Claude Code hooks example, Model introspection hooks the PyTorch flavor, A working PyTorch hooks example, Event hooks the MLOps webhook flavor, When to use which kind of hook, Common pitfalls.
Three different things people mean by “hook” in AI
Vocabulary first, then code. The three variants of “hook” in AI share the same skeletal definition—a user-supplied callback that fires at a defined point in another system’s execution—but they differ in every operationally important respect: where the callback runs, which process owns it, whether it can block the host, and what data it observes.
A lifecycle hook fires at a specific moment in an agent’s session loop. The canonical example is Claude Code’s PreToolUse event, which fires after the model has decided to invoke a tool but before the tool actually executes. The hook is a separate process—a shell command, an HTTP endpoint, or an MCP server—that the agent invokes with structured JSON describing the intended action. The hook may approve, modify, or block the action through its exit code or response. Lifecycle hooks exist because agent runtimes require extensibility points that do not necessitate forking the agent itself.
A model introspection hook is an in-process Python callback registered on a neural network module. PyTorch’s register_forward_hook is the canonical case: a function is supplied, and PyTorch calls that function every time the module’s forward() runs, passing the module, its input, and its output. The hook lives in the same process as the model, runs synchronously within the autograd graph (the system that tracks tensor operations for gradient computation), and may read or even modify tensors on the fly. Such hooks exist because researchers need to inspect a model without rewriting its source code.
An event hook, usually called a webhook in MLOps contexts, is an HTTP POST issued by one service to another when a defined event occurs—a training run completes, a model is promoted to production, or a drift detector exceeds a threshold. The hook receiver lives in an entirely different process (often on a different host or behind a load balancer), authenticates via a shared secret with HMAC (a cryptographic signature method that proves the message was not tampered with), and runs asynchronously with respect to the event source. Webhooks exist because MLOps stacks are heterogeneous and require a low-friction mechanism for distributing events across systems.
Three observations render this taxonomy useful rather than pedantic. First, the audiences scarcely overlap: the researcher confronting a vanishing gradient and the platform engineer integrating a model registry both rely on “hooks,” but their tooling, vocabulary, and failure modes have little in common. Second, the level of trust required differs sharply: a PyTorch hook runs inside the process and is implicitly trusted; a Claude Code hook executes shell commands and is trusted but auditable; a webhook crosses a network boundary and must therefore authenticate. Third, the cost of misclassification scales accordingly: an errant PyTorch hook leaks memory, an errant Claude Code hook may erase a file, and an errant webhook handler may broadcast secrets. Selecting the right variant is not merely a stylistic choice; it defines the security boundary of the entire feature.
The figure below summarises the taxonomy:
Lifecycle hooks: the agent-lifecycle flavor
Lifecycle hooks are the most recent of the three variants to enter the AI lexicon, largely because agent frameworks themselves are recent. The mechanism is straightforward: an agent runtime defines a small set of events that mark notable moments in its operation, and handlers are registered to fire when those events occur.
Claude Code, the CLI agent developed by Anthropic, exposes twelve such events in its current hooks system (per the official documentation at code.claude.com/docs/en/hooks, as of 2026-05-25). The events span the full session arc, from SessionStart when the agent boots, through UserPromptSubmit when the user submits input, to PreToolUse and PostToolUse that wrap every tool call, and finally to Stop and SessionEnd. Each event passes structured JSON to the handler describing the current operation, and the handler may respond with text (returned to Claude as additional context), a block decision, or simply an exit code.
The significance of this mechanism is as follows: without hooks, customising an agent’s behaviour requires either writing a custom tool (a heavy approach) or relying on a CLAUDE.md instruction (an unreliable one). Hooks provide a third option—deterministic, code-enforced policy that fires regardless of the model’s decisions. If a hook returns exit code 2 on a PreToolUse for any Bash call matching /rm -rf \//, the tool will not run. The model is not merely asked not to run it; the tool will not run. This distinction constitutes the entire value proposition.
The twelve events may be categorised by responsibility as follows:
| Event | When it fires | Can block? | Typical use case |
|---|---|---|---|
SessionStart |
Agent boots up | No | Inject project context, set env vars |
UserPromptSubmit |
After you hit enter | Yes | Validate prompt, expand templates |
PreToolUse |
Before any tool runs | Yes | Safety check, dry-run preview |
PostToolUse |
After tool returns | No | Auto-format, log, scan output |
Notification |
Permission prompts, etc. | No | Forward to phone, log audit trail |
Stop |
Claude finishes its turn | Yes | Force continuation, run tests |
SubagentStop |
A sub-agent finishes | Yes | Collect sub-agent artifacts |
SessionEnd |
Session terminates | No | Final cleanup, session summary |
PreCompact |
Before context compaction | No | Persist scratchpad to disk |
PreRespond |
Before reply streams | Yes | Redact, annotate, classify |
| Edit/file events | On file modifications | No | Format, lint, version control |
| Slash command events | On /command invocation | Varies | Custom command preprocessing |
The names matter because matching is partially name-based. A hook configuration in .claude/settings.json specifies an event name and an optional matcher (a regular expression tested against the tool name for tool-related events), followed by a list of handlers. The handler contains the code that executes.
Handler Types and Where They Run
Claude Code’s hooks system currently supports four handler types per the official documentation (as of 2026-05-25; readers should consult the latest reference for the authoritative list, as this area continues to evolve). The three most commonly encountered are described below:
The handler type should be chosen on the basis of the desired operational profile rather than syntactic preference:
| Handler type | Best for | Security posture | When to pick |
|---|---|---|---|
| Command | Local, per-developer policies | Runs as the local user; care required with untrusted arguments | Default for solo or single-machine use |
| HTTP | Team-wide central policy | Use TLS and auth header; isolate the receiver | When a single policy must be enforced across many developers |
| MCP | Integration with existing MCP servers | Inherits the MCP server security model | When MCP infrastructure is already in operation and consistency is required |
Readers new to MCP may find the Model Context Protocol primer a useful companion. Hooks and MCP servers represent two of the principal extensibility surfaces in modern agent runtimes, and they frequently operate in concert.
Exit Code Semantics for Command Handlers
The Command handler’s contract is small but precise. According to the official hooks documentation (as of 2026-05-25):
- Exit 0: success. Stdout is captured but treated as informational, and Claude proceeds normally.
- Exit 2: blocking error. Stderr is returned to Claude as an error message. For
PreToolUsethis blocks the tool call entirely; forStopit forces continuation. This is the appropriate code for deterministic prevention. - Other non-zero: warning. The event is logged but not blocked, which is useful for soft policy (“not recommended, but permitted”).
set -e and then crashes will exit non-zero but probably not with code 2, so the tool will run anyway and only a warning will be logged. Blocking paths should be tested explicitly.
A Working Claude Code Hooks Example
Concrete code follows. The .claude/settings.json file below configures three hooks: a PreToolUse safety check, a PostToolUse auto-formatter, and a SessionStart context injector.
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"handlers": [
{
"type": "command",
"command": ".claude/hooks/safety-check.sh"
}
]
}
],
"PostToolUse": [
{
"matcher": "Edit|Write",
"handlers": [
{
"type": "command",
"command": ".claude/hooks/auto-format.sh"
},
{
"type": "http",
"url": "https://hooks.internal.example.com/claude-edit",
"headers": {
"Authorization": "Bearer ${CLAUDE_HOOK_TOKEN}"
}
}
]
}
],
"SessionStart": [
{
"handlers": [
{
"type": "command",
"command": ".claude/hooks/inject-context.sh"
}
]
}
]
}
}
Note the two-handler array on PostToolUse: hooks compose. Both execute, and their outputs are aggregated. The matcher is a regular expression matched against the tool name; Edit|Write means the hook fires on either event.
PreToolUse Safety Hook in Bash
The shell script below blocks dangerous rm patterns and writes an audit log of every Bash invocation. It reads the event JSON from stdin (using jq for parsing) and exits with code 2 and an explanatory stderr message when a risky pattern is observed.
#!/usr/bin/env bash
# .claude/hooks/safety-check.sh
# Blocks dangerous rm patterns; audits all Bash invocations.
set -uo pipefail
PAYLOAD=$(cat)
CMD=$(echo "$PAYLOAD" | jq -r '.tool_input.command // empty')
# Audit log first — we want every attempt recorded.
mkdir -p .claude/audit
echo "$(date -u +%FT%TZ) $CMD" >> .claude/audit/bash.log
# Block obvious destructive patterns.
DANGEROUS_PATTERNS=(
'rm[[:space:]]+-rf?[[:space:]]+/($|[[:space:]])'
'rm[[:space:]]+-rf?[[:space:]]+/\*'
'rm[[:space:]]+-rf?[[:space:]]+~'
':\(\)\{[[:space:]]*:\|:&[[:space:]]*\};:' # fork bomb
'mkfs\.'
'dd[[:space:]]+if=/dev/(zero|random|urandom)[[:space:]]+of=/dev/sd'
)
for pat in "${DANGEROUS_PATTERNS[@]}"; do
if [[ "$CMD" =~ $pat ]]; then
echo "Blocked: command matches dangerous pattern '$pat'" >&2
echo "If you really need to run this, do it manually outside Claude." >&2
exit 2
fi
done
# Also block writes to anything under /etc or /usr without sudo prompting.
if [[ "$CMD" =~ (^|[[:space:]])(rm|mv|cp|tee|>)[[:space:]].*(/etc/|/usr/) ]]; then
echo "Blocked: write to system path detected." >&2
exit 2
fi
exit 0
The pattern list is intentionally short, because long pattern lists provide a false sense of security. The real defence is the audit log: even when a command is not blocked, a tamper-evident record of Claude’s attempted actions remains available.
PostToolUse Auto-Formatter
#!/usr/bin/env bash
# .claude/hooks/auto-format.sh
# Runs Prettier / Black on any file Claude just edited.
set -euo pipefail
PAYLOAD=$(cat)
FILE=$(echo "$PAYLOAD" | jq -r '.tool_input.file_path // .tool_input.path // empty')
if [[ -z "$FILE" ]] || [[ ! -f "$FILE" ]]; then
exit 0
fi
case "$FILE" in
*.py) ruff format "$FILE" 2>/dev/null || true ;;
*.ts|*.tsx) npx prettier --write "$FILE" 2>/dev/null || true ;;
*.js|*.jsx) npx prettier --write "$FILE" 2>/dev/null || true ;;
*.json) npx prettier --write "$FILE" 2>/dev/null || true ;;
*.go) gofmt -w "$FILE" 2>/dev/null || true ;;
esac
# PostToolUse is not blocking — exit 0 even on format failure.
exit 0
Note the || true: a missing formatter should not cause the hook to fail. Failing a PostToolUse hook with exit code 2 has no effect (the tool has already run), but exit code 1 still produces noise in the agent’s view.
HTTP PostToolUse Hook (FastAPI Receiver)
For team-wide policy or central observability, an HTTP hook is preferable to a per-machine command. A minimal FastAPI receiver is shown below:
"""Webhook receiver for Claude Code PostToolUse events.
Run: uvicorn receiver:app --host 0.0.0.0 --port 8080
"""
import hashlib
import hmac
import json
import logging
import os
from datetime import datetime, timezone
from fastapi import FastAPI, Header, HTTPException, Request
app = FastAPI()
log = logging.getLogger("claude_hook")
logging.basicConfig(level=logging.INFO)
SECRET = os.environ["CLAUDE_HOOK_SECRET"].encode("utf-8")
def verify_signature(body: bytes, signature: str) -> bool:
"""HMAC-SHA256 signature check — prevents spoofed events."""
expected = hmac.new(SECRET, body, hashlib.sha256).hexdigest()
return hmac.compare_digest(expected, signature or "")
@app.post("/claude-edit")
async def claude_edit(
request: Request,
authorization: str | None = Header(default=None),
x_signature: str | None = Header(default=None),
):
body = await request.body()
if not verify_signature(body, x_signature or ""):
raise HTTPException(status_code=401, detail="bad signature")
event = json.loads(body)
log.info(
"edit by %s on %s at %s",
event.get("session_id", "?"),
event.get("tool_input", {}).get("file_path", "?"),
datetime.now(timezone.utc).isoformat(),
)
# Return JSON the agent can use. An empty body is fine for fire-and-forget.
return {"status": "logged"}
The signature check is important. Without it, any party able to reach the endpoint can fabricate “Claude edited /etc/passwd” events. The shared secret resides in CLAUDE_HOOK_SECRET on both the Claude Code client and the receiver.
SessionStart Context Injector
#!/usr/bin/env bash
# .claude/hooks/inject-context.sh
# Adds current git status, branch, and any TODO.md to Claude's session context.
set -euo pipefail
cat <<EOF
Session starting at $(date -u +%FT%TZ).
Current branch: $(git branch --show-current 2>/dev/null || echo 'not a git repo')
Modified files:
$(git status --short 2>/dev/null || echo 'none')
TODOs in repo:
$(test -f TODO.md && head -20 TODO.md || echo 'no TODO.md')
EOF
exit 0
Whatever the hook prints on stdout becomes part of the session’s context: the model receives it before the first user prompt. This is the most underused hook event, since it provides Claude with project-specific situational awareness without enlarging CLAUDE.md.
For further information on customising Claude Code’s behaviour beyond hooks, refer to the custom commands guide and the skills primer. Hooks fire automatically, whereas commands and skills are user-invoked. Together, these three mechanisms cover most extension scenarios.
Model Introspection Hooks: the PyTorch Variant
The context now changes. Setting agents aside, consider a Python process holding a PyTorch nn.Module in which the behaviour of tensors flowing through the module must be observed. Typical use cases include capturing activations for a probing experiment, logging gradient magnitudes to debug a training run, and clipping gradients per layer for an ablation study.
PyTorch’s nn.Module class exposes a small set of hook registration methods that address these requirements without modifying the module’s forward code. The three most commonly used methods are described below:
| API | Signature | Fires when | Typical use case |
|---|---|---|---|
register_forward_pre_hook |
hook(module, input) |
Before module.forward() runs | Modify or inspect inputs |
register_forward_hook |
hook(module, input, output) |
After module.forward() returns | Capture activations, inspect outputs |
register_full_backward_hook |
hook(module, grad_input, grad_output) |
After gradients computed for module | Log/clip gradients, debug training |
All three methods return a RemovableHandle. This handle should be retained, and handle.remove() should be called when the hook is no longer required. Failure to remove the handle leaves the hook firing on every forward pass indefinitely, until the module is garbage-collected. In a long-running training job, this constitutes a memory and performance leak.
The backward hook operates similarly but in the reverse direction. After loss.backward() propagates gradients back through the graph, the backward hook fires for each module that has one registered, receiving the gradients flowing into and out of that module:
The distinction between register_backward_hook (deprecated) and register_full_backward_hook (current) is a small but consequential point that wastes considerable time when overlooked. The deprecated version exhibited ordering issues with in-place operations and produced incorrect gradients for modules with non-trivial structure. The full_ variant should always be preferred.
For readers approaching this material from outside deep learning, brief definitions are provided. The forward pass is the computation that transforms inputs into outputs—for example, running an image through ResNet to obtain class scores. The backward pass is the reverse computation that determines the contribution of each parameter to the loss, using the chain rule of calculus. Autograd is PyTorch’s gradient-tracking machinery, which records every operation performed on a tensor so that those operations can be replayed in reverse when loss.backward() is called. A gradient is the vector of partial derivatives of the loss with respect to each parameter; it is the signal that informs the optimiser of the direction in which to adjust each weight. Hooks permit observation and modification of any of these quantities at module boundaries without altering the module’s source code.
A Working PyTorch Hooks Example
Three concrete tasks are demonstrated below: capturing activations from a ResNet block, logging gradient norms per layer to detect training instability, and clipping gradients in place to study the effect on a small training run.
Activation Extraction for Probing or Visualisation
Consider a scenario in which a pretrained ResNet-50 is available and the feature map following layer4 for an input image is required—perhaps to feed into a linear probe, perhaps to visualise the network’s response. Modifying the ResNet source code is undesirable, and a forward hook is the appropriate tool.
import torch
import torchvision.models as models
from torchvision import transforms
from PIL import Image
# Pretrained ResNet-50, eval mode.
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)
model.eval()
# Where we will stash the activation.
captures: dict[str, torch.Tensor] = {}
def grab_layer4(module: torch.nn.Module,
inp: tuple[torch.Tensor, ...],
out: torch.Tensor) -> None:
"""Forward hook — copy the output, detach, store."""
captures["layer4"] = out.detach().clone()
# Register on the layer4 stack (a Sequential of three Bottleneck blocks).
handle = model.layer4.register_forward_hook(grab_layer4)
try:
# Standard ImageNet preprocessing.
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
img = Image.open("dog.jpg").convert("RGB")
x = preprocess(img).unsqueeze(0)
with torch.no_grad():
_ = model(x) # we discard logits; we want the captured activation
act = captures["layer4"]
print(f"layer4 activation shape: {tuple(act.shape)}")
# → layer4 activation shape: (1, 2048, 7, 7)
# Now use `act` for whatever downstream analysis you want.
finally:
# ALWAYS remove the hook when done.
handle.remove()
try/finally pattern is important. If downstream code raises an exception, a dangling hook will quietly increase memory pressure on the next inference. Registrations should be wrapped in a context manager if this pattern is used frequently.
Logging Gradient Norms with a Backward Hook
Gradient explosions are easier to diagnose when norms can be observed per layer. A few lines of backward hook code reduce this to a single-line printout per step:
import torch
import torch.nn as nn
class SmallNet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(128, 256)
self.fc2 = nn.Linear(256, 256)
self.fc3 = nn.Linear(256, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
return self.fc3(x)
model = SmallNet()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
# Track grad norms by layer name.
grad_norms: dict[str, float] = {}
handles = []
def make_hook(name: str):
def hook(module, grad_input, grad_output):
# grad_output is a tuple of grads w.r.t. each output tensor.
# We log the L2 norm of the first one as a simple health metric.
if grad_output[0] is not None:
grad_norms[name] = grad_output[0].norm().item()
return hook
for name, module in model.named_modules():
if isinstance(module, nn.Linear):
handles.append(module.register_full_backward_hook(make_hook(name)))
# Fake training step.
x = torch.randn(32, 128)
y = torch.randint(0, 10, (32,))
for step in range(3):
optimizer.zero_grad()
logits = model(x)
loss = torch.nn.functional.cross_entropy(logits, y)
loss.backward()
optimizer.step()
print(f"step {step}: " + ", ".join(f"{k}={v:.4f}" for k, v in grad_norms.items()))
# Cleanup.
for h in handles:
h.remove()
A typical output line takes the form step 0: fc1=0.0421, fc2=0.0573, fc3=0.1382. If norms expand by orders of magnitude between steps, or fall to zero for a layer that should be learning, the source of the problem is readily identifiable. This pattern is also common in transformer training: instrumenting attention and MLP blocks separately follows the same approach, simply across more modules. For further discussion of training-stack instrumentation, see the LLM training guide.
Event Hooks: the MLOps Webhook Variant
The third variant operates at an entirely different level. Webhooks are not located within an agent or a model; they connect services. When a training job finishes, that fact must reach a dashboard, a notification service, a downstream pipeline, and a model registry. Webhooks are the mechanism by which this distribution occurs without each service polling the others.
The pattern is consistent across MLflow, Weights & Biases, HuggingFace, AWS SageMaker, and most model registries: when a defined event occurs, the source service sends an HTTP POST to a configured URL, with a JSON body describing the event and an HMAC signature in a header. The receiver verifies the signature, processes the event, and returns a 2xx status code (or signals failure and waits for a retry).
Readers familiar with GitHub webhooks will recognise the design of MLOps webhooks: it is essentially the same. The header names vary by service (MLflow uses one name, Weights & Biases another, and so on), but the structure is invariant.
Common events that vendors expose as webhooks include run.started, run.finished, and run.failed from training trackers; model.version.created, model.version.staged, and model.version.promoted from model registries; dataset.uploaded or dataset.versioned from data platforms; drift.detected or alert.fired from monitoring systems; and increasingly evaluation.completed from automated evaluation services. Each event is accompanied by a stable JSON schema, fixed per major version, and a payload-signing scheme that almost invariably follows the GitHub pattern: a SHA-256 HMAC of the raw body, hex-encoded, in a single header.
One small but consequential decision concerns the location of the receiver. A long-running FastAPI application on a virtual machine places operational responsibility on the team when it fails outside business hours. A serverless function (Lambda, Cloud Run, Vercel) delegates availability to the platform and is charged per call, which is generally cheaper for low-volume webhook traffic. Most MLOps teams adopt serverless solutions for fan-out webhooks and reserve dedicated services for high-throughput hot paths such as real-time inference logging. The pattern is identical in either case; what differs is the operational profile.
A webhook receiver for an MLflow “run completed” event, with strict HMAC checking, is shown below:
"""Receiver for MLflow run-completed webhooks.
POST body shape (illustrative — check your MLflow version):
{
"event": "run.finished",
"run": {
"run_id": "abc123",
"experiment_id": "42",
"status": "FINISHED",
"metrics": {"val_accuracy": 0.912, "val_loss": 0.231}
},
"timestamp": "2026-05-25T14:23:11Z"
}
"""
import hashlib
import hmac
import json
import os
from fastapi import FastAPI, Header, HTTPException, Request
app = FastAPI()
MLFLOW_SECRET = os.environ["MLFLOW_WEBHOOK_SECRET"].encode("utf-8")
SLACK_WEBHOOK = os.environ.get("SLACK_WEBHOOK_URL")
def verify(body: bytes, signature_header: str) -> bool:
"""MLflow-style: 'sha256=<hex digest>'."""
if not signature_header.startswith("sha256="):
return False
expected = hmac.new(MLFLOW_SECRET, body, hashlib.sha256).hexdigest()
return hmac.compare_digest(expected, signature_header[len("sha256="):])
@app.post("/mlflow/run-finished")
async def run_finished(
request: Request,
x_mlflow_signature: str = Header(default=""),
):
body = await request.body()
if not verify(body, x_mlflow_signature):
# Constant-time compare above; reject fast here.
raise HTTPException(status_code=401, detail="bad signature")
event = json.loads(body)
run = event["run"]
metrics = run.get("metrics", {})
val_acc = metrics.get("val_accuracy")
# Fan out: alert humans on Slack only above a threshold so we don't spam.
if SLACK_WEBHOOK and val_acc is not None and val_acc > 0.90:
import httpx
msg = f"Run {run['run_id']} finished with val_accuracy={val_acc:.3f}"
async with httpx.AsyncClient(timeout=5.0) as client:
await client.post(SLACK_WEBHOOK, json={"text": msg})
# Acknowledge — keep response tiny; the sender may impose a timeout.
return {"ok": True}
Three points warrant attention. The signature check uses hmac.compare_digest rather than ==; the latter leaks timing information that allows an attacker to recover the signature byte by byte. The Slack call uses a short timeout, because a slow Slack response should not hold the MLflow connection open and trigger MLflow’s own timeout-and-retry behaviour. Finally, the receiver returns quickly: for any work heavier than a Slack notification, the operation should be pushed onto a queue and acknowledged immediately.
Webhook-adjacent patterns also appear in workflow orchestrators. Airflow’s on_success_callback and on_failure_callback are conceptually identical: they are in-process Python callbacks rather than HTTP POSTs, but they serve the same purpose. The Airflow orchestration guide describes how those callbacks compose with cross-system webhooks.
Selecting the Appropriate Hook: A Decision Framework
The three variants should by this point be clearly distinct. The remaining question is operational: given a problem, which variant should be selected? The matrix below provides guidance:
A simple rule covers most cases: select the hook variant that operates at the same level as the entity to be observed or modified. Tensor flow occurs inside the model process and calls for PyTorch hooks. Agent decisions occur inside the agent runtime and call for lifecycle hooks. The training-job lifecycle spans services and calls for webhooks. Mixing levels works occasionally but usually creates more integration work than it eliminates.
A side-by-side reference for rapid selection follows:
| Property | Lifecycle (Claude Code) | PyTorch introspection | Webhook (MLOps) |
|---|---|---|---|
| Fires when | Agent session events | Per forward/backward call | Infra/business events |
| Runs where | Shell, HTTP, or MCP | Same Python process, sync | Remote HTTP service, async |
| Blocks execution? | Yes (exit 2) | No, but can modify tensors | Indirectly (timeout/retry) |
| Language | Any (shell, Python, Go) | Python only | Any (it’s just HTTP) |
| Typical user | Agent builders, safety teams | Researchers, model debuggers | MLOps, platform teams |
| Auth model | Filesystem perms (Command) or bearer (HTTP) | In-process trust | HMAC signature |
Common Pitfalls
Each variant has its own failure modes. Awareness of these failure modes in advance saves considerable debugging time.
Claude Code Hook Pitfalls
Shell injection through tool arguments. A PreToolUse hook receives JSON containing whatever Claude intends to execute. Naively interpolating fields into a shell command—such as echo "$CMD" | grep ...—exposes a path to remote code execution from prompt-injection-style attacks. JSON should always be parsed with jq or another proper parser, never with string slicing.
Infinite hook loops. A PostToolUse hook that itself uses Claude to summarise the output, where Claude then invokes tools to summarise, and those tools trigger the PostToolUse hook again, produces a stack that is typically discovered at an inconvenient hour. Hooks should be terminal: they observe but do not re-invoke the agent.
Exit-code confusion. Bash’s set -e exits non-zero on any failure but not necessarily with code 2. If a hook’s safety-check command crashes for an unrelated reason, the tool will run anyway because the exit code is not the blocking value. When blocking matters, the script should exit with code 2 explicitly.
The hook is not versioned with the agent. Hook semantics evolve. A handler that worked under one Claude Code version may break under another (renamed fields in the event JSON, new required fields, and so on). Hook scripts should be pinned to the agent version against which they were tested, and re-tested after upgrades.
PyTorch Hook Pitfalls
Failing to call handle.remove(). This is the most common bug. A leaked forward hook is difficult to detect: the model continues to function, but more slowly, and memory usage drifts upwards. handle.remove() should be treated like close() and written on the same line as the registration where possible, or wrapped in a context manager.
Storing tensors with the graph attached. Storing output rather than output.detach() retains the entire computation graph leading to that output. On a fifty-layer model the consequences are severe. Tensors should always be detached, and usually cloned, before storage.
Hooks added in __init__ versus registered post hoc. Hooks registered on a module from outside do not survive a deep copy of the model (a common pattern in distributed training). Hooks installed in the module’s own __init__ do survive, because they form part of the module’s state. If the training launcher uses copy.deepcopy or torch.nn.parallel.replicate, registration should occur inside the module.
Overhead in tight loops. Every hook adds Python-level overhead per call. This is acceptable for offline analysis but problematic in a training loop with tens of thousands of iterations per epoch. Hooks should be registered only on the modules of interest, only for the steps of interest, and removed immediately afterwards.
For training-loop instrumentation that extends beyond gradient logging, the self-supervised learning guide presents similar patterns applied to representation extraction during pretraining.
Webhook Pitfalls
Timeout amplification. The sender (MLflow, Weights & Biases, or the model registry in question) typically imposes a short timeout, often five or ten seconds. If the receiver performs any slow operation inline—a database write, a slow Slack call, or ML inference—events will be missed and retries triggered. The recommended pattern is to receive quickly, queue the work, and return a 2xx status code.
Missing signature verification. An unverified webhook endpoint is a public remote-code-execution risk if the handler performs any privileged operation with the payload. HMAC should be verified on every request, compared with hmac.compare_digest, and the source IP should not be relied upon.
At-least-once semantics. Almost every webhook sender retries on failure, so the receiver will observe the same event more than once. The handler must be idempotent: the same event delivered twice should not double-count, double-notify, or double-promote.
Replay attacks. Even with HMAC, a captured request can be replayed. A timestamp should be included in the signature payload (most senders do this already), and events older than a small window should be rejected.
handle, or a webhook endpoint that senders no longer reach can all produce this outcome. Observability should be added through audit logs and gauge metrics on hook invocation counts so that a non-firing hook is detected.
Frequently Asked Questions
Are Claude Code hooks the same as MCP servers?
No. MCP servers extend what an agent can do by exposing new tools, resources, and prompts that the agent can call. Hooks extend the agent’s lifecycle by inserting policy at predefined moments. Both can be used simultaneously; a common pattern is an MCP server that provides project context paired with a PreToolUse hook that enforces safety on the agent’s tool calls. The two systems are complementary rather than redundant.
Does register_forward_hook affect gradients?
It can. If the hook returns a tensor in place of None, that tensor replaces the module’s output for the remainder of the forward pass, and gradients flow through the replacement during backpropagation. If the hook only reads tensors and returns None, gradients are unaffected. The same applies to backward hooks: returning a modified grad_input tuple replaces what propagates further back. For read-only inspection, the hook should return nothing.
Can webhooks block a training job?
Indirectly. If a model-registry promotion event has a configured webhook receiver that times out, some registries pause the promotion pending retries while others fail the promotion entirely. In either case, the system being hooked into determines whether a slow receiver can stall the workflow. The documentation for the specific service should be consulted. As a general rule, webhooks should be treated as fire-and-forget signals rather than synchronous gates.
What is the relationship between hooks and callbacks?
The terms are largely synonymous, with a difference of connotation. “Callback” implies a function registered by the user, often for a single defined moment. “Hook” implies a registered extension point exposed by the host system, often one of many. PyTorch documentation uses “hook”; asyncio documentation uses “callback”; the underlying concept is the same. In MLOps, Airflow uses “callback” (on_success_callback) while GitHub uses “webhook”—the same pattern, expressed in different vocabulary.
Are there security risks specific to lifecycle hooks?
Yes, three principal risks. First, hooks run with the agent’s privileges, which usually corresponds to the user’s shell, so a bug in a hook script can cause real damage on the machine. Second, hook payloads contain whatever the model intends to do, including potentially adversarial content arising from prompt injection; naive shell interpolation is dangerous. Third, hooks are invisible: a colleague inspecting an agent session will not see the hook fire unless it is logged. Audit logging and code review for hook scripts are as important as for production code. The harness engineering guide covers the broader threat model.
Related reading
- Harness engineering for Claude Code and AI agents — the broader runtime architecture that hooks plug into
- Model Context Protocol explained — MCP, the sibling extension primitive that hooks often work alongside
- Tool calling and function calling explained — the mechanism that
PreToolUseandPostToolUsehooks intercept - Claude Code custom commands — the user-invocable counterpart to automatically-fired hooks
- Claude Code skills guide — another extension surface in the same family
- How to train an open-source LLM — training-loop context where PyTorch hooks earn their keep
- Airflow data pipeline orchestration — webhook-adjacent callback patterns in workflow orchestrators
References
- Claude Code Hooks — official documentation (Anthropic, accessed 2026-05-25)
- torch.nn.Module.register_forward_hook — PyTorch documentation
- Forward and backward hooks in PyTorch — Nandita Bhaskhar, Stanford
- claude-code-hooks-mastery — community examples repository
Conclusion
“Hook” in AI is a small term performing three distinct functions. Lifecycle hooks allow deterministic policy to be inserted into an agent’s session without forking the agent. Model introspection hooks allow tensor flow to be read or modified without forking the model. Event hooks allow services to communicate about significant moments without polling. The mechanisms share a name and a skeletal definition—a callback at a defined point—but they differ in process, language, blocking semantics, security model, and audience.
The practical guidance reduces to three rules. First, select the variant that matches the layer at which the problem resides; agent safety should not be enforced with a PyTorch hook, nor should activations be extracted via a webhook. Second, treat hook code as production code: review it, audit it, log it, and version it alongside the system it extends. Third, recall that hooks are powerful precisely because they are invisible to the host; that invisibility is also their principal failure mode, so observability should be built in to detect when a hook ceases to fire.
One habit worth taking from this guide is the following: whenever the advice to “use a hook” appears in documentation or in a blog post, the appropriate first question is which variant. The answer almost always determines the correct design.
Leave a Reply