OpenTelemetry in Next.js: traces that survive the edge/server boundary without losing context
Why is observability in Next.js still a half-solved problem in 2025? We've had OpenTelemetry as the de facto backend standard for years — well documented in Spring Boot, plain Node, Go — and yet the App Router has a boundary that silently breaks trace context without raising a single visible error. I've wondered for a while if the community underestimates this because the problem doesn't throw exceptions: it just disappears.
My take is blunt: OpenTelemetry in Next.js works, but it requires explicit propagator configuration. The default silently breaks the trace at the edge/node boundary. If you're coming from Spring Boot where context propagates almost automatically, this is going to catch you off guard.
The real problem: what actually happens at the edge/node boundary
The Next.js App Router runs in two distinct environments that share very little:
- Edge Runtime: Middleware, some Route Handlers. A trimmed-down environment based on Web APIs, no full Node.js support. Runs in V8 isolates.
- Node.js Runtime: Server Components, Server Actions, API Routes. Regular Node, with filesystem access,
process, all of it.
When a request enters through Middleware (edge) and then hits a Server Component (node), there's an environment transition. If the OpenTelemetry propagator isn't explicitly configured to read and write traceparent and tracestate headers on both sides, the trace gets cut right there. The Middleware span closes with no children. The Server Component starts a brand new trace with no parent. In Jaeger or any collector, you see two orphaned traces where there should be one single chain.
What makes this hard to diagnose: there's no error. No warning. The code runs perfectly. You only notice something's wrong when you look at the collector and the trace IDs don't match.
How to configure OpenTelemetry in Next.js App Router: the instrumentation hook
Next.js exposes a specific entry point for this, documented in the official guide: the instrumentation.ts file at the project root (or inside src/ if that's your structure). This hook runs exactly once when the server starts.
// instrumentation.ts — runs once when the Node.js server starts
import { NodeSDK } from '@opentelemetry/sdk-node'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'
import { W3CTraceContextPropagator } from '@opentelemetry/core'
import { CompositePropagator, W3CBaggagePropagator } from '@opentelemetry/core'
import { Resource } from '@opentelemetry/resources'
import { SEMRESATTRS_SERVICE_NAME } from '@opentelemetry/semantic-conventions'
export async function register() {
// Dynamic import: we only initialize in the Node.js runtime.
// The edge runtime doesn't support the full Node SDK.
if (process.env.NEXT_RUNTIME === 'nodejs') {
const sdk = new NodeSDK({
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: 'my-nextjs-app',
}),
traceExporter: new OTLPTraceExporter({
// Point this at your local collector, Railway, Fly, whatever.
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
}),
// CRITICAL: the W3C TraceContext propagator is what reads/writes
// the traceparent and tracestate headers between edge and node.
// Without this, each environment starts a new trace with no parent.
textMapPropagator: new CompositePropagator({
propagators: [
new W3CTraceContextPropagator(),
new W3CBaggagePropagator(),
],
}),
})
sdk.start()
}
}The process.env.NEXT_RUNTIME === 'nodejs' conditional is not optional. If you try to initialize the full NodeSDK in the edge runtime, the build breaks because that environment doesn't have access to the Node APIs the SDK needs. The official docs mention this, but they bury it a bit.
The edge side: propagating context without the full SDK
The edge runtime can't run the NodeSDK. What it can do is read and write headers using the W3C TraceContext primitives. If you're using Middleware for auth or routing, this is the pattern for propagating context to the server:
// middleware.ts — edge runtime, header propagation only
import { NextResponse } from 'next/server'
import type { NextRequest } from 'next/server'
export function middleware(request: NextRequest) {
const response = NextResponse.next()
// If there's an incoming traceparent (e.g., from an API gateway),
// we forward it as-is to the server.
// If there's none, the Node server will start a new trace — that's correct.
const traceparent = request.headers.get('traceparent')
if (traceparent) {
response.headers.set('traceparent', traceparent)
}
const tracestate = request.headers.get('tracestate')
if (tracestate) {
response.headers.set('tracestate', tracestate)
}
return response
}
export const config = {
// Apply only to routes that need propagation
matcher: ['/api/:path*', '/((?!_next/static|_next/image|favicon.ico).*)'],
}This doesn't generate spans on the edge (you'd need the full SDK for that, which isn't available), but it keeps the context chain alive so the Node.js Runtime can continue the trace from the same trace ID.
The gotchas nobody documents properly
1. instrumentation.ts needs experimental.instrumentationHook enabled in versions before Next.js 15.
In Next.js 15+ it's on by default. If you're on 14, you need this in next.config.ts:
// next.config.ts
const nextConfig = {
experimental: {
instrumentationHook: true, // required in Next.js 14 and earlier
},
}
export default nextConfigWithout this, the instrumentation.ts file sits on disk and never runs. The server starts with no telemetry and no warning about it.
2. Server Actions don't propagate context automatically.
From OpenTelemetry's perspective, a Server Action is a new HTTP request. If you don't explicitly instrument the Action with a manual span, it'll show up as a separate trace in the collector.
// app/actions/create-resource.ts
'use server'
import { trace } from '@opentelemetry/api'
const tracer = trace.getTracer('my-nextjs-app')
export async function createResource(formData: FormData) {
// We create an explicit child span for the Server Action
return await tracer.startActiveSpan('server-action.createResource', async (span) => {
try {
const name = formData.get('name') as string
span.setAttribute('resource.name', name)
// your logic here
const result = await saveToDb(name)
span.setStatus({ code: 0 }) // SpanStatusCode.OK
return result
} catch (error) {
span.recordException(error as Error)
span.setStatus({ code: 2, message: (error as Error).message }) // SpanStatusCode.ERROR
throw error
} finally {
span.end()
}
})
}3. The exporter name matters for the collector.
OTLPTraceExporter over HTTP uses port 4318. If you're using gRPC (direct to Jaeger), you need @opentelemetry/exporter-trace-otlp-grpc and port 4317. Mixing exporters and ports is a classic source of "everything's configured and nothing's arriving at the collector."
4. sdk.start() doesn't wait for collector confirmation.
If the collector isn't available at startup, the SDK doesn't fail with an error — it just silently drops spans. There's an SDK shutdown you can hook into to flush before the process exits:
// Inside register(), after sdk.start()
process.on('SIGTERM', () => {
sdk.shutdown().finally(() => process.exit(0))
})Decision checklist: before you instrument your Next.js App Router
Before you start, these are the questions that determine how much work you're actually in for:
| Question | If the answer is... | Implication |
|---|---|---|
| Do you have Middleware running on the edge? | Yes | You need manual header propagation in middleware.ts |
| Are you using Server Actions with business logic? | Yes | You need manual spans in each relevant Action |
| Are you on Next.js 14 or earlier? | Yes | You need to explicitly enable instrumentationHook |
| Are you using a collector on Railway/Fly/Docker? | Yes | Check the port: 4318 for HTTP, 4317 for gRPC |
| Do you need full end-to-end traces? | Yes | W3CTraceContextPropagator is mandatory, not optional |
| Do you only need server-side Node traces? | Yes | A basic instrumentation.ts is enough, no manual spans |
The OpenTelemetry JavaScript SDK documents all available exporters and propagators. The thing the official Next.js docs don't emphasize enough is that the SDK's default propagator is not the W3C TraceContext — and that's exactly what breaks the chain at the edge/node boundary.
What you can't conclude without your own production data
Being honest about the limits of this guide:
- Latency overhead: There are no verifiable numbers on how much the instrumentation adds to cold starts on Vercel or edge functions. It could be zero, it could be meaningful. You need to measure it in the environment where you deploy.
- Span volume: On a high-traffic system, the number of spans the
NodeSDKautomatically generates (Next.js instruments its own internal operations) can be larger than you expect. Sampling is a whole separate topic. - Vercel Edge Network compatibility: Header propagation works in theory on any environment that respects the HTTP protocol. Whether it works exactly the same on Vercel Edge, Cloudflare Workers, or your own Node.js server on Railway depends on how each platform handles internal headers.
These are real limits. A guide that doesn't name them isn't being straight with you.
FAQ: OpenTelemetry in Next.js App Router
Can I use OpenTelemetry in the Next.js edge runtime?
Partially. The full NodeSDK doesn't run in the edge runtime because it depends on Node.js APIs that aren't available there. What you can do is manually propagate the traceparent and tracestate headers in Middleware so the Node.js Runtime can continue the trace with the same trace ID. To generate real spans on the edge, you'd need a telemetry library designed specifically for Node-less environments — something that, as of this writing, still doesn't have a mature, official solution in the OTel JS ecosystem.
What propagator do I need so traces survive between Middleware and Server Components?
W3CTraceContextPropagator. This is the propagator that reads and writes the traceparent and tracestate headers from the W3C TraceContext standard, which is the format the modern ecosystem uses to propagate context between services. Without explicitly configuring it in the NodeSDK, the SDK doesn't know how to read the context coming from Middleware and starts a new trace with no parent.
Is OpenTelemetry in Next.js compatible with Jaeger, Grafana Tempo, and similar tools?
Yes, as long as you use the right exporter and point to the right port on the collector. OTLPTraceExporter (HTTP, port 4318) or OTLPTraceExporter with gRPC (port 4317) works with any collector that implements the OTLP protocol: Jaeger, Grafana Tempo, Zipkin (with its specific exporter), Honeycomb, DataDog, etc. The exporter configuration is independent of which collector you use.
Do Server Actions generate spans automatically?
No. Next.js automatically instruments some internal operations (fetch, page rendering, some cache operations), but Server Actions are application code. If you want to trace the logic inside a Server Action, you need to create spans manually using tracer.startActiveSpan() from the OpenTelemetry API.
How do I know if trace context is propagating correctly?
The most direct way is to look at the collector. If you see two separate traces for a request that should be a single chain (for example, a request that goes through Middleware and hits a Server Component), context is breaking. In Jaeger you can search by traceparent in the attributes, or just verify that the trace ID is the same across all spans for a given request.
Do I need instrumentation.ts if my app doesn't use the edge runtime?
If everything runs on the Node.js runtime (no Middleware, no edge routes), instrumentation.ts is enough and the complexity drops significantly. The propagation problem is specific to the edge/node boundary. If you never cross that boundary, Next.js's automatic instrumentation together with the NodeSDK configured with W3CTraceContextPropagator should give you functional traces with minimal extra effort.
My position: instrument early, not when something breaks
When I worked on observability from the backend side in Java with Spring Boot, the advantage was that the framework gave you a lot for free. Next.js is different: the App Router architecture with its edge/node boundary creates a discontinuity that doesn't exist in a classic server, and OpenTelemetry doesn't resolve it automatically.
The uncomfortable part is that the system's silence when context breaks means a lot of people assume their instrumentation is working — until they need to debug something serious in production and discover the traces are fragmented.
My concrete recommendation: if you're using App Router with Middleware, configure W3CTraceContextPropagator from the start and verify in the collector that spans from a single request form a coherent chain before you actually need it. That verification is much cheaper to do in development than to reconstruct it under pressure.
The practical next step: spin up a local Jaeger with Docker (docker run -d --name jaeger -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest), configure the OTLPTraceExporter pointing to http://localhost:4318/v1/traces, and manually verify that a request passing through Middleware and hitting a Server Component shows up as a single trace with chained spans. If you see two traces, the propagator is misconfigured. If you see one, you're good.
That visual check is the proof of concept no guide can replace.
Original sources:
- Next.js OpenTelemetry documentation: https://nextjs.org/docs/app/building-your-application/optimizing/open-telemetry
- OpenTelemetry JS SDK: https://opentelemetry.io/docs/languages/js/
Related Articles
Spring Boot Actuator: What to Expose, What to Hide, and What to Check Before Adding Endpoints
Actuator isn't the problem. Enabling it without a clear exposure policy is. A practical guide to using it as an operational tool without turning it into unnecessary public attack surface.
My Homelab AI Dev Platform: What Problem It Actually Signals and Where the Limits Are
The homelabber community is building local AI dev platforms and the discussion is genuinely interesting. I have some observations that go beyond the initial excitement — and a checklist so you can decide whether the experiment is actually worth it.
The Birth and Death of JavaScript (2014): What Still Holds and What Doesn't
A 2014 talk predicted JavaScript would die, replaced by ASM.js. A decade later, JS is still alive — but the tension it identified is more real than ever. Here's what's worth extracting, what to ignore, and how to turn it into a concrete technical decision.
Comments (0)
What do you think of this?
Drop your comment in 10 seconds.
We only use your login to show your name and avatar. No spam.