Contenido

HikariCP: the p95 that lies to you and how to read the real pool signals

There was a version of this analysis that started wrong. I was looking at the p95 for the tiny scenario with a 500ms delay and seeing 260.78ms. Compared to the default scenario showing 2418.16ms, it looked almost five times faster. That's a classic trap, and I almost fell for it.

The tiny scenario had a 97.05% error rate. Out of 8139 attempts, 7899 failed. Those 260ms were the average rejection time, not the time for a useful response. It wasn't fast — it was failing fast. And that difference matters enormously when you're trying to understand whether your HikariCP configuration is working or not.

That led me to build hikaricp-pool-experiment: a reproducible lab with Java 21, Spring Boot 3.4.5, PostgreSQL 16, HikariCP, Docker Compose, and k6 0.51.0. The goal wasn't to simulate production or document a real incident. It was to build an environment where pool signals would be visible and measurable, so I could reason about them with actual numbers.

The experiment design

The app exposes two endpoints:

GET /api/query?delayMs=500: executes a real query against PostgreSQL and holds the connection using pg_sleep for the specified duration.
GET /api/pool: returns the pool state in real time — active, idle, total, threadsAwaitingConnection, and the effective configuration.

The delayMs is the central mechanism of the experiment. An instant query can hide contention even at high concurrency because connections get released before the next request needs them. With pg_sleep(0.5), each connection stays occupied for half a second. With 50 virtual users hitting in parallel, pressure on the pool becomes visible quickly.

The k6 script does something that the original draft didn't have cleanly separated: it records query_duration for all attempts and query_success_duration only for those that return HTTP 200. Without that distinction, the p95 aggregates fast rejections with slow successful queries and the resulting number doesn't represent any useful reality.

hljs language-javascript

// load/hikari-pool.js — critical separation between all attempts and successful ones
const ok = check(queryResponse, {
  'query status is 200': (response) => response.status === 200,
});
queryDuration.add(queryResponse.timings.duration);
if (ok) {
  querySuccessDuration.add(queryResponse.timings.duration);
}
queryErrors.add(!ok);

The scenarios defined in application.yml are:

Scenario	`maximumPoolSize`	`connectionTimeout`
`default`	10 (Spring Boot default)	30000ms (HikariCP default)
`tiny`	2	250ms
`pool4`	4	1500ms
`pool8`	8	1500ms
`pool16`	16	1500ms
`pool32`	32	1500ms

The matrix was run with two delays — 50ms and 500ms — because the contrast matters: a query that releases its connection quickly and a query that holds it for half a second don't stress the pool the same way.

To reproduce it from scratch:

hljs language-powershell

.\scripts\run-matrix.ps1 -Vus 50 -Duration 60s

Or scenario by scenario:

hljs language-powershell

docker compose down -v
.\scripts\run-scenario.ps1 -Scenario tiny -Vus 50 -Duration 60s -DelayMs 500
.\scripts\run-scenario.ps1 -Scenario pool16 -Vus 50 -Duration 60s -DelayMs 500

Important limitation: all of this is a single local run from 2026-05-14 on Windows with Docker Desktop/WSL2. The numbers are useful for comparing scenarios within the same machine. They are not a universal benchmark and don't reflect behavior in any cloud environment, Railway, or otherwise. pg_sleep holds connections artificially to make the pressure visible — it doesn't represent a real production workload.

The full results — and what to read in them

This is the table generated by summarize-results.ps1 from the k6 JSON output:

Scenario	Delay	Attempts	Successful	Failed	Error rate	Successful/s	p95 all	p95 successful	Max active	Max waiting
default	50ms	11772	11772	0	0%	195.38	165.7ms	165.7ms	10	30
default	500ms	1240	1240	0	0%	19.85	2418.16ms	2418.16ms	10	39
tiny	50ms	8289	2325	5964	71.95%	38.53	298.81ms	304.84ms	2	47
tiny	500ms	8139	240	7899	97.05%	3.97	260.78ms	752.51ms	2	47
pool4	50ms	4712	4712	0	0%	77.75	557.55ms	557.55ms	4	43
pool4	500ms	1779	492	1287	72.34%	7.95	1962.83ms	1990.52ms	4	45
pool8	50ms	9253	9253	0	0%	153.4	365.15ms	365.15ms	8	41
pool8	500ms	1653	984	669	40.47%	15.87	1996.36ms	1998.83ms	8	41
pool16	50ms	18155	18155	0	0%	301.83	82.92ms	82.92ms	16	40
pool16	500ms	1948	1947	1	0.05%	31.62	1492.44ms	1492.44ms	16	31
pool32	50ms	18892	18892	0	0%	314.16	70.33ms	70.33ms	32	32
pool32	500ms	3830	3830	0	0%	63.00	784.9ms	784.9ms	32	24

There are several things worth reading together, not in isolation.

The trap of a low p95 with a high error rate

The tiny scenario with a 500ms delay is the most instructive in the experiment. The p95 for all attempts is 260.78ms. If you only look at that number, it looks like the pool responds very quickly. But the 97.05% error rate tells you that almost no query ever executed — HikariCP was rejecting requests after connectionTimeout: 250ms because there were no free connections.

The separation between query_duration and query_success_duration makes visible what the aggregated number was hiding: the p95 for successful queries is 752.51ms — almost three times higher. Those few queries that did get a connection took nearly a second, probably because they had to wait for one of the two pool connections to be released.

When active is pinned at the pool maximum (2/2) and waiting reaches 47, the system isn't processing load — it's rejecting it. The 260ms is the time to fail, not to succeed.

Signal that matters: if p95 all attempts ≪ p95 successful and the error rate is high, the pool is in exhaustion. You're not seeing query latency — you're seeing rejection latency.

How to read the four signals together

The experiment confirmed that no single metric is enough. The signals that make sense to cross-reference are:

1. Error rate + successful queries/s

These two together are the first filter. A 0% error rate with 19.85 successful/s (default, 500ms delay) is very different from a 97% error rate with 3.97 successful/s (tiny, 500ms delay). Successful throughput tells you how much useful work the system is doing; error rate tells you how much work it's throwing away.

With pool4 at 500ms delay: 72.34% error rate with only 7.95 successful/s. Four connections with 500ms queries give a theoretical ceiling of 8 successful/s (4 connections × 2 per second). The numbers match — the pool is at its limit and rejects the rest.

2. `active = maximumPoolSize` sustained + `waiting > 0`

This combination is the most direct operational signal that the pool is under pressure. When maxActiveConnections hits the configured ceiling and maxThreadsAwaitingConnection is greater than zero for a sustained period, application threads are waiting for a connection that isn't available.

From the experiment:

tiny 500ms delay: max active 2/2, max waiting 47. Pool exhausted from the start.
pool8 500ms delay: max active 8/8, max waiting 41, error rate 40.47%. High pressure but not total.
pool32 500ms delay: max active 32/32, max waiting 24, error rate 0%. The pool hits the ceiling but absorbs the load without rejecting requests.

In pool32 with 500ms delay, waiting = 24 with 0% error rate means threads are waiting but the connectionTimeout: 1500ms is enough — queries queue up and eventually get a connection. That's a system under pressure that still works, not one in crisis.

3. Attempt latency vs. successful latency

I already covered the tiny case. But it's worth generalizing: when there's significant error rate, the p95 of all attempts stops being an application performance metric and becomes a rejection speed metric. The real operational latency is that of successful queries.

With pool4 at 500ms delay: p95 all attempts 1962.83ms, p95 successful 1990.52ms. Here the numbers are similar because queries that do get through also wait a lot — the pool has 4 connections with 500ms queries, so almost all the time is spent waiting for one to free up.

4. The jump from 50ms to 500ms as a pressure revealer

With a 50ms delay, pool8 has zero errors and processes 153.4 successful/s. With a 500ms delay, it drops to 40.47% error rate and 15.87 successful/s. The pool didn't change — the connection hold time changed. If each connection takes ten times longer to release, a pool that was previously sufficient now isn't.

This is the variable most frequently ignored when calibrating a pool: it's not just how many connections exist, but how long each query holds them. A pool of 16 connections with 50ms queries is very different from a pool of 16 connections with 500ms queries.

The diminishing returns of going from pool16 to pool32 with a short delay

There's an observation from the experiment that I think is important to avoid the easy conclusion of "more connections = better".

With 50ms delay:

pool16: 301.83 successful/s, p95 82.92ms
pool32: 314.16 successful/s, p95 70.33ms

Doubling the pool size gave only about ~4% improvement in throughput. The jump from pool8 to pool16 was much larger (153.4 → 301.83, nearly double). Beyond a certain point, the bottleneck is no longer the pool — it becomes something else. In this case, probably the Docker Desktop CPU or PostgreSQL itself under load from 50 VUs.

This is consistent with the formula Brettwooldridge mentions in the HikariCP README: the optimal pool for database throughput is not simply "as large as possible". Beyond a certain threshold, adding connections creates overhead without real benefit, and in an environment with max_connections limits on PostgreSQL, you can run out of slots before throughput improves.

The practical conclusion from the experiment isn't that 32 is the right number. It's that pool16 with a 500ms delay has a 0.05% error rate and pool32 has 0%, with 2x higher throughput. Depending on your actual query times and your PostgreSQL limits, the trade-off is different in each case.

The metrics the experiment exposes via Actuator

The app has Actuator enabled with health, info, metrics, and prometheus. During a run you can query pool state directly:

hljs language-bash

# Pool state via custom endpoint
curl http://localhost:8080/api/pool

# Micrometer metrics via Actuator
curl http://localhost:8080/actuator/metrics/hikaricp.connections.active
curl http://localhost:8080/actuator/metrics/hikaricp.connections.pending
curl http://localhost:8080/actuator/metrics/hikaricp.connections.timeout

The /api/pool endpoint uses HikariPoolMXBean directly and returns active, idle, total, threadsAwaitingConnection, and the effective configuration. That's what k6 queries in parallel to record the hikari_pool_active, hikari_pool_idle, hikari_pool_total, and hikari_pool_threads_awaiting_connection metrics.

The hikaricp.connections.timeout metric from Actuator is the one I care most about in any real environment: it counts the number of times a thread waited for a connection and the connectionTimeout expired. If that counter is greater than zero, users are being affected — that's not a warning, it's a fact.

The experiment configuration vs. configuration for a real environment

The experiment uses values designed to make pool pressure visible in a lab, not values to copy into any system. The tiny profile has connectionTimeout: 250ms because 250ms makes the pool reject requests quickly and errors become immediately visible. In a real system, 250ms is probably too aggressive — you'll generate false positives during any brief spike.

What does translate are the reading principles:

On connectionTimeout: the value defines the speed of failure, not the speed of success. A short timeout generates errors faster and makes symptoms visible sooner. A long timeout accumulates blocked threads that consume memory and can saturate the web server's thread pool before the error becomes obvious. Which one you want depends on whether you have circuit breakers and retry logic, and on how long a user can wait before the experience breaks.

On maximumPoolSize: the right number depends on the average query hold time, expected concurrency, and your PostgreSQL's max_connections limits. There's no universal formula. What the experiment shows is that with 500ms queries and 50 VUs, you need at least 16 connections to get close to zero error rate — and that doubling to 32 gives diminishing returns on throughput.

On managed cloud databases: if you use Railway, Supabase, RDS, or another service where you don't directly control the server, there's an additional parameter that matters and that this experiment doesn't cover: maxLifetime. The server may close idle connections before HikariCP's 30-minute default, and a connection that the pool thinks is alive but the server has already closed will generate PSQLException: This connection has been closed on the next use. Setting maxLifetime below the server's timeout is a necessary adjustment in those environments — but it's not something this local Docker lab can measure.

My take after the experiment

The most valuable thing from this exercise wasn't picking a connection count. It was understanding that you can't tune HikariCP by looking at a single metric.

If you only look at the p95 of all attempts, you might conclude that a pool in crisis is "fast". If you only look at error rate, you can't tell whether the system is absorbing load or rejecting it. If you only look at active, you don't know whether the pool has headroom or is at its limit. You need to cross all four: error rate, successful queries/s, active vs. configured maximum, waiting, and successful latency.

The other takeaway that stuck with me: there are two ways a pool can fail under load. One is the long timeout — threads waiting 30 seconds and eventually blowing up the heap. The other is the short timeout — fast rejections that generate a high error rate but create the illusion of low latency. The lab made both visible with real numbers.

I don't buy the idea that there's a universally correct maximumPoolSize. What there is is a correct size for your combination of query hold time, expected concurrency, and database capacity. And that number only makes sense read alongside connection hold time and error rate — not in isolation.

The repo has everything needed to run it again in your environment and compare:

hljs language-powershell

.\scripts\run-matrix.ps1 -Vus 50 -Duration 60s

If you change the delay, the concurrency, or the maximumPoolSize, the signals change. That's exactly the point.

→ github.com/JuanTorchia/hikaricp-pool-experiment

Reference:

HikariCP GitHub — Configuration: https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby

Comments (0)

💬

What do you think of this?

Drop your comment in 10 seconds.

We only use your login to show your name and avatar. No spam.

No comments yet. Be the first — your take matters most when we're few.

Experimentsdevopsbackend

Spring Security with Spring Boot Actuator: the authorization model that survived the incident

Locking down Actuator endpoints isn't enough. After the incident, I rebuilt the authorization model from scratch: explicit SecurityFilterChain, separate health groups, roles for /metrics and /env, and real validation with curl. This is what's still standing.

7 min105

ExperimentsTypeScriptpnpm

pnpm workspaces in a Next.js 16 monorepo: what the benchmark didn't measure and almost broke my CI

The install-time benchmark I published earlier didn't capture the real cost of pnpm workspaces in CI: silent cache invalidation, dependency hoisting that breaks in App Router, and a specific edge case that can take down your Railway pipeline. Here's what I failed to measure.

10 min110

Experimentsdevopsbackend

Spring Boot Actuator in Production: The Endpoints I Left Open by Accident and How I Closed Them

After publishing my Jakarta EE vs Spring Boot analysis, I audited Actuator's defaults on a backend I own and found sensitive endpoints wide open — ones I never consciously configured. Here's the hardening checklist I built afterward.

7 min149

HikariCP: the p95 that lies to you and how to read the real pool signals

The experiment design

The full results — and what to read in them

The trap of a low p95 with a high error rate

How to read the four signals together

1. Error rate + successful queries/s

2. active = maximumPoolSize sustained + waiting > 0

3. Attempt latency vs. successful latency

4. The jump from 50ms to 500ms as a pressure revealer

The diminishing returns of going from pool16 to pool32 with a short delay

The metrics the experiment exposes via Actuator

The experiment configuration vs. configuration for a real environment

My take after the experiment

Comments (0)

What do you think of this?

Related Articles

Spring Security with Spring Boot Actuator: the authorization model that survived the incident

pnpm workspaces in a Next.js 16 monorepo: what the benchmark didn't measure and almost broke my CI

Spring Boot Actuator in Production: The Endpoints I Left Open by Accident and How I Closed Them

2. `active = maximumPoolSize` sustained + `waiting > 0`