Last updated: 2026-04-23 Build: v0.4.0 (release profile, commit db4acba)

All measurements taken on Raspberry Pi 5, 2GB RAM, USB storage, ~23K ROMs across 41 systems.

Single Request Latency (c=1, warm cache)

PageP50Req/s
Home (cache hit)5ms176
Search “mario”39ms26
Search “sonic”40ms25
Search “street fighter”30ms33
Search “a” (broad, 23K matches)180ms5.5
System page1ms666–748
Game detail1ms873

Concurrent Load (50 requests per test)

Homepage

ConcurrencyReq/sP50 (ms)P95 (ms)
117657
52691821
102653744
202567098
3024997144

Search “mario”

ConcurrencyReq/sP50 (ms)P95 (ms)
1263940
528180184
1028360363
2028703728
30278701,099

System pages (SNES, Mega Drive)

ConcurrencyReq/sP50 (ms)P95 (ms)
1666–74812
51,227–1,6713–44–6
101,637–1,79358
201,769–1,8441014–16
301,755–1,9241220

Game detail

ConcurrencyReq/sP50 (ms)P95 (ms)
187311
52,02924
101,94957
202,212813
302,1821117

Mixed Concurrent Test

4 endpoints simultaneously at c=5 each (20 total concurrent connections):

EndpointReq/sP50 (ms)P95 (ms)
Homepage16.2281498
Search “mario”8.8544834
Search “sonic”8.7542895
Search “street fighter”8.6541949

Asset Sizes (v0.4.0)

AssetRawGzip
WASM bundle4,000 KB848 KB
CSS88 KB14 KB
Home HTML58 KB
System page HTML20 KB

WASM is served gzip-compressed by the server.

v0.3.0 → v0.4.0 Comparison

Single request (c=1)

Endpointv0.3.0v0.4.0Change
Home14ms, 70 req/s5ms, 176 req/s-64% latency / +151% throughput
Search “mario”47ms, 21 req/s39ms, 26 req/s-17% / +24%
Search “sonic”54ms, 18 req/s40ms, 25 req/s-26% / +39%
Search “street fighter”41ms, 24 req/s30ms, 33 req/s-27% / +38%
Search “a” (broad)194ms, 5.2 req/s180ms, 5.5 req/s-7% / +6%
System page1ms, 918 req/s1ms, 666–748 req/s-19% throughput
Game detail<1ms, 1,036 req/s1ms, 873 req/s-16% throughput

Major gains on home (2.8× throughput, ~3× faster) and searches (+24–39%). Small regressions on the already-fast system and game-detail pages (~15–20% throughput) — P50 stays at 1ms, so unmeasurable on the UI.

Concurrent (c=10)

Endpointv0.3.0 req/sv0.4.0 req/sChange
Homepage113265+135%
Search “mario”2228+27%
System pages1,6371,637–1,793flat
Game detail2,2101,949-12%

Mixed concurrent (c=5 × 4 endpoints)

Endpointv0.3.0 req/sv0.4.0 req/sChange
Homepage11.816.2+37%
Search “mario”6.88.8+29%
Search “sonic”7.58.7+16%

Assets

Assetv0.3.0 gzipv0.4.0 gzipChange
WASM bundle995 KB848 KB-15%
CSS14 KB14 KB

Key changes since v0.3.0:

  • PHF → runtime SQLite catalog (the v0.3.0→v0.4.0 headline change): cuts incremental build time from ~90s to ~10s.
  • Async catalog pool with deadpool-sqlite + prepare_cached + batch APIs eliminates the single-mutex bottleneck on concurrent lookups.
  • Core split (replay-control-core / replay-control-core-server): 89 #[cfg(target_arch = "wasm32")] attributes eliminated, 17 wire-type mirrors in app/src/types.rs deleted. Build-time wins, no runtime impact expected.
  • Subprocess async migration: df, ip, journalctl, tail, systemctl, pgrep all use tokio::process::Command instead of blocking the reactor.

v0.2.0 → v0.3.0 Comparison

Single request (c=1)

Endpointv0.2.0v0.3.0Change
Home19ms, 51 req/s14ms, 70 req/s+37% throughput
Search “mario”63ms, 16 req/s47ms, 21 req/s+33%
Search “sonic”82ms, 12 req/s54ms, 18 req/s+50%
Search “street fighter”59ms, 17 req/s41ms, 24 req/s+41%
Search “a” (broad)232ms, 4.3 req/s194ms, 5.2 req/s+21%
System page1ms, 910 req/s1ms, 918 req/s
Game detail<1ms, 1,107 req/s<1ms, 1,036 req/s

Concurrent (c=10)

Endpointv0.2.0 req/sv0.3.0 req/sChange
Homepage74113+53%
Search “mario”1622+38%
System pages1,8971,637-14%
Game detail2,1622,210

Mixed concurrent (c=5 × 4 endpoints)

Endpointv0.2.0 req/sv0.3.0 req/sChange
Homepage8.311.8+42%
Search “mario”4.86.8+42%

Assets

Assetv0.2.0 gzipv0.3.0 gzipChange
WASM bundle1,778 KB995 KB-44%
CSS13 KB14 KB+1 KB

Key improvements: GameInfo refactor (detail page reads from DB instead of re-deriving), curl → reqwest migration (shared async client, connection pooling), and release-profile WASM optimizations.

Memory (jemalloc allocator)

Measured via /proc/<PID>/status on the Pi using tools/pi-memory.sh. VmRSS is resident set size (physical memory actually in use); VmHWM is the peak RSS since process start.

StateVmRSSRssAnonVmHWM (peak)
Idle (warm, after a few page hits post-restart)47 MB19 MB47 MB
Right after full load test (c=30 across all endpoints)68 MB39 MB189 MB
60s post-load-test62 MB33 MB189 MB

Pi 5 2GB host has ~1,720 MB available after OS + buff/cache.

jemalloc returns memory well. VmHWM hit 189 MB during the broad-search burst (/search?q=a at c=30, ~3,400ms per response for 50 concurrent requests) where the heap inflates. Steady-state RSS settles to 62 MB within 60 seconds — a drop of ~120 MB back to the OS. Under glibc malloc the retained portion would not be returned (v0.2.0 pre-jemalloc: 324 MB steady-state for the same workload).

Historical Comparison

MetricPre-optimizationv0.2.0v0.3.0v0.4.0
Home page (warm, c=1)940ms19ms14ms5ms
Home page (c=10)74 req/s113 req/s265 req/s
Search “mario” (c=1)348ms63ms47ms39ms
Steady-state memory324 MB (glibc)67 MB (jemalloc)67 MB62 MB
Mixed load: homepage req/s0.608.311.816.2
WASM gzip1,778 KB995 KB848 KB
Incremental build time~90s~90s~90s~10s

Test Methodology

  • Tool: Apache Bench (ab) via tools/bench.sh and tools/load-test.sh
  • 50 requests per test with warmup pass
  • Raw results in tools/bench-results/
  • Memory read from /proc/<PID>/status after the full load-test suite completes