Real-User Monitoring for Web Fonts: Field Data, Beaconing & Percentiles

Lab tools tell you how fonts load on your machine, on your network, in your browser. They cannot tell you what a visitor on a throttled 4G connection in another region actually experienced. Real-user monitoring (RUM) closes that gap by instrumenting the page, capturing per-visit font timing and stability metrics from production traffic, and reporting them at percentiles that reflect your real audience. This guide is part of the Font Performance Monitoring & Auditing blueprint, and it picks up where the lab-oriented measuring font loading performance guide leaves off.

The core problem RUM solves: your synthetic Lighthouse run reports a 90ms font transfer and zero layout shift, yet field data from the Chrome User Experience Report shows your Cumulative Layout Shift (CLS) at the 75th percentile (p75) sitting above the 0.1 "good" threshold. The discrepancy is almost always slow networks, cold caches, and late font swaps that never appear in a fast lab run. You diagnose this by collecting the same metrics from the field. A good diagnostic starting point: open Chrome DevTools, throttle to "Slow 4G" with cache disabled, and watch how far the font transfer and the resulting swap drift from your unthrottled numbers. That delta is what RUM measures at scale.

The distinction that makes RUM non-optional is variance. A lab measurement is a single point; field performance is a distribution with a long, heavy tail. Two sites with an identical median font-load time can have wildly different p95s — one because it serves a 14KB Latin subset, the other because it ships a 180KB full Cyrillic-plus-Latin file that only matters to a minority of visitors whose experience the median quietly buries. RUM surfaces that tail, attributes it to specific subsets and specific connection classes, and turns "fonts feel slow sometimes" into "the 200KB display weight adds 410ms of p75 transfer on 3G, and that cohort is 8% of traffic." Only then can you make a defensible optimization decision instead of guessing from a fast laptop on office wifi.

RUM pipeline: capture in the browser, sample and batch, beacon to a collector, aggregate at percentiles.

What to capture

RUM for fonts is worth doing only if you capture the metrics that actually correlate with user-perceived quality. Four signals matter most.

Font transfer time is how long the byte transfer took — distinct from the request's total duration, which includes DNS, connection, and queueing. You read this from a PerformanceResourceTiming entry: responseEnd - responseStart is the transfer window. Capture it per font URL so you can attribute slow loads to a specific subset.

Swap delay is how long the page rendered fallback glyphs before the web font painted — the visible lifetime of a flash of unstyled text (FOUT). You approximate it as the gap between first contentful paint and the moment document.fonts.ready resolves (or, more precisely, when the specific FontFace you care about reaches loaded). The longer this window, the more reflow risk and the more jarring the swap.

CLS attributed to fonts is the share of layout shift caused by the metric mismatch between fallback and web font. A PerformanceObserver watching layout-shift entries gives you each shift's value and its sources; you sum shifts that occur in the swap window or that touch text nodes. Reducing this is the job of fallback font metric matching, and RUM is how you prove the override actually helped real users. The deep-dive on isolating it lives in debugging font-related layout shift.

Cache hit ratio is the fraction of font requests served from cache. A PerformanceResourceTiming entry with transferSize === 0 and a non-zero decodedBodySize indicates a cache hit (the bytes came from disk/memory, not the network). Tracking this across visits tells you whether your Cache-Control: immutable headers are doing their job; it also exposes the cost of HTTP cache partitioning, where each top-level site pays the full download on first visit. A field cache-hit ratio that stays stubbornly low despite year-long TTLs usually means your filenames are not content-hashed, your CDN is stripping the header, or your audience is dominated by single-page sessions that never get a second request to hit warm.

Beyond these four, two contextual dimensions belong on every record. The connection class from navigator.connection.effectiveType is what lets you segment all of the above by 4g / 3g / slow-2g, and the saveData flag tells you when a visitor opted into reduced data — a population that may have skipped font-display: optional fonts entirely. Stamping these onto each beacon costs nothing and is the difference between a single blended number and an actionable breakdown.

Metric	Source API	Field computed as
Font transfer time	`PerformanceResourceTiming`	`responseEnd - responseStart`
Swap delay (FOUT window)	`document.fonts.ready` + paint timing	`fontsReady - firstContentfulPaint`
Font CLS	`PerformanceObserver('layout-shift')`	sum of shifts in swap window
Cache hit	`PerformanceResourceTiming`	`transferSize === 0 && decodedBodySize > 0`
Connection class	`navigator.connection`	`effectiveType` (`4g`, `3g`, …)

Baseline configuration

The minimum correct setup is a small inline script that runs as early as possible, registers a buffered PerformanceObserver, and prepares a payload. Register the observer with buffered: true so you also catch entries that fired before your script ran.

Minimal RUM bootstrap

// Run inline in <head>, before fonts paint, so no entries are missed.
const rum = { fonts: [], cls: 0, conn: navigator.connection?.effectiveType || 'unknown' };

// Buffered observer captures resource entries emitted before this ran.
new PerformanceObserver((list) => {
  for (const e of list.getEntries()) {
    if (e.initiatorType === 'css' && /\.woff2?(\?|$)/.test(e.name)) {
      rum.fonts.push({
        url: new URL(e.name).pathname,
        transfer: Math.round(e.responseEnd - e.responseStart),
        ttfb: Math.round(e.responseStart - e.requestStart),
        cached: e.transferSize === 0 && e.decodedBodySize > 0,
      });
    }
  }
}).observe({ type: 'resource', buffered: true });

// Cumulative font-adjacent layout shift.
new PerformanceObserver((list) => {
  for (const e of list.getEntries()) {
    if (!e.hadRecentInput) rum.cls += e.value;
  }
}).observe({ type: 'layout-shift', buffered: true });

Note the initiatorType === 'css' filter: fonts pulled in by an @font-face rule report css as their initiator, whereas a <link rel="preload" as="font"> reports link. Capture both if you preload. The byte-level breakdown — turning a single PerformanceResourceTiming into DNS, connect, TTFB, and transfer phases — is covered in detail in capturing font timing with the Resource Timing API.

Keep this bootstrap small and inline rather than bundled with your application JavaScript. If the observer registration is deferred behind a framework bundle, it can miss the very fast cache hits and the very early font requests that fired during initial parse — precisely the events that distinguish a warm repeat visit from a cold one. An inline script of a few hundred bytes in the <head> runs before any of that and, with buffered: true, retroactively captures what already happened. The payload it accumulates is then beaconed once at end of session, so the only runtime cost on the page is two idle observers.

Step-by-step setup

Step 1 — Decide what fraction of sessions to sample

Beaconing every session wastes bandwidth and money, and you do not need every session for stable percentiles. Sample a fixed fraction — 10% is a sensible default for medium-traffic sites, 1% for very high traffic. Decide once per session, not per metric, so a sampled session reports all its metrics.

Session-level sampling gate

const SAMPLE_RATE = 0.1; // 10% of sessions
const sampled = Math.random() < SAMPLE_RATE;

Verification: load the page 20 times with the console open and a console.log(sampled); roughly 2 of 20 should log true. If you need exact ratios per cohort, hash a stable session id instead of Math.random().

Step 2 — Capture swap delay against a paint mark

Measure the FOUT window as the distance from First Contentful Paint to fonts-ready. Read FCP from the paint timeline rather than guessing.

Swap delay measurement

const fcp = performance.getEntriesByName('first-contentful-paint')[0]?.startTime || 0;
document.fonts.ready.then(() => {
  rum.swapDelay = Math.round(performance.now() - fcp);
});

Verification: throttle DevTools to "Slow 4G" with cache disabled and reload. rum.swapDelay should be visibly larger than on an unthrottled run — if it stays near zero, your fonts are inlined or the observer never saw the font, and you should re-check the font-display value in use.

Step 3 — Assemble the payload and stamp connection context

A flat JSON object keeps the collector simple. Stamp the connection class and a coarse device hint so you can segment later.

Payload assembly

function buildPayload() {
  return JSON.stringify({
    page: location.pathname,
    conn: rum.conn,                       // '4g' | '3g' | 'slow-2g' | ...
    saveData: navigator.connection?.saveData || false,
    dpr: window.devicePixelRatio,
    fonts: rum.fonts,                     // [{url, transfer, ttfb, cached}]
    swapDelay: rum.swapDelay ?? null,
    cls: Math.round(rum.cls * 1000) / 1000,
    ts: Date.now(),
  });
}

Verification: call buildPayload() in the console and confirm fonts is non-empty and conn is a real effectiveType string, not unknown, on a Chromium browser.

Step 4 — Beacon on the visibility transition

Send the payload with navigator.sendBeacon(), which queues the request and survives page unload — a fetch() started during unload is frequently killed. Fire it when the page is hidden (visibilitychange → hidden), which is the most reliable end-of-session signal across desktop and mobile; pagehide is a reasonable secondary trigger and unload should be avoided because it breaks the back/forward cache.

Reliable beacon dispatch

function flush() {
  if (!sampled || rum.sent) return;
  rum.sent = true;
  const ok = navigator.sendBeacon('/rum/fonts', buildPayload());
  if (!ok) {
    // Beacon queue full — fall back to a keepalive fetch.
    fetch('/rum/fonts', { method: 'POST', body: buildPayload(), keepalive: true });
  }
}
addEventListener('visibilitychange', () => {
  if (document.visibilityState === 'hidden') flush();
});
addEventListener('pagehide', flush);

Verification: in DevTools open the Network panel, filter to rum, then switch tabs. You should see one fonts request with type ping (that is the Beacon API) and a 204/200 response. Switching back and forth must not produce duplicates — the rum.sent guard enforces single-shot delivery.

Step 5 — Aggregate at p75, not at the mean

On the collector, never report the mean. Averages hide the tail where slow networks live. Report the 75th percentile to align with how Core Web Vitals are graded, and keep p95 for transfer time and CLS to watch the worst real experiences.

Percentile aggregation (collector side)

function percentile(values, p) {
  if (!values.length) return null;
  const sorted = [...values].sort((a, b) => a - b);
  const idx = Math.ceil((p / 100) * sorted.length) - 1;
  return sorted[Math.max(0, idx)];
}
// const p75Transfer = percentile(rows.map(r => r.transfer), 75);

Verification: feed the function [10, 20, 30, 40, 100] with p = 75; it should return 40. Confirm your dashboard's p75 font transfer tracks the CrUX LCP/CLS trend rather than diverging from it.

Step 6 — Correlate with connection type

Segment every metric by effectiveType. The same site can show a 60ms p75 transfer on 4g and 600ms on 3g; a single blended number hides which audience is suffering. This is where RUM earns its keep — it tells you whether to invest in smaller subsets (see unicode-range subset loading) or in metric overrides to cut CLS for the slow cohort.

Verification: group rows by conn, compute p75 per group, and confirm slower classes report higher swap delay and transfer. If they do not, your sampling is biased toward fast sessions — common when scripts load late and miss bouncing slow-network users.

Browser compatibility & fallback matrix

Capability	Chrome / Edge	Firefox	Safari	Fallback if absent
`PerformanceResourceTiming`	Yes	Yes	Yes	None needed
`PerformanceObserver('resource')`	Yes	Yes	Yes (11+)	Poll `getEntriesByType('resource')`
`PerformanceObserver('layout-shift')`	Yes	No	No	Skip CLS field; CWV CLS unavailable
`first-contentful-paint` paint entry	Yes	Yes	No	Use `domContentLoadedEventEnd` proxy
`navigator.sendBeacon()`	Yes	Yes	Yes	`fetch(..., { keepalive: true })`
`navigator.connection.effectiveType`	Yes	No	No	Tag `conn` as `unknown`; skip segmentation
`transferSize` (cache detection)	Yes	Yes	Yes (cross-origin needs TAO)	Treat cache state as unknown

The two practical gaps: layout-shift is Chromium-only, so your field CLS comes from Chrome traffic — combine it with the Chrome User Experience Report for population-level grading. And navigator.connection is Chromium-only, so connection segmentation covers your Chromium audience only; for Safari and Firefox, tag the cohort unknown and still record the timing.

Common pitfalls

Reporting the mean. A 90ms average transfer can hide a 700ms p95. Always aggregate at p75/p95; the mean is structurally blind to the slow tail that Core Web Vitals penalises.
Beaconing with fetch() on unload. Requests started during unload/beforeunload are routinely dropped and they break the back/forward cache. Use sendBeacon() on visibilitychange → hidden.
No sampling. Sending every session multiplies collector cost and can itself become a performance liability under load. Gate at the session level with a fixed rate.
Unbuffered observers. Without buffered: true, an observer registered after the font already loaded sees nothing, silently under-counting fast cache hits and inflating apparent transfer times.
Trusting cross-origin transfer numbers without TAO. Cross-origin font responses report zeroed timing phases unless the server sends Timing-Allow-Origin. Without it, responseStart/responseEnd collapse and transfer time reads as misleadingly small.
Ignoring saveData. Visitors with Data Saver on may have skipped font-display: optional fonts entirely. Counting their (absent) swap as zero CLS is correct, but blending them into transfer percentiles understates real cost.

Frequently Asked Questions

Why report p75 instead of the average for font metrics?

Because Core Web Vitals are themselves graded at the 75th percentile of real visits, so aligning your RUM to p75 lets you predict the score Google assigns. The mean is dominated by the bulk of fast sessions and structurally cannot represent the slow tail where layout shift and long swaps actually hurt users. Keep p95 alongside p75 to watch the worst real experiences.

Do I need a third-party RUM vendor, or can I build this myself?

You can build a capable font-RUM pipeline yourself with PerformanceObserver, navigator.sendBeacon(), and a small endpoint that appends rows and computes percentiles — that is the entire pattern above. Vendors add convenience: session stitching, alerting, and the web-vitals library's attribution, which maps a CLS value back to the offending element. Start with the homegrown version to learn what matters, then adopt a vendor if the operational load outweighs the build.

How does the web-vitals library fit into font RUM?

Google's web-vitals library wraps the same underlying APIs (PerformanceObserver for layout-shift, paint timing) and exposes onCLS, onLCP, and onINP callbacks that already handle buffering, the correct session windowing for CLS, and attribution. You can subscribe to onCLS and pull attribution.largestShiftSource to confirm a shift came from a text node during the font swap, then beacon that alongside your own transfer and swap-delay numbers. It is the recommended way to get a CWV-accurate CLS into your payload without re-implementing the session-gap logic.

What sample rate should I use?

Pick the lowest rate that still yields stable percentiles for your smallest segment. For most sites 10% is fine; high-traffic properties drop to 1% or lower. The constraint is your slowest cohort — if 3g traffic is 2% of visits and you sample 1%, you will see too few 3g sessions to trust their p75. Sample per session (not per metric) so each sampled visit reports a complete record.

How do I avoid double-counting CLS across a single-page-app navigation?

In a single-page app the layout-shift observer keeps accumulating across client-side route changes, so a long session can report an inflated cumulative value that no real user perceived as one shift. The Core Web Vitals CLS definition uses a session-window of shifts (capped at 5s, with a 1s gap), and the web-vitals library implements exactly that windowing for you. If you roll your own, reset your font-CLS accumulator on each route transition and report per-view, or adopt onCLS so your field number matches how the metric is officially scored. For font-attributed CLS specifically, the shifts you care about cluster in the swap window right after navigation, so per-view accounting also keeps the attribution clean.

Font Performance Monitoring & Auditing — the parent blueprint covering lab and field measurement.
Measuring Font Loading Performance — lab-side instrumentation with PerformanceObserver.
Debugging Font-Related Layout Shift — isolating the font that moves your layout.
Capturing Font Timing with the Resource Timing API — the per-request timing breakdown that feeds this pipeline.