Measuring FOUT Duration in the Field

This guide is one technique inside Measuring Font Loading Performance, part of the Font Performance Monitoring & Auditing blueprint. It answers a single question with a single number: for real users, how many milliseconds pass between the first paint of fallback text and the moment the web font actually swaps in?

Problem Statement

A Flash of Unstyled Text (FOUT) is the interval where the user reads your fallback font before the web font arrives and replaces it — the expected behavior of font-display: swap, and the thing FOUT/FOIT mitigation work tries to shorten. Lab tools cannot tell you its real duration because that depends on each user's network and cache state. You need a field measurement: a timestamp for when fallback text first paints, a timestamp for when the swap happens, and the difference between them shipped to a Real User Monitoring (RUM) backend.

Why bother quantifying it rather than just minimizing it blindly? Because FOUT duration is the metric that tells you whether your mitigation work paid off for real users, and whether it is even worth doing. A site whose p75 FOUT is 80ms has a swap most users will barely perceive; a site whose p75 is 900ms is showing a visibly different typeface for nearly a second on every cold load, which is jarring and, if fallback metrics are mismatched, a source of layout shift. Those two sites need completely different responses — the first should leave well enough alone, the second should preload the critical weight and match its fallback metrics — and the only way to tell them apart is to measure. A single field number turns "our fonts flash a bit" into a prioritizable, trackable figure you can put a target on.

Prerequisites

A web font with font-display: swap (or fallback) — optional may never swap, so there is often no FOUT to measure.
The CSS Font Loading API available ('fonts' in document), supported in Chrome 35+, Firefox 41+, Safari 10+.
A First Contentful Paint (FCP) signal. The Paint Timing API (getEntriesByType('paint')) provides it in Chrome 60+ and Firefox 84+; Safari lacks FCP, so the variant below falls back to a custom mark.

Implementation

The cleanest approach treats the swap as a two-mark measurement: a baseline mark when fallback text is first visible, and a swap mark when document.fonts.ready resolves and you flip a fonts-loaded class (the act that triggers the reflow you are timing).

Primary: baseline-to-swap FOUT duration, reported to RUM

function measureFoutDuration() {
  if (!('fonts' in document)) return;

  // Baseline: prefer the browser's First Contentful Paint; else mark now.
  const fcp = performance
    .getEntriesByType('paint')
    .find((p) => p.name === 'first-contentful-paint');
  const baseline = fcp ? fcp.startTime : performance.now();
  if (!fcp) performance.mark('fout-baseline');

  document.fonts.ready.then(() => {
    // The class swap is the reflow we are timing — mark immediately after.
    document.documentElement.classList.add('fonts-loaded');
    performance.mark('fout-swap');

    const swapTime = performance.getEntriesByName('fout-swap')[0].startTime;
    const foutMs = Math.max(0, swapTime - baseline);

    const payload = {
      foutMs: +foutMs.toFixed(1),
      swapAtMs: +swapTime.toFixed(1),
      hadFcp: Boolean(fcp),
      effectiveType: navigator.connection?.effectiveType ?? null,
    };

    navigator.sendBeacon?.('/rum/fout', JSON.stringify(payload));
  });
}
measureFoutDuration();

Annotated explanation

baseline is the moment the user first sees any text — which, with swap, is fallback text. First Contentful Paint is the right anchor because FOUT begins at that paint, not at navigation start. Where FCP is unavailable (Safari), we drop a fout-baseline mark at the earliest script execution as a close proxy.
document.fonts.ready resolves once every face in the current layout has finished loading. It is the signal that the swap is imminent.
The class flip happens before the mark. Adding fonts-loaded is what unblocks any CSS gated on the real font, so it is the cause of the reflow. Marking right after it timestamps the swap as closely as the main thread allows.
foutMs = swapTime - baseline, clamped at 0 because on a warm cache the font can be ready before FCP, yielding a (meaningless) negative — those sessions had effectively no FOUT.
effectiveType from the Network Information API tags each sample with the connection class, so in RUM you can see FOUT duration rise from 4g to 3g to 2g — exactly the segmentation that justifies preload or subset work.
navigator.sendBeacon posts the sample without blocking unload, so even users who navigate away mid-swap contribute data.

The choice of the class swap as the timing anchor deserves emphasis. Many implementations measure FOUT as simply "FCP to fonts.ready," but that misses a real-world detail: nothing visually changes until your CSS actually applies the loaded font, and on the two-stage rendering pattern that often means a fonts-loaded class on the root element gating the real font-family. By marking immediately after adding that class, you time the cause of the reflow rather than an abstract promise resolution. If your site instead lets font-display: swap apply the font automatically (no class gating), the swap happens at fonts.ready resolution and the two approaches converge — but anchoring on the class is correct in both cases, which is why the primary snippet flips the class even on auto-swap sites.

There is one more reason to capture effectiveType and report per session rather than pre-aggregating in the client: FOUT duration is strongly bimodal. Warm-cache repeat visits cluster near zero, while cold first views on constrained networks form a long right tail. If you average those together in the browser you produce a meaningless middle number. Send each raw sample and let the RUM backend compute p75 and p95 across the real distribution, sliced by connection class and ideally by first-view versus repeat-view. That is the segmentation that points you at the specific population a preload or subset change would actually help.

The measured window is the gap between fallback paint and the swap mark.

Defensive Variant

document.fonts.ready resolving is not a hard guarantee the swap painted, and on stalled networks the promise can hang far longer than the user cares about. This variant adds a feature guard, a timeout that records a capped sample, and protection against double-firing.

Defensive: timeout cap and single-fire guard

function measureFoutSafe({ capMs = 5000 } = {}) {
  if (!('fonts' in document) || !document.fonts.ready) return;

  const fcp = performance.getEntriesByType?.('paint')
    .find((p) => p.name === 'first-contentful-paint');
  const baseline = fcp ? fcp.startTime : performance.now();

  let reported = false;
  const send = (timedOut) => {
    if (reported) return;
    reported = true;
    const foutMs = Math.max(0, performance.now() - baseline);
    const payload = {
      foutMs: +Math.min(foutMs, capMs).toFixed(1),
      timedOut,
      effectiveType: navigator.connection?.effectiveType ?? null,
    };
    navigator.sendBeacon?.('/rum/fout', JSON.stringify(payload));
  };

  const timer = setTimeout(() => send(true), capMs);

  document.fonts.ready
    .then(() => {
      clearTimeout(timer);
      document.documentElement.classList.add('fonts-loaded');
      send(false);
    })
    .catch(() => { clearTimeout(timer); send(true); });
}
measureFoutSafe();

The capMs timeout records a session as timedOut: true rather than losing it — useful, because the very slowest swaps are the ones worth knowing about. The reported guard ensures the timeout and the promise cannot both fire, and the .catch() records a sample even if the FontFaceSet rejects.

Verification

In DevTools, open the Network panel, check Disable cache, and throttle to Slow 3G to force a visible FOUT.
Reload. Watch the heading or body text render in the fallback, then visibly snap to the web font — that snap is the swap you are timing.
In the Network panel filtered to Fetch/XHR, confirm a single beacon to /rum/fout and inspect its payload — foutMs should be in the hundreds of milliseconds on Slow 3G and effectiveType should read slow-2g or 3g.
Re-run with cache enabled and no throttling: foutMs should collapse to 0 (font ready before FCP), confirming the clamp works and that warm-cache visits report no FOUT.

Common Pitfalls

Anchoring the baseline at navigation start. FOUT begins when fallback text paints, not when the navigation begins; anchoring too early inflates every sample. Use FCP.
Marking the swap before flipping the class. If you timestamp before adding fonts-loaded, you record the promise resolution, not the reflow. Flip the class, then mark.
Letting a hung promise drop the sample. On stalled networks document.fonts.ready can take many seconds; without a timeout cap you silently lose your worst cases. Cap and flag them.
Ignoring negative durations. Warm-cache loads can have the font ready before FCP, producing negative deltas. Clamp at 0 so they read as "no FOUT," not garbage.
Measuring FOUT under font-display: optional. With optional the browser may skip the swap entirely on slow networks, so there is no FOUT to measure — and your reporter will record a misleading 0 for sessions that never swapped.

FAQ

Why use document.fonts.ready instead of a per-font FontFace.load() promise?

ready resolves only after every face in the current layout settles, which matches the user-visible "the page is done swapping" moment. A single FontFace.load() resolves when one weight arrives, so on a page using several weights it would report the swap too early.

Does this work in Safari, which has no First Contentful Paint?

Yes — the code falls back to a performance.now() baseline (or a fout-baseline mark) taken at first script execution. It is slightly less precise than true FCP but close enough to track FOUT trends and segment by connection type.

Is the measured value the exact pixel-paint of the swap?

Not to the pixel. document.fonts.ready plus a mark captures the moment the swap is unblocked on the main thread; the actual rasterization follows within the same frame. It is accurate enough for field aggregation, where you care about p75/p95 trends rather than per-session millisecond exactness.