Measuring Font Loading Performance

Most font optimization work fails for a simple reason: it is never measured. Teams ship a font-display: swap change, eyeball the page, and declare victory — without ever quantifying when the font requested, when its bytes arrived, or when the glyphs actually painted. Those are three distinct timestamps, and the gaps between them are where Cumulative Layout Shift (CLS), a delayed Largest Contentful Paint (LCP), and visible flashes of unstyled text live. This guide is part of the broader Font Performance Monitoring & Auditing blueprint, and it covers how to capture those timestamps precisely, in both a controlled lab and on real users' devices.

The audience here is frontend and performance engineers who already serve WOFF2 with reasonable font-display values and now need to prove their loading strategy works. We will instrument the full request lifecycle, mark a deterministic "fonts ready" moment, separate lab signals from field signals, and walk an ordered workflow where every step ends in a verification check.

The Problem: Three Timestamps, Not One

"Font load time" is ambiguous because a font passes through several stages that you can each timestamp independently:

  • Request start — the browser discovers the @font-face src (or a <link rel="preload">) and opens a connection.
  • Response end — the last byte of the WOFF2 file arrives.
  • Face ready — the browser has parsed the font and document.fonts reports it loaded.
  • Paint — the layout reflows and the new glyphs are rasterized to the screen.

A diagnostic starting point: open Chrome DevTools, go to the Network panel, filter by Font, and hover a WOFF2 row. The timing tooltip already splits Queueing, Stalled, Request sent, Waiting (TTFB), and Content Download. If "Content Download" dominates, your file is too large — subset it. If "Queueing" or "Stalled" dominates, the font was discovered late — it needs a resource hint. Reading that tooltip is the manual version of everything we automate below.

The reason these stages matter is that each one responds to a different fix. A long download window is a payload problem, solved by subsetting and switching to WOFF2 (roughly 30% smaller than WOFF). A long discovery gap is a scheduling problem, solved by preload or by moving the font declaration earlier in the critical path. A long parse-and-reflow tail is usually a font-display or fallback-metric problem, solved by tuning the swap window or matching fallback metrics with size-adjust. If you only measure a single conflated "load time," you cannot tell which of these levers to pull, and you end up guessing. The whole point of instrumenting separate timestamps is to turn a vague "fonts feel slow" complaint into a specific, fixable interval with a number attached.

Font request lifecycle timeline A timeline showing fetchStart, responseStart, responseEnd, fonts.ready and paint timestamps and the intervals between them. fetchStart responseStart responseEnd fonts.ready paint network wait download parse + face ready reflow PerformanceResourceTiming covers fetchStart → responseEnd; document.fonts.ready covers the rest.
The intervals you can measure separately — and the API that owns each one.

Baseline: The APIs You Will Use

Three browser APIs cover the whole lifecycle. You need all three because none of them sees the full picture alone.

PerformanceResourceTiming gives you network-level timestamps for every fetched font file. Each entry exposes fetchStart, domainLookupStart/End, connectStart/End, requestStart, responseStart, responseEnd, plus transferSize, encodedBodySize, and decodedBodySize. All times are DOMHighResTimeStamp values relative to the navigation start, so they are directly comparable.

The minimal resource-timing read

const fontEntries = performance
  .getEntriesByType('resource')
  .filter((e) => e.initiatorType === 'css' || /\.woff2?($|\?)/.test(e.name));

for (const e of fontEntries) {
  const transfer = e.responseEnd - e.responseStart; // download time
  const total = e.responseEnd - e.fetchStart;        // request → last byte
  console.log(e.name, { transferMs: transfer.toFixed(1), totalMs: total.toFixed(1) });
}

document.fonts.ready is a Promise on the FontFaceSet that resolves once every font face used in the current layout has finished loading (or failed). It is the cleanest signal for "the page's typography is settled."

Awaiting the FontFaceSet

await document.fonts.ready;
performance.mark('fonts-loaded');

performance.mark() and performance.measure() let you place named timestamps on the same timeline as everything else and then compute named intervals. A performance.mark('fonts-loaded') becomes visible in the DevTools Performance panel and is reportable to a Real User Monitoring (RUM) backend.

Why three APIs instead of one? Each owns a different segment of the lifecycle and is blind to the others. PerformanceResourceTiming knows exactly when bytes moved on the wire but has no concept of whether the font was ever applied to the layout — a preloaded font that no selector uses still produces a perfect-looking resource entry. document.fonts.ready knows the FontFaceSet finished but reports a single boolean-ish moment, not per-file network detail, and it resolves slightly before the reflow paints. performance.mark() knows nothing on its own; it is the glue that lets you stamp application-specific moments — "fallback painted," "fonts swapped," "hydration done" — onto the same DOMHighResTimeStamp clock so you can subtract them. Combine all three and you can attribute every millisecond between request and paint to a named, fixable cause. Use any one alone and you will mis-attribute the delay.

One subtlety worth internalizing: every timestamp these APIs return is a DOMHighResTimeStamp measured from the same time origin (navigation start for the page's document). That shared origin is what makes cross-API subtraction valid. You can take responseEnd from a resource entry, startTime from a paint entry, and startTime from your own mark, and compute differences between them with confidence, because they are all ticks on the same monotonic clock. This is also why you should never mix in Date.now() — wall-clock time can jump backward (NTP correction) and is on a different origin, which corrupts any interval you compute against it.

A note on transferSize semantics, because it trips people up: transferSize is the bytes that crossed the network including response headers, encodedBodySize is the compressed body, and decodedBodySize is the body after decompression. For a font served from cache, transferSize is 0 while decodedBodySize is the full font size — that asymmetry is how you detect a cache hit programmatically. For a cross-origin font without a Timing-Allow-Origin header, all three sizes and the granular timestamps read 0, so you must feature-detect before reporting them as real numbers.

Step-by-Step Measurement Workflow

Work through these in order. Each step ends with a concrete check so you never advance on an assumption. The flow is deliberately diagnosis-first: you capture raw timestamps before you compute anything, you compute a single headline number before you report it, and you only ship to the field once the lab numbers make sense. Skipping ahead — for example wiring up a RUM beacon before you have validated the math against a DevTools waterfall — is the fastest way to fill a dashboard with numbers nobody trusts.

Step 1 — Capture the resource entry for your critical font

Run the resource-timing snippet above in the console on a cold load (disable cache in DevTools first). Identify the WOFF2 file that backs your above-the-fold text.

Verification: you should see one entry per font weight you actually use. If a weight you did not expect appears, an unused @font-face is being requested — remove it. If a weight is missing, the browser never needed it (good) or your selector never matched (investigate).

Step 2 — Quantify download vs. discovery latency

For the critical entry, compute responseStart - fetchStart (time to first byte, dominated by discovery + connection) and responseEnd - responseStart (download).

Verification: if fetchStart is much later than the navigation's responseEnd for the HTML, the font was discovered late. Add a preload and re-measure — fetchStart should drop toward the top of the waterfall. The dedicated guide on preloading critical fonts without blocking LCP covers the priority trade-offs.

Step 3 — Mark the fonts-loaded moment

Add the await document.fonts.ready; performance.mark('fonts-loaded') snippet to your application bootstrap, then performance.measure('font-block', 'navigationStart', 'fonts-loaded') — or measure from your first paint mark.

Verification: open the DevTools Performance panel, record a load, and confirm a fonts-loaded marker appears in the Timings track. Its position relative to First Contentful Paint (FCP) tells you how long users saw fallback glyphs.

Step 4 — Correlate with the swap, not just the load

document.fonts.ready resolving is necessary but not sufficient for the visual swap — the browser still has to reflow. To approximate the paint, add a class on ready and timestamp it; readers measuring the visible flash specifically should follow measuring FOUT duration in the field.

Verification: in the Rendering drawer enable Paint flashing; reloading should flash the text region exactly once when the swap occurs, confirming a single reflow rather than a cascade.

Step 5 — Compute a single, reportable number

Derive swapDelay = fontsLoadedMark - firstContentfulPaint. This is the headline metric: how many milliseconds the user spent reading fallback type.

Verification: log the value and sanity-check it against the Network waterfall — swapDelay should never be smaller than responseEnd - FCP for the critical font.

Step 6 — Ship the measurement to the field

Wrap steps 1, 3, and 5 in a small reporter that posts to your analytics endpoint via navigator.sendBeacon(). Use PerformanceObserver to track font load time with buffered: true so you do not miss entries that fired before your script ran.

Verification: trigger the beacon, then confirm the payload arrives in your RUM dashboard with a non-null swapDelay from at least one real session.

Lab vs. Field: Two Different Truths

A measurement taken on your laptop over office wifi (lab data) and a measurement aggregated from real users (field data) answer different questions, and conflating them is the most common analysis mistake.

Dimension Lab (synthetic) Field (RUM)
Tooling DevTools, Lighthouse, WebPageTest PerformanceObserver + beacon to RUM
Network Throttled profile you choose Real device + connection diversity
Cache state Controllable (cold/warm) Mostly warm; partitioned per top-level site
Sample size One run (or a few) Thousands of sessions; use p75/p95
Best for Debugging a regression, reproducing Knowing the true user-experienced delay
Blind spot Misses slow real networks & old devices Can't single-step a root cause

The practical rule: diagnose in the lab, decide with the field. A lab run tells you why a font is slow; the field p75 tells you whether it matters to enough users to fix. Note that HTTP cache has been partitioned by top-level site since Chrome 86 / Firefox 85, so field cache-hit rates for fonts on shared CDNs are far lower than a naive lab test suggests — the old assumption that a popular font is "probably already cached from another site" is no longer true, and your field numbers will reflect that even when your warm-cache lab test does not.

A second reason to separate the two: the lab measures a device you chose, while the field measures the devices you actually have. Your development laptop has a fast CPU, so font parsing and reflow cost almost nothing; a mid-tier Android phone may spend tens of milliseconds just decoding and shaping the same WOFF2. That CPU cost lands in the gap between responseEnd and the visible swap — precisely the interval your resource-timing instrumentation cannot see and only the fonts.ready-to-paint measurement captures. If your lab numbers look great but field swapDelay at p75 is stubbornly high, suspect device CPU and reflow cost, not the network.

Finally, treat lab and field as a loop, not a one-time check. When the field p75 for swapDelay regresses, reproduce it in the lab with a matching throttling profile, isolate the offending interval, ship a fix, and then confirm the field metric recovers over the following days as the change propagates to real sessions. Without the field half of that loop you cannot know whether a lab improvement reached anyone; without the lab half you cannot find the cause.

Connecting Font Timing to Core Web Vitals

The reason any of this measurement matters is that font timing feeds directly into two of the three Core Web Vitals. When a web font backs your largest text block, the font's arrival can gate Largest Contentful Paint — the LCP candidate cannot reach its final rendered state until the font paints, so a late font pushes LCP past the 2.5s "good" threshold. Your swapDelay metric is, in those cases, a leading indicator of an LCP problem: shorten the delay and LCP improves in lockstep.

The second link is to Cumulative Layout Shift. Every swap from fallback to web font that changes the rendered text dimensions produces a layout shift, and if your fallback metrics are unmatched, that shift can be large enough to break the 0.1 CLS budget on its own. This is where measurement and mitigation meet: instrument the swap (as in the field-FOUT workflow), then close the gap with size-adjust, ascent-override, and descent-override on the fallback @font-face so the swap is dimensionally invisible. The diagnostic loop is "measure the swap timing, observe the CLS it causes, match the metrics, re-measure CLS to confirm it dropped." Interaction to Next Paint (INP, target under 200ms) is largely insulated from font timing, but a font-triggered reflow landing during an interaction can still cost you — which is one more reason to know exactly when your reflows happen.

The practical takeaway: do not measure font timing as an isolated vanity metric. Tie each number to the vital it influences. A swapDelay figure is only actionable when you know whether the affected text is your LCP element (then it is an LCP problem) and whether the swap shifts layout (then it is a CLS problem). Pages where the web font backs neither the LCP element nor any above-the-fold block can tolerate a longer swap delay than your dashboard's red threshold might suggest.

Browser Compatibility Matrix

API / Feature Chrome Firefox Safari Notes
PerformanceResourceTiming 25+ 35+ 11+ Universally available; the workhorse
transferSize / encodedBodySize 64+ 71+ 15.4+ Older Safari returns 0; feature-detect
PerformanceObserver (resource) 52+ 57+ 11+ Use buffered: true for early entries
document.fonts.ready 35+ 41+ 10+ Resolves on load or failure
document.fonts.check() 35+ 41+ 10+ Synchronous "is this face usable now"
performance.mark/measure 28+ 41+ 11+ measure with start/end marks: Chrome 78+
navigator.sendBeacon 39+ 31+ 11.1+ Survives page unload for field reporting

Always feature-detect before relying on a sized field. A common defensive guard is if ('fonts' in document && document.fonts.ready).

Code Examples

Full instrumentation: timing + mark + field beacon

function reportFontTiming() {
  if (!('performance' in window) || !('getEntriesByType' in performance)) return;

  document.fonts.ready.then(() => {
    performance.mark('fonts-loaded');

    const fcp = performance
      .getEntriesByType('paint')
      .find((p) => p.name === 'first-contentful-paint');

    const loaded = performance.getEntriesByName('fonts-loaded')[0];
    const fonts = performance
      .getEntriesByType('resource')
      .filter((e) => /\.woff2?($|\?)/.test(e.name))
      .map((e) => ({
        name: e.name.split('/').pop(),
        transferMs: +(e.responseEnd - e.responseStart).toFixed(1),
        bytes: e.transferSize || e.encodedBodySize || null,
      }));

    const payload = {
      swapDelayMs: fcp && loaded ? +(loaded.startTime - fcp.startTime).toFixed(1) : null,
      fonts,
    };

    navigator.sendBeacon?.('/rum/font-timing', JSON.stringify(payload));
  });
}
reportFontTiming();

Checking a specific face synchronously

// Has the exact face we need already painted? No promise required.
if (document.fonts.check('700 16px "Inter"')) {
  document.documentElement.classList.add('inter-bold-ready');
}

Common Pitfalls

  • Treating loadEventEnd as font-ready. The window load event can fire before or after fonts settle depending on font-display; it is not a font signal. Use document.fonts.ready.
  • Forgetting buffered: true. A PerformanceObserver registered after the font already loaded sees nothing unless you replay buffered entries — you silently lose your fastest, most-cached sessions.
  • Reading transferSize without feature-detecting. Older Safari and any cross-origin response without Timing-Allow-Origin report 0, skewing your size aggregates toward zero.
  • Measuring on a warm cache. A second reload reads the font from disk in single-digit milliseconds, making any change look like an improvement. Always disable cache for lab diagnosis.
  • Reporting a mean instead of a percentile. Font delay is heavily skewed, with a slow upper range; the mean hides the slow-network users you most need to help. Aggregate field data at p75/p95.
  • Confusing fonts.ready with the visual swap. The promise resolves before the reflow paints. For the user-perceived flash, timestamp the class swap instead.
  • Subtracting Date.now() from a DOMHighResTimeStamp. The two are on different time origins and different clocks; mixing them yields nonsense intervals that can even go negative. Stay on the performance clock end to end.
  • Aggregating sizes when some entries are zeroed. A cross-origin font without Timing-Allow-Origin reports transferSize: 0; averaging it in drags your "median font size" toward zero and hides a real payload problem. Filter zeroed entries out of size stats.

Frequently Asked Questions

Why is document.fonts.ready resolving but my text still looks unstyled?

The promise resolves when the FontFaceSet finishes loading, but if no element on the page currently requests that family — for example it is set on a not-yet-rendered component — the browser may have loaded the face without applying it. Confirm with document.fonts.check() against the exact weight size family string, and ensure an element actually uses the family.

Should I measure font timing in the lab or with RUM?

Both, for different reasons. Use lab tooling (DevTools, Lighthouse, WebPageTest) to reproduce and root-cause a slow font; use RUM to learn the real distribution across devices and networks. A change that looks great on throttled lab "Fast 3G" can still leave p95 field users with a long swap delay.

Does PerformanceResourceTiming include the font for cross-origin CDNs?

The entry exists, but the detailed timestamps and transferSize are zeroed unless the font response sends a Timing-Allow-Origin header (or *). Self-hosting your fonts removes this blind spot entirely and is one reason teams move fonts in-house.

How do I measure the time the user actually sees fallback text?

Compute the gap between First Contentful Paint and your fonts-loaded mark, then refine it by timestamping the moment you add a fonts-loaded class (which triggers the reflow). The dedicated field-measurement guide below walks this end to end.

Related