Frontend UX, Chunking & Progress Tracking: Engineering Guide

Uploading a 4 GB video from a flaky mobile connection is not a single HTTP request — it is a long-running, interruptible process that must survive tab refreshes, dropped Wi-Fi, expired tokens, and the user wandering out of cellular range. This guide treats the browser-side upload as three cooperating subsystems: a chunker that slices the file and feeds bytes through a bounded concurrency window, a progress channel that reconciles client byte counts with server-authoritative acknowledgements, and a retry-and-resume loop that turns transient failures into checkpoints instead of restarts. Get those three right and a transfer that loses its connection at 87% picks up at 87% — not at zero.

The patterns here assume you have already solved the storage contract: clients talk to object storage through S3 presigned URL workflows or a tus-style endpoint, and the backend follows direct-to-cloud upload patterns so that application servers never buffer whole files. Everything in this guide lives on the client and the thin progress API in front of it.

Architecture overview

A resilient upload is a loop, not a pipeline. The chunker emits fixed-size slices; each slice is uploaded under a concurrency limit; every success advances a durable offset; every failure is classified and either retried with backoff or surfaced as fatal. A separate progress channel reads the same offset state and pushes throttled updates to the UI, while a server-push channel reports backend processing progress that the client cannot observe directly.

The upload as a loop: chunker feeds a concurrency window, successes advance a durable offset, failures recirculate through retry/resume, and two channels feed the UI.

The single most important design decision is where truth lives. Client-side byte counters tell you what you sent, not what the server committed — and the gap between those two numbers is exactly the data you must not retransmit. Every robust upload keeps an authoritative offset (a byte position, a set of completed part numbers, or a tus Upload-Offset) and treats the network as untrusted in between.

Cross-cutting concerns

State durability. If the only record of progress lives in JavaScript heap, a refresh or crash destroys it. Persist the upload session — file fingerprint, chunk size, completed offsets, and the server-side upload identifier — to IndexedDB after every confirmed chunk. On reload, you reconcile that record against the server before sending a single byte. This is the foundation of resumable upload state machines.

Network resilience. Treat the network as hostile by default. A 503 is retryable; a 400 validation error is not. A dropped socket mid-chunk is recoverable; a 412 Precondition Failed means your offset assumption is stale and you must re-handshake. Distinguishing these classes is the whole job of upload error recovery patterns, and getting the classification wrong produces either retry storms or silent data loss.

Perceived performance. Users judge an upload by the smoothness of the progress bar, not by raw throughput. A bar that jumps from 30% to 100% feels broken even when the transfer was fast, and a bar that freezes at 99% while the server transcodes feels stalled. Separating transfer progress (client-observable) from processing progress (server-push only) and throttling UI writes to animation frames is the domain of real-time upload progress events.

Resumable upload state machines

Modeling an upload as an implicit set of booleans (isUploading, isPaused, hasError) produces impossible states — paused and uploading, errored and complete — and the bugs that follow. An explicit finite state machine makes illegal transitions unrepresentable: an upload is in exactly one of idle, uploading, paused, retrying, completed, or failed, and only declared edges move between them. The machine owns the durable offset and the resume handshake, so the rest of the UI just reads state.

type UploadState =
  | "idle" | "uploading" | "paused" | "retrying" | "completed" | "failed";

type UploadEvent =
  | { type: "START" } | { type: "PAUSE" } | { type: "RESUME" }
  | { type: "CHUNK_OK" } | { type: "ERROR"; fatal: boolean }
  | { type: "ALL_DONE" } | { type: "RETRY_NOW" };

const transitions: Record<UploadState, Partial<Record<UploadEvent["type"], UploadState>>> = {
  idle:      { START: "uploading" },
  uploading: { PAUSE: "paused", ERROR: "retrying", ALL_DONE: "completed" },
  paused:    { RESUME: "uploading" },
  retrying:  { RETRY_NOW: "uploading", ERROR: "failed", PAUSE: "paused" },
  completed: {},
  failed:    { START: "uploading" }, // allow manual restart
};

export function nextState(current: UploadState, event: UploadEvent): UploadState {
  // A fatal ERROR always lands in `failed`, regardless of where it fired.
  if (event.type === "ERROR" && event.fatal) return "failed";
  const target = transitions[current][event.type];
  if (!target) {
    console.warn(`Ignored ${event.type} in state ${current}`);
    return current; // no legal edge: stay put, never crash
  }
  return target;
}

Because the machine is data, you can render it, test every edge in isolation, and persist current alongside the offset. The deep-dive on resumable upload state machines covers the resume handshake (HEAD with Range, or the tus Upload-Offset round trip) and the IndexedDB persistence schema that lets a reloaded tab re-enter uploading at the correct byte.

Real-time upload progress events

Transfer progress and processing progress are different signals with different sources. Bytes leaving the browser are observable through XMLHttpRequest’s upload.onprogress (the Fetch API still cannot report request-body upload progress in most browsers, which is why XHR survives here). Server-side work — virus scanning, transcoding, thumbnail generation — is invisible to the client and must be pushed back over Server-Sent Events or a WebSocket. The UI aggregates both into one honest bar.

interface ChunkProgress { index: number; loaded: number; total: number; }

export function uploadChunkWithProgress(
  url: string,
  blob: Blob,
  index: number,
  onProgress: (p: ChunkProgress) => void,
): Promise<void> {
  return new Promise((resolve, reject) => {
    const xhr = new XMLHttpRequest();
    xhr.open("PUT", url);
    xhr.upload.onprogress = (e) => {
      if (e.lengthComputable) {
        onProgress({ index, loaded: e.loaded, total: e.total });
      }
    };
    xhr.onload = () =>
      xhr.status >= 200 && xhr.status < 300
        ? resolve()
        : reject(new Error(`HTTP ${xhr.status}`));
    xhr.onerror = () => reject(new Error("network error"));
    xhr.send(blob);
  });
}

The hard parts are throttling (firing a React state update on every progress event can saturate the main thread, so coalesce to requestAnimationFrame) and aggregation (summing loaded across N concurrent chunks against the file’s true total). Both are worked through in real-time upload progress events, including when to choose Server-Sent Events versus WebSockets for the server-push leg.

Upload error recovery patterns

The default behavior of fetch on failure is to throw and forget. A resilient client instead classifies the failure, decides whether it is retryable, and — if so — waits a jittered, exponentially growing interval before retrying the same chunk against the same offset. Retries must be idempotent: re-PUTting part 7 to the same byte range produces the same object, so duplicate delivery is harmless. The browser’s online/offline events let you pause the whole machine when the network drops instead of burning the retry budget against a dead link.

const FATAL_STATUS = new Set([400, 401, 403, 404, 422]);

export function isRetryable(status: number | null): boolean {
  if (status === null) return true;          // network error, no response
  if (FATAL_STATUS.has(status)) return false; // client/auth/validation
  return status >= 500 || status === 408 || status === 429;
}

export function backoffDelay(attempt: number, baseMs = 500, capMs = 30_000): number {
  const exp = Math.min(capMs, baseMs * 2 ** attempt);
  const jitter = Math.random() * exp; // full jitter avoids thundering herd
  return Math.floor(jitter);
}

export async function withRetry<T>(
  task: () => Promise<T>,
  classify: (err: unknown) => boolean,
  maxAttempts = 6,
): Promise<T> {
  let attempt = 0;
  for (;;) {
    try {
      return await task();
    } catch (err) {
      attempt += 1;
      if (attempt >= maxAttempts || !classify(err)) throw err;
      await new Promise((r) => setTimeout(r, backoffDelay(attempt)));
    }
  }
}

Full-jitter backoff, online/offline checkpointing, and the 412/409 handshake that re-syncs a stale offset are detailed in upload error recovery patterns. The recovery loop and the state machine share the same durable offset, which is why a recovered upload never re-sends committed bytes.

Decision matrix

Choose the transport and progress strategy by file size, network volatility, and whether the backend does post-upload processing.

Scenario	Transport	Progress channel	Resume strategy	Concurrency
Small file (<5 MB), stable network	Single XHR `PUT`	`upload.onprogress`	Restart on failure	1
Large file, lossy mobile network	Chunked multipart / tus	`upload.onprogress` aggregated	Durable offset + `HEAD`/`Range`	3–4
Long server-side processing	Chunked + control channel	SSE for processing progress	Durable offset	3–4
Bidirectional control (pause/cancel from server)	Chunked + WebSocket	WebSocket frames	Durable offset + tus `Upload-Offset`	4
Background / cross-session resume	tus protocol	Poll `HEAD` then SSE	tus `Upload-Offset` + IndexedDB	2–4

Common failure modes

Stalled progress at 99%. The transfer finished but the bar waits on a server confirmation that never arrives, or the UI conflates transfer with processing. Root cause: the client marks “complete” on the last 200 instead of on a server-authoritative completion event. Fix: only enter completed when the assembly/processing channel reports done; show a distinct “Processing…” phase for the gap.
Duplicate chunks / corrupted assembly. A retried chunk lands twice and the server appends both. Root cause: non-idempotent chunk handling keyed by arrival order rather than byte offset or part number. Fix: key every chunk by (uploadId, partNumber) or absolute byte range so re-delivery overwrites rather than appends; verify the final ETag/checksum.
Lost state on refresh. The user reloads at 60% and the upload restarts from zero. Root cause: session state held only in memory. Fix: persist offset and uploadId to IndexedDB after each confirmed chunk and reconcile via a HEAD handshake on reload.
Retry storms (429/503 cascade). Every chunk retries on a fixed interval simultaneously, re-saturating the failing endpoint. Root cause: synchronized, un-jittered backoff. Fix: full-jitter exponential backoff plus an online gate so the loop sleeps while offline.
Frozen UI under fast networks. A gigabit link fires thousands of progress events and React re-renders on each. Root cause: unthrottled state writes. Fix: coalesce progress into a single requestAnimationFrame write per frame.

FAQ

Should I use the Fetch API or XMLHttpRequest for uploads?

Use fetch for the control plane (creating sessions, completing uploads) and XMLHttpRequest for the bytes when you need a progress bar — fetch still cannot report request-body upload progress reliably across browsers, while xhr.upload.onprogress has worked for a decade. The newer ReadableStream request bodies are not yet broadly usable for this.

How big should each chunk be?

For mobile and lossy networks, 5–8 MB balances retry cost against request overhead; S3 multipart requires parts of at least 5 MB except the last. Smaller chunks recover faster from a single failure but multiply requests; larger chunks waste more work when one fails. Make it configurable and persist the chosen size with the session so a resumed upload re-slices identically.

Do I need both SSE and WebSockets?

Rarely. Use Server-Sent Events when the server only needs to push processing progress to the client — it is simpler, auto-reconnects, and rides plain HTTP. Reach for a WebSocket only when you need the server to send control commands (pause, cancel, throttle) back during the upload, which requires the bidirectional channel.

How do I resume after the user closes the tab entirely?

Persist the file fingerprint, chunk size, completed offset, and server uploadId to IndexedDB. On the next visit, ask the user to re-select the same file, recompute the fingerprint to confirm identity, then issue a HEAD/Range (or tus HEAD for Upload-Offset) handshake to learn the committed offset and re-enter uploading from there.

Frontend UX, Chunking & Progress Tracking: Engineering Guide #

Architecture overview #

Cross-cutting concerns #

Resumable upload state machines #

Real-time upload progress events #

Upload error recovery patterns #

Decision matrix #

Common failure modes #

FAQ

Related

Topics in this section