Best Practices for Handling 500MB File Uploads
Uploading 500MB+ payloads requires bypassing standard browser memory constraints and network timeout defaults. Standard FormData submissions will trigger V8 heap exhaustion or gateway timeouts. This guide details a production-ready architecture using File.slice(), resumable chunking, and binary streaming. You must understand Upload Fundamentals & Browser APIs before implementing custom chunking logic.
Key architectural requirements:
- Avoid Base64 encoding to prevent 33% payload bloat and heap crashes.
- Implement chunked binary uploads via
fetch()with strict concurrency limits. - Integrate jittered exponential backoff for transient network failures.
Client-Side Chunking with File.slice() & Blob Objects
Breaking a 500MB file into 5β10MB slices prevents browser OOM crashes. The File.slice() method creates lightweight Blob references without duplicating underlying memory. This enables zero-copy chunking across the main thread.
Calculate dynamic boundaries using navigator.hardwareConcurrency to avoid saturating the network stack. Attach upload_id and chunk_index metadata to every request header. Review Handling Large File Size Limits when configuring proxy thresholds.
// Production-ready chunk generator
async function* generateChunks(file, chunkSize = 8 * 1024 * 1024) {
const totalChunks = Math.ceil(file.size / chunkSize);
const uploadId = crypto.randomUUID();
for (let i = 0; i < totalChunks; i++) {
const start = i * chunkSize;
const end = Math.min(start + chunkSize, file.size);
const blob = file.slice(start, end);
yield {
blob,
uploadId,
chunkIndex: i,
totalChunks,
fileName: file.name,
mimeType: file.type
};
}
}
Binary Streaming via Modern Fetch API
Transmit raw binary chunks efficiently by bypassing FormData overhead. Set Content-Type: application/octet-stream to enable direct binary streaming. This eliminates multipart parsing bottlenecks on the gateway.
Use AbortController to cancel in-flight requests during navigation. Enable keepalive: true to allow background uploads during page transitions. Explicitly track Content-Length to prevent chunked transfer encoding fragmentation.
async function uploadChunk(chunkData, signal) {
const response = await fetch(`/api/upload/${chunkData.uploadId}`, {
method: 'POST',
headers: {
'Content-Type': 'application/octet-stream',
'X-Chunk-Index': chunkData.chunkIndex.toString(),
'X-Total-Chunks': chunkData.totalChunks.toString(),
'Content-Length': chunkData.blob.size.toString(),
'X-File-Name': encodeURIComponent(chunkData.fileName)
},
body: chunkData.blob,
signal,
keepalive: true
});
if (!response.ok) {
const errorBody = await response.text().catch(() => 'No details');
throw new Error(`Chunk ${chunkData.chunkIndex} failed [${response.status}]: ${errorBody}`);
}
return response;
}
Timeout Management & Exponential Backoff Retry Logic
Network instability requires deterministic retry strategies. Configure AbortSignal.timeout(30000) per chunk to prevent hanging connections. Implement jittered exponential backoff to avoid thundering herd scenarios.
Verify server 201 Created or 202 Accepted before advancing the index. Persist state in sessionStorage to survive tab crashes. Never retry without validating the previous chunkβs server receipt.
async function uploadWithRetry(chunkData, maxRetries = 3) {
let attempt = 0;
const baseDelay = 1000;
const maxDelay = 10000;
while (attempt <= maxRetries) {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000);
try {
await uploadChunk(chunkData, controller.signal);
clearTimeout(timeoutId);
return true;
} catch (error) {
clearTimeout(timeoutId);
if (error.name === 'AbortError' || attempt === maxRetries) {
throw error;
}
const jitter = Math.random() * 500;
const delay = Math.min(baseDelay * Math.pow(2, attempt) + jitter, maxDelay);
await new Promise(resolve => setTimeout(resolve, delay));
attempt++;
}
}
}
Implementation Patterns: Full-Stack Architecture
Combine the generator, fetch wrapper, and retry logic into a cohesive orchestrator. Emit observability metrics at each stage. Log chunk completion events to your APM.
// Frontend Orchestrator
class ResumableUploader {
constructor(file) {
this.file = file;
this.sessionKey = `upload_${file.name}_${file.lastModified}`;
this.state = JSON.parse(sessionStorage.getItem(this.sessionKey) || '{"lastChunk": -1}');
}
async start() {
const generator = generateChunks(this.file);
const promises = [];
for await (const chunk of generator) {
if (chunk.chunkIndex <= this.state.lastChunk) continue;
const p = uploadWithRetry(chunk)
.then(() => {
this.state.lastChunk = chunk.chunkIndex;
sessionStorage.setItem(this.sessionKey, JSON.stringify(this.state));
performance.mark(`chunk_${chunk.chunkIndex}_complete`);
console.log(`[OBSERVABILITY] Chunk ${chunk.chunkIndex} persisted.`);
});
promises.push(p);
// Limit concurrency to 4 parallel requests
if (promises.length >= 4) {
await Promise.race(promises);
promises.splice(promises.findIndex(p => p === promises[0]), 1);
}
}
await Promise.all(promises);
console.log('Upload finalized. Trigger server assembly.');
}
}
Server-side reassembly requires strict validation. Append chunks to a temporary stream only after signature verification. Trigger final processing exclusively when the expected size matches.
# Backend Reassembly (FastAPI/Python)
import os
from fastapi import APIRouter, Request, HTTPException
from pathlib import Path
router = APIRouter()
UPLOAD_DIR = Path("/tmp/uploads")
UPLOAD_DIR.mkdir(exist_ok=True)
@router.post("/api/upload/{upload_id}")
async def receive_chunk(upload_id: str, request: Request):
chunk_index = int(request.headers.get("X-Chunk-Index", -1))
total_chunks = int(request.headers.get("X-Total-Chunks", 0))
file_path = UPLOAD_DIR / f"{upload_id}.part"
if chunk_index < 0:
raise HTTPException(400, "Missing chunk index")
# Append raw binary stream safely
with open(file_path, "ab") as f:
async for chunk_bytes in request.stream():
f.write(chunk_bytes)
# Verify completion threshold
expected_size = total_chunks * 8 * 1024 * 1024
if os.path.getsize(file_path) >= expected_size:
final_path = UPLOAD_DIR / f"{upload_id}.final"
os.rename(file_path, final_path)
# Trigger async processing queue here
return {"status": "assembled", "path": str(final_path)}
return {"status": "chunk_received", "index": chunk_index}
Common Pitfalls & Diagnostics
| Issue | Root Cause | Diagnostic & Mitigation |
|---|---|---|
Client OOM from readAsDataURL() |
Base64 triples memory usage, exceeding V8 heap limits. | Diagnose: Monitor performance.memory.usedJSHeapSize. Fix: Stream raw Blob objects. Use readAsArrayBuffer() only for cryptographic hashing. |
| Silent 413/504 Errors | Reverse proxies enforce 1β5MB limits and 60s timeouts. | Diagnose: Check Nginx/Cloudflare access logs for 413 or 504. Fix: Set client_max_body_size 512m; and proxy_read_timeout 300s;. Disable request buffering. |
| Parallel Chunk Race Conditions | Out-of-order writes corrupt the final binary stream. | Diagnose: Compare md5sum of assembled file vs original. Fix: Enforce sequential processing or implement Redis chunk locks. Validate chunk_index before appending. |
Frequently Asked Questions
Why does my 500MB upload fail with a 413 error despite correct client code?
Reverse proxies (Nginx, Cloudflare) and load balancers enforce strict client_max_body_size limits. You must explicitly increase these thresholds and disable request buffering for large payloads.
Should I use Base64 or binary encoding for large file uploads?
Always use binary encoding. Base64 increases payload size by ~33%, doubles memory allocation, and significantly slows down parsing and network transmission.
How do I resume a 500MB upload after a network drop?
Persist the upload_id and last successful chunk_index in sessionStorage. On reconnect, query the server for completed chunks and resume streaming from the next slice using File.slice().