Design a data processing pipeline handling 500 GB datasets on 64 GB RAM machines using mmap, huge pages, NUMA-aware allocation, and zero-copy I/O.
## Problem
Your team needs to build a data processing pipeline that transforms and aggregates large datasets (up to 500 GB) on machines with only 64 GB of RAM. The pipeline reads data from local NVMe storage, applies transformations across multiple processing stages, and writes results to both local disk and a remote object store. The current prototype processes at 400 MB/sec but the target is 2 GB/sec sustained throughput.
Sign up to access the full problem
Design canvas, rubric, hints, and model solutions.
Mid-Senior · Conceptual