Lesson 15: NUMA and Memory Access in Multi-Core Systems

Let’s Understand Random Access Memory: The Key to How Your Computer Thinks Fast

🔄 Quick Recap

In the last few lessons, we climbed through the memory hierarchy:

Registers → Caches (L1, L2, L3) → RAM → Storage.
We learned how each layer trades off speed vs size.

But until now, we’ve assumed we only had one CPU core talking to RAM.

In reality, modern computers — from your laptop to massive servers — have multiple CPU cores, sometimes dozens, each trying to use RAM at the same time.

This raises a big question:
👉 How do multiple processors share memory without creating traffic jams?

That’s where NUMA (Non-Uniform Memory Access) comes in.

🧠 What is NUMA?

NUMA stands for Non-Uniform Memory Access.

To understand it, let’s compare it to the older model: UMA (Uniform Memory Access).

UMA (Uniform Memory Access):

All CPU cores share the exact same memory equally.
Any core can access any part of RAM with the same speed.

👉 Analogy: Imagine a group of chefs all using the same single pantry. No matter which chef goes to it, they all take the same amount of time to grab ingredients.

NUMA (Non-Uniform Memory Access):

RAM is divided into regions, and each CPU or group of cores has its own local memory.
Accessing local memory is fast.
Accessing another processor’s memory region is slower.

👉 Analogy: Each chef has their own mini pantry nearby. They can grab things from their own pantry quickly, but if they need something from another chef’s pantry, they have to walk farther.

🏗️ How NUMA Works in Multi-Core Systems

Modern CPUs are built as clusters of cores. For example:

An AMD Ryzen or Threadripper might have multiple CCDs (Core Complex Dies), each with its own cache and memory controller.
In large servers, multiple CPU sockets each have their own local memory banks.

NUMA organizes memory so that:

Each CPU (or socket) talks to its own “closest” RAM first.
If the data isn’t there, it can still access other regions — but more slowly.

📊 UMA vs NUMA Comparison

Feature	UMA	NUMA
Memory Access	Same speed for all cores	Faster local, slower remote
Scalability	Limited	Excellent for many cores/CPUs
Cost	Simpler	More complex, needs smarter OS
Example	Old dual-core CPUs	Modern servers, AMD EPYC, Intel Xeon

🏃‍♂️ Example in Action: Server with 2 CPUs

Imagine a server with 2 CPUs, each with 64 GB of RAM:

CPU A has fast access to its 64 GB.
CPU B has fast access to its 64 GB.
If CPU A tries to fetch data from CPU B’s RAM, it must travel through a special link (like Intel’s QPI or AMD’s Infinity Fabric). This takes longer.

⚡ Why NUMA Matters

In gaming PCs: NUMA usually isn’t a big deal because CPUs only have one memory region.
In servers and high-performance computing (HPC): NUMA matters a lot, because dozens of cores share memory. If software isn’t written to be NUMA-aware, some cores may spend too much time waiting for “far away” memory.

🖥️ Real-World Example: High-Performance Computing

Supercomputers run on NUMA-based designs:

Thousands of processors, each with local memory banks.
Software must be optimized so that each processor works mostly with its local memory.
If not, too much time is wasted fetching “remote” memory, slowing the entire system.

🛠️ NUMA and Operating Systems

Operating systems like Linux, Windows Server, and macOS are NUMA-aware.
They try to:

Assign processes to CPU cores that are close to the memory they need.
Balance workloads so cores don’t spend all their time accessing remote memory.

This is called NUMA balancing.

🔮 The Future: NUMA Everywhere

CPUs with chiplets (like AMD Ryzen, EPYC) use NUMA internally.
GPUs (graphics cards) are beginning to explore NUMA-like designs for massive parallelism.
Cloud data centers optimize workloads with NUMA in mind.

📝 Recap

NUMA (Non-Uniform Memory Access) organizes memory for multi-core and multi-CPU systems.
Local memory = fast, remote memory = slower.
UMA gives equal access speed to all cores but doesn’t scale well.
NUMA is crucial in servers, data centers, and supercomputers.
Operating systems must be NUMA-aware to schedule processes efficiently.