Collection: Cache / Expansion

Beyond DRAM: The Next Frontier of AI Server Memory

The memory wall is real. As AI models grow larger and inference workloads demand that more model weights, KV caches, and feature vectors live in fast memory simultaneously, the physical limits of DRAM — slot counts, power envelopes, and cost per gigabyte — become genuine infrastructure constraints. The teams building the most capable AI systems today are the ones who have moved beyond thinking about memory as a fixed resource and started treating it as an expandable, tiered architecture.

CXL (Compute Express Link) memory expansion, storage cache accelerators, and persistent memory technologies represent the current frontier of this architectural evolution. They allow AI servers to access memory capacities that would be physically impossible with DRAM alone, at latencies and bandwidths that make them practical for real AI workloads — not just theoretical benchmarks. DVUN's Cache / Expansion collection brings these technologies to AI infrastructure teams who are ready to push past the limits of conventional memory architecture.

Technologies in This Collection

  • CXL Memory Expansion Modules: PCIe-attached memory expansion using the CXL 1.1/2.0 protocol. Add 256GB to 2TB of additional memory capacity per module, accessible at near-DRAM latency from the host CPU. Ideal for large language model inference servers that need to hold multiple model instances in memory simultaneously.
  • Storage Cache Accelerators: NVMe-based caching appliances that sit between your storage network and your compute nodes, caching hot data in fast local flash to reduce storage network traffic and improve effective I/O throughput for training workloads.
  • Persistent Memory (PMEM): Byte-addressable persistent memory that combines the capacity of storage with the access speed of DRAM. Useful for checkpoint staging, feature store caching, and workloads that benefit from memory that survives a power cycle.
  • GPU Memory Extension: Software and hardware solutions that extend effective GPU memory capacity by transparently tiering between GPU VRAM, system DRAM, and NVMe storage — enabling larger models to run on existing GPU hardware.

When Cache / Expansion Changes the Equation

The 70B Model Inference Problem: You need to serve a 70B parameter model in production, but your inference servers only have 80GB of GPU VRAM per card. Loading the full model requires either model parallelism across multiple GPUs or a memory architecture that can hold model weights in fast system memory and stream them to the GPU on demand. CXL memory expansion makes the latter practical — giving your inference server 512GB to 2TB of fast memory for model weight storage without the complexity of multi-GPU model parallelism. See our Memory Kits for the DRAM foundation this architecture builds on.

The Training Checkpoint Bottleneck: Your training job writes checkpoints every 1000 steps, and each checkpoint write takes 3 minutes because it's going to network storage. Persistent memory or a local storage cache accelerator can reduce this to under 30 seconds by staging checkpoints locally before asynchronously replicating to shared storage. Pair with our NVMe Storage for the local storage tier that completes this architecture.

Specifications Overview

  • CXL memory: 256GB to 2TB per module, CXL 1.1/2.0 protocol, PCIe Gen5 interface
  • CXL latency: 200–300ns (vs. 80–100ns for local DRAM)
  • Storage cache: 6.4TB to 25.6TB NVMe cache capacity, up to 20GB/s throughput
  • Persistent memory: 128GB to 512GB per module, byte-addressable
  • GPU memory extension: supports models up to 10x GPU VRAM capacity on select configurations
  • Interface: PCIe Gen4/Gen5, CXL 1.1/2.0, DDR5 DIMM slot (PMEM)

Expand What's Possible. Don't Replace What Works.

The best infrastructure investments are the ones that extend the value of what you've already built. DVUN's Cache / Expansion collection lets you push your existing AI servers further — more memory, faster data access, larger models — without a full platform replacement. Request a quote for memory expansion architecture design, or contact our team to discuss which technology is the right fit for your specific workload constraints.

No products found
Use fewer filters or remove all