Attacking the Top 3 Server Memory Challenges

As modern workloads like AI and large databases continue to scale, there is tremendous pressure being put on data center compute resources. Such workloads are often memory-limited, requiring massive amounts of server memory, or DRAM, in order to run performantly. In recent years, there have been some key challenges associated with memory. Let’s take a look at the top 3 most pressing memory problems—and how we can begin to solve them.

Challenge 1: The Memory Cliff

In the era of big data-driven applications, memory bottlenecks are becoming a significant challenge. These applications increasingly demand larger and larger memory footprints to operate performantly. However, the available memory capacity in many systems is struggling to keep pace with escalating demands. This leads to performance degradation, frequent swapping to slower storage, and overall inefficiencies that hinder computational throughput. The industry has attempted to address this “memory cliff” challenge with solutions like Intel Optane, but no attempt has managed to adequately address the problem.

Unfortunately the memory cliff problem is continuing to get worse. While processor complexity and performance has historically followed the exponential growth pattern dictated by Moore's Law, memory technologies such as DRAM have seen much lower improvements in capacity due to fundamental semiconductor scaling challenges for these technologies. There has never been a stronger industry-wide appetite to solve this problem.¹

Challenge 2: Rising Costs

On top of these latency challenges, memory costs are also on the rise. Meta saw the cost of DRAM as a percentage of their data center spend DOUBLE from 2012 to 2022, eventually reaching 33%.² This trend continued with average DRAM prices rising 53% in 2024, with projections indicating a further 35% increase over the course of 2025. This surge can be attributed to the previously mentioned scaling problem as well the increasing adoption of DDR5 and LPDDR5/5X technologies, which, despite price normalization efforts, remain costlier than their predecessors.³ Today, DRAM represents about 50% of the cost of an individual server, and over $100B is spent on it each year globally.⁴

Challenge 3: Poor Utilization

Despite the substantial costs associated with it, various studies have revealed significant inefficiencies in memory utilization. For instance, in Google's datacenters, up to 60% of allocated memory tends to remain idle. Similarly, in Meta's private datacenters, applications tend to use only 30–60% of the allocated memory within a 10-minute window, indicating a considerable portion of memory resources going underutilized.⁵Further research indicates that only 20% of memory is generally utilized in cloud computing environments. This poor utilization and unnecessary over-provisioning leads to increased infrastructure costs and a higher carbon footprint.⁶

Attacking the Challenges

At MEXT, we have been laser-focused on tackling these key memory problems. We realized that the industry needs a novel approach: an easy-to-implement solution that drives maximal utilization and radically lower costs, while not compromising on latency / performance.

We considered the traditional memory hierarchy: it includes CPU caches and DRAM in the “memory” zone, and Flash and HDD / SDD storage devices in the “storage” zone. This sparked a wondering: what if we could bring Flash into the memory tier—in other words, make Flash function as though it were memory? Because Flash is 20x cheaper than DRAM, this approach would solve the cost problem.

The issue with using Flash, however, is latency; reading from Flash can be hundreds of times slower than reading from DRAM. This sparked a new question: how can we solve the Flash latency problem? After various rounds of iteration, we realized that AI could be an excellent way to do it. Here’s how it works.

MEXT AI-Powered Predictive Memory

MEXT’s patented, software-only solution monitors which memory pages in DRAM are hot / actively being utilized and which have gone cold. It continually offloads the cold memory pages to Flash, leverages AI to predict which offloaded pages are likely to be requested by the application, and predictively pushes those pages back to DRAM. From the application’s perspective, all relevant pages are always findable in DRAM, keeping performance intact within a far smaller DRAM fooptrint—yielding major cost efficiencies.

We call it “AI-powered predictive memory”: it’s an unprecedented solution that can transparently, quickly, and accurately predict the future of memory page requests.

Delivering on the Promise

As we deploy across various customer environments, both on-premises and in the cloud, we are witnessing up to 40% reductions in operational costs. On the other hand, certain customers are less interested in cost-cutting, and instead want to give their applications access to a larger memory pool to drive up performance. For these customers, MEXT can be deployed to double or quadruple the system’s effective memory footprint (through a combination of existing DRAM and now Flash-as-memory) while maintaining the same cost profile. We think it’s rather MEXTraordinary.