June 2025

VAST improvements: Our High-Performance AI Storage Strategy

At Voltage Park, I am part of a small team. We help companies, researchers, and developers. Our job is to build strong infrastructure. This infrastructure can manage large AI workloads.

We operate more than $1 billion in AI hardware across six data centers. Every day, we support customers training large models, launching new products, and running demanding compute jobs.

As Principal Network Engineer, I focus on the foundation that keeps everything running: AI storage management. Whether it’s checkpointing during a long training run or onboarding a customer with thousands of GPUs, the performance and reliability of storage for AI shapes the entire experience. When you're working with trillion-parameter models or streaming massive datasets, storage must deliver without compromise.

The Storage Problem in AI at Scale

AI workloads create storage problems legacy systems simply can't handle. In this moment, those jobs involve:

Weeks or months of non-stop training that require consistent checkpointing
Teams with different approaches to moving data: direct S3 access, hierarchical local storage, shared network storage
Multi-tenant infrastructure that demands data isolation, encryption, and performance guarantees
Compliance with standards like SOC 2, HIPAA, FINRA, and SEC

Virtualized systems often introduce performance tradeoffs. Our customers don’t want compromise. They want bare metal. They want predictability. They want speed.

Why We Chose VAST

When new customers started asking, “Do you support VAST?” we listened.

VAST gave us the right combination of speed, reliability, and control. It worked with our bare-metal GPU cloud. It met the needs of a multi-tenant setup. This gave our customers the confidence to move quickly with large models and complex workflows. We made it the standard across all six of our data centers.

EBox: The Game-Changer

The introduction of VAST’s EBox platform pushed performance even further. In production, it outperformed benchmarks and gave us a storage layer that kept up with growth.

One customer scaled from 64 GPUs to 2,000 GPUs in under six months. Others run mixture-of-expert models that move terabytes of training data across clusters. With EBox in place, we moved the data without delay. Our systems stayed responsive, stable, and fast.

Real Results: What Customers Are Seeing

Customers who left hyperscalers and moved to Voltage Park saw the difference right away. In one case, a team told us the AI storage performance with VAST on our system was better than any major cloud provider.

Other customers run jobs for 3 to 4 months solid with frequent checkpointing and no disruptions. They don’t need to worry about whether their data will be available or whether storage will slow them down. They trust that when they hit “run,” the infrastructure will keep pace.

What Success Looks Like

Our measurement is unorthodox but correlates directly to the problem we are solving.

We ask a simple question: Is VAST keeping anyone awake at night?

We want the answer to always be “no.”

VAST doesn’t slow us down. It doesn’t hold back our customers. It stays in the background like good infrastructure should.

Customers who don’t realize VAST and Voltage Park are part of their tool stack are our gold standard. They’re not experiencing latency, or gaps in coverage and outages. Everything is simply running as it should be, as good infrastructure should be.

AI data storage supports innovation and doesn’t stand in its way.

Stop Searching, Start Building

As demand for fast, scalable storage for AI applications grows, VAST remains a key part of how we deliver speed and stability across our GPU cloud. The workloads will keep growing. The expectations will keep rising. And we’ll keep building with partners that help us move fast.

If your storage is slowing you down, you haven’t tried Voltage Park. Deploy with us today.

-Drew Pletcher, Principal Network Engineer, Voltage Park