This week at the NVIDIA GPU Technology Conference (GTC), flash storage vendor Pure Storage announced an extension to its engineered systems for artificial intelligence. For those not familiar with the term, an engineered system is a turnkey solution that brings together all the technology components required to run a certain workload.
The first example of this was the vBlock system introduced by VCE, a joint venture between VMware, Cisco Systems and EMC. It included all the necessary storage, networking infrastructure, servers and software to stand up a private cloud and took deployment times from weeks or even months to just a few days.
During the past decade, compute platforms have become increasingly disaggregated as companies desired the freedom to pick and choose which storage, network or server vendor to use. Putting the components together for low performance workloads is fairly easy. Cobbling together the right piece parts for high-performance ones, such as private cloud and AI is very difficult–particularly in the area of tuning the software and hardware to run optimally together. Engineered systems are validated designs that are tested and tuned for a particular application.
Airi Comes in Three Versions
Pure’s platform is known as Airi, which was announced at GTC 2019 and uses NVIDIA DGX servers, Arista network infrastructure and Pure Storage Flashblades. There are currently three versions of Airi that range from 2 PFlops of performance to 4 PFlops and 119 TB of flash to 374 TB. All three versions of Airi are single chassis systems. The new ones announced at GTC are multi-chassis systems where multiple Airis can be daisy chained together to create a single, larger logical unit.
Both can accommodate up to 30×17 TB blades. One version uses up to 9 NVIDIA DGX-1 systems for a total compute capacity of 9 Pflops. The other can be loaded up with up to 3 NVIDIA DGX-2 systems for a total processing capability of 6 Pflops per unit. The new units use Mellanox’s (recently acquired by NVIDIA) 100 Gig low-latency Ethernet.
The use of Mellanox Ethernet may seem strange, because it’s the market leader in Infiniband, which often is used to interconnect servers. Its low-latency Ethernet has performance characteristics that are closed to Infiniband, and scaling out Ethernet is simpler with it. The new Airi systems can be scaled out to 64 racks with a leaf-spine network for a massive amount of AI capacity.
The leaf-spine network architecture is the best network topology for multi-chassis, because it offers consistent performance, high bandwidth, rapid scale and high availability. Companies can use the new Airi systems to start small with a single chassis and then scale out as required.
AI-Optimized Version of Engineered-System Flashstack
Also, at GTC 19, Pure Storage announced an AI-optimized version of Flashstack, which is its engineered system using Cisco UCS servers and Nexus data center switches. The new Flashstack for AI uses the Cisco UCS C480 M5 ML AI servers that is optimized for deep learning. The server contains up to eight NVIDIA Tesla V100 GPUs that use the NVLink interconnect to make the eight processors work like a single, massive GPU. Flashstack uses Cisco’s 100 Gig Nexus storage and Pure’s Flashblade system.
The company does have other Flashstack systems, but those were not optimized. Currently the system can’t be set up in multi-chassis configuration the way Airi can, but that’s likely coming in the not-too-distant future.
Cisco and Pure have a strong partnership and offer a unique way of simplifying the entire data pipeline for AI. Cisco has a wide range of servers for every step in the AI cycle. The UCS C220 is ideal for data collection, and the UCS C240 is optimal for the clean and transform phase. As the graphic shows, a single Flashblade data hub can share the entire data set across the AI lifecycle.
[To see a larger version of the graphic at upper left, right-click on it and select “View Image.”]
The combination of NVIDIA, Mellanox and Pure Storage or even Cisco and Pure have hundreds of possible configuration knobs and levers to tune. While the hardware settings might look complex, they pale in comparison to the AI software. As an example, TensorFlow alone has more than 800 configuration parameters. With Airi and Flashstack, all the heavy lifting has been done and customers can get the product up and running in just a few days. In highly competitive industries, this could make the difference between being a market leader and a laggard.
The AI era has arrived, and IT professionals need to be ready. Even the most technically astute engineers likely don’t have the skills to get an AI ready system up and running quickly. The majority of organizations looking to embark on AI should look to an engineered system to de-risk the deployment.
Zeus Kerravala is the founder and principal analyst with ZK Research. He spent 10 years at Yankee Group and prior to that held a number of corporate IT positions.