Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
Analysis of AI storage requirements: from data pipelines to model optimization
Full text overview
With the rapid development of artificial intelligence technology, AI is increasingly widely used in various industries. However, the performance of AI systems not only depends on powerful computing power, but is also closely related to the optimization of storage systems. AI workloads are multi-staged, highly parallel, and diverse, making storage requirements very different from traditional applications. From data ingestion to model inference, each phase places different demands on the storage system. Understanding these requirements can not only help optimize system performance, but also provide more efficient storage solutions for enterprise AI applications, thereby improving overall competitiveness.
Read the harvest
Why is the storage of AI scenes so special?
The difference between AI storage and traditional storage is that AI workloads are typically multi-stage, and the requirements and patterns of each stage are different. AI optimization targets are also different from traditional storage, focusing more on GPU utilization and data scientist productivity. In addition, AI tasks often involve highly parallel operations, and different AI tasks have different requirements for performance and capacity.
AI data pipelines/workflows
The different stages of AI processing are depicted in the image, which mainly includes the following parts:
Cite
In the previous article, Solidigm gave a more down-to-earth storage selection scheme around the IO characteristics of the data pipeline, which can be combined to take a look.
Model building phase
The authors emphasize that for the AI industry, the core attention and resource investment is in the model building stage, which mainly involves the first five steps in the figure above.
In the process:
Target:
Note:
Model inference applications
Note
At the time of the SDC24 meeting, DeepSeek had not yet iterated the current V3/R1 version, and the inference demand was not fully opened. Looking at tomorrow from today, the retention and persistence of enterprise production data is an indispensable part of the follow-up insight into production laws. In the hustle and bustle of the industry to improve productivity with large models, it is necessary to think more calmly, where is the data of enterprises? In what way should I feed it to the big model?
Take data ingestion, for example
The image discusses the process of data ingestion and suggests how AI impacts every aspect of enterprise capturing, storing, and accessing business data. Although companies are already collecting data, the application of AI may change the way data is processed and stored, affecting the efficiency of data utilization.
==
Business data is already being captured, but:
Real-world example: Data flow before AI was not used
Before companies used AI, their data ingestion relied primarily on sequential writes, and a large amount of data generated was discarded. Only a small fraction of the data is read and saved randomly. This processing is likely to change with the use of AI, which can help make more efficient use of this data.
AI-powered data flows
The image shows how AI technology can be used to extract value from data.
Data generation is processed through business logic and AI-enhanced business logic, ultimately generating data insights. The saved data provides possible business insights for the future, while also reflecting the read-write nature of the data.
Note
Recently, a lot of needs have been discussed: DS is so good, can you help me check XXX data? Compared with the all-parameter model inference, the small model is actually a huge market for data filtering and insight in the production process, and the model will become a throughput machine for logs and process data.
Storage characteristics of data workflows
Data cleansing
The legend on the left shows that the data cleaning process requires a large storage system: large capacity, sequential writing process, and random reading process.
Feature engineering
The picture illustrates the main steps of feature engineering. Data scientists play the role of translators in this process, turning raw data into numbers that AI can process. Feature engineering involves multiple steps such as exploring data, extracting features, and transforming data types. This process is typically computationally intensive and highly parallel.
===
The legend on the left shows that the requirements of the feature engineering process for the storage system: large capacity, and the read/write process is mainly random.
Model training phase
Several aspects related to GPU and storage planning are discussed, emphasizing the importance of balancing storage and GPU performance, understanding data sources, and how to optimize for known workloads. There are also specific GPU benchmarking tools and training requirements.
===
Model training – general storage planning
– Maximizing GPU utilization optimizes investment
Is the training sample size here a parameter scale or raw data? How is the sample size determined?
The size of the training sample usually refers to the size of the raw data, not the size of the parameters. “Sample” here refers to an instance of data used for training, typically input data (such as images, text, sounds, etc.) and their corresponding labels or targets.
Determining the size of the training sample can be considered in the following ways:
How to determine the number of model parameters:
Illustrate:
Suppose a model has 3 layers, each layer has 1000 neurons, and each neuron is connected to each neuron in the previous layer, then the number of parameters of the model is approximately:
1000 (input neurons) × 1000 (output neurons) + 1000 (bias) = 1,001,000 parameters
If there are 3 layers, the total number of parameters is 1,001,000 × 3 = 3,003,000 parameters.
For more complex models, such as GPT-3 (with 175 billion parameters), the increase in the number of parameters is mainly achieved by deepening the number of layers and increasing the number of neurons per layer.
Checkpoint mechanism
The problems that may arise during model training are discussed, especially when it comes to storage performance when it comes to checkpointing mechanisms.
Checkpoints are used to save the state of the model, such as weights and biases, so that it can be recovered if an error occurs during training. Writes to files are usually sequential, and there may be multiple sequential writes in parallel.
Training pauses can impact performance, and the recovery process often requires high sequential reads and parallel reads to recover to multiple GPUs. The performance of the storage system directly affects the efficiency of the preservation and recovery of checkpoints, which in turn affects the overall efficiency of training.
===
Model training – what can happen when something goes wrong
— Save model weights and other states
The legend on the left shows that the checkpointing process requires the storage system to be that capacity is not the main contradiction, and the system has high requirements for read and write performance, and all of them are sequential read and write.
Model evaluation and tuning
Key aspects of model evaluation and tuning are discussed. The evaluation measures the correctness (accuracy) of the model and its performance in dealing with errors versus correct results (precision/recall).
In addition, other commonly used assessment measures such as F1 score and AUC-ROC are presented. The tuning section emphasizes tweaking the model’s hyperparameters to improve the evaluation effect, and mentions that tuning can generate a dataset that contains the model’s parameters. Finally, the number of parameters of the model is fixed and depends on the number of weights of the model, which is critical in neural networks.
The legend on the left shows that the requirements for the storage system during model evaluation and tuning are similar to those of checkpoints.
===
Cite
Model evaluation and tuning
– Measure how well the model results match expectations
– What is the probability of being correct?
– A rough measure of the frequency of errors versus errors
– as F1 score and AUC-ROC (area under the curve/receiver operating characteristics)
– Adjust hyperparameters to improve evaluation
F1 score and AUC-ROC are two important metrics to evaluate the performance of machine learning models. Here’s a closer look at the two metrics:
The F1 score is the harmonic average of Precision and Recall. It is a comprehensive indicator of the accuracy and comprehensiveness of the model, especially when the categories are uneven.
Features of F1 scores:
The range of F1 scores is [0, 1], with 1 being perfect precision and recall, and 0 being the worst performance.
When precision and recall are unbalanced, F1 scores can provide a more comprehensive evaluation and avoid relying too much on one metric.
Applicable Scenarios:
AUC (Area Under the Curve) is the area under the Receiver Operating Characteristic Curve (ROC), which is used to evaluate the performance of binary classification models.
Features of AUC-ROC:
AUC is the overall assessment of the model at different categorical thresholds.
It is suitable for cases where the categories are unbalanced, and the overall performance of the model can be evaluated.
Applicable Scenarios:
Summary:
Ideal for scenarios that require a balance between precision and recall, especially when the categories are unbalanced.
It is a comprehensive evaluation indicator that can more comprehensively describe the performance of classifiers under different thresholds, especially in the case of category imbalance.
Characteristics of the inference session
The slides discuss the concept of reasoning and its application in business. Inference refers to the use of a model to infer and process production data after it has been trained to generate business value.
It includes types such as large language model-based retrieval enhancement generation (RAG), predictive analytics, computer vision, and anomaly detection (e.g., for malware or fraud). Changes in access patterns are also related to the type of inference, especially RAGs, which can generate random workloads that resemble databases. These are important for understanding the application of the model in a production environment and optimizing the inference process.
The legend on the left shows that the requirements for the storage system in the model inference process are as follows: the capacity requirements are not high, the write performance requirements are not high, and the main ones are random writes; The read performance is demanding (mainly fast retrieval from the model) and is all random reads.
Data archiving
The importance of archiving in AI was discussed. Although archiving is not typically a core part of AI, it is important for AI storage, especially in some AI applications where archiving data may be required by law or regulation.
Unlike the traditional concept of “archiving”, the archived data mentioned here may need to be extracted later for subsequent training or analysis. Archival storage doesn’t have high performance requirements, but it needs to be “fast enough” to recover quickly when needed. Often, archived data is referred to as “cold storage,” and as datasets continue to grow, archive storage expands. This type of storage requires low-cost and low-carbon footprints, and even offers zero-power storage solutions such as DNA storage and optical storage technologies.
The legend on the left shows that the storage system is required in the data archiving phase: mass storage.
Key tools and technologies for AI infrastructure
A standard for measuring computational efficiency – the MLPerf benchmark.
Several commonly used benchmarking tools are highlighted, in particular the MLPerf benchmark provided by MLCommons.
MLPerf has multiple categories, covering training, inference, storage, and more. The inference segment is also subdivided into different scenarios, such as mobile devices, micro devices, data centers, and edge devices. In addition, some benchmark results of training algorithms (AlgoPerf) are mentioned to evaluate the performance of different training algorithms. These benchmarks can help evaluate how different hardware and models perform in specific tasks.
About the MLPerf benchmark
Accelerator – SDXI
SDXI is a standard data transfer interface developed by SNIA (Storage Networking Industry Association) to simplify the movement and processing of data.
Future versions of SDXI will support additional features such as encryption/decryption and compression/decompression, further enhancing its application in the data processing process. With these additional features, SDXI optimizes the efficiency and security of data transfers, helping to improve the overall performance of storage and compute tasks.
Cite
For more technical reports on SDXI, please refer to:
Main content: The article details the design background, key features, and role of SDXI (Memory-to-Memory Data Mobility and Acceleration Interface) in the storage access path. SDXI is designed to optimize data transfer, reduce CPU load, support tiered storage, and efficient data access for high-performance computing and AI applications
Accelerator – Computational storage
Computational storage technology, which is defined by SNIA (Storage Networking Industry Association) and NVMe (Non-Volatile Memory Interface).
Computational storage provides an open platform that allows computing functions to be integrated directly into the storage device, enabling computational operations to take place close to the data, reducing latency and increasing efficiency in data transfer. Common features include encryption/decryption, compression/decompression, data filtering, and preparation of training data. These features provide more efficient computational support for data processing, especially in applications such as machine learning.
Cite
For a more detailed report on computational storage, please refer to:
This article details the definition, architecture, and performance of computational storage, emphasizing moving computing tasks to where data resides, reducing data movement and improving efficiency. The article also discusses the benefits of computational storage, including reduced network bottlenecks and energy consumption
Accelerators – Graphics Processing Units (GPUs).
The advantages of GPUs in parallel computing are discussed , especially in AI computing tasks.
The GPU’s ability to efficiently handle multiple similar computations in a matrix is at the heart of its massively parallel computing capabilities. Compared to CPUs, GPUs can perform multiple calculations at the same time, dramatically reducing compute time while improving energy efficiency. In addition, data center GPUs are often equipped with high-speed memory (HBM) to meet the demands of large-scale data processing.
===
Accelerator – GPU
(high-speed memory) is typically found on GPUs in data centers
Although GPUs are extremely efficient in training, they also present some challenges when using them.
First of all, GPU programming is more complex than CPUs and requires more technical capabilities. Second, GPUs typically have higher power consumption, which also leads to higher costs and cooling requirements, increasing the complexity of use. In addition, GPU hardware is expensive, and latency can be introduced when data is transferred to and read from the GPU, which can impact compute performance.
Note
Of the three accelerators introduced here, the only ones that are more mature are GPUs, SDXI and computational storage are temporarily in the public eye because of the large investment required by software and ecology, but as the AI scene continues to mature, the advantages of the other two accelerators in data transmission efficiency will shine in the inference scenario.
Awareness of storage networking
Because storage devices and networks typically have high latency when processing data, the speed of the entire system is often limited by the performance of these components. Especially in high-performance computing and AI tasks, storage and network bottlenecks can affect data transfer and processing speeds. In order to optimize the performance of the GPU, it is important to keep the GPU constantly getting data (“feeding”).
Network architecture
The image shows the network design of an accelerated computing data center, with different network layers distinguished by color.
Ultra Ethernet UEC
Introducing Ultra Ethernet, an open project developed by the Linux Foundation to provide a highly scalable, low-latency, and highly reliable network solution capable of supporting network topologies of up to one million nodes.
Hyper Ethernet integrates the latest congestion management technologies and low-latency protocols, and is designed with security in mind. The project combines the knowledge and technology of several experts to advance the development of network technology, and its specifications will be made public by the end of the year.
UALink
Introducing UALink, a technology for inter-accelerator communication designed for massive scaling. The initial focus will be on implementing memory sharing between accelerators (e.g., GPUs), specifically DDR and HBM memory.
At the same time, UALink provides a low-latency and high-bandwidth network that supports the operation of hundreds of accelerators in a single node, and guarantees simple load/store operations and software consistency. With support for transfer rates up to 200Gbps and compatibility with the Infinity Fabric protocol, UALink is capable of supporting large-scale accelerator network deployments. This technology can be complemented with other extension methods, such as Hyper Ethernet (UEC), to further improve system performance.
Several of the biggest challenges associated with storage are discussed.
First, performance issues require the storage system to interfere with the GPU as little as possible to improve compute efficiency. Second, as models continue to scale, storage solutions need to be more scalable to handle the sheer volume of data. Third, reliability is key, especially when training, as checkpoint data loss can result in huge data loss.
Finally, the slide asks the question of how SNIA (Storage Networking Industry Association) can help address these challenges, suggesting that there may be technologies or standards that can provide support.