This guest article from DDN Storage covers five data platform considerations to take into account when exploring the possibilities of deep learning.
With the current maturation of Artificial Intelligence applications and Deep Learning algorithms, many organizations are spinning up initiatives to figure out how they will extract competitive differentiation from their data. In fact, many companies have been collecting data over the last 5-10 years with the knowledge that they will probably need it someday, but without the plan for how. We are now on the cusp of widespread adoption of Deep Learning to finally monetize all this data.
Regardless of how the data is acquired, it is at the foundation of these nascent programs – and so data platforms should be evaluated carefully at the outset to ensure future plans are successful even if based on existing architectures. This requires forward thinking – gauging how a Deep Learning program will be deployed in production when current processing requirements and data sources may just be a fraction of the size they will be in production instances. Without making these plans now, organizations are at risk at falling behind the competition right when key breakthroughs are anticipated. To have to re-architect the entire Deep Learning infrastructure at the time of deployment could put companies well behind competitors that planned for the future.
To ensure ultimate success, there are five key areas that businesses and research organizations should consider when creating and developing their Deep Learning data platform to ensure better answers, faster time to value, and capability for rapid scaling:
- Saturate your AI Platform
The up-front investment in GPU-enabled Deep Learning compute systems may be taken for granted, but the backing storage systems are central to maximizing answers per day. The correct storage platform will ensure that GPU cycles don’t remain idle due to applications waiting for the storage to respond. The impact to the storage system is vastly different depending upon the application behavior: GPU-enabled in-memory databases have lower start-up times when more quickly populated from the data warehousing area. GPU-accelerated analytics demand large thread counts – each with low-latency access to small pieces of data. Image-based deep learning for classification, object detection and segmentation benefit from high streaming bandwidth, random access, and, in most cases, fast memory mapped calls. In a similar vein, recurrent networks for text/speech analysis also benefit from high performance random small file or small I/O access.
Typical AI compute systems house between four and eight GPUs along with high-end networking, often with multiple Infiniband ports for hundreds of Gbps (Gigabits per second) of low-latency bandwidth via RDMA (Remote Direct Memory Access) I/O protocol. This means that any storage system under consideration should also leverage RDMA-capable networks such as Infiniband, which require no work to be done by CPUs, caches, or context switches vastly reducing latency and enabling far faster message transfer rates and eliminating application wait times.
- Build massive ingest capability to cope with future scaling of data feeds.
Gathering data into a central repository will be a critical factor in creating a source that the Deep Learning model can run against once it is ready for production. Collecting data into this repository will require the ability quickly ingest information from a wide variety of sources. Ingest for storage systems means write performance and coping with large concurrent streams from distributed sources at huge scale. Fruitful AI implementations are not only a means to gain insight from data, but also can gather increasingly more data to aid in the continuous refinement of any model. Chosen storage systems must have highly balanced I/O, performing writes just as fast as reads. Data sources developed to augment and improve acquisition need to satisfy all data gathering demands, while concurrently serving machine learning compute platforms.
- Flexible and fast access to data
Flexibility covers multiple factors when it comes to AI storage platforms. In the end, ingesting, transforming, splitting, and otherwise manipulating large datasets is equally import to Deep Learning as pushing that data through neural network applications. Flexibility for organizations entering AI also implies good performance regardless of the choice of data formats. Considered storage platforms should support both support strong memory-mapped file performance and fast small-file access, useful when moving between all kinds of structured and unstructured data.
[clickToTweet tweet=”DDN Storage – Delivering performance to the AI app is what matters, not how fast the storage can push out data. ” quote=”DDN Storage – Delivering performance to the AI app is what matters, not how fast the storage can push out data. “]
As an AI-enabled data center moves from initial prototyping and testing towards production and scale, a flexible data platform should provide the means to scale in any one of multiple areas: performance, capacity, ingest capability, Flash-HDD ratio and responsiveness for data scientists. Such flexibility also implies expansion of a namespace without disruption, eliminating data copies and complexity during growth phases.
- Start Small, but Scale Simply and Economically
Scalability is measurable in terms of not only performance, but also manageability and economics. Successful AI program should be designed to start with a few TBs (terabytes) of data, but easily ramp to multiple PBs (petabytes) without architecting the environment.
One way to scale economically is to optimize the use of storage media depending on workload. While Flash should always be the media for live AI training data, it can become unfeasible to hold hundreds of TBs or PBs of data all on Flash, but many alternatives just don’t work at scale. Hybrid models often suffer limitations around data management and data movement and loosely coupled architectures that combine all-flash arrays with separate HDD-based data lakes present complicated environments for managing hot data efficiently.
One way to scale economically is to optimize the use of storage media depending on workload.
AI platform architects should consider tightly integrated, scale-out hybrid architectures designed specifically for AI. Start small with a flash deployment and then choose your scaling strategy according to demand; either scale with flash only, or combine with deeply integrated HDD pools. The integration and data movement techniques are key here, make sure to select solutions with the utmost transparency to users.
- Partner with a vendor who understands the whole environment, not just storage.
Delivering performance to the AI application is what matters, not how fast the storage can push out data. The chosen storage platform vendor must recognize that integration and support services span the whole environment, beyond just storage, deliver results faster. Given the sheer processing power of AI compute platforms – each system akin to a mini-Super Computer — the vendor must deliver high-performance solutions for the most demanding data-at-scale workflows and partner closely with you as your AI requirements evolve.
This guest article comes from DDN Storage, a provider of high performance, high capacity big data storage systems, processing solutions and services to data-intensive, global organizations.
Speak Your Mind