Supercomputing has come a long way since its beginnings in the 1960s
Initially, many supercomputers were based on mainframes, however, their cost and complexity were significant barriers to entry for many institutions. The idea of utilising multiple low-cost PCs over a network to provide a cost-effective form of parallel computing led research institutions along the path of high performance computing (HPC) clusters starting with “Beowulf” clusters in the 90’s.
More recently, we have witnessed the advancement of HPC from the original, CPU-based clusters to systems that do the bulk of their processing on Graphic Processing Units (GPUs), resulting in the growth of GPU accelerated computing.
Data and Compute – GPU’s role
While HPC was scaling up with more compute resource, the data was growing at a far faster pace. This has presented big data challenges for storage, processing, and transfer.
HPC’s GPU parallel computing has been a real game changer for AI as parallel computing can process all this data, in a short amount of time using GPUs. As workloads have grown, so too have GPU parallel computing and AI machine learning. Image analysis is a good example of how the power of GPU computing can support an AI project. With one GPU it would take 72 hours to process an imaging deep learning model, but it only takes 20 minutes to run the same AI model on an HPC cluster with 64 GPUs.
How is HPC supporting AI growth?
Storage, networking, and processing are important to make AI projects work at scale, this is when AI can make use of the large scale, parallel environments that HPC infrastructure (with GPUs) provides to help process workloads quickly. Training an AI model takes more far more time than testing one. The importance of coupling AI with HPC is that it significantly speeds up the ‘training stage’ and boosts the accuracy and reliability of AI models, whilst keeping the training time to a minimum.
As traditional use cases for HPC applications are so well established, changes often happen relatively slowly. However, the updates for many HPC applications are only necessary every 6 to 12 months. Whereas, AI development is happening so fast, updates and new applications, tools and libraries are being released roughly daily.
If you employed the same update strategies to manage your AI as you do for your HPC platforms, you would get left behind.
Giving back – how is AI supporting traditional HPC problems?
AI models can be used to predict the outcome of a simulation without having to run the full, resource intensive, simulation. By using an AI model in this way input variables/design points of interest can be narrowed down to a candidate list quickly and at much lower cost. These candidate variables can be run through the known simulation to verify the AI model’s prediction.
How can an HPC integrator help with your AI infrastructure?
Start with a few simple questions; How big is my problem? How fast do I want my results back? How much data do I have to process? How many users are sharing the resource?
HPC techniques will help the management of an AI project if the existing dataset is substantial, or if contention issues are being experienced on the infrastructure from having multiple users.
Some organisations might be running AI workloads on a large machine or multiple machines with GPUs and your AI infrastructure might look more like HPC infrastructure than you realise. There are HPC techniques, software and other aspects that can really help to manage that infrastructure. The infrastructure looks quite similar, but there are some clever ways of installing and managing it specifically geared towards AI modelling.
Storage is very often overlooked when organisations are building infrastructure for AI workloads, and you may not be getting the full ROI on your AI infrastructure if your compute is waiting for your storage to be freed up. It is important to seek the best advice for sizing and deploying the right storage solution for your cluster.
Big data doesn’t necessarily need to be that big, it is just when it reaches that point when it becomes unmanageable for an organisation. When you can’t get out of it what you want, then it becomes too big for you. HPC can provide the compute power to deal with the large amounts of data in AI workloads.
It is an exciting time for both HPC and AI, as we are seeing incremental adaptation by both technologies. The challenges are getting bigger every day, with newer and more distinct problems which need faster solutions. For example, countering cyber-attacks, discovering new vaccines, detecting enemy missiles and so on.
Both HPC and AI will continue to have an impact on both organisations and each other and their symbiotic relationship will only grow stronger as both traditional HPC users and AI infrastructure modellers realise the full potential of each other.
About the Author
Vibin Vijay is an artificial intelligence (AI) and machine learning (ML) product specialist at OCF. He has been working in the data analytics and AI industry for over 10 years and has experience in healthcare, higher education, manufacturing, retail and financial services. He began his career in data science, developing skills in big data, distributed computing & IOT, before more recently designing AI and ML solutions for customers using the latest hardware and software available. As AI system requirements grow bigger day by day, Vibin often looks to HPC to understand the synergy between the two.