At this 12 months’s massive supercomputing convention, SC19, the highest of the record of the quickest machines on this planet is unchanged, however there are a selection of recent applied sciences being talked about that portend the period of exascale computing—machines theoretically able to a billion billion (i.e. a quintillion) calculations per second.
Because it has been since June of last year, the Summit laptop on the Division of Power’s Oak Ridge Nationwide Laboratory (ORNL) is now on prime of the Top 500 list, with a sustained theoretical efficiency of 148.6 petaflops on the Excessive Efficiency Linpack check used to rank the High500 record. This machine, constructed by IBM, has 4,608 nodes, every one geared up with two, 22-core IBM Energy 9 CPUs and 6 Nvidia Tesla V100 GPUs, all related by a Mellanox EDR InfiniBand community. An analogous however considerably smaller system known as Sierra on the Lawrence Livermore Nationwide Laboratory is available in second at 94.6 petaflops. In third place is the Sunway TaihuLight supercomputer at China’s Nationwide Supercomputing Middle in Wuxi. It’s powered by Sunway’s SW26010 processors and scores 93 petaflops
In reality, the complete prime 10 on the record is unchanged since June. Essentially the most highly effective new system is available in at quantity 25 with a system known as the Superior Multiprocessing Optimized System (AMOS) at Rensselaer Polytechnic Institute’s Center for Computational Innovations (CCI).
Once more, that is an IBM Blue Gene/Q system, with Energy 9 CPUs and Nvidia Tesla V100s. This can be a smaller, five-rack system with a sustained Linpack most of eight petaflops, based on the record.
(As an alum, it is nice to see, and I used to be significantly tickled to see it named AMOS, after Rensselaer’s first senior professor, Amos Eaton. That made me chuckle, as I spent lots of time as an undergraduate ready for the mainframe at Amos Eaton Corridor. I doubt anybody ever ran LINPACK on the outdated IBM 360/67, however the brand new machine might be hundreds of thousands of occasions sooner; it has 130,000 cores in contrast with the only digit quantity on the outdated mainframe.)
Wanting over the entire record, China continues to rise and now has 227 of the High 500 installations, whereas the US accounted for 118, close to its all-time low. The highest three system distributors are Lenovo, Sugon, and Inspur—all primarily based in China—adopted by Cray and HPE (HPE now owns Cray). 470 programs use Intel CPUs, one other 14 use Energy processors and three use AMD. There are actually two ARM-based supercomputers on the record: the Astra system deployed at Sandia Nationwide Laboratories, which is provided with Marvell’s ThunderX2 processors, and Fujitsu’s A64FX prototype system in Japan. Nvidia stays the dominant vendor for accelerators, with GPUs in 136 of the 145 accelerated programs. Ethernet remains to be utilized in greater than half of the programs, however the quickest have a tendency to make use of InfiniBand or proprietary interconnects comparable to Cray Aries and Intel OmniPath.
Nonetheless, if there is not a lot change within the record to date, there’s lots of work being accomplished on new architectures with the purpose of manufacturing an Exascale machine throughout the subsequent two years. The US has introduced work on two massive new supercomputers. The primary is the Aurora venture on the DOE’s Argonne National Laboratory, which can be constructed by Cray (now a part of HPE) and Intel, whereas the second is Frontier at Oak Ridge, which can run customized AMD Epyc processors and Radeon Intuition GPUs related over a Infinity Cloth interconnect.
Main as much as SC19, Intel announced more details of the Aurora project, saying it would use nodes that include two 10nm++ Sapphire Rapids Xeon processors and 6 of the brand new Ponte Vecchio GPU accelerators, primarily based on the forthcoming Xe graphics structure, in addition to the agency’s Optane DC persistent reminiscence. Intel mentioned Aurora will help over 10 petabytes of reminiscence and over 230 petabytes of storage, and can use the Cray Slingshot material to attach nodes over greater than 200 racks. (It didn’t, nonetheless, give precise numbers for whole nodes or efficiency).
Intel gave a bit of extra element on the Ponte Vecchio processors, saying will probably be constructed across the Xe structure, however optimized for high-performance computing and AI workloads. This model can be manufactured on 7nm know-how and use Intel’s Foveros 3D and EMIB packaging to have a number of die within the bundle. It additionally will help high-bandwidth reminiscence, and the Compute Specific Hyperlink (CXL) interconnect. (Intel had beforehand mentioned to anticipate a model of the Xe structure in a shopper GPU someday in 2020, presumably on Intel’s 10nm or 14nm course of.)
Intel additionally gave extra particulars on its oneAPI venture, libraries and a brand new language variant known as Knowledge Parallel C++, which is designed to assist builders write code that may run on CPUs, GPUs and FPGAs.
To not be outdone, Nvidia—whose GPUs are the most well-liked accelerators—introduced a reference design for constructing servers combining ARM-based processors with Nvidia GPUs. Nvidia labored with Ampere, Fujitsu, and Marvell—all of who’re engaged on ARM-based server processors, in addition to with Cray and HPE, who’ve individually labored on a few of the early ARM-based HPC programs with Nvidia GPU accelerators.
Nvidia additionally launched Magnum IO, a set of software program that makes use of a way known as GPUDirect to bypass the CPUs when accessing the community; in addition to a brand new component known as GPUDirect Storage that does the identical when accessing storage and knowledge recordsdata for simulation, evaluation, or visualization. Magnum IO is out there now, however with the GPUDirect Storage half deliberate for the primary half of 2020.
AMD mentioned extra firms are utilizing its second-generation EPYC processors and Radeon Intuition accelerators, highlighting the corporate’s choice for the Frontier laptop, which the agency mentioned was anticipated to be the very best performing supercomputer on this planet when it ships in 2021. AMD additionally introduced quite a few different programs that can be utilizing its programs, together with offers with Atos on its BullSequana XH2000 supercomputers for climate forecasting and analysis in atmospheric, ocean, and shopper computing; and with Cray, utilizing its Shasta structure within the forthcoming Archer2 and Vulcan programs within the UK. AMD talked about ROCm 3.0, a brand new model of the open supply software program for GPU compute that the agency helps.
AMD highlighted that Microsoft Azure now presents a preview of an HPC occasion primarily based on its second-generation Epyc 7742 processor whereas Nvidia introduced a brand new Azure occasion that may scale as much as 800 V100 GPUs interconnected over a single Mellanox InfiniBand backend community. Nvidia mentioned it used 64 of those situations on a pre-release model of the cluster to coach BERT, a well-liked conversational AI mannequin, in roughly three hours.
One of many extra attention-grabbing bulletins got here from startup Cerebras, which is specializing in its Wafer-Scale Engine (WSE), a 300mm wafer that accommodates 1.2 trillion transistors together with 400,000 compute cores and 18 GB of on-chip reminiscence.
On the present, Cerebras launched its CS-1 system and introduced it had already delivered the primary one to Argonne Nationwide Laboratory. The corporate highlighted that this technique—which tales say comprise six of those WSEs together with reminiscence and networking—is simply 26 inches (15 rack models) tall, a lot smaller than racks of GPU-accelerated programs. It is a fairly attention-grabbing idea, one which may be very completely different from the opposite approaches.