2805 Bowers Ave, Santa Clara, CA 95051 | 408-730-2275
sales@colfax-intl.com
My Colfax  

NEC SX Architecture

Vector processor + x86/Linux architecture - The new SX architecture contains the Vector Engine (VE) and Vector Host (VH). The VE executes complete applications while the VH mainly provides OS functions for connected VEs. The VE consists of one vector processor with eight vector cores, using "high bandwidth memory" modules (HBM2) for utmost memory bandwidth. The world's first implementation of one CPU LSI with six HBM2 memory modules using a "chip-on-wafer-on-substrate" technology (CoWoS) leads to the world-record memory bandwidth of 1.2 TB/s. It is connected to the VH, a standard x86/Linux node, through PCIe. This new SX architecture, which executes an entire application on the VE and the OS on the VH, combines highest sustained performance, for which vector processors are famous, in a well-known x86/Linux environment.


Extremely high capability core and processor with extremely high memory bandwidth - The vector core on the VE processor is the most powerful single core in HPC as of today, thus keeping the design philosophy from the previous SX series. With eight cores the vector processor will execute applications with extremely high sustained performance. It features 2.45 TF peak performance and 1.2 TB/s memory bandwidth per processor. Different from standard processors, a vector architecture is known to achieve a significant fraction of the peak performance on real applications.


NEC SX-Aurora TSUBASA Memory

The NEC Vector Engine Processor has a newly developed shared "Last-Level-Cache" (LLC), the first shared vector cache ever. This shared LLC serves all cores simultaneously, and has a "write-back" policy, which means coherency between different cores, LLC and memory is always easily ensured. At the same time this kind of architecture lends itself easily to a shared memory parallelization, by autoparallelization or OpenMP, while MPI would be used to parallelize over different Engines. The last level cache has a line-size of 128 bytes, and some additional features are implemented to increase the efficency for strided stores or scatter operations.

NEC utilizes the second generation of the "High Bandwidth Memory" standard (HBM2). A HBM2 memory block is realized by stacking four or eight dies together, and it achieves up to 200 GB/s bandwidth while providing either 4 or 8 GB of capacity. Six of these memory blocks and the Vector Engine Processor are connected by means of a "silicon interposer", a special die to mount on and that connects memory and processor. They provide a total of 24 GB to 48 GB per Vector Engine and industry-leading 1.2 TB/s bandwidth.


NEC SX-Aurora TSUBASA Interconnect

The NEC Vector Engine can communicate with other Vector Engines or x86 CPUs over shared memory, PCI Express or a high speed network.


GPGPU and VE




* Other names and brands may be claimed as the property of others.