Node and Core Comparison

The conversion or translation of usage from one particular system to another system is a complex process when high degrees of precision are desired. The performance characteristics of an application on one system are typically different than on another system. A detailed performance model for an application will enable accurate projections, but such a model may require substantial investment. To assist teams to create proposals with reasonably accurate projections of computational time on Blue Waters, we offer some general "rule of thumb" guidance. The "rules of thumb" are based on the characteristics of the system architecture compared to the characteristics of other HPC systems.

On the Blue Waters system, we consider the fundamental allocation unit to be a node and as such an allocation is awarded time in units of node-hours. This is true for both the dual-CPU XE nodes and the CPU+GPU XK nodes despite the peak FLOP performance differences between the two. At regular priority, a project is charged 1 node-hour for the use of either an XE node or an XK node for 1 hour of wall clock time irrespective of how much work is done on the node. As is typical of most HPC systems, the compute nodes are exclusive to a job so that only one job may access all the computing resources provided by the node(s) allocated to the job. As the current charging policy does not differentiate between usage of XE or XK the analysis below considers conversion or translation to XE nodes. Because the XE node-hour is the estimated unit, if a team's application does more useful work in a given period on the XK nodes than the XE, the team uses the same amount of their allocation, which is to the team's benefit.

Each XE node has two AMD 6276 Interlagos processors. The Interlagos processor has a unique design with eight (8) 256-bit SSE vector floating-point units (FPU or Bulldozer module) where each FPU is shared by two (2) "integer" cores. Each of the two integer cores is individually able to drive the FPU at close to its peak rate given the proper instruction stream. In total there are 16 floating-point bulldozer core units per node or 32 "integer" cores per node. On Blue Waters, the Linux operating system (OS) and thus the batch job resource manager currently see each "integer" core as a "processor" due to legacy reasons. The Interlagos "integer" core is similar to an AMD Istanbul, AMD Magny-Cours or Intel Nehalem core when issuing 128-bit wide vector floating point instructions to an FPU. A single XE node has 32 of these "integer" cores. The Interlagos FPU is similar to the Intel "Sandy Bridge" core or IBM A2 core when handling 256-bit SSE instructions, and a single XE node has a total of 16 FPUs.

It may be appropriate for some projections to also account for the use of caching structures on the different processors and the efficiency of data movement when comparing performance between systems.

The XSEDE SU value is a locally determined entity involving the relative performance of the High-Performance LINPACK (HPL) benchmark between two XSEDE systems and is not pertinent to charging on Blue Waters. Within the XSEDE concept of the service unit (SU), a Blue Waters XE node has 16 cores compared to newer processors. For conversion with systems like NICS Kraken, a Blue Waters XE node would have 32 cores.

Table: Node characteristics. An * indicates a processor with 8 flops per clock period.

Node	Processor type	Nominal Clock Freq. (GHz)	FPU cores	Peak GF/s per node	Peak Memory GB/s
BlueWaters Cray XE	AMD 6276 Interlagos	2.45	16*	313	102
NICS Kraken Cray XT	AMD Istanbul	2.6	12	125	25.6
NERSC Hopper XE	AMD 6172 MagnyCours	2.1	24	202	85.3
ANL IBM BG/P	POWERPC 450	0.85	4	13.6	13.6
ANL IBM BG/Q	IBM A2	1.6	16*	205	42.6
NCAR Yellowstone	Intel E5-2670 Sandy Bridge	2.6	16*	333	102
NICS Darter Cray XC	Intel E5-2600 Sandy Bridge	2.6	16*	333	102

Blue Waters User Portal

Node and Core Comparison

References