Node and Core Comparison

The conversion or translation of usage from one particular system to another system is complex process when high degrees of precision are desired. The performance characteristics of an application on one system are typically different than on another system. A detailed performance model for an application will enable precise projections but such a model may require substantial investment. In order to assist teams to create proposals with reasonably accurate projections of computational time on Blue Waters, we offer some general "rule of thumb" guidance. The "rules of thumb" are based on the characteristics of the system architecture compared to the characteristics of other HPC systems.

On the Blue Waters system we consider the fundamental allocation unit to be a node and as such an allocation is awarded time in units of node-hours. This is true for both the dual CPU x86 processor XE nodes and the heterogeneous x86 CPU and GPU XK nodes despite the peak FLOP performance differences between the two node types. At regular priority, a project is charged 1 node-hour for the use of either a XE node or an XK node for 1 hour of wall clock time irrespective of how much work is done on the node. As is typical of most HPC systems, the compute nodes are exclusive to a job so that only one job may access all the compute resources provided by the node(s) allocated to the job. As the current charging policy does not differentiate between usage of XE or XK the analysis below considers conversion or translation to XE nodes. Because the XE node-hour is the estimated unit, if a team's application does more useful work in a given time period on the XK nodes than the XE, the team uses the same amount of their allocation, which is to the team's benefit.

The processor of the dual-processor XE nodes is the AMD 6276 Interlagos processor. The Interlagos processor has a unique design with eight (8) 256-bit SSE vector floating-point units (FPU or Bulldozer module) where each FPU is shared by two (2) "integer" cores. Each of the two integer cores are individually able to drive the FPU at close to its peak rate given the proper instruction stream. In total there are 16 floating-point bulldozer core units per node or 32 "integer" cores per node. On Blue Waters the Linux operating system (OS) and thus the batch job resource manager currently see each "integer" core as a "processor" due to legacy reasons. The Interlagos "integer" core is similar to an AMD Istanbul, AMD MagnyCours or Intel Nehalem core when issuing 128-bit wide vector floating point instructions to a FPU. A single XE node has 32 of these "integer" cores. The Interlagos FPU is similar to the Intel "Sandy Bridge" core or IBM A2 core when handling 256-bit SSE instructions and a single XE node has a total of 16 FPUs.

It may be appropriate for some projections to also account for the use of caching structures on the different processors and the efficiency of data movement when comparing performance between systems.

The XSEDE SU value is a locally determined entity involving the relative performance of the High-Performance LINPACK (HPL) benchmark between two XSEDE systems and is not pertinent to charging on Blue Waters. Within the XSEDE concept of the service unit (SU), a Blue Waters XE node has 16 cores compared to newer processors. For conversion with systems like NICS Kraken a Blue Waters XE node would have 32 cores.

 

Table 1. Table of node characteristics.  An * indicates processors with 8 flops per clock period. 

 

Node

Processor type

Nominal Clock Freq. (GHz)

FPU cores

Peak GF/s per node

Peak Memory GB/s

BW XE

AMD 6276 Interlagos

2.45

16*

313

102

NICS Kraken

AMD Istanbul

2.6

12

125

25.6

NERSC Hopper

AMD 6172 MagnyCours

2.1

24

202

85.3

ANL BG/P

POWERPC 450

0.85

4

13.6

13.6

ANL BG/Q

IBM A2

1.6

16*

205

42.6

NCAR Yellowstone

Intel E5-2670 Sandy Bridge

2.6

16*

333

102