System Summary

The Blue Waters system is a Cray XE/XK hybrid machine composed of AMD 6276 "Interlagos" processors (nominal clock speed of at least 2.3 GHz) and NVIDIA GK110 (K20X) "Kepler" accelerators all connected by the Cray Gemini torus interconnect. 

System Totals

Total Cabinets 288
Total Peak Performance 13.34 PF
Total System Memory 1.634 PB
   
XE Compute Cabinets 237
XE Peak Performance 7.1 PF
XE Compute Nodes 22,636
XE Bulldozer Cores* 362,240
XE System Memory 1.382 PB
   
XK Compute Cabinets 44
XK Peak Performance (CPU+GPU) 6.24 PF
XK Compute Nodes 4,228
XK Bulldozer Cores* (CPU) 33,792
XK Kepler Accelerators (GPU) 4,228
XK System Memory (CPU) 135 TB
XK Accelerator Memory (GPU) 25 TB

 

Interconnect

Architecture 3D Torus
Topology 24x24x24
Compute nodes per Gemini 2
Peak Node Injection Bandwidth 9.6 GB/s

 

Online Storage

Total Usable Storage 26.4 PB
Total Raw Storage 34.0 PB
Aggregate Measured I/O Bandwidth > 1.1 TB/s
File System Size (PB) # of OSTs
home 2.2 36 (was 144)*
projects 2.2 36 (was 144)*
scratch 22 360 (was 1440)*

* - The move to Lustre 2.5 gridraid changes the number of drivers per OST. The total amount of physical storage remains unchanged..

Near-line Storage

Archive Software HPSS
Online disk cache 1.2 PB
Aggregate Bandwidth to tape 58 GB/s
raw capacity assuming all slots are filled 250+ PB

 

Compute Node Summary

There are 22,640 XE compute nodes with 96 XE compute nodes having 128 GB and the remaining have 64 GB.

There are 4,228 XK compute nodes with 96 XK compute nodes having 64 GB and the remaining have 32 GB.

XE Compute Node

AMD 6276 Interlagos Processors 2
Bulldozer Cores* 16
Integer Scheduling Units** 32
Memory / Bulldozer Core 4 GB
Total Node Memory 64 GB
Peak Performance 313.6 GF
Memory Bandwidth 102.4 GB/s

Large Memory XE Compute Node

AMD 6276 Interlagos Processors 2
Bulldozer Cores* 16
Integer Scheduling Units** 32
Memory / Bulldozer Core 4 GB
Total Node Memory 128 GB
Peak Performance 313.6 GF
Memory Bandwidth 102.4 GB/s
 

XK Compute Node

AMD 6276 Interlagos Processors 1
Bulldozer Cores* 8
Integer Scheduling Units** 16
Memory / Bulldozer Core 4 GB
Node System Memory 32 GB
GPU Memory 6 GB
Peak CPU Performance 156.8 GF
CPU Memory Bandwidth 51.2 GB/s
CUDA cores 2688
Peak GPU Performance (DP) 1.31 TF
GPU Memory Bandwidth (ECC off)*** 250 GB/s

Large Memory XK Compute Node

AMD 6276 Interlagos Processors 1
Bulldozer Cores* 8
Integer Scheduling Units** 16
Memory / Bulldozer Core 4 GB
Node System Memory 64 GB
GPU Memory 6 GB
Peak CPU Performance 156.8 GF
CPU Memory Bandwidth 51.2 GB/s
CUDA cores 2688
Peak GPU Performance (DP) 1.31 TF
GPU Memory Bandwidth (ECC off)*** 250 GB/s
 

* - We refer to the Bulldozer Core compute unit as a single compute "core" and consider the Interlagos processors as having 8 (floating point) cores each. On the XE nodes there are 2 Interlagos processors and each processor has 8 cores.

** - The Interlagos processor is viewed by the Linux operating system in /proc/cpuinfo as having 16 "processors" per chip. These "processors" are the schedualable integer cores that work with the floating point unit. The moab/torque resource scheduler and Cray ALPS also see the integer cores as the smallest unit.

*** - For Kepler GPU, the memory bandwidth is reduced by ~6% (1/16th) with ECC enabled.