System Summary

The Blue Waters system is a Cray XE/XK hybrid machine composed of AMD 6276 "Interlagos" processors (nominal clock speed of at least 2.3 GHz) and NVIDIA GK110 (K20X) "Kepler" accelerators all connected by the Cray Gemini torus interconnect. 

System Totals

Total Cabinets 288
Total Peak Performance 13.34 PF
Total System Memory 1.476 PB
XE Compute Cabinets 237
XE Peak Performance 7.1 PF
XE Compute Nodes 22,640
XE Bulldozer Cores* 362,240
XE System Memory 1.382 PB
XK Compute Cabinets 44
XK Peak Performance (CPU+GPU) 6.24 PF
XK Compute Nodes 4,228
XK Bulldozer Cores* (CPU) 33,792
XK Kepler Accelerators (GPU) 4,224
XK System Memory (CPU) 135 TB
XK Accelerator Memory (GPU) 25 TB



Architecture 3D Torus
Topology 24x24x24
Compute nodes per Gemini 2
Peak Node Injection Bandwidth 9.6 GB/s


Online Storage

Total Usable Storage 26.4 PB
Aggregate I/O Bandwidth > 1 TB/s
File System Size (PB) # of OSTs
home 2.2 144
projects 2.2 144
scratch 22 1440


Near-line Storage

Archive Software HPSS
Online disk cache 1.2 PB
Aggregate Bandwidth to tape 58 GB/s
5 year capacity 380 PB


XE Compute Node

AMD 6276 Interlagos Processors 2
Bulldozer Cores* 16
Integer Scheduling Units** 32
Memory / Bulldozer Core 4 GB
Total Node Memory 64 GB
Peak Performance 313.6 GF
Memory Bandwidth 102.4 GB/s


XK Compute Node

AMD 6276 Interlagos Processors 1
Bulldozer Cores* 8
Integer Scheduling Units** 16
Memory / Bulldozer Core 4 GB
Node System Memory 32 GB
GPU Memory 6 GB
Peak CPU Performance 156.8 GF
CPU Memory Bandwidth 51.2 GB/s
CUDA cores 2688
Peak GPU Performance (DP) 1.31 TF
GPU Memory Bandwidth (ECC off)*** 250 GB/s


* - We refer to the Bulldozer Core compute unit as a single compute "core" and consider the Interlagos processors as having 8 (floating point) cores each. On the XE nodes there are 2 Interlagos processors and each processor has 8 cores.

** - The Interlagos processor is viewed by the Linux operating system in /proc/cpuinfo as having 16 "processors" per chip. These "processors" are the schedualable integer cores that work with the floating point unit. The moab/torque resource scheduler and Cray ALPS also see the integer cores as the smallest unit.

*** - For Kepler GPU, the memory bandwidth is reduced by ~6% (1/16th) with ECC enabled.