Blue Waters Data Sets

For questions email: help+bw@ncsa.illinois.edu

Overview

The Blue Waters data is the result of scientific data processing since 2012 on the Petascale Computing Facility sponsored by the National Science Foundation. Blue Waters data is pubically available for viewing and downloading. The data has been anonymized; that is any personnel/account names associated with the data have been removed.

General Description of Collected Data

OVIS, an Open Source software designed for the monitoring of the performance, health and efficiency of large scale computing systems was used for most of the data collection. OVIS uses an API and network protocol for gathering this data which is called the Lightweight Distributed Metric Service (LDMS).

Blue Waters data comprises statistics compiled on various computer hardware and software activities; a few examples are:

I/O data on various components such as NICs
statistics on node usage
memory allocations,
CPU or GPU performance
Read/write/caching, file opens and closes
File transfers and data calls
Communication link status

For each data set, a detailed description of actual data set or data element is provided, or a link is provided to obtain more information.

How to Get Access to the Data

Access to the datasets is provided via https://www.globus.org/. You may login with an existing institutional account or create a new account at that site.

The collection name is Blue Waters System Monitoring Data Set and can be accessed by clicking HERE.

Data Types Available

1. Node metric, compute and service node (time series) data -

A. Cray system sampler data - click link to description further down on page

B. Model Specific Registers (MSR) data - click link to description further down on page

2. Syslogs Data - click link to description further down on page

3. Resource Manager data (Torque) - click link to description further down on page

4. System Environment Data Collections - click link to description further down on page

5. Darshan data (I/O data) - click link to description further down on page

6. Lustre User Experience Metrics - click link to description further down on page

Explanation of Data Types

1. Node metric, compute and service node (time series) data

1 A. Cray System Sampler Data

The following is a description of the node and time series data contents.

Units of B are raw byte counts at the time of the sample. Most other values are raw counts at the sample time as well. The few rate datapoints are denoted as B/s (bytes per second)

To parse the datafiles, the appropriate header should be used to determine the position of the data within the comma seperated data file. As the data has changed slightly over time, there are files named HEADER.<date range> to denote the format.

File Data Name	File Data Definition
#Time	Time in epoch (GMT)
Time_usec	partial second time in microseconds to the right of the decimal point
CompId	Node ID
Tesla_K20X.gpu_util_rate	Utilization reported by nvidia at the time of sample (see attached NVIDIA documentation for more info)
Tesla_K20X.gpu_agg_dbl_ecc_total_errors	GPU Double bit ecc errors (see attached NVIDIA doumentation for more info)
Tesla_K20X.gpu_agg_dbl_ecc_texture_memory	GPU Double bit ecc errors for the texture memory (see attached NVIDIA documentation for more info)
Tesla_K20X.gpu_agg_dbl_ecc_register_file	GPU Double bit ecc errors for the register file (see attached NVIDIA documentation for more info)
Tesla_K20X.gpu_agg_dbl_ecc_device_memory	GPU Double bit ecc errors for device memory (see attached NVIDIA documentation for more info)
Tesla_K20X.gpu_agg_dbl_ecc_l2_cache	GPU Double bit ecc errors for the level 2 cache (see attached NVIDIA documentation for more info)
Tesla_K20X.gpu_agg_dbl_ecc_l1_cache	GPU Double bit ecc errors for the Level 1 cache (see attached NVIDIA documentation for more info)
Tesla_K20X.gpu_memory_used	Memory in use Kb (see attached NVIDIA documentation for more info)
Tesla_K20X.gpu_temp	GPU temperature in Celsius
Tesla_K20X.gpu_pstate	Power management state (see attached NVIDIA documentation for more info)
Tesla_K20X.gpu_power_limit	Power limit (maximum) in milliwatts
Tesla_K20X.gpu_power_usage	GPU power consumption in milliwatts
ipogif0_tx_bytes	Bytes transmitted with TCP/IP over the gemini interface
ipogif0_rx_bytes	Bytes received with TCP/IP over the gemini interface
RDMA_rx_bytes	Remote Direct Memory Address received bytes
RDMA_nrx	RDMA number of cumulative receives
RDMA_tx_bytes	RDMA cumulative transmit bytes
RDMA_ntx	RDMA cumulative number of transfers
SMSG_rx_bytes	Cumulative Bytes received via the Short Message cprotocol (refer to Cray documentation)
SMSG_nrx	Cumulative Number of Short messages receives (refer to Cray documentation)
SMSG_tx_bytes	Sort message transmit bytes (refer to Cray documentation)
SMSG_ntx	Short message number of transmits (refer to Cray documentation)
current_freemem	Unallocated memory in Kb
loadavg_total_processes	Unixload of all processes ready to run average X 100
loadavg_running_processes	Unix load of processes in the running state average X 100
loadavg_5min(x100)	Unix load 5 minute average X 100
loadavg_latest(x100)	Current Unix load X 100
nr_writeback	Number of count of pages scheduled out but not completed
nr_dirty	Number of count of pages waiting to be scheduled to output device
lockless_write_bytes#stats.snx11001	Number of lockless write I/O. This is a special kind of I/O where clients do not get any locks but instead instructs the server to take the locks on the client’s behalf
lockless_read_bytes#stats.snx11001	Cumulative number of lockless read I/O. This is a special kind of I/O where clients do not get any locks but instead instructs the server to take the locks on the client’s behalf
direct_write#stats.snx11001	Cumulative number of writes to storage
direct_read#stats.snx11001	Cumulative number of reads to storage
inode_permission#stats.snx11001	Cumulative number of checks for access rights to a given inode
removexattr#stats.snx11001	Cumulative number of remove attributes. Command removes the extended attribute identified by name and associated with the given path in the filesystem.
listxattr#stats.snx11001	Cumulative number of listattr. Command retrieves the list of extended attribute names associated with the given path in the filesystem.
getxattr#stats.snx11001	Cumulative number of times operation has occurred to retrieve the value of the extended attribute identified by name and associated with the given path in the filesystem.
setxattr#stats.snx11001	Cumulative number of calls to set extended attributes
alloc_inode#stats.snx11001	Cumulative number of Fragmentations. System will allocate another inode as needed.
statfs#stats.snx11001	Cumulative number of calls to stat fs
getattr#stats.snx11001	Cumulative number of get attribute calls
flock#stats.snx11001	This utility manages flock locks from within shell scripts or from the command line. A cumulative count of file locks.
lockless_truncate#stats.snx11001	Cumulative number of file truncates without locking a file.
truncate#stats.snx11001	The cumulative number of events to shrink (or extend) the size of a file to the specified size
setattr#stats.snx11001	The cumulative number of times setattr was called. This command sets the value of given attribute of an object
fsync#stats.snx11001	The cumulative number of fsync transfers. fsync transfers ("flushes") all modified in-core data of (i.e.,modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed. As well as flushing the file data, fsync() also flushes the metadata information associated with the file.
seek#stats.snx11001	Cumulative File seeks
mmap#stats.snx11001	Cumulative number of new mapping in the virtual address space of the calling process. The starting address for the new mapping is specified in addr.
close#stats.snx11001	Cumulative File Closes
open#stats.snx11001	Cumulative File Opens
ioctl#stats.snx11001	Cumulative Input/Output control calls
brw_write#stats.snx11001	Cumulative Bulk read writes to storage
brw_read#stats.snx11001	Cumulative Bulk reads to storage
write_bytes#stats.snx11001	Cumulative Writes to storage in bytes
read_bytes#stats.snx11001	Cumulative Reads to storage in bytes
writeback_failed_pages#stats.snx11001	Cumulative number of writeback failed pages. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_ok_pages#stats.snx11001	Cumulative number of writeback success pages.Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_from_pressure#stats.snx11001	Cumulative number of writeback from pressure. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_from_writepage#stats.snx11001	Cumulative number of writeback from writepages. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
dirty_pages_misses#stats.snx11001	Cumulative number of Dirty page misses; Dirty pages are the pages in memory (page cache) that have been updated and therefore have changed from what is currently stored on disk.
dirty_pages_hits#stats.snx11001	Cumulative number of Dirty page hits; Dirty pages are the pages are the pages in memory (page cache) that have been updated and therefore have changed from what is currently stored on disk.
lockless_write_bytes#stats.snx11002	Number of lockless write I/O. This is a special kind of I/O where clients do not get any locks but instead instructs the server to take the locks on the client’s behalf
lockless_read_bytes#stats.snx11002	Cumulative number of lockless read I/O. This is a special kind of I/O where clients do not get any locks but instead instructs the server to take the locks on the client’s behalf
direct_write#stats.snx11002	Cumulative number of writes to storage
direct_read#stats.snx11002	Cumulative number of reads to storage
inode_permission#stats.snx11002	Cumulative number of checks for access rights to a given inode
removexattr#stats.snx11002	Cumulative number of remove attributes. Command removes the extended attribute identified by name and associated with the given path in the filesystem.
listxattr#stats.snx11002	Cumulative number of listattr. Command retrieves the list of extended attribute names associated with the given path in the filesystem.
getxattr#stats.snx11002	Cumulative number of times operation has occurred to retrieve the value of the extended attribute identified by name and associated with the given path in the filesystem.
setxattr#stats.snx11002	Cumulative number of calls to set extended attributes
alloc_inode#stats.snx11002	Cumulative number of Fragmentations. System will allocate another inode as needed.
statfs#stats.snx11002	Cumulative number of calls to stat fs
getattr#stats.snx11002	Cumulative number of get attribute calls
flock#stats.snx11002	This utility manages flock locks from within shell scripts or from the command line. A cumulative count of file locks.
lockless_truncate#stats.snx11002	Cumulative number of file truncates without locking a file.
truncate#stats.snx11002	The cumulative number of events to shrink (or extend) the size of a file to the specified size
setattr#stats.snx11002	The cumulative number of times setattr was called. This command sets the value of given attribute of an object
fsync#stats.snx11002	The cumulative number of fsync transfers. fsync transfers ("flushes") all modified in-core data of (i.e.,modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed. As well as flushing the file data, fsync() also flushes the metadata information associated with the file.
seek#stats.snx11002	Cumulative File seeks
mmap#stats.snx11002	Cumulative number of new mapping in the virtual address space of the calling process. The starting address for the new mapping is specified in addr.
close#stats.snx11002	Cumulative File Closes
open#stats.snx11002	Cumulative File Opens
ioctl#stats.snx11002	Cumulative Input/Output control calls
brw_write#stats.snx11002	Cumulative Bulk read writes to storage
brw_read#stats.snx11002	Cumulative Bulk reads to storage
write_bytes#stats.snx11002	Cumulative Writes to storage in bytes
read_bytes#stats.snx11002	Cumulative Reads to storage in bytes
writeback_failed_pages#stats.snx11002	Cumulative number of writeback failed pages. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_ok_pages#stats.snx11002	Cumulative number of writeback success pages. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_from_pressure#stats.snx11002	Cumulative number of writeback from pressure. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_from_writepage#stats.snx11002	Cumulative number of writeback from writepages. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
dirty_pages_misses#stats.snx11002	Cumulative number of Dirty page misses; Dirty pages are the pages in memory (page cache) that have been updated and therefore have changed from what is currently stored on disk.
dirty_pages_hits#stats.snx11002	Cumulative number of Dirty page hits; Dirty pages are the pages are the pages in memory (page cache) that have been updated and therefore have changed from what is currently stored on disk.
lockless_write_bytes#stats.snx11003	Number of lockless write I/O. This is a special kind of I/O where clients do not get any locks but instead instructs the server to take the locks on the client’s behalf
lockless_read_bytes#stats.snx11003	Cumulative number of lockless read I/O. This is a special kind of I/O where clients do not get any locks but instead instructs the server to take the locks on the client’s behalf
direct_write#stats.snx11003	Cumulative number of writes to storage
direct_read#stats.snx11003	Cumulative number of reads to storage
inode_permission#stats.snx11003	Cumulative number of checks for access rights to a given inode
removexattr#stats.snx11003	Cumulative number of remove attributes. Command removes the extended attribute identified by name and associated with the given path in the filesystem.
listxattr#stats.snx11003	Cumulative number of listattr. Command retrieves the list of extended attribute names associated with the given path in the filesystem.
getxattr#stats.snx11003	Cumulative number of times operation has occurred to retrieve the value of the extended attribute identified by name and associated with the given path in the filesystem.
setxattr#stats.snx11003	Cumulative number of calls to set extended attributes
alloc_inode#stats.snx11003	Cumulative number of Fragmentations. System will allocate another inode as needed.
statfs#stats.snx11003	Cumulative number of calls to stat fs
getattr#stats.snx11003	Cumulative number of get attribute calls
flock#stats.snx11003	This utility manages flock locks from within shell scripts or from the command line. A cumulative count of file locks.
lockless_truncate#stats.snx11003	Cumulative number of file truncates without locking a file.
truncate#stats.snx11003	The cumulative number of events to shrink (or extend) the size of a file to the specified size
setattr#stats.snx11003	The cumulative number of times setattr was called. This command sets the value of given attribute of an object
fsync#stats.snx11003	The cumulative number of fsync transfers. fsync transfers ("flushes") all modified in-core data of (i.e.,modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed. As well as flushing the file data, fsync() also flushes the metadata information associated with the file.
seek#stats.snx11003	Cumulative File seeks
mmap#stats.snx11003	Cumulative number of new mapping in the virtual address space of the calling process. The starting address for the new mapping is specified in addr.
close#stats.snx11003	Cumulative File Closes
open#stats.snx11003	Cumulative File Opens
ioctl#stats.snx11003	Cumulative Input/Output control calls
brw_write#stats.snx11003	Cumulative Bulk read writes to storage
brw_read#stats.snx11003	Cumulative Bulk reads to storage
write_bytes#stats.snx11003	Cumulative Writes to storage in bytes
read_bytes#stats.snx11003	Cumulative Reads to storage in bytes
writeback_failed_pages#stats.snx11003	Cumulative number of writeback failed pages. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_ok_pages#stats.snx11003	Cumulative number of writeback success pages. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_from_pressure#stats.snx11003	Cumulative number of writeback from pressure. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
writeback_from_writepage#stats.snx11003	Cumulative number of writeback from writepages. Writeback is the process of asynchronously writing dirty pages from the page cache back to the underlying filesystem
dirty_pages_misses#stats.snx11003	Cumulative number of Dirty page misses; Dirty pages are the pages in memory (page cache) that have been updated and therefore have changed from what is currently stored on disk.
dirty_pages_hits#stats.snx11003	Cumulative number of Dirty page hits; Dirty pages are the pages are the pages in memory (page cache) that have been updated and therefore have changed from what is currently stored on disk.
SAMPLE_totaloutput_optB (B/s)	NIC Metrics (bytes per second). The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
SAMPLE_bteout_optB (B/s)	NIC Metrics. The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
SAMPLE_bteout_optA (B/s)	NIC Metrics. The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
SAMPLE_fmaout (B/s)	NIC Metrics. The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
SAMPLE_totalinput (B/s)	NIC Metrics. The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
SAMPLE_totaloutput_optA (B/s)	The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
totaloutput_optB	NIC Metrics. The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
bteout_optB	NIC Metrics. The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
bteout_optA	NIC Metrics. The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
fmaout	Cumulative number of Fast Memory Accesses (small transfers) by Node's NIC
totalinput	Sum of total bytes for the Node's NIC
totaloutput_optA	NIC Metrics. The fundamental issue is that some of the performance counters count data that doesn't actually make it onto the HSN. There are overhead flits counted as parts of some Get transaction, and demarcation packets within some transactions that are entirely generated by and consumed by the local Gemini. There isn't enough information available to compensate exactly for them. Option A takes a simplistic approach, and ignores the issue. The extra bytes are counted as if they're message payload. Option B is preferred. It makes two assumption we believe are reasonable: 1. Packets part of BTE Puts will mostly be max-sized. 2. The majority of Get requests will be BTE, not FMA. We believe this matches MPI's use. The BTE is used for large transfers. Only the first and last packets of a transfer may be less than max sized. As the transfers are large, most of the packets will not be the first or last packet. Option A may be more accurate if actual use doesn't match these assumptions.
Z-_SAMPLE_GEMINI_LINK_CREDIT_STALL (% x1e6)	Percentage of time that Z Negative link was in a stalled state
Z+_SAMPLE_GEMINI_LINK_CREDIT_STALL (% x1e6)	Link aggregated Gemini output stalls for the Z Postive link
Y-_SAMPLE_GEMINI_LINK_CREDIT_STALL (% x1e6)	Link aggregated Gemini output stalls for the Y Negative link
Y+_SAMPLE_GEMINI_LINK_CREDIT_STALL (% x1e6)	Link aggregated Gemini output stalls for the Z Postive link
X-_SAMPLE_GEMINI_LINK_CREDIT_STALL (% x1e6)	Link aggregated Gemini output stalls for the X Negative link
X+_SAMPLE_GEMINI_LINK_CREDIT_STALL (% x1e6)	Link aggregated Gemini output stalls for the X Negative link
Z-_SAMPLE_GEMINI_LINK_INQ_STALL (% x1e6)	% of time spent in Input Queue Stall state for the Z Negative link
Z+_SAMPLE_GEMINI_LINK_INQ_STALL (% x1e6)	% of time spent in Input Queue Stall state for the Z Postive link
Y-_SAMPLE_GEMINI_LINK_INQ_STALL (% x1e6)	% of time spent in Input Queue Stall state for the Y Negative link
Y+_SAMPLE_GEMINI_LINK_INQ_STALL (% x1e6)	% of time spent in Input Queue Stall state for the Y Postive link
X-_SAMPLE_GEMINI_LINK_INQ_STALL (% x1e6)	% of time spent in Input Queue Stall state for the X Negative link
X+_SAMPLE_GEMINI_LINK_INQ_STALL (% x1e6)	% of time spent in Input Queue Stall state for the X Negative link
Z-_SAMPLE_GEMINI_LINK_PACKETSIZE_AVE (B)	Average Packet size for the Z Negative link
Z+_SAMPLE_GEMINI_LINK_PACKETSIZE_AVE (B)	Average Packet size for the Z Postive link
Y-_SAMPLE_GEMINI_LINK_PACKETSIZE_AVE (B)	Average Packet size for the Y Negative link
Y+_SAMPLE_GEMINI_LINK_PACKETSIZE_AVE (B)	Average Packet size for the Y Postive link
X-_SAMPLE_GEMINI_LINK_PACKETSIZE_AVE (B)	Average Packet size for the X Negative link
X+_SAMPLE_GEMINI_LINK_PACKETSIZE_AVE (B)	Average Packet size for the X Postive link
Z-_SAMPLE_GEMINI_LINK_USED_BW (% x1e6)	% of used bandwidth for the Z Negative link
Z+_SAMPLE_GEMINI_LINK_USED_BW (% x1e6)	% of used bandwidth for the Z Postive link
Y-_SAMPLE_GEMINI_LINK_USED_BW (% x1e6)	% of used bandwidth for the Y Negative link
Y+_SAMPLE_GEMINI_LINK_USED_BW (% x1e6)	% of used bandwidth for the Y Postive link
X-_SAMPLE_GEMINI_LINK_USED_BW (% x1e6)	% of used bandwidth for the X Negative link
X+_SAMPLE_GEMINI_LINK_USED_BW (% x1e6)	% of used bandwidth for the X Postive link
Z-_SAMPLE_GEMINI_LINK_BW (B/s)	Z negative total transfer rate
Z+_SAMPLE_GEMINI_LINK_BW (B/s)	Z plus total transfer rate
Y-_SAMPLE_GEMINI_LINK_BW (B/s)	Y negative total transfer rate
Y+_SAMPLE_GEMINI_LINK_BW (B/s)	Y positive total transfer rate
X-_SAMPLE_GEMINI_LINK_BW (B/s)	X negative total transfer rate
X+_SAMPLE_GEMINI_LINK_BW (B/s)	X positive total transfer rate
Z-_recvlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 24 send and receive lanes
Z+_recvlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 24 send and receive lanes
Y-_recvlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 12 send and receive lanes
Y+_recvlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 12 send and receive lanes
X-_recvlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 24 send and receive lanes
X+_recvlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 24 send and receive lanes
Z-_sendlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 24 send and receive lanes
Z+_sendlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 24 send and receive lanes
Y-_sendlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 12 send and receive lanes
Y+_sendlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 12 send and receive lanes
X-_sendlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 24 send and receive lanes
X+_sendlinkstatus (1)	Link stat is listed in number of communication lanes functioning; a fully functional gemini for a complete torus will have 24 send and receive lanes
Z-_credit_stall (ns)	Wait time between devices when the first device is waiting for a signal from the second device to send more data.
Z+_credit_stall (ns)	Wait time between devices when the first device is waiting for a signal from the second device to send more data.
Y-_credit_stall (ns)	Wait time between devices when the first device is waiting for a signal from the second device to send more data.
Y+_credit_stall (ns)	Wait time between devices when the first device is waiting for a signal from the second device to send more data.
X-_credit_stall (ns)	Wait time between devices when the first device is waiting for a signal from the second device to send more data.
X+_credit_stall (ns)	Wait time between devices when the first device is waiting for a signal from the second device to send more data.
Z-_inq_stall (ns)	Input queue stalled in nanoseconds for Z minus
Z+_inq_stall (ns)	Input queue stalled in nanoseconds for Z positive
Y-_inq_stall (ns)	Input queue stalled in nanoseconds for Y negative
Y+_inq_stall (ns)	Input queue stalled in nanoseconds Y positive
X-_inq_stall (ns)	Input queue stalled in nanoseconds X negative
X+_inq_stall (ns)	Input queue installed in nanoseconds X positive
Z-_packets (1)	Cumulative number of packets for Z negative
Z+_packets (1)	Cumulative number of packets for Z positive
Y-_packets (1)	Cumulative number of packets for Y negative
Y+_packets (1)	Cumulative number of packets for Y positive
X-_packets (1)	Cumulative number of packets for X negative
X+_packets (1)	Cumulative number of packets for X positive
Z-_traffic (B)	Cumulative number traffic in bytes for Z negative
Z+_traffic (B)	Cumulative number traffic in bytes for Z positive
Y-_traffic (B)	Cumulative number traffic in bytes for Y negative
Y+_traffic (B)	Cumulative number traffic in bytes for Y positive
X-_traffic (B)	Cumulative number traffic in bytes for X negative
X+_traffic (B)	Cumulative number traffic in bytes for X positive
nettopo_mesh_coord_Z	Z position in the Torus
nettopo_mesh_coord_Y	Y Position in the Torus
nettopo_mesh_coord_X	X position in the Torus

1 B. MSR Data File

The MSR data file contains headers follow by associated data elements. Those elements will be explained later in the document. The headers for each MSR comma separated value (CSV) data file are formatted as follows:

#Time, Time_usec, CompId, Ctr0, Ctr0_c00, Ctr0_c08, Ctr0_c16, Ctr0_c24, Ctr1, Ctr1_c00, Ctr1_c08, Ctr1_c16, Ctr1_c24, Ctr2, Ctr2_c00, Ctr2_c08, Ctr2_c16, Ctr2_c24, Ctr3, Ctr3_c00, Ctr3_c08, Ctr3_c16, Ctr3_c24, Ctr4, Ctr4_c00, Ctr4_c01, Ctr4_c02, Ctr4_c03, Ctr4_c04, Ctr4_c05, Ctr4_c06, Ctr4_c07, Ctr4_c08, Ctr4_c09, Ctr4_c10, Ctr4_c11, Ctr4_c12, Ctr4_c13, Ctr4_c14, Ctr4_c15, Ctr4_c16, Ctr4_c17, Ctr4_c18, Ctr4_c19, Ctr4_c20, Ctr4_c21, Ctr4_c22, Ctr4_c23, Ctr4_c24, Ctr4_c25, Ctr4_c26, Ctr4_c27, Ctr4_c28, Ctr4_c29, Ctr4_c30, Ctr4_c31, Ctr5, Ctr5_c00, Ctr5_c01, Ctr5_c02, Ctr5_c03, Ctr5_c04, Ctr5_c05, Ctr5_c06, Ctr5_c07, Ctr5_c08, Ctr5_c09, Ctr5_c10, Ctr5_c11, Ctr5_c12, Ctr5_c13, Ctr5_c14, Ctr5_c15, Ctr5_c16, Ctr5_c17, Ctr5_c18, Ctr5_c19, Ctr5_c20, Ctr5_c21, Ctr5_c22, Ctr5_c23, Ctr5_c24, Ctr5_c25, Ctr5_c26, Ctr5_c27, Ctr5_c28, Ctr5_c29, Ctr5_c30, Ctr5_c31, Ctr6, Ctr6_c00, Ctr6_c01, Ctr6_c02, Ctr6_c03, Ctr6_c04, Ctr6_c05, Ctr6_c06, Ctr6_c07, Ctr6_c08, Ctr6_c09, Ctr6_c10, Ctr6_c11, Ctr6_c12, Ctr6_c13, Ctr6_c14, Ctr6_c15, Ctr6_c16, Ctr6_c17, Ctr6_c18, Ctr6_c19, Ctr6_c20, Ctr6_c21, Ctr6_c22, Ctr6_c23, Ctr6_c24, Ctr6_c25, Ctr6_c26, Ctr6_c27, Ctr6_c28, Ctr6_c29, Ctr6_c30, Ctr6_c31, Ctr7, Ctr7_c00, Ctr7_c01, Ctr7_c02, Ctr7_c03, Ctr7_c04, Ctr7_c05, Ctr7_c06, Ctr7_c07, Ctr7_c08, Ctr7_c09, Ctr7_c10, Ctr7_c11, Ctr7_c12, Ctr7_c13, Ctr7_c14, Ctr7_c15, Ctr7_c16, Ctr7_c17, Ctr7_c18, Ctr7_c19, Ctr7_c20, Ctr7_c21, Ctr7_c22, Ctr7_c23, Ctr7_c24, Ctr7_c25, Ctr7_c26, Ctr7_c27, Ctr7_c28, Ctr7_c29, Ctr7_c30, Ctr7_c31, Ctr8, Ctr8_c00, Ctr8_c01, Ctr8_c02, Ctr8_c03, Ctr8_c04, Ctr8_c05, Ctr8_c06, Ctr8_c07, Ctr8_c08, Ctr8_c09, Ctr8_c10, Ctr8_c11, Ctr8_c12, Ctr8_c13, Ctr8_c14, Ctr8_c15, Ctr8_c16, Ctr8_c17, Ctr8_c18, Ctr8_c19, Ctr8_c20, Ctr8_c21, Ctr8_c22, Ctr8_c23, Ctr8_c24, Ctr8_c25, Ctr8_c26, Ctr8_c27, Ctr8_c28, Ctr8_c29, Ctr8_c30, Ctr8_c31, Ctr9, Ctr9_c00, Ctr9_c01, Ctr9_c02, Ctr9_c03, Ctr9_c04, Ctr9_c05, Ctr9_c06, Ctr9_c07, Ctr9_c08, Ctr9_c09, Ctr9_c10, Ctr9_c11, Ctr9_c12, Ctr9_c13, Ctr9_c14, Ctr9_c15, Ctr9_c16, Ctr9_c17, Ctr9_c18, Ctr9_c19, Ctr9_c20, Ctr9_c21, Ctr9_c22, Ctr9_c23, Ctr9_c24, Ctr9_c25, Ctr9_c26, Ctr9_c27, Ctr9_c28, Ctr9_c29, Ctr9_c30, Ctr9_c31

Future data files may be different so reference the new header file. Refer to Table 1 for information on the meaning of the counters (CTR 0-9).

Table 1 – Header Details

Counter	MSR Counter Definitions	What is being measured	Validation Number
Ctr0	L3_CACHE_MISSES per NUMA domain (4 counters)	Memory Controller Counts	85903603681
Ctr1	DCT_PREFETCH per NUMA domain (4 counters)	Memory Controller Counts	73018664176
Ctr2	DCT_RD_TOT per NUMA domain for each controller (4 counters)	Memory Controller Counts	730186636664
Ctr3	DCT_WRT per NUMA domain (4 counters)	Memory Controller Counts	73018644976
Ctr4	TOT-CYC per core (4 counters)	Total Processor Cycle for each core	4391030
Ctr5	TOT INS per Core (4 counters)	Total Instructions for each core	4391104
Ctr6	L1_DCM per core (4 counters)	L1 data Cache misses for the L1 each core	4391233
Ctr7	Retired flops per core (all types of flops) (4 counters)	Number of retired floating-point operations per core	4456195
Ctr8	Operation counts per core (4 counters)	Vector unit instructions per core	4392139
Ctr9	Translation Lookaside Buffer (TLB) data misses per core (4 counters)	TLB data misses per core	4392774

How to Read the MSR Data File

Reference this sample MSR comma separated value (CSV) data file:

1480996440.004670, 4670, 8672, 85903603681, 1075675482, 957589463, 717738766, 744067220, 73018664176, 412116844, 369559710, 125781222, 119420227, 73018663664, 1703147611, 1424989459, 813186830, 824852929, 73018644976, 941771093, 910328449, 393929432, 383752296, 4391030, 571110602344, 562415965217, 556924102961, 554761201724, 552701273182, 551182100824, 551820818084, 550270895655, 560016645494, 553663646637, 549783500782, 549381437004, 539166673211, 539742737722, 540313267024, 539874939820, 150688025199, 148035712784, 162794766413, 158869157833, 163899518848, 161418031801, 164150890354, 162961337921, 147052930419, 144195801312, 146327145247, 144225245105, 166696559177, 164632544561, 193190727930, 214273693014, 4391104, 503840210303, 640187476527, 597059695706, 596812465922, 594595322971, 595837373660, 591893240209, 592532355039, 596434671942, 592577910304, 593366025588, 591517987927, 590791274450, 592685890076, 591352966301, 591685943383, 157896260541, 155033869123, 159496313138, 155325793581, 158132877662, 156241919544, 156052879923, 156028880128, 161016004873, 160206105921, 157705581605, 154634583769, 157143534884, 156100936718, 158578417504, 163794562360, 4391233, 3350723806, 617055349, 543869780, 471503058, 461166615, 435289712, 465346004, 443590008, 503874718, 438612212, 452681008, 463562281, 416589356, 423156695, 429588594, 421988887, 295675170, 289542436, 313223710, 308085728, 325639270, 315319292, 365086035, 314506675, 287651138, 275625001, 271477379, 271835640, 335907639, 323885751, 341101565, 521244927, 4456195, 30779324559, 30779336977, 30779328042, 30779334057, 30779323889, 30779323032, 30779323435, 30779323120, 30779323916, 30779323364, 30779324117, 30779323774, 30779322949, 30779322839, 30779322762, 30779322922, 30778881320, 30778881309, 30778881499, 30778881574, 30778881320, 30778881385, 30778881216, 30778881349, 30778881632, 30778881297, 30778884573, 30778881471, 30778881316, 30778881237, 30778881391, 30778881787, 4392139, 14761832555, 14582318338, 14589458999, 14583243493, 14582424396, 14581571672, 14582586746, 14581792888, 14603781577, 14582028453, 14583226586, 14581883168, 14582393085, 14581535539, 14581592416, 14581549938, 14514853391, 14513979629, 14514014497, 14513973145, 14513995833, 14513974554, 14514007272, 14513980413, 14524818983, 14513954446, 14514343883, 14513937916, 14514059869, 14514016689, 14514188159, 14514218075, 4392774, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

How to understand what all those numbers above mean

10 counters are used with 4 values for the counters 0-4 and 32 values for counters 5-9. Each entry of the 26,868 BW computer nodes is represented on a separate line in the CSV file. Each file contains a full day of data for each computer node at 1 minute samples. Each file should have approximately 38.7 million lines. Each data block starts with three leading elements before the CTR 0-9 data.

Therefore, in the sample CSV data file above, the three leading comma separated elements in bold correspond to #Time (EPOCH TIME), Time_Usec, and CompID. Each of the 10 counters are followed by their associated values. For the other bolded items (not the first three), these correspond to one of ten counters between CTR 0-9. These are known as VALIDATION numbers. The Validation number in EACH data block for CTR 0-9 must be equal to the listed validation number in the tables for the associated data to be valid.

Table 2 below shows the three leading elements for the counter data set, plus the 10 counters. Compare Table 2 to the above sample CSV file in order to see the relationship between the counter headers, validation number, and their associated data.

Table 2 – Counter Examples

MSR Counter Definitions	Validation Number	Counter
#Time		1480996440.004670
Time_Usec		4670
CompID		8672
Ctr0 *4 L3_CACHE_MISSES per NUMA domain	85903603681	1075675482, 957589463, 717738766, 744067220
Ctr1 * 4 DCT_PREFETCH per NUMA domain	73018664176	412116844, 369559710, 125781222, 119420227
Ctr2 * 4 DCT_RD_TOT per NUMA domain	730186636664	1703147611, 1424989459, 813186830, 824852929
Ctr3 * 4 DCT_WRT per NUMA domain	73018644976	941771093, 910328449, 393929432, 383752296
Ctr4 * 32 TOT-CYC per core	4391030	571110602344, 562415965217, 556924102961, 554761201724, 552701273182, 551182100824, 551820818084, 550270895655, 560016645494, 553663646637, 549783500782, 549381437004, 539166673211, 539742737722, 540313267024, 539874939820, 150688025199, 148035712784, 162794766413, 158869157833, 163899518848, 161418031801, 164150890354, 162961337921, 147052930419, 144195801312, 146327145247, 144225245105, 166696559177, 164632544561, 193190727930, 214273693014
Ctr5 * 32 TOT INS per Core	4391104	32 counters….after 4391104
Ctr6 * 32 L1 DCM per core	4391233	32 counters…
Ctr7 * 32 retired flops per core (all types of flops)	4456195	32 counters…
Ctr8 * 32 vector instructions per core	4392139	32 counters…
Ctr9 * 32 TLB DM per core	4392774	32 counters…

2. Syslogs Data

Syslog is a standard for sending and receiving notification messages–in a particular format–from various network devices. The messages include time stamps, event messages, severity, host IP addresses, diagnostics and more. Syslog was designed to monitor network devices and systems to send out notification messages if there are any issues with functioning–it also sends out alerts for pre-notified events and monitors suspicious activity via the change log/event log of participating network devices.

The posted logs have been anonymized to replace usernames and to remove ssh and sudo lines.

3. Resource Manager data (Torque)

Please go to Chapter 10: Accounting Records of the Adaptive website for background information on job log and accounting data.

The standard accounting logs are posted after the username and project/group name fields were anonymized.

4. System Environment Data Collections

Cabinet and chassis data is separated into 4 types of files

L1_ENV_DATA

CSV with the following fields:

service_id, datetime, PCB_TEMP, INLET_TEMP, XDP_AIRTEMP, CAB_KILOWATTS, FANSPEED

L1_XT5_STATUS

CSV with the following fields:

service_id,datetime,L1_S_XT5_FWLEVEL,L1_H_XT5_PWRSTATUS,L1_H_XT5_CABHEALTH,L1_S_XT5_FANSPEED,L1_S_XT5_FANMODE,L1

_S_XT5_VFD_REG,L1_S_XT5_DOORSTAT,L1_H_XT5_CAGE0VRMSTAT,L1_H_XT5_CAGE1VRMSTAT,L1_H_XT5_CAGE2VRMSTAT,L1_H_XT5_

VALERE_SH0_SL0,L1_H_XT5_VALERE_SH0_SL1,L1_H_XT5_VALERE_SH0_SL2,L1_H_XT5_VALERE_SH1_SL0,L1_H_XT5_VALERE_SH1_SL1,

L1_H_XT5_VALERE_SH1_SL2,L1_H_XT5_VALERE_SH2_SL0,L1_H_XT5_VALERE_SH2_SL1,L1_H_XT5_VALERE_SH2_SL2,L1_S_XT5_VALERE_SHAREFAULTS,L1_H_XT5_XDPALARM

L1_XT5_TEMPS

CSV with the following fields:

service_id,datetime,L1_T_XT5_PCBTEMP,L1_T_XT5_INLETTEMP,L1_T_XT5_XDPAIRTEMP,L1_T_XT5_XDPSTARTTEMP,L1_T_XT5_VALERE_

FET_SH0_SL0,L1_T_XT5_VALERE_FET_SH0_SL1,L1_T_XT5_VALERE_FET_SH0_SL2,L1_T_XT5_VALERE_FET_SH1_SL0,L1_T_XT5_

VALERE_FET_SH1_SL1,L1_T_XT5_VALERE_FET_SH1_SL2,L1_T_XT5_VALERE_FET_SH2_SL0,L1_T_XT5_VALERE_FET_SH2_SL1,L1_T_

XT5_VALERE_FET_SH2_SL2

L1_XT5_VOLTS

CSV with the following fields:

service_id,datetime,L1_V_XT5_PCB5VA,L1_V_XT5_PCB5VB,L1_V_XT5_PCB3V,L1_V_XT5_PCB2V,L1_V_XT5_VALERE_SH0_SL0,L1_V_XT5

_VALERE_SH0_SL1,L1_V_XT5_VALERE_SH0_SL2,L1_V_XT5_VALERE_SH1_SL0,L1_V_XT5_VALERE_SH1_SL1,L1_V_XT5_VALERE_SH1_

SL2,L1_V_XT5_VALERE_SH2_SL0,L1_V_XT5_VALERE_SH2_SL1,L1_V_XT5_VALERE_SH2_SL2,L1_I_XT5_VALERE_SH0_SL0,L1_I_XT5_

VALERE_SH0_SL1,L1_I_XT5_VALERE_SH0_SL2,L1_I_XT5_VALERE_SH1_SL0,L1_I_XT5_VALERE_SH1_SL1,L1_I_XT5_VALERE_SH1_SL2,

L1_I_XT5_VALERE_SH2_SL0,L1_I_XT5_VALERE_SH2_SL1,L1_I_XT5_VALERE_SH2_SL2,L1_P_XT5_CABKILOWATTS

Definitions of all parameters is not available but this is what we can provide at this time:

Parameter Definitions
Parameter	Definition
PCB_TEMP	Printed circuit board temperature which is the L1 controller pcb
INLET_TEMP	Incoming air temp Printed circuit board temperature which is the L1 controler pcb.at the bottom of each cabinet in centigrade. At 32 degrees cabinet shuts down.
XDP_AIRTEMP	XDP is the cooling unit that circulates coolant through the cabinets. This temperature is measured above each cabinet.
CAB_KILOWATTS	Total power being used by the cabinet. Does not include the cooling fan. DC power consumption.
FANSPEED	A 7.5 HP motor driving the circulating fan. Max speed is 75.
VALERE	The manufacturer of the power supplies used in the cabinets. Each cabinet has 7 of them. There are 9 slots, 3 rows, 3 high so 2 are always empty. Most of the VALERE outputs are referencing temperatures.
FET	The field effect transistors which are the components that actually do the voltage regulation.

5. Darshan Data

Darshan is a lightweight and scalable I/O profiling tool. Darshan is able to collect profile information for POSIX, HDF5, NetCDF and MPIIO calls. Darshan profile data can be used to investigate and tune I/O behavior of MPI applications. Darshan can be used only with MPI applications. The application, at minimum, must call MPI_Init and MPI_Finalize.

For information on Darshan Data, go to: https://www.mcs.anl.gov/research/projects/darshan/

The available tar files contain an anonymized version of the darshan output for each Blue Waters job. Not all Blue Waters jobs used Darshan, however. The anonymization step used the same user map as other data above and obfuscates the username, uid and path (keeping the base file system path intact). The resulting files should be readable by the normal Darshan analysis tools. The anonymization was performed with a modified version of the darshan-convert utility that leaves more information (such as the application name) along with a python driver.

6. Lustre User Experience Metrics

Active probing of filesystem components

The character of a Lustre HPC file system is complex with activity that is influenced by multiple users and subsystems, therefore understanding abnormal behavior can be difficult to identify. To provide better insights on file system activity, the Integrated System Console (ISC) was utilized which provides an active monitoring tool of server storage data. At its core, ISC processes logs and job metadata, then stores this data and provides a mechanism by which to view collected data through a web interface.

The component data probes actively monitor server storage and metadata by writing to every single component of the server storage file system to measure performance. The active probing of the file system is different than the Cray Sampler Data in that the latter is passive and is basically a counter of operations and data flows from each computer node.

How data is collected

Data was collected from 3 file system components, which were the MOM, Login, and Import/Export nodes. Problems typically associated with these components are:

a large number of metadata operations
a very large I/O to a striped file
a moderate amount of I/O to an unstriped file

Service Nodes - Service nodes is a general term for non-compute nodes. The service nodes which launch jobs are more specifically called “MOM.” The server storage hosts within the main computer system that launch jobs (MOM) are used to represent file system interactions using the same clients as the compute nodes and using LNET routers to access the file system data.

Import/Export (aka DTN) Nodes – Data is collected to measure access via InfiniBand.

Login Nodes - Login nodes are used for administrative tasks like copying, editing and transferring files. For example. If a user connects via an ssh client they are connecting to a login nodes. Login nodes are used when a user compiles code and submits jobs to the batch scheduler. Login nodes are measured to represent users impact on each other’s behavior. The login nodes require collection from each host as the user interference can be unique to a host.

How to Understand the Data

The data files are comma-separated in the following format:

Collection host, time, operation, filesystem, ost ID, measurement time

Collection host is one of the following:

H2ologin[1-4] these are collections from the login nodes and have multi user access via infiniband
Mom[1-64] these are machines inside the high speed network and use lnet routers
Ie[01-28] Import/Export nodes without login access using infinband

Time: in epoch

Operation:

Operation	Test id
create	1
write	2
rmdir	3
end	4
single file create	5
single file delete	6

Filesystem:

Filesystem name	Before 2016-02-22	After 2016-02-22
home	snx11001	snx11002
projects	snx11002	snx11001
scratch	snx11003	snx11003

Ost ID: Lustre node number for the filesystem server

Measurement time: Time in milliseconds to perform the operation

How to Get Access to the Data

Access to the datasets is provided via https://www.globus.org/. You may login with an existing institutional account or create a new account at that site.

The collection name is Blue Waters System Monitoring Data Set and can be accessed by clicking HERE.

Blue Waters Data Sets

Overview

General Description of Collected Data

How to Get Access to the Data

Data Types Available

Explanation of Data Types

1. Node metric, compute and service node (time series) data

1 A. Cray System Sampler Data

File Data Name

File Data Definition

1 B. MSR Data File

Table 1 – Header Details

How to Read the MSR Data File

How to understand what all those numbers above mean

Table 2 – Counter Examples

2. Syslogs Data

3. Resource Manager data (Torque)

4. System Environment Data Collections

L1_ENV_DATA

CSV with the following fields:

L1_XT5_STATUS

CSV with the following fields:

L1_XT5_TEMPS

CSV with the following fields:

L1_XT5_VOLTS

CSV with the following fields:

5. Darshan Data

6. Lustre User Experience Metrics

How to Understand the Data

How to Get Access to the Data