Using AWS (Amazon Web Services) S3 CLI with Boto3 Python Package
This page steps one through the process of setting up an AWS account and using the Boto3 python package to access S3 buckets via the CLI (command line interface).
For those at the University of Illinois, here are a couple useful links:
Log in to Illinois AWS account: https://aws.illinois.edu
Illinois AWS resources (Tech Services page): https://answers.uillinois.edu/illinois/search.php?q=AWS
Important note: you will need access to the Hydro cluster to use Boto3. There are issues with installing it on Blue Waters.
Obtain AWS Account
First, of course, you need access to an AWS account. If you reside at the University of Illinois, instructions for requesting an Illinois AWS account can be found here: https://answers.uillinois.edu/illinois/63359 Otherwise, consult the IT/network team (and/or your supervisor) at your institution for details on obtaining an account.
Create Access Keys
This is done by an account admin through the AWS IAM (Identity and Access Management) console.
Instructions can be found here: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey
If you were given an admin account on AWS, follow the steps below to create a user with access keys. Only user instances can have keys, so even if you have an admin account, you still need to create a user instance for yourself. If you don't have an admin account, find out who does at your institution and ask them to create a user with access keys for you.
After logging in (use https://aws.illinois.edu if you're at U of I), go to the IAM Dashboard:
Under "IAM resources," click Users:
Select "Add users":
Choose a user name and set access type to "Programmatic access"; click "Next: Permissions":
On the "Set permissions" screen, select "Attach existing policies directly" and choose "AmazonS3FullAccess"; click "Next: Tags":
On the next screen, there's no need to do anything with tags, so just click "Next: Review."
There's also nothing you need to do on the Review screen, so just click "Create user."
At the end of the user creation process, it gives you an "Access key ID" and "Secret access key"; save these:
If you have an admin account, you can also create user instances for other group members who need access.
Store Access Keys on System
To allow CLI access to AWS, the keys need to be stored in a credentials file:
mkdir -p ~/.aws vim ~/.aws/credentials |
The file should have these three lines:
[default] aws_access_key_id = <YOUR_ACCESS_KEY> aws_secret_access_key = <YOUR_SECRET_KEY> |
Install Boto3
Instructions can be found here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html
The procedure for installing boto3 in a virtual environment on the Hydro cluster (https://bluewaters.ncsa.illinois.edu/hydro) is simple:
# with module Python/3.8.6-GCCcore-10.2.0 loaded # cd to location where you want to create the virtual environment mkdir myvirtualenv cd myvirtualenv virtualenv --system-site-packages $PWD source bin/activate pip install boto3 |
This should work without issue.
AWS S3 Bucket Interaction Examples
If boto3 is installed in a virtual environment, you need to be in the virtual environment (i.e., it needs to be activated) to use it:
source myvirtualenv/bin/activate |
Here are some example python scripts for interacting with AWS:
bucket_list.py - list all buckets
Usage: ./bucket_list.py
#!/usr/bin/env python # from https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html import boto3 # Let's use Amazon S3 s3 = boto3.resource('s3') # Print out bucket names for bucket in s3.buckets.all(): print(bucket.name) |
upload_file.py - upload a file to a bucket
Usage: ./upload_file file_name [bucket_name]
Be sure to set default_bucket in the script to the name of the bucket that you want to be your default.
#!/usr/bin/env python # Usage: ./upload_file file_name [bucket_name] # from https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html default_bucket = 'uiuc-ncsa-bluewaters-rmokos-test' import sys import logging import boto3 from botocore.exceptions import ClientError def upload_file(file_name, bucket, object_name=None): """ Upload a file to an S3 bucket :param file_name: File to upload :param bucket: Bucket to upload to :param object_name: S3 object name. If not specified then file_name is used :return: True if file was uploaded, else False """ # If S3 object_name was not specified, use file_name if object_name is None: object_name = file_name # Upload the file s3_client = boto3.client('s3') try: response = s3_client.upload_file(file_name, bucket, object_name) except ClientError as e: logging.error(e) return False return True try: bname = sys.argv[2] except IndexError: bname = default_bucket upload_file(sys.argv[1], bname) |
download_file.py - download a file from a bucket
Usage: ./download_file file_name [bucket_name]
Be sure to set default_bucket in the script to the name of the bucket that you want to be your default.
#!/usr/bin/env python # Usage: ./download_file file_name [bucket_name] # from https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-download-file.html default_bucket = 'uiuc-ncsa-bluewaters-rmokos-test' import sys import logging import boto3 from botocore.exceptions import ClientError def download_file(file_name, bucket, dest_file_name): """ Download a file from an S3 bucket :param file_name: File to download (S3 object name) :param bucket: Bucket to download from :param dest_file_name: Name to use for the downloaded file. If not specified, then file_name is used :return: True if file was downloaded, else False """ # If dest_file_name was not specified, use file_name if dest_file_name is None: dest_file_name = file_name # Download the file s3_client = boto3.client('s3') try: response = s3_client.download_file(bucket, file_name, dest_file_name) except ClientError as e: logging.error(e) return False return True try: bname = sys.argv[2] except IndexError: bname = default_bucket download_file(sys.argv[1], bname, sys.argv[1]) |
Hydro <=> AWS Transfer Rates
The measured time for uploading a tiny file (a few bytes) using "time -p" on the python script was 0.78 sec, and the same for downloading was 0.86 sec. Considering these to be "overhead" times, they were subtracted from the measured times for 1-MB and 10-GB transfers to get the times and transfer rates below. Note that the 1-MB file was 2^20 bytes, and the 10-GB file was 10*2^30 bytes. More tests were not performed due to cost concerns (Amazon charges based on the amount of data transferred).
Upload from Hydro to AWS
File Size | Time (sec) | Transfer Rate |
---|---|---|
1 MB | 0.17 | 5.9 MB/sec |
10 GB | 51.89 | 197.3 MB/sec |
Download from AWS to Hydro
File Size | Time (sec) | Transfer Rate |
---|---|---|
1 MB | 0.12 | 8.3 MB/sec |
10 GB | 34.43 | 297.4 MB/sec |