Python Script to take backup of folder on amazon s3 – Windows
Python Script to take backup of folder on Amazon s3 – Windows
Date Posted:17-04-2017
Amazon S3 is a storage where we can store files in a cloud. We can download the files from anywhere after we stored the files on cloud. There are many way to store the files on cloud. In this post, we will use python script to take backup of folder.
Assumption:
- Windows Machine
- Python
- Boto Library.'”
- IAM Access key and secret key which has privilege to the bucket “bucket-name”
Incase, if python or boto not installed on windows machine, follow the post to install python on windows machine
Implementation.
The below script will take backup of folder “c:\backup” on s3 bucket “bucket-name”.
What needs to be modified on the below script:
Depends on your requirement/details, please modify the below details
- AWS_ACCESS_KEY_ID – Your Access key. Incase, if we don’t have access key generate one from amazon IAM
- AWS_ACCESS_KEY_SECRET – Your Secret key.
- bucket_name – S3 Bucket name
- sourceDir – Directory which needs to be backed up on s3
Create a file called s3backup.py
nano s3backup.py
import boto import boto.s3 import os.path import sys import time import datetime # Fill these in - you get them when you sign up for S3 AWS_ACCESS_KEY_ID = 'xxxxxxxxxxxx' AWS_ACCESS_KEY_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxx' # Fill in info on data to upload # destination bucket name bucket_name = 'bucket-name' # source directory sourceDir = 'c:\backup' # destination directory name (on s3) utc_timestamp = time.time() UTC_FORMAT = '%Y%m%d' utc_time = datetime.datetime.utcfromtimestamp(utc_timestamp) utc_time = utc_time.strftime(UTC_FORMAT) print (utc_time) destDir = '' #max size in bytes before uploading in parts. between 1 and 5 GB recommended MAX_SIZE = 20 * 1000 * 1000 #size of parts when uploading in parts PART_SIZE = 6 * 1000 * 1000 conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_KEY_SECRET) bucket = conn.get_bucket(bucket_name) uploadFileNames = [] for root, dirs, files in os.walk(sourceDir, topdown=False): for name in files: fname=os.path.join(root, name) print (fname) uploadFileNames.append(fname) print (uploadFileNames) def percent_cb(complete, total): sys.stdout.write('.') sys.stdout.flush() for filenames in uploadFileNames: filename=filenames.replace("\\", "/") print ('filename=' + filename) sourcepath = filename destpath = utc_time + '/' + filename #print ('Uploading %s to Amazon S3 bucket %s') % \ #(sourcepath, bucket_name) filesize = os.path.getsize(sourcepath) if filesize > MAX_SIZE: print ("multipart upload") mp = bucket.initiate_multipart_upload(destpath) fp = open(sourcepath,'rb') fp_num = 0 while (fp.tell() < filesize): fp_num += 1 print ("uploading part %i" %fp_num) mp.upload_part_from_file(fp, fp_num, cb=percent_cb, num_cb=10, size=PART_SIZE) mp.complete_upload() else: print ("singlepart upload") k = boto.s3.key.Key(bucket) k.key = destpath k.set_contents_from_filename(sourcepath, cb=percent_cb, num_cb=10)
Execute the script on powershell to take backup. The backup will be created on the bucket with today’s date.
c:\python3\python.exe s3backup.py
We can add the script on schedule tasks to take backup daily.