Python Script to take backup of folder on Amazon s3 – Windows

Date Posted:17-04-2017

Amazon S3 is a storage where we can store files in a cloud. We can download the files from anywhere after we stored the files on cloud. There are many way to store the files on cloud. In this post, we will use python script to take backup of folder.

Assumption:
  1. Windows Machine
  2. Python
  3. Boto Library.'”
  4. IAM Access key and secret key which has privilege to the bucket “bucket-name”

Incase, if python or boto not installed on windows machine, follow the post to install python on windows machine

Implementation.

The below script will take backup of folder “c:\backup” on s3 bucket “bucket-name”.

What needs to be modified on the below script:

Depends on your requirement/details, please modify the below details

  1. AWS_ACCESS_KEY_ID –  Your Access key. Incase, if we don’t have access key generate one from amazon IAM
  2. AWS_ACCESS_KEY_SECRET – Your Secret key.
  3. bucket_name – S3 Bucket name
  4. sourceDir – Directory which needs to be backed up on s3

Create a file called s3backup.py

nano s3backup.py

import boto
import boto.s3

import os.path
import sys
import time
import datetime

# Fill these in - you get them when you sign up for S3
AWS_ACCESS_KEY_ID = 'xxxxxxxxxxxx'
AWS_ACCESS_KEY_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxx'
# Fill in info on data to upload
# destination bucket name
bucket_name = 'bucket-name'
# source directory
sourceDir = 'c:\backup'
# destination directory name (on s3)
utc_timestamp = time.time()
UTC_FORMAT = '%Y%m%d'
utc_time = datetime.datetime.utcfromtimestamp(utc_timestamp)
utc_time = utc_time.strftime(UTC_FORMAT)
print (utc_time)
destDir = ''

#max size in bytes before uploading in parts. between 1 and 5 GB recommended
MAX_SIZE = 20 * 1000 * 1000
#size of parts when uploading in parts
PART_SIZE = 6 * 1000 * 1000

conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_KEY_SECRET)
bucket = conn.get_bucket(bucket_name)
uploadFileNames = []
for root, dirs, files in os.walk(sourceDir, topdown=False):
       for name in files:
          fname=os.path.join(root, name)
          print (fname)
          uploadFileNames.append(fname)
          print (uploadFileNames)
def percent_cb(complete, total):
    sys.stdout.write('.')
    sys.stdout.flush()
for filenames in uploadFileNames:
        filename=filenames.replace("\\", "/")
        print ('filename=' + filename)
        sourcepath = filename
        destpath = utc_time + '/' + filename
        #print ('Uploading %s to Amazon S3 bucket %s') % \
               #(sourcepath, bucket_name)

        filesize = os.path.getsize(sourcepath)
        if filesize > MAX_SIZE:
            print ("multipart upload")
            mp = bucket.initiate_multipart_upload(destpath)
            fp = open(sourcepath,'rb')
            fp_num = 0
            while (fp.tell() < filesize):
                fp_num += 1
                print ("uploading part %i" %fp_num)
                mp.upload_part_from_file(fp, fp_num, cb=percent_cb, num_cb=10, size=PART_SIZE)

            mp.complete_upload()

        else:
            print ("singlepart upload")
            k = boto.s3.key.Key(bucket)
            k.key = destpath
            k.set_contents_from_filename(sourcepath,
                    cb=percent_cb, num_cb=10)

 

Execute the script on powershell to take backup. The backup will be created on the bucket with today’s date.

c:\python3\python.exe s3backup.py

We can add the script on schedule tasks to take backup daily.

Leave a Reply