A few years ago I published a page on a S3 python based backup service. I had been using cloud based data backup for years (mainly Crashplan) to provide an extra level of protection to my on-prem raid1 NAS drive. This got me thinking if I could write something in Python and S3 that would do a similar job ? So that has now been working for me for years (and saved me a few times).
One of the features it lacks though is the ability to deleted files no longer present that need to be backed up. Being part of AWS one of our leadership principals is frugality – so its time to delete some of the old files that we don’t need anymore !
This update does the following:
- Creates a new variable that stores s3 objects that we find matching files locally for
- Looks at whats left in the s3 object list
- If anything is found, it does a date check to see if the s3 object has been updated in the last 6 monts
- If not – it is deleted
The updated python source code can be download HERE
The new delete block of code looks like this
if delete_files == "y":
file_to_delete_data = set()
months_to_delete = 6
date_6_months_ago = datetime.datetime.now(pytz.timezone("Australia/Sydney")) - datetime.timedelta(months_to_delete*30)
self.write_stdout_log(f"Starting expired file check process for {source}, s3 objects older than {date_6_months_ago}")
for s3_obj in self.object_keys:
# check if files that we did not locate locally and are older than 6 months so can be deleted from s3
if s3_obj.startswith(f"{bucket_folder}/") and s3_obj not in found_local_files_in_s3 and s3_obj != f"{bucket_folder}/":
s3_last_save = self.object_keys[s3_obj]
if s3_last_save < date_6_months_ago:
# Create this so we can have a counter showing progress for deletion
file_to_delete_data.add(s3_obj)
if file_to_delete_data:
self.write_stdout_log(f"Deleting files that are not present locally and more than {months_to_delete} months since last updated")
s3_to_delete_count = len(file_to_delete_data)
current_s3_obj = 1
for s3_obj in file_to_delete_data:
self.write_stdout_log(f"{current_s3_obj}/{s3_to_delete_count} Deleting: {s3_obj} {self.object_keys[s3_obj]}")
result = self.bucket_resource.object_versions.filter(Prefix=s3_obj).delete()
if result[0]['ResponseMetadata']['HTTPStatusCode'] != 200:
self.write_stdout_log(f"ERROR DELETING FROM S3: {result}")
else:
self.write_stdout_log(f"Success deleting from S3")
self.s3_objects_deleted += 1
current_s3_obj += 1
This loop takes advantage of the self.bucket_resource.object_versions.filter(Prefix=s3_obj).delete()
call. This call makes sure we get all version of this file that have been saved. This check looks from 6 months old before deleting them. Note this does not take into account when the file was actually deleted as to when it should also drop from s3. That would require a local database storage of the information to compare against. Then we could make a true deletion based on how long ago it was deleted from the local drive. But that’s for another day (or someone else can do that 🙂 )
Ive done some small code cleanup and also added a new self.write_stdout_log
function. This handles output to stdout and the log file which reduces repetition in the code (KISS !).
Hope you enjoy your UPDATED S3 backup application ! I generally run it weekly, but run it to the schedule that works best for you.