ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Solved Cleanup script help

    IT Discussion
    scripting backup
    3
    7
    401
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • JaredBuschJ
      JaredBusch
      last edited by

      I currently have a FTP server at a site that is the backup target of some bespoke software.

      Currently, it simply deletes files via a daily cron that deletes files a mtime parameter.

      [root@ftp ~]# crontab -l
      #Delete all files older than 30 days. Check daily beginning at 06:00
      0 6 * * * find /home/tt/h* -mtime +10 -type f -delete
      1 6 * * * find /home/tt/nc* -mtime +10 -type f -delete
      2 6 * * * find /home/tt/nlr* -mtime +10 -type f -delete
      3 6 * * * find /home/tt/s* -mtime +10 -type f -delete
      4 6 * * * find /home/tt/th* -mtime +10 -type f -delete
      

      I need to improve this now to actually be a smart delete and ensure that there are 4 full backups. Each backup is 4 files as pictured here:

      38150b38-094b-49ef-a61a-82cdc4dc7353-image.png

      Any recommendations for a starting point?

      1 Reply Last reply Reply Quote 1
      • JaredBuschJ
        JaredBusch
        last edited by JaredBusch

        ok this is what I came up with.

        #!/bin/bash
        # Send everything to logs and screen.
        exec 1> >(logger -s -t $(basename $0)) 2>&1
        
        # Variables and descriptions of their use.
        # Array of dates found in the filename of the backup files.
        arrDates=()
        # Number of full backup sets to keep.
        keep=4
        # How many full backup sets have been found.
        found=0
        # Bas path to the backup files, minus the last folder.
        base="/home/jbusch/"
        # Full path to the backup files, populated by the script.
        path=""
        
        # This script requires that the final folder name be passed as a paramter.
        # This is because it is designed to be ran independently for each subfolder.
        # ex: ./file_cleanup.sh Hartford
        # ex: ./file_cleanup.sh Seymour
        
        #check for the path to be passed
        if [ ! -z "$1" ]
        then
            # Create the full path to be checked based on the passed parameter.
            path=$base$1
        else
            exit 127
        fi
        
        printf "Executing cleanup of backup files located in $path.\n"
        
        # Loop through all of the files in the path and parse out an array of the file dates from the file names.
        # All backups are named `backup-0000001-YYYYMMDD-XXXX*`.
        cd $path
        for f in backup-*
        do
            # The date is from character 15 for 8 characters.
            arrDates=("${arrDates[@]}" "${f:15:8}")
        done
        cd ~
        
        # Sort in reverse order and only show unique dates.
        arrDates=($(printf '%s\n' "${arrDates[@]}" | sort -ru))
        
        # Loop through the array of dates and check for there to be 4 files for each date.
        for checkdate in "${arrDates[@]}"
        do
            count=$(find "$path"/backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c)
            if [ $count -eq 4 ] && [ $found -lt $keep ]
            then
                found=$((found+1))
                printf "Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.\n"
            elif [ $count -gt 0 ] && [ ! $count -eq 4 ]
            then
                printf "Incorrect number of files '('$count')' found, removing invalid backup dated $checkdate.\n"
                rm $path/backup-*-$checkdate-*
            elif [ $count -gt 0 ] && [ $found -eq $keep ]
            then
                printf "We have already found $keep full sets of backup files. Removing backup files dated $checkdate.\n"
                rm $path/backup-*-$checkdate-*
            else
                printf "The date $checkdate returned $count files. This is an unhandled scenario, doing nothing.\n"
            fi
        done
        

        output look like this

        [jbusch@dt-jared FTPTest]$ ./file_cleanup.sh FTPTest
        <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200107, we found 4 files. We are keeping this date, currently we have 1 dates saved.
        <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200105, we found 4 files. We are keeping this date, currently we have 2 dates saved.
        <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200104, we found 4 files. We are keeping this date, currently we have 3 dates saved.
        <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200103, we found 4 files. We are keeping this date, currently we have 4 dates saved.
        <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20200102.
        <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191230.
        <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191228.
        
        1 Reply Last reply Reply Quote 2
        • J
          JasGot
          last edited by

          Not my code (but I do use a variation of it), and maybe not your solution; but I hope this gets you going in the direction you need to be.

          #  A "safe" function for removing backups older than REMOVE_AGE + 1 day(s), always keeping at least the ALWAYS_KEEP youngest
          remove_old_backups() {
              local file_prefix="${backup_file_prefix:-$1}"
              local temp=$(( REMOVE_AGE+1 ))  # for inverting the mtime argument: it's quirky ;)
              # We consider backups made on the same day to be one (commonly these are temporary backups in manual intervention scenarios)
              local keeping_n=`/usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime -"$temp" -printf '%Td-%Tm-%TY\n' | sort -d | uniq | wc -l`
              local extra_keep=$(( $ALWAYS_KEEP-$keeping_n ))
          
              /usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime +$REMOVE_AGE -printf '%T@ %p\n' |  sort -n | head -n -$extra_keep | cut -d ' ' -f2 | xargs -r rm
          }
          

          It takes a backup_file_prefix env variable or it can be passed as the first argument and expects environment variables ALWAYS_KEEP (minimum number of files to keep) and REMOVE_AGE (num days to pass to -mtime). It expects a gz or tgz extension. There are a few other assumptions as you can see in the comments, mostly in the name of safety.

          Credit this post: https://stackoverflow.com/questions/20358865/remove-all-files-older-than-x-days-but-keep-at-least-the-y-youngest/52230709#52230709

          Good luck! And be sure to post your solution.

          1 Reply Last reply Reply Quote 1
          • JaredBuschJ
            JaredBusch
            last edited by

            This is what I have at the moment.

            stick it in a directory and run it.

            I think I would be better served to loop on the known dates. but I would have to figure out how to parse them out of ls or find

            #!/bin/bash
            
            # create test files
            rm backup-*
            touch backup-0000001-20191228-1182-critical-data.tar.gz
            touch backup-0000001-20191228-1182.log
            touch backup-0000001-20191228-1182-mysqldump.sql.gz
            #touch backup-0000001-20191228-1182-toptech-software.tar.gz
            touch backup-0000001-20191229-1183-critical-data.tar.gz
            touch backup-0000001-20191229-1183.log
            touch backup-0000001-20191229-1183-mysqldump.sql.gz
            touch backup-0000001-20191229-1183-toptech-software.tar.gz
            touch backup-0000001-20191230-1184-critical-data.tar.gz
            touch backup-0000001-20191230-1184.log
            touch backup-0000001-20191230-1184-mysqldump.sql.gz
            touch backup-0000001-20191230-1184-toptech-software.tar.gz
            touch backup-0000001-20191231-1185-critical-data.tar.gz
            touch backup-0000001-20191231-1185.log
            touch backup-0000001-20191231-1185-mysqldump.sql.gz
            touch backup-0000001-20191231-1185-toptech-software.tar.gz
            touch backup-0000001-20200101-1186-critical-data.tar.gz
            touch backup-0000001-20200101-1186.log
            touch backup-0000001-20200101-1186-mysqldump.sql.gz
            touch backup-0000001-20200101-1186-toptech-software.tar.gz
            touch backup-0000001-20200102-1187-critical-data.tar.gz
            touch backup-0000001-20200102-1187.log
            touch backup-0000001-20200102-1187-mysqldump.sql.gz
            touch backup-0000001-20200102-1187-toptech-software.tar.gz
            touch backup-0000001-20200103-1188-critical-data.tar.gz
            touch backup-0000001-20200103-1188.log
            #touch backup-0000001-20200103-1188-mysqldump.sql.gz
            touch backup-0000001-20200103-1188-toptech-software.tar.gz
            touch backup-0000001-20200104-1189-critical-data.tar.gz
            touch backup-0000001-20200104-1189.log
            touch backup-0000001-20200104-1189-mysqldump.sql.gz
            touch backup-0000001-20200104-1189-toptech-software.tar.gz
            touch backup-0000001-20200105-1190-critical-data.tar.gz
            touch backup-0000001-20200105-1190.log
            touch backup-0000001-20200105-1190-mysqldump.sql.gz
            touch backup-0000001-20200105-1190-toptech-software.tar.gz
            #touch backup-0000001-20200106-1191-critical-data.tar.gz
            touch backup-0000001-20200106-1191.log
            touch backup-0000001-20200106-1191-mysqldump.sql.gz
            touch backup-0000001-20200106-1191-toptech-software.tar.gz
            touch backup-0000001-20200107-1192-critical-data.tar.gz
            touch backup-0000001-20200107-1192.log
            touch backup-0000001-20200107-1192-mysqldump.sql.gz
            touch backup-0000001-20200107-1192-toptech-software.tar.gz
            
            
            
            keep=4
            found=0
            for i in {0..13}
            do
                checkdate=$(date --date="-$i days" +"%Y%m%d")
                count=$(find backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c)
                if [ $count -eq 4 ] && [ $found -lt $keep ]
                then
                    found=$((found+1))
                    echo Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.
                elif [ $count -gt 0 ] && [ ! $count -eq 4 ]
                then
                    echo Incorrect number of files '('$count')' found, removing invalid backup
                elif [ $count -gt 0 ] && [ $found -eq $keep ]
                then
                    echo We have already found $keep full sets of backup files. Removing backup files dated $checkdate.
                else
                    echo The date $checkdate returned $count files. 
                fi
            done
            
            DustinB3403D 1 Reply Last reply Reply Quote 0
            • DustinB3403D
              DustinB3403 @JaredBusch
              last edited by

              @JaredBusch Could you use the stat command?

              stat -c "%y" /path/*

              JaredBuschJ 1 Reply Last reply Reply Quote 0
              • JaredBuschJ
                JaredBusch @DustinB3403
                last edited by

                @DustinB3403 said in Cleanup script help:

                @JaredBusch Could you use the stat command?

                stat -c "%y" /path/*

                I don't know that I can 100% trust the file date to match the date in the filename.

                1 Reply Last reply Reply Quote 0
                • JaredBuschJ
                  JaredBusch
                  last edited by

                  ok this gets me just the date bit. now to get it into an array of unique only.

                  for f in backup-*
                  do
                      echo ${f:15:8}
                  done
                  
                  1 Reply Last reply Reply Quote 0
                  • JaredBuschJ
                    JaredBusch
                    last edited by JaredBusch

                    ok this is what I came up with.

                    #!/bin/bash
                    # Send everything to logs and screen.
                    exec 1> >(logger -s -t $(basename $0)) 2>&1
                    
                    # Variables and descriptions of their use.
                    # Array of dates found in the filename of the backup files.
                    arrDates=()
                    # Number of full backup sets to keep.
                    keep=4
                    # How many full backup sets have been found.
                    found=0
                    # Bas path to the backup files, minus the last folder.
                    base="/home/jbusch/"
                    # Full path to the backup files, populated by the script.
                    path=""
                    
                    # This script requires that the final folder name be passed as a paramter.
                    # This is because it is designed to be ran independently for each subfolder.
                    # ex: ./file_cleanup.sh Hartford
                    # ex: ./file_cleanup.sh Seymour
                    
                    #check for the path to be passed
                    if [ ! -z "$1" ]
                    then
                        # Create the full path to be checked based on the passed parameter.
                        path=$base$1
                    else
                        exit 127
                    fi
                    
                    printf "Executing cleanup of backup files located in $path.\n"
                    
                    # Loop through all of the files in the path and parse out an array of the file dates from the file names.
                    # All backups are named `backup-0000001-YYYYMMDD-XXXX*`.
                    cd $path
                    for f in backup-*
                    do
                        # The date is from character 15 for 8 characters.
                        arrDates=("${arrDates[@]}" "${f:15:8}")
                    done
                    cd ~
                    
                    # Sort in reverse order and only show unique dates.
                    arrDates=($(printf '%s\n' "${arrDates[@]}" | sort -ru))
                    
                    # Loop through the array of dates and check for there to be 4 files for each date.
                    for checkdate in "${arrDates[@]}"
                    do
                        count=$(find "$path"/backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c)
                        if [ $count -eq 4 ] && [ $found -lt $keep ]
                        then
                            found=$((found+1))
                            printf "Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.\n"
                        elif [ $count -gt 0 ] && [ ! $count -eq 4 ]
                        then
                            printf "Incorrect number of files '('$count')' found, removing invalid backup dated $checkdate.\n"
                            rm $path/backup-*-$checkdate-*
                        elif [ $count -gt 0 ] && [ $found -eq $keep ]
                        then
                            printf "We have already found $keep full sets of backup files. Removing backup files dated $checkdate.\n"
                            rm $path/backup-*-$checkdate-*
                        else
                            printf "The date $checkdate returned $count files. This is an unhandled scenario, doing nothing.\n"
                        fi
                    done
                    

                    output look like this

                    [jbusch@dt-jared FTPTest]$ ./file_cleanup.sh FTPTest
                    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200107, we found 4 files. We are keeping this date, currently we have 1 dates saved.
                    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200105, we found 4 files. We are keeping this date, currently we have 2 dates saved.
                    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200104, we found 4 files. We are keeping this date, currently we have 3 dates saved.
                    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200103, we found 4 files. We are keeping this date, currently we have 4 dates saved.
                    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20200102.
                    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191230.
                    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191228.
                    
                    1 Reply Last reply Reply Quote 2
                    • 1 / 1
                    • First post
                      Last post