ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Checking multiple Directories to confirm all files are identical

    IT Discussion
    windows comparison file management powershell
    9
    30
    2.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DustinB3403D
      DustinB3403
      last edited by

      In a windows environment if you wanted to check multiple network directories, with millions of files ranging in sizes from tiny (a few KB) to large 4GB+ how would you do it.

      Ideally I'd like to compare them all at once, but setting the "golden standard" here may be difficult.

      I know I could use a tool like Create-Synchronicity to force 1 other directory to match the source, but I would prefer to find and list the differences in the directories.

      Maybe powershell can help? I may even have something written to do this, but I'm drawing a blank.

      ObsolesceO 3 Replies Last reply Reply Quote 0
      • DustinB3403D
        DustinB3403
        last edited by DustinB3403

        Of course powershell can do this, but it essentially acts as a memory leak attempting to process this job...

        $fso = Get-ChildItem -Recurse -path C:\something
        $fsoREMOTE = Get-ChildItem -Recurse -path \\remote\something
        Compare-Object -ReferenceObject $fso -DifferenceObject $fsoREMOTE
        
        1 Reply Last reply Reply Quote 0
        • F
          flaxking
          last edited by

          I would think you would be able to use robocopy to do a diff

          DustinB3403D gjacobseG 2 Replies Last reply Reply Quote 0
          • DustinB3403D
            DustinB3403 @flaxking
            last edited by

            @flaxking said in Checking multiple Directories to confirm all files are identical:

            I would think you would be able to use robocopy to do a diff

            Probably, but the issue still comes down to system resources.

            Anything that is storing in memory will quickly consume the available resources.

            Maybe if I pipe the about to a file it won't be so bad..

            F dafyreD 2 Replies Last reply Reply Quote 1
            • EddieJenningsE
              EddieJennings
              last edited by

              To make sure I'm understanding what you want to do:

              Let's say you have dir1 with files a,b, and c and dir2 with files d,e, and f. You're wanting to do the following check for duplicates.
              Is a a duplicate of b,c,d,e, and f?
              Is b a duplicate of c, d, e, and f?
              and so on, correct?

              DustinB3403D 1 Reply Last reply Reply Quote 0
              • gjacobseG
                gjacobse @flaxking
                last edited by

                @flaxking said in Checking multiple Directories to confirm all files are identical:

                I would think you would be able to use robocopy to do a diff

                RoboCopy has the /MIR function - but that could be the incorrect flag.

                1 Reply Last reply Reply Quote 0
                • DustinB3403D
                  DustinB3403 @EddieJennings
                  last edited by

                  @eddiejennings said in Checking multiple Directories to confirm all files are identical:

                  To make sure I'm understanding what you want to do:

                  Let's say you have dir1 with files a,b, and c and dir2 with files d,e, and f. You're wanting to do the following check for duplicates.
                  Is a a duplicate of b,c,d,e, and f?
                  Is b a duplicate of c, d, e, and f?
                  and so on, correct?

                  Yes and no, I want to make sure that dir2 is an exact copy of dir1 (and lastly compare dir3 to dir1 and dir2).

                  EddieJenningsE 1 Reply Last reply Reply Quote 0
                  • DustinB3403D
                    DustinB3403
                    last edited by

                    Also all of these directories dir2 and dir3 are on remote servers, so I'd have to do this over UNC Share.

                    D:\dir1
                    \srv2\dir2
                    \srv3\dir3

                    1 Reply Last reply Reply Quote 0
                    • DustinB3403D
                      DustinB3403
                      last edited by

                      While I'm almost positive the powershell above would work, I suspect it would only work on much smaller directories.

                      Each directory that I'm trying to compare is over 10TB in capacity

                      1 Reply Last reply Reply Quote 0
                      • EddieJenningsE
                        EddieJennings @DustinB3403
                        last edited by

                        @dustinb3403 said in Checking multiple Directories to confirm all files are identical:

                        @eddiejennings said in Checking multiple Directories to confirm all files are identical:

                        To make sure I'm understanding what you want to do:

                        Let's say you have dir1 with files a,b, and c and dir2 with files d,e, and f. You're wanting to do the following check for duplicates.
                        Is a a duplicate of b,c,d,e, and f?
                        Is b a duplicate of c, d, e, and f?
                        and so on, correct?

                        Yes and no, I want to make sure that dir2 is an exact copy of dir1 (and lastly compare dir3 to dir1 and dir2).

                        The doing part can be eaily done with robocopy /MIR. Of course, it'll take a while given the number of files. The reporting part is the challenge. You might want to look into using Get-FileHash. That's how I typically compare files, but I've never done a comparison at that scale before.

                        DustinB3403D 1 Reply Last reply Reply Quote 2
                        • DustinB3403D
                          DustinB3403 @EddieJennings
                          last edited by

                          @eddiejennings Yeah I was thinking of the same solution as well, my trouble is how would I get the system to not try and store everything to memory first and then write to file. . . .

                          Some of these customer requests are insane...

                          1 1 Reply Last reply Reply Quote 0
                          • F
                            flaxking @DustinB3403
                            last edited by

                            @dustinb3403 said in Checking multiple Directories to confirm all files are identical:

                            @flaxking said in Checking multiple Directories to confirm all files are identical:

                            I would think you would be able to use robocopy to do a diff

                            Probably, but the issue still comes down to system resources.

                            Anything that is storing in memory will quickly consume the available resources.

                            Maybe if I pipe the about to a file it won't be so bad..

                            It's bound to be a lot more efficient than your powershell.

                            DustinB3403D 1 Reply Last reply Reply Quote 0
                            • 1
                              1337 @DustinB3403
                              last edited by 1337

                              @dustinb3403 said in Checking multiple Directories to confirm all files are identical:

                              @eddiejennings Yeah I was thinking of the same solution as well, my trouble is how would I get the system to not try and store everything to memory first and then write to file. . . .

                              Some of these customer requests are insane...

                              If you do the equivalent of md5sum with subdirectories you will get md5 sums of all files. A diff with produce the different files.
                              File size or directory size will not matter at all for this operation.

                              Get-FileHash seems to output multiple lines per file which is not good for this.

                              If you don't need hash to compare and just wanted to check filenames, file sizes and dates, maybe you should just do a directory listing for each tree and compare them. That would be very fast.

                              You could get dir to provide a one-file-per-line output, with the proper options.

                              DustinB3403D 1 Reply Last reply Reply Quote 2
                              • DustinB3403D
                                DustinB3403 @1337
                                last edited by DustinB3403

                                @pete-s said in Checking multiple Directories to confirm all files are identical:

                                If you don't need hash to compare and just wanted to check filenames, file sizes and dates, maybe you should just do a directory listing for each tree and compare them. That would be very fast.

                                While I don't need the hash's of the files I was hoping to get some automated way of saying these files aren't*** in dir#.

                                But I can't for the life of me think of a good way to do that without eating up all of the ram in the world....

                                1 Reply Last reply Reply Quote 0
                                • DustinB3403D
                                  DustinB3403 @flaxking
                                  last edited by

                                  @flaxking said in Checking multiple Directories to confirm all files are identical:

                                  @dustinb3403 said in Checking multiple Directories to confirm all files are identical:

                                  @flaxking said in Checking multiple Directories to confirm all files are identical:

                                  I would think you would be able to use robocopy to do a diff

                                  Probably, but the issue still comes down to system resources.

                                  Anything that is storing in memory will quickly consume the available resources.

                                  Maybe if I pipe the about to a file it won't be so bad..

                                  It's bound to be a lot more efficient than your powershell.

                                  It's still going to consume more ram than any host in the environment has to process the job. Just between any 2 directories there's over 20 million files.

                                  F 1 Reply Last reply Reply Quote 0
                                  • DanpD
                                    Danp
                                    last edited by

                                    I know WinMerge has a folder comparison feature, but not sure it can handle your file count.

                                    DustinB3403D 1 Reply Last reply Reply Quote 1
                                    • DustinB3403D
                                      DustinB3403 @Danp
                                      last edited by

                                      @danp said in Checking multiple Directories to confirm all files are identical:

                                      I know WinMerge has a folder comparison feature, but not sure it can handle your file count.

                                      It might be worth a try, I hadn't thought of it.

                                      1 Reply Last reply Reply Quote 0
                                      • dafyreD
                                        dafyre @DustinB3403
                                        last edited by

                                        @dustinb3403 That's what I was thinking.

                                        You'll still be in the shape of how do you compare two stupidly large files, though.

                                        DustinB3403D 1 Reply Last reply Reply Quote 0
                                        • DustinB3403D
                                          DustinB3403 @dafyre
                                          last edited by

                                          @dafyre said in Checking multiple Directories to confirm all files are identical:

                                          @dustinb3403 That's what I was thinking.

                                          You'll still be in the shape of how do you compare two stupidly large files, though.

                                          Yeah, while that is certainly a part of the challenge, the larger portion is just checking to see if the bulk is all aligned and matching...

                                          If any tooling had some way to "skip large files" and just jot down their names then a simple stare and compare might work in that case.

                                          dafyreD 1 Reply Last reply Reply Quote 0
                                          • F
                                            flaxking @DustinB3403
                                            last edited by

                                            @dustinb3403 said in Checking multiple Directories to confirm all files are identical:

                                            @flaxking said in Checking multiple Directories to confirm all files are identical:

                                            @dustinb3403 said in Checking multiple Directories to confirm all files are identical:

                                            @flaxking said in Checking multiple Directories to confirm all files are identical:

                                            I would think you would be able to use robocopy to do a diff

                                            Probably, but the issue still comes down to system resources.

                                            Anything that is storing in memory will quickly consume the available resources.

                                            Maybe if I pipe the about to a file it won't be so bad..

                                            It's bound to be a lot more efficient than your powershell.

                                            It's still going to consume more ram than any host in the environment has to process the job. Just between any 2 directories there's over 20 million files.

                                            I don't know how it's implemented, so I can't say. Just create a new powershell script that doesn't store as much in memory. I think if you pipe to ForEach-Object it actually starts operating before the get-childitem gets all the objects and then don't store those objects in a variable. So maybe it will start garbage collection before you are done

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 1 / 2
                                            • First post
                                              Last post