ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    OCR documents scanned in to folder

    Scheduled Pinned Locked Moved Unsolved IT Discussion
    19 Posts 5 Posters 836 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mike DavisM
      Mike Davis
      last edited by

      10 or 15 years ago I set up Omnipage to watch a folder and if a .pdf was dropped in, it would OCR it and drop the result (either a OCR .pdf or word document) in to another folder. It's starting to have issues and I wondered what others are doing.

      Does anyone know of a program that can be set to monitor a folder and OCR the documents that drop in there without user interaction?

      The application is that the company has lots of scanners around the office and when they walk up, they just hit the scan template that says "OCR pdf" and hit the green button. They go back to their desk and the output file is sitting on a network share they have mapped.

      1 Reply Last reply Reply Quote 0
      • DashrenderD
        Dashrender
        last edited by

        What's the issue you are having?

        1 Reply Last reply Reply Quote 0
        • Mike DavisM
          Mike Davis
          last edited by

          For some reason, the queue seems to get jammed up. It will stop processing documents. They'll clear them all out and restart the server and then it will some times work again. I haven't looked at it myself to troubleshoot it better.

          1 Reply Last reply Reply Quote 0
          • DashrenderD
            Dashrender
            last edited by

            It's been working for 10-15 years. Why is it suddenly not? A conflict with a system update? I'd definitely look into why it's failing before looking to just replace it.

            1 Reply Last reply Reply Quote 0
            • Mike DavisM
              Mike Davis
              last edited by

              I have to wonder if they upped the resolution of the scan template or something. That's what I would check if they let me check it out before messing with it.

              1 Reply Last reply Reply Quote 0
              • travisdh1T
                travisdh1
                last edited by

                Network scanning has always been a pain in my neck. Wondering how others have automated this.

                On a linux box I'd just be running a small script that would use tesseract-ocr, and have it run from cron every 15 seconds. Something like that anyway.

                DashrenderD 1 Reply Last reply Reply Quote 0
                • DashrenderD
                  Dashrender @travisdh1
                  last edited by

                  @travisdh1 said in OCR documents scanned in to folder:

                  Network scanning has always been a pain in my neck. Wondering how others have automated this.

                  On a linux box I'd just be running a small script that would use tesseract-ocr, and have it run from cron every 15 seconds. Something like that anyway.

                  Well, it's on github, is it even available anymore?

                  travisdh1T scottalanmillerS 2 Replies Last reply Reply Quote 0
                  • travisdh1T
                    travisdh1 @Dashrender
                    last edited by

                    @Dashrender said in OCR documents scanned in to folder:

                    @travisdh1 said in OCR documents scanned in to folder:

                    Network scanning has always been a pain in my neck. Wondering how others have automated this.

                    On a linux box I'd just be running a small script that would use tesseract-ocr, and have it run from cron every 15 seconds. Something like that anyway.

                    Well, it's on github, is it even available anymore?

                    Eich, hopefully. Should be available via repositories if not directly from github anymore.

                    1 Reply Last reply Reply Quote 0
                    • Mike DavisM
                      Mike Davis
                      last edited by

                      For this project I can't really consider linux because I can't really support linux.

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller @Dashrender
                        last edited by

                        @Dashrender said in OCR documents scanned in to folder:

                        @travisdh1 said in OCR documents scanned in to folder:

                        Network scanning has always been a pain in my neck. Wondering how others have automated this.

                        On a linux box I'd just be running a small script that would use tesseract-ocr, and have it run from cron every 15 seconds. Something like that anyway.

                        Well, it's on github, is it even available anymore?

                        I think you are confusing GitLab with GitHub unless you know something that I don't.

                        DashrenderD 1 Reply Last reply Reply Quote 1
                        • DashrenderD
                          Dashrender @scottalanmiller
                          last edited by

                          @scottalanmiller said in OCR documents scanned in to folder:

                          @Dashrender said in OCR documents scanned in to folder:

                          @travisdh1 said in OCR documents scanned in to folder:

                          Network scanning has always been a pain in my neck. Wondering how others have automated this.

                          On a linux box I'd just be running a small script that would use tesseract-ocr, and have it run from cron every 15 seconds. Something like that anyway.

                          Well, it's on github, is it even available anymore?

                          I think you are confusing GitLab with GitHub unless you know something that I don't.

                          Did you really just ask that?

                          scottalanmillerS 1 Reply Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller @Mike Davis
                            last edited by

                            @Mike-Davis said in OCR documents scanned in to folder:

                            For this project I can't really consider linux because I can't really support linux.

                            But isn't it Windows that is unsupportable here? 😉

                            Having to use special software on Windows vs. a basic script on Linux seems like a support nightmare compared to something that should "just work."

                            Mike DavisM 1 Reply Last reply Reply Quote 1
                            • scottalanmillerS
                              scottalanmiller @Dashrender
                              last edited by

                              @Dashrender said in OCR documents scanned in to folder:

                              @scottalanmiller said in OCR documents scanned in to folder:

                              @Dashrender said in OCR documents scanned in to folder:

                              @travisdh1 said in OCR documents scanned in to folder:

                              Network scanning has always been a pain in my neck. Wondering how others have automated this.

                              On a linux box I'd just be running a small script that would use tesseract-ocr, and have it run from cron every 15 seconds. Something like that anyway.

                              Well, it's on github, is it even available anymore?

                              I think you are confusing GitLab with GitHub unless you know something that I don't.

                              Did you really just ask that?

                              So GitHub is fine as usual?

                              1 Reply Last reply Reply Quote 0
                              • Mike DavisM
                                Mike Davis @scottalanmiller
                                last edited by

                                @scottalanmiller said in OCR documents scanned in to folder:

                                Having to use special software on Windows vs. a basic script on Linux seems like a support nightmare compared to something that should "just work."

                                I've never set up a cron job in linux. Their onsite IT has never logged in to linux. I don't think it would be a good idea to put something like that in my customers environment.

                                scottalanmillerS 1 Reply Last reply Reply Quote 0
                                • scottalanmillerS
                                  scottalanmiller @Mike Davis
                                  last edited by

                                  @Mike-Davis said in OCR documents scanned in to folder:

                                  @scottalanmiller said in OCR documents scanned in to folder:

                                  Having to use special software on Windows vs. a basic script on Linux seems like a support nightmare compared to something that should "just work."

                                  I've never set up a cron job in linux. Their onsite IT has never logged in to linux. I don't think it would be a good idea to put something like that in my customers environment.

                                  You've never used whatever new, untested thing you'd use on Windows either. The differences would be:

                                  • One is free, one is costly.
                                  • One is enterprise battle tested, one... who knows.
                                  • One is industry standard and can be supported by anyone, the other... who knows.
                                  • One will keep itself fully updated and patched for a decade or more and can be trivially updated beyond that.

                                  That their onsite IT isn't prepared for simple tasks should not necessarily imply that we don't provide good solutions. It just means that their IT is not prepared to support anything. It is what it is. If having never used Linux is a reason to not consider Linux, then surely that logic applies to getting the a new product to support as well.

                                  In many ways, the logic you use to rule out Linux would also rule it in.

                                  1 Reply Last reply Reply Quote 0
                                  • Mike DavisM
                                    Mike Davis
                                    last edited by

                                    Can you give me an estimate in number of hours to build a linux box and configure that package?

                                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller @Mike Davis
                                      last edited by

                                      @Mike-Davis said in OCR documents scanned in to folder:

                                      Can you give me an estimate in number of hours to build a linux box and configure that package?

                                      I don't know anything about the OCR piece. But time to build a box is normally about five minutes for me. The script, maybe ten to fifteen. The real issues will be time to download the ISO for them and questions about their environment. The Linux and cron pieces are essentially zero effort items. All of the factors that might create effort are the parts we don't know about.

                                      JaredBuschJ 1 Reply Last reply Reply Quote 1
                                      • JaredBuschJ
                                        JaredBusch @scottalanmiller
                                        last edited by JaredBusch

                                        @scottalanmiller said in OCR documents scanned in to folder:

                                        @Mike-Davis said in OCR documents scanned in to folder:

                                        Can you give me an estimate in number of hours to build a linux box and configure that package?

                                        I don't know anything about the OCR piece. But time to build a box is normally about five minutes for me. The script, maybe ten to fifteen. The real issues will be time to download the ISO for them and questions about their environment. The Linux and cron pieces are essentially zero effort items. All of the factors that might create effort are the parts we don't know about.

                                        Hello, real world calling.

                                        Time to build a box != 5 minutes ever. Time for you to spin up a VM from a template and configure the basics, I would accept.

                                        Even assuming that the latest CentOS 7 release ISO was on his client's infrastructure and ready to attach, it would take more time than that to configure the new VM, boot, install, reboot, update, and configure.

                                        scottalanmillerS 1 Reply Last reply Reply Quote 0
                                        • scottalanmillerS
                                          scottalanmiller @JaredBusch
                                          last edited by

                                          @JaredBusch said in OCR documents scanned in to folder:

                                          @scottalanmiller said in OCR documents scanned in to folder:

                                          @Mike-Davis said in OCR documents scanned in to folder:

                                          Can you give me an estimate in number of hours to build a linux box and configure that package?

                                          I don't know anything about the OCR piece. But time to build a box is normally about five minutes for me. The script, maybe ten to fifteen. The real issues will be time to download the ISO for them and questions about their environment. The Linux and cron pieces are essentially zero effort items. All of the factors that might create effort are the parts we don't know about.

                                          Hello, real world calling.

                                          Time to build a box != 5 minutes ever. Time for you to spin up a VM from a template and configure the basics, I would accept.

                                          Even assuming that the latest CentOS 7 release ISO was on his client's infrastructure and ready to attach, it would take more time than that to configure the new VM, boot, install, reboot, update, and configure.

                                          That's why it matters as to the environment. I can build a VM locally, and ship it digitally all ready to go based on ready to go templates. Just need to run the latest updates (two minutes normally) and apply the IP address and hostname. Then time to transfer the file is not in the five minutes, but doesn't take labour time, either.

                                          1 Reply Last reply Reply Quote 1
                                          • 1 / 1
                                          • First post
                                            Last post