ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Massive Searchable Document/File Repository

    IT Discussion
    4
    11
    1.1k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      hubtechagain
      last edited by

      The short of it is this. I have 30ish TB of data. Lots of it is irrelevant. But What i'm needing is essentially to be able to search within the text/office/pdf/files. Anything that is textual I need to be able to search (globally).

      Trying Nextcloud, but i'm not intelligent enough to make it work. Anyone have experience with such an endeavor ?

      1 Reply Last reply Reply Quote 0
      • H
        hubtechagain
        last edited by

        Also, lots of PDFs here. so will have to OCR as well. If we have a Linux Guru on here that has some time to help out i'm sure Nextcloud can do it. Im just not able.

        1 Reply Last reply Reply Quote 0
        • H
          hubtechagain
          last edited by

          https://nextcloud.com/industries/legal/
          This is essentially what i'd love to happen 🙂

          1 Reply Last reply Reply Quote 0
          • H
            hubtechagain
            last edited by

            Is Windows Server File Indexing an option? If so, what about OCR en masse?

            M 1 Reply Last reply Reply Quote 0
            • notverypunnyN
              notverypunny
              last edited by

              I can't say for sure, but I think what you're looking for surpasses nextcloud's feature set.

              With 30 TB you're getting into serious DM space. I'm not sure if it'll suit your needs, but Alfresco might be worth a look.

              https://github.com/loftuxab/alfresco-ubuntu-install

              https://orderofthebee.net/

              https://hub.alfresco.com/

              H 2 Replies Last reply Reply Quote 0
              • M
                marcinozga @hubtechagain
                last edited by

                @hubtechagain said in Massive Searchable Document/File Repository:

                Is Windows Server File Indexing an option? If so, what about OCR en masse?

                Yes, windows file indexing will index all the document types you mentioned, except pdfs. You need Adobe pdf ifilter installed (free) to index text pdf files. If your pdf documents are scanned images, then you'd need Adobe Acrobat to OCR en masse, but after that pdfs will be index-able.

                1 Reply Last reply Reply Quote 1
                • H
                  hubtechagain @notverypunny
                  last edited by

                  @notverypunny well not near all of it is actual text. it's a lot of computer images etc. Really i just need the OCR Full Text Search to work so we can dig through the readable data.

                  1 Reply Last reply Reply Quote 0
                  • H
                    hubtechagain @notverypunny
                    last edited by

                    @notverypunny what's your experience with Alfresco?

                    notverypunnyN 1 Reply Last reply Reply Quote 0
                    • notverypunnyN
                      notverypunny @hubtechagain
                      last edited by

                      @hubtechagain Minimal to be honest. I'd looked at it a couple of times for replacing file servers but the combination of intertia and overall lack of buy-in from the stakeholders meant that I never really got past the testing / demo / proof of concept phase.

                      1 Reply Last reply Reply Quote 0
                      • M
                        marcinozga
                        last edited by

                        Actually, MayanEDMS might be what you're looking for. It does OCR and indexing. I have a running instance, but I haven't used it at all yet.

                        wrx7mW 1 Reply Last reply Reply Quote 1
                        • wrx7mW
                          wrx7m @marcinozga
                          last edited by

                          @marcinozga said in Massive Searchable Document/File Repository:

                          Actually, MayanEDMS might be what you're looking for. It does OCR and indexing. I have a running instance, but I haven't used it at all yet.

                          This looks interesting. I wonder how well it can catalog other digital assets (images, video, etc)

                          1 Reply Last reply Reply Quote 0
                          • 1 / 1
                          • First post
                            Last post