5 Replies Latest reply on Sep 30, 2016 1:28 PM by LeeTramp

    noindex tags in a PDF

    LeeTramp

      Greetings,

       

      I'm looking for a way to encourage search engines to not index a PDF by placing a 'noindex' or similar tag in the PDF document.

       

      I work with an educational organization who shares copyrighted PDF documents with members who are teachers and want to share the documents with their students on websites. Our document repository only members can view, so there is no problem with security there, but when teachers post these documents to their own websites (against our acceptable use policies), often search engines find them and students can then search and find these documents (including, often, documents with solutions added).

       

      If we can embed a noindex tag in the actual PDF, this should help decrease the number of indexed documents on the web (we're a small organization without the capital to hire someone to search and follow up on violators of our policy).

       

      Does anyone know if this is possible?

       

      Thanks :-)

        • 1. Re: noindex tags in a PDF
          Karl Heinz Kremer Adobe Community Professional

          There is no "no index" tag in PDF - what you need to do is prevent the search engine from indexing the file. The most straight forward method is to use a robots.txt file on your web server and then hope that the search engine's spider program does actually honor the information in that file. In your case, that will not help, because you don't know in advance who is breaking the rules and makes the files available. To prevent content extraction, you can assign a permissions or owner password that prevents content extraction. To do that, open the PDF file is Acrobat, and then bring up the document information dialog (Ctrl-D or Cmd-D or via the menu item in the File menu). Then go to the Security tab and select to add password security. Now make sure that "Enable copying of text, images and other content" is not enabled. This should prevent a well behaved PDF indexer from accessing your content, but if somebody's software is not playing by the rules imposed by the PDF format, there is nothing you can do that would also severely restrict the usefulness of the PDF documents.

          1 person found this helpful
          • 2. Re: noindex tags in a PDF
            LeeTramp Level 1

            Thanks. That sounds like a good option!

             

            Maybe someday they'll add a 'noindex' option in PDFs. It seems like something that should be easy to implement in tags or other meta content that search engines can read.

            • 3. Re: noindex tags in a PDF
              Test Screen Name Most Valuable Participant

              The interesting question is who is the "they" who would do that. It would need any specific changes to PDF to add more metadata, but people would prefer to see something simple in the UI (or a simple tool). But how do you persuade all of the makers of indexing tools that this is a thing they want to do? Each indexing tool would need to invest in it separately. Adobe don't control PDF any more, it is done by ISO, but they can take years to change anything at all.  Anyone could invent a tag, but would it help - would it in fact give a false sense of security?

               

              In fact it's an HTTP tag; each PDF served has HTTP data, outside it. (HTML has it inside and outside). But most web curators don't have the power to set this. Google invented noindex, they would be the people to persuade.

              • 4. Re: noindex tags in a PDF
                Bernd Alheit Adobe Community Professional & MVP

                When a search engine ignores the robots.txt file it will also ignore this tag in a PDF file.

                • 5. Re: noindex tags in a PDF
                  LeeTramp Level 1

                  Bernd,

                   

                  Yes, you are correct. That is why I started the post with "encourage search engines" not "prohibit." It seems the only way truly block a search engine would be adding a password.

                   

                  But I believe most mainstream search engines respect the noindex tag, so if such a tag could be added to PDFs (with ISO approval?), I believe they would also respect it.

                   

                  Perhaps Google should be encouraged to respect a nonidex tag in metadata in a PDF just like they do in HTML.

                   

                  Thanks for all the great comments/suggestions.