I have a text catalog created with ID CS6. Sections were PDF documents given to me from our engineering department.
Can you search those PDF documents from your engineering department? I.e. before inclusion in the book? Were those PDFs placed into InDesign? How were they generated?
If you can search those PDfs before book inclusion, it sounds like you may have a "refried PDF" encoding problem. That process - taking a PDF and re-PDF-ing it - can generate problems, and if Dov shows up in this thread he'd advise against it, I bet. I would wonder how the fonts were embedded by your engineers, and if those fonts are the same fonts that are being re-encoded in a non-searchable way. I'm guessing that you have some Identity-H in your engineer's files (open PDF in Acrobat -> Properties -> Fonts tab -> Encoding) that is colliding with the re-encoding of an identically-named font in the parts of your book that are not your refried PDFs.
That's just a guess, though. More details about the engineering PDFs will help us figure out what is going on with a bit more certainty.
Yes, I can search the PDF documents from the engineering department with no issue prior to inclusion into the book. I did place the PDF pages into ID and then they were generated as an interactive PDF.
Previous years, the PDFs generated by engineering were done via Crystal Reports. In reviewing the current files, they show:
PDF Producer: PrimoPDF
PDF Version: 1.3 (Acrobat 4.x)
The fonts show:
Arial (Embedded Subset), Type: TrueType, Encoding: Custom
Arial, Bold (Embedded Subset), Type: TrueType, Encoding: Custom
The previous years file show:
PDF Producer: Powered by Crystal
PDF Version: 1.5 (Acrobat 6.x)
The fonts show:
Arial (Embedded Subset), Type: TrueType, Encoding: Built-in
Arial, Bold (Embedded Subset), Type: TrueType, Encoding: Built-in
Huh. Well, it seems like my hunch was incorrect. It might be an encoding collision, but I can't say. I wonder what it is about PrimoPDF output that doesn't work this time around. I'm just guessing here, you may have already tried all of this, but - what happens if you export as a print PDF? Is it then searchable? How about if you just place one PrimoPDF-generated PDF into ID, then just export that? Is that searchable? Because there are a variety of things changing here from your previous functional searchable PDFs:
CS4 -> CS6
Crystal Reports -> PrimoPDF
Print -> Interactive (because Interactive PDF was unavailable in CS4 IIRC)
and if I were in your shoes I'd be playing with those variables trying to figure out which one, or which combination, was creating the problem. Unfortunately, you're dealing with circumstances I've not experienced, so I have no hard&fast answers for you.
I've tried many, many combinations and none of them work. I just tried creating a new ID document, placed a page from the PDF (PrimoPDF generated) and exported as both print and interactive. Neither allowed me to search. When I copied the text from the PDF into Note Pad, I get small boxes in place of letters.
Performing the same test with a PDF file from Crystal in both print and interactive, the file is searchable and the text can be copied and pasted properly into Note Pad.
In looking at the Primo PDF manual, the PDFs need to be created using the Prepress profile in order for fonts to be embedded. That might be the issue with the fonts.
As for the searchability, no idea. Might be nice to have one of the offending PDFs. Even a dummy document created with Primo if the information is sensitive.
Did you try printing the PrimoPDF PDFs to PostScript and distill to PDF?
Maybe that will change the game…
You could do that from Acrobat Pro. Or place the PDFs in InDesign and print from there*
*to do that you will need the "ADPDF9.PPD" in a fresh new folder named "PPDs" in the "Presets" folder of your InDesign CS6 application folder (at least on Mac OSX).
First off, I wish to espress my gratitude to all for your responses.
We've found the best solution to fix this issue is to purchase Acrobat for the engineers. We tested it and the problem was fixed.
You guys rock!
1. LAUNCH Adobe Acrobat
2. Using the File menu, OPEN the corrupted document. (I don't know what to do if you're not even able to open the file. Sorry!)
3. (VERIFY that Acrobat can't search the document, in case you haven't done so, just to avoid unnecessary work.)
4. EXPORT: Once the document is open, use the File menu again, and choose EXPORT / IMAGE / PNG. Your corrupted pdf will be saved as a series of images with the file extension ".png", one for each page of the pdf document. Don't worry, they will be numbered automatically by Acrobat, and they aren't terribly large. My document was 200 pages long, so I got 200 little image files in .png format. The export may take a couple of minutes. You won't get any further signals from Adobe to tell you it's done--just go look in the directory that contains the original and see if it made png files with names like:
5. COLLECT: Once you have the image files, collect them all by cut and paste into their own directory.
6. OCR: Under the Document menu, choose OCR TEXT RECOGNITION / RECOGNIZE TEXT IN MULTIPLE FILES USING OCR
7. ADD FILES: You will be shown a dialog box with the title "Paper Capture Multiple Files" with the subtitle "Run OCR on a set of images. There is a button that says "Add Files". Click this button, choose ADD FOLDERS, and browse to the folder that contains your png files. Highlight that file, click OK. The files will then appear within this dialog box. Make sure that the files are in the proper order, or you will be sad. Click OK.
8. CHOOSE OUTPUT OPTIONS: Now you will get a dialog box entitled "Output Options". You have several choices to make here:
TARGET FOLDER: Click "Specific Folder", then Browse to your folder full of images, click "Make New Folder", name the folder (something like "CHEMISTRYBOOKIMAGEFILES" so you can find it easily and know what is in it, click OK.
FILE NAMING: Click "Keep Original File Names". This will preserve Acrobat's automatic numbering of your files--you will need that to get the page ordering right! UNcheck "Overwrite existing files" just to avoid a terrible mishap, unless you are very pressed for disc space or unless this is your 5th time attempting to follow these instructions and you've already got 'way too many duplicates of the output files. If you have the disc space, just make new empty folder for your 6th try.
OUTPUT FORMAT: Select "Save File(s) as Adobe PDF. Click OK.
Now wait for Adobe to execute optical character recognition on the image files. Its output will be one little pdf file for each little image file that it OCR's.
9. COLLATE THE FILES INTO ONE: Under the File menu, select COMBINE / MERGE FILES INTO A SINGLE PDF. This step is optional; maybe you wanted a bunch of little files, or maybe you wanted to divide your enormous original document into 2 or 3 more manageable documents. To divide the file, just make a separate directory for the png files you want in each smaller final document, and repeat steps 6 through 9 for each directory. BE CAREFUL WITH NAMING! Make sure you choose a unique name, because if you got something wrong, you will want to be able to go back to your original corrupted pdf and try again. If your original is named "CHEMISTRY.PDF", please remember to name this new file something like "CHEMISTRY-FIXED.PDF".
If you really despise pdf, you can try using different output formats in Step 8. I do hate pdf, but I chose pdf for two reasons: one is that I had more confidence that that pdf would retain important features like charts and graphs and labeled photos in my document. The other is that I was so so so so SO tired of doing all this pdf crap instead of the chemistry work that I'd gotten the ebook to help me with that I didn't want to do anything fancy with file formats at this point. Let me know if you try output to rtf or ascii and get good results.
10. TEST: Open the merged document(s) in all of the pdf readers and web browsers you will want to use with it, and try searching with it. Use your file browser and try to search for text in the directory with a word you know the file contains. Searchable? Good job, you're done, cheers!
Not searchable? Oh noes! Check that you opened the correct document (maybe you opened the original by mistake). Try the entire process again. If that doesn't help, try the entire process again, but output to plain text this time. My apologies, but, being a complete newbie myself, I have no further advice on this topic.
NB! My output PDF is of rather low quality. It looks like it was literally scanned from a 10th-iteration paper copy. Don't know how to fix that, after the fact or somewhere in the above process. It's good enough, so I'm just dealing with the shakey blurriness. I seem to remember somewhere that I could choose a high quality output, but, again, I didn't want to do anything fancy with vectors and rostering and layers and other terms that I don't know before I verified that I could do something basic and get back to chemistry asap.
We have had similar issues to the one you've described and would be interested to hear what your final solution was. In our instance, exporting the InDesign file to PDF resulted in sections of unsearchable text that, when copied, pasted as peculiar box characters. My suspicion was that there's an issue with encoding the kerning of these sections as these were the only parts of the document that had notably variable attributes in the original InDesign file.
That said, what did you find to be the best solution for our mutual problem?