6 Replies Latest reply on Jun 30, 2014 7:40 AM by kkelleherdesign

    Layout in Arabic, Russian and Chinese. Exporting text from a PDF

    kkelleherdesign

      I am laying out long documents in Arabic, Russian and Chinese. The text has been provided as a PDF when I copy and paste this into Indesign it comes up as boxes question marks and other characters having nothing to do with the text I am trying to layout.  I have set the typeface to the Myriad Arabic and the Arabic dictionary still nothing resembling Arabic or any language for that matter. Same with Chinese and Russian. Any suggestions on how to get the text in from the PDF where it is the actual language. Appreciate any help with this.  Thank you.

        • 1. Re: Layout in Arabic, Russian and Chinese. Exporting text from a PDF
          Ellis home Level 4

          While we wait for the in-house expert on foreign languages (Joel Cherney) to shed some light on this, can you tell us what application was used to create the PDF (File/Properties/Description/Application) and what fonts were used (File/Properties/Fonts).

          • 2. Re: Layout in Arabic, Russian and Chinese. Exporting text from a PDF
            kkelleherdesign Level 1

            Thank you any help would be appreciated

             

            Arabic version:

            application: adobe acrobat 8.1 combine files

            PDF producer: Acrobat distiller 3.0 for windows

            PDF version: 1.6 (Acrobat 7.x)

             

            fonts: WPNaskh13 and 14

            and times roman

             

            Chinese version:

            application: adobe acrobat 8.1 combine files

            PDF producer: ABR Tiff2PDF converter V3

            PDF version: 1.6 (Acrobat 7.x)

            fonts: comes up as none looks like it was a tiff file they converted... not a good sign

             

            Russian version:

            application: PScript5.dll Version 5.2.2

            PDF producer: Acrobat distiller 8.1 for windows

            PDF version: 1.4 (Acrobat 5.x)

             

            fonts: WP-CyrillicA

            WPTypographic symbols

            and courierNewPSMT

             

            these are only the first 3 of 45 in these languages. 

            • 3. Re: Layout in Arabic, Russian and Chinese. Exporting text from a PDF
              Joel Cherney Adobe Community Professional & MVP

              Thanks for the callout, Ellis

               

              Soooo, KK: you are in for a world of hurt. The intials "WP" at the beginning of these fonts means that the text came out of WordPerfect. Doing multilingual layouts in WP was annoying, but possible. It was developed in the pre-Unicode world where every single method of complex-script layout was a dirty hack. If you like knowing All of the Nerdy Dirty Details, I can tell you how it worked, but suffice it to say that trying to harvest non-Latin-script text from WP and repurpose it for use in InDesign is just pure pain. The WordPerfect-specific codepages were never really supported anywhere outside of WP.

               

              That being said, I have a script laying around somewhere for conversion of WP-Cyrillic into Unicode. (Actually, I think it does Windows CP 1251, but that works just as well.) But that is only one out of forty-five languages? And the Chinese has been rasterized? And the PDFs were originally generated by Distiller 3? If you have any choice, it's time to walk away. If you don't have any choice, I really hope you are billing hourly. My experience in this area (painfully extensive) is that it will cost three to five times as much to extract the text as it would to have a translation professional rekey the text, and then to have a second translation professional review the rekeyed text looking for typos.

               

              Russian OCR is pretty damn good these days, but Chinese OCR is hit-or-miss. I have never seen good Arabic OCR - doesn't mean it's not out there, but I couldn't help you find it.  But chances that all 45 languages have reliable OCR available, and that the result of said OCRing will not need to be reviewed by someone who knows the language, are basically nil.

              • 4. Re: Layout in Arabic, Russian and Chinese. Exporting text from a PDF
                kkelleherdesign Level 1

                Good morning Joel,

                 

                Not the answer I wanted to wake up too but I appreciate you laying the facts on the table. I will in turn give it to my client straight up and unless they have the original word docs this is just not going to happen.  Really appreciate you taking the time to spell it out for me as I was pulling my hair out over here.

                 

                Have a good weekend.

                Karen

                • 5. Re: Layout in Arabic, Russian and Chinese. Exporting text from a PDF
                  Bo LeBeau Level 4

                  You wrote: I will in turn give it to my client straight up and unless they have the original word docs this is just not going to happen.

                  I hope you understand that Word Perfect is unrelated to MS Word.

                  Word Perfect was for a time the preeminent word processing application. Only later did Microsoft Word arrive and knock Word Perfect off its throne.

                   

                  But as Joel Cherney wrote . . . trying to harvest non-Latin-script text from WP and repurpose it for use in InDesign is just pure pain.

                  But Joel was talking about having the actual Work Perfect files, and you apparently only have PDFs. So it seems like you will have pure pain x 2.

                   

                  Joel is certainly the expert here, but I think the message here is his quote:

                  My experience in this area (painfully extensive) is that it will cost three to five times as much to extract the text as it would to have a translation professional rekey the text, and then to have a second translation professional review the rekeyed text looking for typos.

                  • 6. Re: Layout in Arabic, Russian and Chinese. Exporting text from a PDF
                    kkelleherdesign Level 1

                    Thank you for pointing that out to me. It is something I would not have known or thought about. Appreciate the input.

                    Karen