1 2 Previous Next 57 Replies Latest reply: Jun 27, 2012 5:41 AM by MnemosyneD RSS

    Need some help with Identity-H CMAP and Cyrillic text

    andrejusc Community Member

      Hi,

       

      I'm trying to output some text with Cyrillic characters by utilizing AddGlyphsSnip code and using Identity-H as encoding name.

      But that works only partially and in resulting PDF I don't get ToUnicode table, though kPDEFontCreateToUnicode is specified for PDEFontCreateFromSysFontAndEncoding call. Also, resulting PDF, when opened doesn't show text at all, though it's inside that PDF in Unicode form.

      What else should I look at?

       

      I've tried to prepare siilar PDF using Acrobat Pro and besides ToUnicode table I also see CIDToGIDMap entry, but not in one I prepare with Acrobat SDK.

       

      Any help would be highly appreciated.

       

      P.S. Any sample code which uses PDEFontCreateWithParams?

        • 1. Re: Need some help with Identity-H CMAP and Cyrillic text
          andrejusc Community Member

          In my resulting PDF part of content stream looks like this:

           

          1 0 0 0 k

          /C2_2 1 Tf

          7.003 0 Td

          <4300790061006E00>Tj

           

          Here I have word "Cyan" got encoded.

          • 2. Re: Need some help with Identity-H CMAP and Cyrillic text
            lrosenth Adobe Employee

            I would recommend submitting a formal support request to our developer support area, as this is a code level issue.

            • 3. Re: Need some help with Identity-H CMAP and Cyrillic text
              andrejusc Community Member

              Well, you've mentioned that several times already in different places of this forum, but didn't provide still more details on such procedure. How should I do that? Do I need to purchase special Developer Support case, which is not cheap by any mean to solve SDK missed documentation issue, become at least Solution Partner program's participant at Bronze level or how?

              • 4. Re: Need some help with Identity-H CMAP and Cyrillic text
                lrosenth Adobe Employee

                Yes, I believe you need to become a solution partner.

                 

                Unfortunately, no one here is able to assist you with this level of problem as it's related to your code (and how it interacts with our SDK).

                • 5. Re: Need some help with Identity-H CMAP and Cyrillic text
                  andrejusc Community Member

                  Well, let me disagree in regards to your SDK usage.

                   

                  I think that SDK should provide very clear code sample on how for example use Arial TrueType font for Unicode string output. This is not such exotic thing as let say Chinese or Japanese texts.

                   

                  If you have such piece and could share - that would be enough.

                   

                  I hate when some big company (like Adobe in this case) push you to pay for something useless for you actually instead of preparing high quality SDK documentation.

                   

                  I'm personally in IT business for 15 years and about 12 of them are spent on programming with various IBM Lotus APIs. They are consistent and very well documented and I don't need at all to become a Solution Partner (well, my company is now an IBM partner at Advanced ISV level, but that is another story), because instead they provide me 3 months for free! of very deep technical support in case I publish my first technical solution in their Global Solutions catalog. And then for any new one I get again 3 more months.

                   

                  So, Adobe in that case just sucks when thinking about developers, who tries to utilize Acrobat SDK. If Adobe thinks that ISVs could monetize on their plugins solutions a lot - that is just wrong thinking.

                   

                  It would be great if you could pass this my post to some of your management. I really need some proper reaction to this my post, not what we have till now.

                  • 6. Re: Need some help with Identity-H CMAP and Cyrillic text
                    lrosenth Adobe Employee

                    We have an extended example available to our PDFLibrary licensees and I believe we will incorporate it into future versions of the SDK.  Again, you will need to contact support to obtain it.

                     

                    But just as you think your needs are important, there are SO MANY ways in which people use PDF and we can only write so many samples.

                    • 7. Re: Need some help with Identity-H CMAP and Cyrillic text
                      andrejusc Community Member

                      Are you saying that if we need to develop a plugin for Acrobat Pro - we need to license PDFLibrary instead of using freely available Acrobat SDK?

                       

                      In that PDF Library FAQ it's written:

                      To develop plug-ins for Acrobat and Reader, please use the Acrobat SDK.

                       

                      So, your response in regards to PDFLibrary license is not valid here.

                       

                      Again - Acrobat SDK documentation needs to be enhanced ASAP and while of course there are SO MANY ways to use PDFs - we need to use only those ways that are available through public Acrobat SDK.

                       

                      And again once more - it's very bad practice to have something in publicly available SDK not properly documented. Yes, I've had such bad experience with public IBM APIs, but then while calling their method in return I've got string saying "Not implemented" and with Adobe I think we have a little bit another story.

                       

                      So, please tell me how to utilize Identity-H encoding name for CMAP table and some Unicode encoded text. I think it's not top secret and should not require to purchase separate Developer Support case or anything else.

                      • 8. Re: Need some help with Identity-H CMAP and Cyrillic text
                        lrosenth Adobe Employee

                        Identity-H is chosen automatically when you work with Unicode-based data and a CID-based font (or a font that is converted into a CID-based font).

                        • 9. Re: Need some help with Identity-H CMAP and Cyrillic text
                          andrejusc Community Member

                          That answer still doesn't help.

                           

                          Let say I'm using regular Windows XP SP3 Arial.ttf file. For it I have:

                           

                              sysEncoding = PDSysEncodingCreateFromCMapName(ASAtomFromString(myGlyphData.encodingName));
                              pdeFont = PDEFontCreateFromSysFontAndEncoding(sysFont, sysEncoding,
                                      pdeFontAttrs.name, fontCreateFlags);

                          where encodingName I set into "Identity-H". If I set it instead into "" or "UniCNS-UTF16-H" - I get then an error with hex code 20030055, i.e. encoding is missed (CMap).

                           

                          Maybe I need to use just PDEFontCreateFromSysFont(sysFont, fontCreateFlags); - I don't know, because I'm trying to use Unicode part of sample code for AddGlyphsSnip.cpp

                           

                          Now for fontCreateFlags I use:

                           

                          fontCreateFlags = kPDEFontCreateToUnicode|kPDEFontCreateEmbedded|kPDEFontWillSubset;

                           

                          but looking at AddGlyphsSnip.cpp sample from Acrobat SDk 9 I'm not sure why it uses this comparison:

                           

                           

                            if (fontCreateFlags == (kPDEFontCreateEmbedded|kPDEFontWillSubset)) {

                           

                          because it actually should be:

                           

                          if (fontCreateFlags == (kPDEFontCreateToUnicode|kPDEFontCreateEmbedded|kPDEFontWillSubset)) {

                           

                          Am I right with all those calls?

                          I'm not sure what do you mean by "Unicode-based data". I'm using same sample CreateGlyphRun function and each my character is encoded by 2 bytes as wchar_t.

                          • 10. Re: Need some help with Identity-H CMAP and Cyrillic text
                            lrosenth Adobe Employee

                            Yes, PDEFontCreateFromSysFont would be a much better idea - since you just need the font and the encoding will be handled for you as needed.  Picking your own encoding means we can't do smart things about it...

                             

                            Yes, your flags are fine - just use them with that API instead of rhe way you are doing it.

                            • 11. Re: Need some help with Identity-H CMAP and Cyrillic text
                              andrejusc Community Member

                              That doesn't work, cause for "Cyan" text, which I pass in as 2bytes per character string I get (here you could see extra zeros inserted by Acrobat):

                               

                              /TT2 1 Tf

                              7.003 0 Td

                              (C\000y\000a\000n\000)Tj

                               

                              So, there is no understanding from Acrobat side then, that I actually pass him Unicode encoded string and not just 8byte string.

                               

                              Ok, to clear things up from what I've learned so far I need either of 2 things:

                              1. Either know how to obtain CID for particular Unicode character for my selected font via some Acrobat SDK call - if you have that API please let me know.

                              2. Or utilize same AddGlyphsSnip.cpp code for regular Windows Arial TTF file, but using something else than "UniCNS-UTF16-H" as encoding name, becuase when using it over regular Arial.ttf I get an error 20030055

                               

                              Could you help?

                              • 12. Re: Need some help with Identity-H CMAP and Cyrillic text
                                andrejusc Community Member

                                After playing with thrid-party TTF reader utility - I could see that my Arial.ttf contains these CMAP tables (below is output of that utility):

                                 

                                cmap

                                            platform id: 0 (Unicode), encoding id: 3 (Unicode 2.0 and onwards semantics), offset: 28

                                            platform id: 1 (Macintosh), encoding id: 0 (Roman), offset: 2880

                                            platform id: 3 (Microsoft), encoding id: 1 (Unicode), offset: 3142

                                 

                                So my additional question would be - how would I always utilize via Acrobat SDK code the usage of that last CMAP table, i.e. Microsoft/Unicode? What encoding name in terms of Acrobat SDK should be used to select that CMAP? That third-party utility uses code like:

                                 

                                glyphIndex = cmapFormat.mapCharCode(i);

                                 

                                to retrieve appropriate Glyph ID corresponding to my Unicode character supplied to it.

                                 

                                What about Acrobat SDK and C/CPP code?

                                 

                                I've read already all related Adobe CID/CMAP related Technotes, but still can't understand what should be done in terms of Acrobat SDK code to output simple Unicode string via PDEText usage.

                                • 13. Re: Need some help with Identity-H CMAP and Cyrillic text
                                  lrosenth Adobe Employee

                                  Because Cyan doesn't need 2 bytes - use only a single byte for those.   For your Cyrillic, use the Unicode (UTF-16) code point.

                                  • 14. Re: Need some help with Identity-H CMAP and Cyrillic text
                                    lrosenth Adobe Employee

                                    You don't want to OUTPUT a Unicode string - PDF doesn't do (for the most part) Unicode in content streams.

                                     

                                    Instead the data in the PDF content stream (especially in the case of a subset) is just a set of glyph IDs.

                                    • 15. Re: Need some help with Identity-H CMAP and Cyrillic text
                                      andrejusc Community Member

                                      I need to have general Unicode solution and thus I need even "Cyan" text to be on C/C++ side as 2bytes per character string.

                                       

                                      Now I don't understand this your phrase:

                                      "use the Unicode (UTF-16) code point"

                                       

                                      What does it (code point) mean? Is it some identifier I need to use somewhere or what? How should I use that actually?

                                      • 16. Re: Need some help with Identity-H CMAP and Cyrillic text
                                        andrejusc Community Member

                                        Well, if it could do Unicode to CID conversion on the fly for me - that would be great, but so far from all my tests I could oly see that it for example has PDF output like this:

                                         

                                        12 0 0 12 42.5197 0.5669 Tm

                                        <3004>Tj

                                         

                                         

                                        Here it had inserted my Unicode character, which is originally U+0430 doing some bytes exchange, but not corresponding to it CID, which then should be

                                         

                                        12 0 0 12 16 749.25 Tm

                                        <025A>Tj

                                         

                                        So, how should I tell him to place <025A> (i.e. CID code) instead of initially provided <0430> (i.e. Unicode)?

                                         

                                        I assume that for my text to be shown properly - Encoding field in resulting PDF should be in any case Identity-H (but then CID logic is involved either via some code or via some special parameters) and not WinAnsiEncoding.

                                         

                                        • 17. Re: Need some help with Identity-H CMAP and Cyrillic text
                                          lrosenth Adobe Employee

                                          CID is generated on the fly - you don't worry about that.  You only need to worry about the Unicode code points (aka Unicode/UTF-16 values).   This is what the AddGlyphs sample does, IIRC.

                                          • 18. Re: Need some help with Identity-H CMAP and Cyrillic text
                                            andrejusc Community Member

                                            Hmm, it's becoming like a never ending story. I have Unicode text, which I want to be added into my PDF using Acrobat SDK.

                                             

                                            Looking at AddGlyphs sample I can't find situation, which suits my needs, because all encoding names used in that sample don't work for non-CJK Unicode string situation, which I have.

                                             

                                            Could you tell me exactly what I need to change in AddGlyphs sample to work with my non-CJK Unicode string and have it normally converted into CIDs? I have no idea after all that long discussion how I could achieve that.

                                             

                                            AddGlyph for kASRomanScript type of text just makes assumption that some font called "CourierStd" supports "UniCNS-UTF16-H" encoding name. But that assumption just doesn't work for my Arial.ttf font.

                                             

                                            I've found in Acrobat SDK documentation method called PDGetPDFDocEncoding, which could probably get me some light what Glyph ID is actually corresponds to my Unicode character, but there is no smaple code on it. And if it returns an array - how to use that array?

                                             

                                            So, I still have this issue unresolved.

                                             

                                            Maybe you could send me some code offline to my email andrejusc@yahoo.com?

                                            • 19. Re: Need some help with Identity-H CMAP and Cyrillic text
                                              andrejusc Community Member

                                              If I do just one simple change for first 2 bytes of gRGlyphs array inside AddGlyphs code and have this:

                                               

                                              unsigned char gRGlyphs[] = {
                                              0x04, 0x20, 0x00, 0x64, 0x00, 0x64, 0x00, 0x20,
                                              0x00, 0x54, 0x00, 0x65, 0x00, 0x78, 0x00, 0x74
                                              };

                                              Then I end up with an error: 40000003, Incorrect parameter value

                                               

                                              What does it mean? I have no idea how I should prepare that Unicode value properly then. Any advice?

                                              • 20. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                lrosenth Adobe Employee

                                                I am trying to get you a working sample....hang in there...

                                                • 21. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                  andrejusc Community Member

                                                  Great! Hope it will not take you forever to prepare it.

                                                  • 22. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                    andrejusc Community Member

                                                    Any update on this issue?

                                                    • 23. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                      andrejusc Community Member

                                                      Still waiting for any new light and while on it, could you tell me if such error:

                                                       

                                                      40000003, Incorrect parameter value

                                                       

                                                      could be exposed to more information by some SDK call? It's really hard to debug when you get error's description just like this.

                                                      • 24. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                        lrosenth Adobe Employee

                                                        Yes, I know...

                                                         

                                                        Where are you getting this error?

                                                        • 25. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                          andrejusc Community Member

                                                          With my changed gRGlyphs array as specified in some of my previous posts - I get that error during PDETextAddGlyphs call

                                                          • 26. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                            andrejusc Community Member

                                                            I'm marking this question as answered, because it was handled via our separate Developer Support case #181861381. In short - we know now how to achieve our goal but I thought initially that Acrobat SDK has already some built-in functionality to do various Unicode/Code Pages mappings and it's turned out that not.

                                                            • 27. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                              MnemosyneD

                                                              Would you care to share how it's done?

                                                              • 28. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                Test Screen Name CommunityMVP

                                                                If Identity-H worked at all it would mean "my strings are CIDs". Not "my strings are Unicode and I want you to turn them into CIDs".

                                                                 

                                                                You need to use an encoding which matches the actual encoding of the strings you plan to show. If there isn't a Unicode encoding, then you can't use Unicode.  This is what I would expect, since there is no such thing as Unicode text support from PDF content stream to font. Find an encoding which accomodates your data, use it, map Unicode into it. It is likely that for broad ranges of Unicode you'll need to partition your text into multiple fonts. Even if they are the same real font. Examine existing PDFs, this happens a lot.

                                                                • 29. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                  MnemosyneD Community Member

                                                                  Thank for your answer.

                                                                   

                                                                  My problem is finding an encoding that matches my string.

                                                                   

                                                                  Since no CMaps exist (that I know of) for languages apart from CJK, it seems that the solution should be to use simple font.

                                                                   

                                                                  For simple font, the only encodings are : MacRomanEncoding, WinAnsiEncoding, StandardEncoding or MacExpertEncoding. The solution seems to be to use difference arrays to encode the glyphes that I need, by mapping the unicode to the glyph name, using the Acrobat Glyph Table.

                                                                   

                                                                  This actualy works, but not for all TrueType fonts (which is no surprise since the PDF Reference mentions that using difference array with TrueType is not the way to do things).

                                                                   

                                                                  For TrueType fonts encodings, non-roman glyphs must be encoded useing a symbol encoding. However, I don't understand how to convert uncideo the character code. Moreover, I don't see how to see in the Acrobat SDK how to define a symbol encoding.

                                                                   

                                                                  What is even more trroubling is that, for both symboland non-symbol encoding, the mapping to the GID is done with the use of subtables. If using difference array don't work for a font, I fail to see how it would work with non-symbolic encoding. 

                                                                  • 30. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                    lrosenth Adobe Employee

                                                                    Identity-H and Identity-V are the best way to represent Unicode in PDF.  However, you don’t just use the CMAP, you also need to be sure to incorporate a ToUnicode table and correctly subset the font data itself.

                                                                    • 31. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                      MnemosyneD Community Member

                                                                      I like the idea to use Identity-H and Identity-V. This however meens that I need to use directly the GID, which I don't have. Is there a way to use the Acrobat SDK to get the GIDs of a unicode for a certain font or do I have use an external application (ex: FreeType) to get them?

                                                                      • 32. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                        lrosenth Adobe Employee

                                                                        There are PDFont APIs that should help.

                                                                         

                                                                        Have you looked at the AddGlyphs sample in the SDK?

                                                                        • 33. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                          Test Screen Name CommunityMVP

                                                                          PDFEdit is a thin layer of abstraction over what actually goes in the PDF. So, for example, the list of valid Encodings is the list that appears in the PDF Reference. But, that said, there do seem to be methods which do more for you, and PDETextAddGlyphs does seem to be one of those methods. Never used it though.

                                                                          • 34. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                            MnemosyneD Community Member

                                                                            I use PDETextAddGlyphs to create unicode text (is there any other way?). To use this fonction, we need to create a PDEGlyphRun do define the characters to display, there positions and their link to uncode.

                                                                             

                                                                            Specifically, my question is:

                                                                             

                                                                            PDEGlyphRun glyphRun;

                                                                            glyphRun.glyphs[i].glyphID = VALUE;

                                                                             

                                                                            How can I calculate VALUE, for a certain unicode, and font with Identity-H?

                                                                            I beieve VALUE with Identity-H is the GID of the unciode for the font, but I don't see how I can extract this information with the Acrobat SDK.

                                                                            • 35. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                              MnemosyneD Community Member

                                                                              In AddGlyphSnip.cpp VALUE is the unicode value, since there is an appropiate CMap set for the language encoded.

                                                                              With Identity-H, we must convert the unicode valy ourself before setting to glyphRun.glyphs[i].glyphID.

                                                                              • 36. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                                Test Screen Name CommunityMVP

                                                                                A side question which may shed light on the approach here.... what if you make a PDF using these characters and this font? What gets into the PDF?

                                                                                 

                                                                                CMap or Encoding? What CMap? Or what Encoding? What code points?

                                                                                • 37. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                                  MnemosyneD Community Member

                                                                                  I gent an encoded stream (FlatDecode) so I have no idea of the code points. I'm pretty sure the code points would have been the GID however.

                                                                                  The font encoding is Identity-H.

                                                                                  • 38. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                                    lrosenth Adobe Employee

                                                                                    I’m confused – are you CREATING the content stream & associated Font resources?   Or are you trying to do something like text extraction on a content stream?  Or other?!?!

                                                                                    • 39. Re: Need some help with Identity-H CMAP and Cyrillic text
                                                                                      MnemosyneD Community Member

                                                                                      Non, I am not trying to do text extraction. I am trying to create a PDEText containing, for example cyrillic glyphs.

                                                                                       

                                                                                      You said that Identity-H encoding is what should be used for non-CJK string. I have found that with Identity-H, the value I should set to 

                                                                                      glyphRun.glyphs[i].glyphID is the GID of the unicode for the chosen font. I don't know how to get this value from the acrobat SDK.

                                                                                      Could you tell me how? Or is there another way to encode the unicode string?

                                                                                      1 2 Previous Next