-
1. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 6, 2010 8:36 AM (in response to andrejusc)In my resulting PDF part of content stream looks like this:
1 0 0 0 k
/C2_2 1 Tf
7.003 0 Td
<4300790061006E00>Tj
Here I have word "Cyan" got encoded.
-
2. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 6, 2010 11:58 AM (in response to andrejusc)I would recommend submitting a formal support request to our developer support area, as this is a code level issue.
-
3. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 6, 2010 12:09 PM (in response to lrosenth)Well, you've mentioned that several times already in different places of this forum, but didn't provide still more details on such procedure. How should I do that? Do I need to purchase special Developer Support case, which is not cheap by any mean to solve SDK missed documentation issue, become at least Solution Partner program's participant at Bronze level or how?
-
4. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 6, 2010 12:15 PM (in response to andrejusc)Yes, I believe you need to become a solution partner.
Unfortunately, no one here is able to assist you with this level of problem as it's related to your code (and how it interacts with our SDK).
-
5. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 6, 2010 12:31 PM (in response to lrosenth)Well, let me disagree in regards to your SDK usage.
I think that SDK should provide very clear code sample on how for example use Arial TrueType font for Unicode string output. This is not such exotic thing as let say Chinese or Japanese texts.
If you have such piece and could share - that would be enough.
I hate when some big company (like Adobe in this case) push you to pay for something useless for you actually instead of preparing high quality SDK documentation.
I'm personally in IT business for 15 years and about 12 of them are spent on programming with various IBM Lotus APIs. They are consistent and very well documented and I don't need at all to become a Solution Partner (well, my company is now an IBM partner at Advanced ISV level, but that is another story), because instead they provide me 3 months for free! of very deep technical support in case I publish my first technical solution in their Global Solutions catalog. And then for any new one I get again 3 more months.
So, Adobe in that case just sucks when thinking about developers, who tries to utilize Acrobat SDK. If Adobe thinks that ISVs could monetize on their plugins solutions a lot - that is just wrong thinking.
It would be great if you could pass this my post to some of your management. I really need some proper reaction to this my post, not what we have till now.
-
6. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 6, 2010 12:43 PM (in response to andrejusc)We have an extended example available to our PDFLibrary licensees and I believe we will incorporate it into future versions of the SDK. Again, you will need to contact support to obtain it.
But just as you think your needs are important, there are SO MANY ways in which people use PDF and we can only write so many samples.
-
7. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 6, 2010 1:22 PM (in response to lrosenth)Are you saying that if we need to develop a plugin for Acrobat Pro - we need to license PDFLibrary instead of using freely available Acrobat SDK?
In that PDF Library FAQ it's written:
To develop plug-ins for Acrobat and Reader, please use the Acrobat SDK.
So, your response in regards to PDFLibrary license is not valid here.
Again - Acrobat SDK documentation needs to be enhanced ASAP and while of course there are SO MANY ways to use PDFs - we need to use only those ways that are available through public Acrobat SDK.
And again once more - it's very bad practice to have something in publicly available SDK not properly documented. Yes, I've had such bad experience with public IBM APIs, but then while calling their method in return I've got string saying "Not implemented" and with Adobe I think we have a little bit another story.
So, please tell me how to utilize Identity-H encoding name for CMAP table and some Unicode encoded text. I think it's not top secret and should not require to purchase separate Developer Support case or anything else.
-
8. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 6, 2010 2:33 PM (in response to andrejusc)Identity-H is chosen automatically when you work with Unicode-based data and a CID-based font (or a font that is converted into a CID-based font).
-
9. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 6, 2010 2:50 PM (in response to lrosenth)That answer still doesn't help.
Let say I'm using regular Windows XP SP3 Arial.ttf file. For it I have:
sysEncoding = PDSysEncodingCreateFromCMapName(ASAtomFromString(myGlyphData.encodingName));
pdeFont = PDEFontCreateFromSysFontAndEncoding(sysFont, sysEncoding,
pdeFontAttrs.name, fontCreateFlags);where encodingName I set into "Identity-H". If I set it instead into "" or "UniCNS-UTF16-H" - I get then an error with hex code 20030055, i.e. encoding is missed (CMap).
Maybe I need to use just PDEFontCreateFromSysFont(sysFont, fontCreateFlags); - I don't know, because I'm trying to use Unicode part of sample code for AddGlyphsSnip.cpp
Now for fontCreateFlags I use:
fontCreateFlags = kPDEFontCreateToUnicode|kPDEFontCreateEmbedded|kPDEFontWillSubset;
but looking at AddGlyphsSnip.cpp sample from Acrobat SDk 9 I'm not sure why it uses this comparison:
if (fontCreateFlags == (kPDEFontCreateEmbedded|kPDEFontWillSubset)) {
because it actually should be:
if (fontCreateFlags == (kPDEFontCreateToUnicode|kPDEFontCreateEmbedded|kPDEFontWillSubset)) {
Am I right with all those calls?
I'm not sure what do you mean by "Unicode-based data". I'm using same sample CreateGlyphRun function and each my character is encoded by 2 bytes as wchar_t.
-
10. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 6, 2010 4:32 PM (in response to andrejusc)Yes, PDEFontCreateFromSysFont would be a much better idea - since you just need the font and the encoding will be handled for you as needed. Picking your own encoding means we can't do smart things about it...
Yes, your flags are fine - just use them with that API instead of rhe way you are doing it.
-
11. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 7, 2010 4:26 AM (in response to lrosenth)That doesn't work, cause for "Cyan" text, which I pass in as 2bytes per character string I get (here you could see extra zeros inserted by Acrobat):
/TT2 1 Tf
7.003 0 Td
(C\000y\000a\000n\000)Tj
So, there is no understanding from Acrobat side then, that I actually pass him Unicode encoded string and not just 8byte string.
Ok, to clear things up from what I've learned so far I need either of 2 things:
1. Either know how to obtain CID for particular Unicode character for my selected font via some Acrobat SDK call - if you have that API please let me know.
2. Or utilize same AddGlyphsSnip.cpp code for regular Windows Arial TTF file, but using something else than "UniCNS-UTF16-H" as encoding name, becuase when using it over regular Arial.ttf I get an error 20030055
Could you help?
-
12. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 7, 2010 5:36 AM (in response to andrejusc)After playing with thrid-party TTF reader utility - I could see that my Arial.ttf contains these CMAP tables (below is output of that utility):
cmap
platform id: 0 (Unicode), encoding id: 3 (Unicode 2.0 and onwards semantics), offset: 28
platform id: 1 (Macintosh), encoding id: 0 (Roman), offset: 2880
platform id: 3 (Microsoft), encoding id: 1 (Unicode), offset: 3142
So my additional question would be - how would I always utilize via Acrobat SDK code the usage of that last CMAP table, i.e. Microsoft/Unicode? What encoding name in terms of Acrobat SDK should be used to select that CMAP? That third-party utility uses code like:
glyphIndex = cmapFormat.mapCharCode(i);
to retrieve appropriate Glyph ID corresponding to my Unicode character supplied to it.
What about Acrobat SDK and C/CPP code?
I've read already all related Adobe CID/CMAP related Technotes, but still can't understand what should be done in terms of Acrobat SDK code to output simple Unicode string via PDEText usage.
-
13. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 7, 2010 7:36 AM (in response to andrejusc)Because Cyan doesn't need 2 bytes - use only a single byte for those. For your Cyrillic, use the Unicode (UTF-16) code point.
-
14. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 7, 2010 7:37 AM (in response to andrejusc)You don't want to OUTPUT a Unicode string - PDF doesn't do (for the most part) Unicode in content streams.
Instead the data in the PDF content stream (especially in the case of a subset) is just a set of glyph IDs.
-
15. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 7, 2010 7:43 AM (in response to lrosenth)I need to have general Unicode solution and thus I need even "Cyan" text to be on C/C++ side as 2bytes per character string.
Now I don't understand this your phrase:
"use the Unicode (UTF-16) code point"
What does it (code point) mean? Is it some identifier I need to use somewhere or what? How should I use that actually?
-
16. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 7, 2010 7:51 AM (in response to lrosenth)Well, if it could do Unicode to CID conversion on the fly for me - that would be great, but so far from all my tests I could oly see that it for example has PDF output like this:
12 0 0 12 42.5197 0.5669 Tm
<3004>Tj
Here it had inserted my Unicode character, which is originally U+0430 doing some bytes exchange, but not corresponding to it CID, which then should be
12 0 0 12 16 749.25 Tm
<025A>Tj
So, how should I tell him to place <025A> (i.e. CID code) instead of initially provided <0430> (i.e. Unicode)?
I assume that for my text to be shown properly - Encoding field in resulting PDF should be in any case Identity-H (but then CID logic is involved either via some code or via some special parameters) and not WinAnsiEncoding.
-
17. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 7, 2010 9:10 AM (in response to andrejusc)CID is generated on the fly - you don't worry about that. You only need to worry about the Unicode code points (aka Unicode/UTF-16 values). This is what the AddGlyphs sample does, IIRC.
-
18. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 7, 2010 9:43 AM (in response to lrosenth)Hmm, it's becoming like a never ending story. I have Unicode text, which I want to be added into my PDF using Acrobat SDK.
Looking at AddGlyphs sample I can't find situation, which suits my needs, because all encoding names used in that sample don't work for non-CJK Unicode string situation, which I have.
Could you tell me exactly what I need to change in AddGlyphs sample to work with my non-CJK Unicode string and have it normally converted into CIDs? I have no idea after all that long discussion how I could achieve that.
AddGlyph for kASRomanScript type of text just makes assumption that some font called "CourierStd" supports "UniCNS-UTF16-H" encoding name. But that assumption just doesn't work for my Arial.ttf font.
I've found in Acrobat SDK documentation method called PDGetPDFDocEncoding, which could probably get me some light what Glyph ID is actually corresponds to my Unicode character, but there is no smaple code on it. And if it returns an array - how to use that array?
So, I still have this issue unresolved.
Maybe you could send me some code offline to my email andrejusc@yahoo.com?
-
19. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 7, 2010 11:47 AM (in response to andrejusc)If I do just one simple change for first 2 bytes of gRGlyphs array inside AddGlyphs code and have this:
unsigned char gRGlyphs[] = {
0x04, 0x20, 0x00, 0x64, 0x00, 0x64, 0x00, 0x20,
0x00, 0x54, 0x00, 0x65, 0x00, 0x78, 0x00, 0x74
};Then I end up with an error: 40000003, Incorrect parameter value
What does it mean? I have no idea how I should prepare that Unicode value properly then. Any advice?
-
20. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 7, 2010 12:32 PM (in response to andrejusc)I am trying to get you a working sample....hang in there...
-
21. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 7, 2010 1:25 PM (in response to lrosenth)Great! Hope it will not take you forever to prepare it.
-
22. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 8, 2010 6:06 AM (in response to andrejusc)Any update on this issue?
-
23. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 11, 2010 5:39 AM (in response to lrosenth)Still waiting for any new light and while on it, could you tell me if such error:
40000003, Incorrect parameter value
could be exposed to more information by some SDK call? It's really hard to debug when you get error's description just like this.
-
24. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Oct 11, 2010 6:42 AM (in response to andrejusc)Yes, I know...
Where are you getting this error?
-
25. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Oct 11, 2010 6:57 AM (in response to lrosenth)With my changed gRGlyphs array as specified in some of my previous posts - I get that error during PDETextAddGlyphs call
-
26. Re: Need some help with Identity-H CMAP and Cyrillic text
andrejusc Nov 18, 2010 7:01 AM (in response to lrosenth)I'm marking this question as answered, because it was handled via our separate Developer Support case #181861381. In short - we know now how to achieve our goal but I thought initially that Acrobat SDK has already some built-in functionality to do various Unicode/Code Pages mappings and it's turned out that not.
-
27. Re: Need some help with Identity-H CMAP and Cyrillic text
MnemosyneD May 29, 2012 11:35 AM (in response to andrejusc)Would you care to share how it's done?
-
28. Re: Need some help with Identity-H CMAP and Cyrillic text
Test Screen Name Jun 20, 2012 7:26 AM (in response to MnemosyneD)If Identity-H worked at all it would mean "my strings are CIDs". Not "my strings are Unicode and I want you to turn them into CIDs".
You need to use an encoding which matches the actual encoding of the strings you plan to show. If there isn't a Unicode encoding, then you can't use Unicode. This is what I would expect, since there is no such thing as Unicode text support from PDF content stream to font. Find an encoding which accomodates your data, use it, map Unicode into it. It is likely that for broad ranges of Unicode you'll need to partition your text into multiple fonts. Even if they are the same real font. Examine existing PDFs, this happens a lot.
-
29. Re: Need some help with Identity-H CMAP and Cyrillic text
MnemosyneD Jun 20, 2012 8:15 AM (in response to Test Screen Name)Thank for your answer.
My problem is finding an encoding that matches my string.
Since no CMaps exist (that I know of) for languages apart from CJK, it seems that the solution should be to use simple font.
For simple font, the only encodings are : MacRomanEncoding, WinAnsiEncoding, StandardEncoding or MacExpertEncoding. The solution seems to be to use difference arrays to encode the glyphes that I need, by mapping the unicode to the glyph name, using the Acrobat Glyph Table.
This actualy works, but not for all TrueType fonts (which is no surprise since the PDF Reference mentions that using difference array with TrueType is not the way to do things).
For TrueType fonts encodings, non-roman glyphs must be encoded useing a symbol encoding. However, I don't understand how to convert uncideo the character code. Moreover, I don't see how to see in the Acrobat SDK how to define a symbol encoding.
What is even more trroubling is that, for both symboland non-symbol encoding, the mapping to the GID is done with the use of subtables. If using difference array don't work for a font, I fail to see how it would work with non-symbolic encoding.
-
30. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Jun 20, 2012 8:18 AM (in response to MnemosyneD)Identity-H and Identity-V are the best way to represent Unicode in PDF. However, you don’t just use the CMAP, you also need to be sure to incorporate a ToUnicode table and correctly subset the font data itself.
-
31. Re: Need some help with Identity-H CMAP and Cyrillic text
MnemosyneD Jun 20, 2012 8:41 AM (in response to lrosenth)I like the idea to use Identity-H and Identity-V. This however meens that I need to use directly the GID, which I don't have. Is there a way to use the Acrobat SDK to get the GIDs of a unicode for a certain font or do I have use an external application (ex: FreeType) to get them?
-
32. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Jun 20, 2012 9:15 AM (in response to MnemosyneD)There are PDFont APIs that should help.
Have you looked at the AddGlyphs sample in the SDK?
-
33. Re: Need some help with Identity-H CMAP and Cyrillic text
Test Screen Name Jun 20, 2012 9:23 AM (in response to MnemosyneD)PDFEdit is a thin layer of abstraction over what actually goes in the PDF. So, for example, the list of valid Encodings is the list that appears in the PDF Reference. But, that said, there do seem to be methods which do more for you, and PDETextAddGlyphs does seem to be one of those methods. Never used it though.
-
34. Re: Need some help with Identity-H CMAP and Cyrillic text
MnemosyneD Jun 20, 2012 10:03 AM (in response to Test Screen Name)I use PDETextAddGlyphs to create unicode text (is there any other way?). To use this fonction, we need to create a PDEGlyphRun do define the characters to display, there positions and their link to uncode.
Specifically, my question is:
PDEGlyphRun glyphRun;
glyphRun.glyphs[i].glyphID = VALUE;
How can I calculate VALUE, for a certain unicode, and font with Identity-H?
I beieve VALUE with Identity-H is the GID of the unciode for the font, but I don't see how I can extract this information with the Acrobat SDK.
-
35. Re: Need some help with Identity-H CMAP and Cyrillic text
MnemosyneD Jun 20, 2012 11:29 AM (in response to lrosenth)In AddGlyphSnip.cpp VALUE is the unicode value, since there is an appropiate CMap set for the language encoded.
With Identity-H, we must convert the unicode valy ourself before setting to glyphRun.glyphs[i].glyphID.
-
36. Re: Need some help with Identity-H CMAP and Cyrillic text
Test Screen Name Jun 20, 2012 1:20 PM (in response to MnemosyneD)A side question which may shed light on the approach here.... what if you make a PDF using these characters and this font? What gets into the PDF?
CMap or Encoding? What CMap? Or what Encoding? What code points?
-
37. Re: Need some help with Identity-H CMAP and Cyrillic text
MnemosyneD Jun 21, 2012 6:33 AM (in response to Test Screen Name)I gent an encoded stream (FlatDecode) so I have no idea of the code points. I'm pretty sure the code points would have been the GID however.
The font encoding is Identity-H.
-
38. Re: Need some help with Identity-H CMAP and Cyrillic text
lrosenth Jun 21, 2012 6:56 AM (in response to MnemosyneD)I’m confused – are you CREATING the content stream & associated Font resources? Or are you trying to do something like text extraction on a content stream? Or other?!?!
-
39. Re: Need some help with Identity-H CMAP and Cyrillic text
MnemosyneD Jun 21, 2012 7:11 AM (in response to lrosenth)Non, I am not trying to do text extraction. I am trying to create a PDEText containing, for example cyrillic glyphs.
You said that Identity-H encoding is what should be used for non-CJK string. I have found that with Identity-H, the value I should set to
glyphRun.glyphs[i].glyphID is the GID of the unicode for the chosen font. I don't know how to get this value from the acrobat SDK.
Could you tell me how? Or is there another way to encode the unicode string?



