9 Replies Latest reply: May 13, 2014 5:40 AM by PigPigPig RSS

    How to recover incomplete cmap?

    PigPigPig Community Member

      I printed a Microsoft Word file as a PDF file by Distiller with an embed SakkalMajalla truetype font. I want to extract unicode texts from the PDF file. I found ToUnicode misses part of mapping. For example, CID 06B4 doen't have any mapping. I guess 06b4 should be mapped to U+0644. There are some substitutions in SakkalMajalla. So uni0644.medi (U+FEE0) is replaced by liga.0758.medi.alt1 (U+10354). Why can't Distiller deal with the situation? How can I recover missed mapping from PDF objects except ToUnicode? Thanks

       

      P.S. I also asked the question couple days ago. Please see Re: Is it a bug of Distiller? I haven't got answers. I don't have privilege to move or delete that discussion. Sorry for asking a question in two communities.

       

       

      /GS1 gs

      BT

      /TT1 1 Tf

      24 0 0 24 513.84 764.1203 Tm

      0 g

      0 Tc

      0 Tw

      <0284>Tj

      .495 .5925 TD

      <0551>Tj

      -.1675 -.5925 TD

      <06b4>Tj

      .4 .4225 TD

      <0551>Tj

      -.12 -.4225 TD

      <024f>Tj

      /TT2 1 Tf

      12 0 0 12 506.58 764.1203 Tm

      ( )Tj

      ET

       

      /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<

      /Registry (JJEELB+TT1+0) /Ordering (T42UV) /Supplement 0 >> def

      /CMapName /JJEELB+TT1+0 def

      /CMapType 2 def

      1 begincodespacerange <024f> <0551> endcodespacerange

      3 beginbfchar

      <024f> <0639>

      <0284> <0649>

      <0551> <064E>

      endbfchar

      endcmap CMapName currentdict /CMap defineresource pop end end