Skip navigation
Currently Being Moderated

Yet another FlateDecode issue

Sep 4, 2011 6:23 AM

Hello, everyone.

 

I've got an issue concerning parsing a pdf file. I'm trying to create a php parser for the pdf files, that extract text AND fonts, any other data i can extract might also help my needs, but is not absolutely necessary.

 

And so on for the last two weeks i've been surfing the net, trying to find a fitting solution, but unfortunately, none of those could extract fonts, so i decided i should inflate streams myself, and extract information i need, yet again, unfortunately, i've only found a few php functions, like gzinflate, and, for some reason, they do not work on MY pdf file. After several attempts, i gave up on trying to get all i want at once and started trying to simply decode the stream by any means necessary. But i also failed, all of the user(developer)-written functions couldn't inflate it. Some of the functions returned 5-6  64-bit symbols after decoding a stream with length of over 5000.

 

Most fitting solutions were found while surfing adobe forums: "remove first 2 bytes add a header and inflate" - that method returned two symbols after inflation.

Incase i do not add the header i get an error calling gzinflate function. I've also attempted to decode removing different number of bytes from start - from zero to 15 bytes. All of them failed.

 

From what i know atm:

* my pdf streams are only using flatedecode filter;

* my pdf file is version 1.4, which is written in both adobe reader info and in raw source at first line;

* my data is kept in ansi format;

* stream should be trimmed before trying to decode it;

* gzinflate function uses both Huffman Encoding and RFC 1951, so i do not need to add a header, only got to remove first 2 bytes.

 

In addition there is some mysterious info, that flatedecode is one of two filters which take up some parameters, although info on what those parameters are and how do i call inflation with them failed to be found.

 

Finally i've downloaded a sample.pdf file, that is also using pdf version 1.4 and flate decode alone and by simply removing first 2 bytes and calling gzinflate, i've succeeded in decoding the data stream.

 

So the question is: "wtf?"... my stream is NOT damaged, because i'm able to open it with adobe reader and is not password-protected, why can't i decode it then?

 

There is also another question, in first data stream we see:

/I 295/L 255/S 43

after filter name. What those parameters might be?

 

Some info:

 

trailer
<</Size 81/Prev 5999684/Root 11 0 R/Info 9 0 R/ID[<EE951A7F3B62BED917DA9DFF020EE348><4A129B9704454547B9307854F5845526>]>>

 

first data stream:

 

80 0 obj
<</Length 260/B 271/Filter/FlateDecode/I 295/L 255/S 43>>stream
xЪTЏ1KГP …ї——ЦґV%%uP A‘ЉЛs §7 л в"€q(†J  ' йрf§ "ё:€c~Ѓdp '‡
 q“ТAњULўCЅ—ГЗЅч ч
 ™
WУСТгг]ґplЦZЕ
*=vґx Ѓ Ї¦”РN<Еt&#152;ѕRxаKeNЏ&…}К!›±РИd8¤…&#127;&sВj– яр—YWА&#152;П0ќKшз“щ  &#127;  
endstream
endobj

 

 

other sampled stream

 

22 0 obj
<</Length 4568/Filter/FlateDecode>>stream
H‰|W]oЫК  І
W– ™Rњ0–%y{УЫ’®µб.їыR и Р·ў ъPч)hТ HЃЫ>ф дсж7чММ’ў ҐHl‹ 9;{О™3і?,њщ‡YxoЫЄ5®nKл}oњ+ [ЧЮ ]ЭФ¶wµщчЯ u_ШѕпMЫ5¶qЌщёшіщЧў Їб з:~….й ъънпядН‡я,~э°xы»Т8ур~Qх¶л Sа_YЫ¦2UiЛОu€д{[ъ¦5   ш ~ ЮСЇя.2џ?ь !њ„pҐн$‚|кjл ьuuйlУч бh‹ўh)„-|EQю’UжS~фu›бЖ35{ћ# џEъ™с…sf–»Іµefуї>ь ЛyY p Ґ¬з[ЫzSUЮvuБ z[9пhA[#g[Ф®’EлЮЛЄґ’™E*Тs ьЫ 
‚6Ц  @ш^Ђ vЌ\«й¶ы®А‹X i”¶ЄhaA {чHOц^v’Жьв±p~™·Ѕ­І&#152;.кјўП+`  ъм2?Ъ&Г& &l» рx  ТЖ® Ѕц¶,АWХч@ р{лZи l:?Й»     y;Ыu’6&#127;j*л ru@ АъIТ…$-њ lС3  хXы Dг ¦)lСС®‹ H™w щ> 9V@°3ЗТцЌ3ЗО6Ш<юt• ®Ю-Ћ@ЇхешµkmУUг»бІа'еурЌјч$к T–”Л"јQp”ї/ЮЯ-юИ”C7МWEІ!К‹ЋВФ uжКr‚] ¤WЎXЪљ#UЁКТ”mѓr© wE8Фe;%Ѕ Hпл“ц"­з №CЭesѓ&#127;Љt §wѓЮїQ^ ДСuњvcЫ®кЗт %Ц3cЙzЈ_\­µ‰ћї\ї2¤° Q”¦&zсjум*5КјЋф‹MzЅЋ¶/nL`v‚LЏ
{TШ Њл  ¬xB¦•$ ЫwuЕYv¶„!Х   <ґеlUчS\ЄЪq¦Оѓ?ЖеЗgКМ  ™ )  УџY^  }0Kъэњ¶°2zu™{0”-Q -&#127;џМ іЈѓу°Ub hыб7ґH А§·“•^У_ъ№7џМ†oЄҐyБ·ЇLЋ*Єі—|хJ§:¶xкutдш w€G МсRzжЪиФиЌZn)Џ•yћы"Sл }7gМННZпТэЬ$ј§1      $xБJПпMЄLґЖЫШШбFBУуy]dQ¬МNНy ъ•Ю›‹ГкЦрЕfЙ°LгB‡ 8Щ"®¦ “јП ЊЧO °U,   9†0‡У®(‰8бK &#127;НП[уЭЂЪЦlф     ‰ВµN–YоЇ) +y•»ЄГkч tF{џ«(Э_ жRУ*     §Ѕѕ t€цэХ)Ёo« ];$”&#152;ЛµN$¶‰
ЎоoтІѓCj`ЄЇМu¬сДк3 Ѓп•ОSµѓэIПІ$пЈѓ~‹¶™*„ йEђM ё}™ГC! AЖgsЕЩо.ЧTЈШjКм_ , v›—ф…¦§  ТЪи$Ю ї%LZ  EЛўU4A  ”щ4‰Ћз  иЭѓюt ї ЌЉ –jВшPRҐ BmЦљKhЕЏПHYЩaГи]Ѕ yН ;єЬљdЕ_«t&#127;&#127;Т№k  #эІЫ™$БIю”е“„{чШ2х°>‹"¦X.¤$ТtП`Nr&л"Ф+1,¦Ьі<
їџlW¤ Ё ЁЦpK. д dЁ8дДZ Dґ 0ђ ПЂRА›d Ю/эДч&ZnЌ  ц  ”­MЌB4‡B"ћ9 ¦иЙJ%Їд WћМБKВ ©Іжхкb n б2Pm ҐzјЕЇlЦR}ьe2S;dG_уЭ";ћбg®¶C)сЫх.ЕќH ¦п9QZ­¤Оr Кf OђG'ХL®}тµщ¤&#152;  3› Kя0$<ь%qЖCЦ ж·yШ‹a § ж&љљ‡o '@ъ*ќ “W7јoр2 • ©зBђpЎ µµѕщ•Ф<JЁыІє х Zаuў/9 мZНёДb2ЕЛ$§% П#%%7& ›‰U ПF µ5ШХШ«ИвђЬ dи7KtґOІҐP 
ЋhўЫнМьLН–ъi; u іІaљ1чЈ&ысзР^JҐQs¬Пж1ИVЇo ±ЦС хj¶Ю П љл‡†іGГY ћ>ѓK“О х ґЂЫ{i+\yћШБЊўе‚aZ = ®ы&#152;‹     A MгZМн уСW¶Ъ¶a]}yЦFg"OЖ^‚я Q&#152;ЛЅI6*№71 (Чќ ){+ѕrУN,°­ВjЙ…‚{ оД¦ЁуЬ ЕкрY Ц уCh  5P”Т8&Щ—нРM­yЖ)ўЧ Ц*VѓFВЛgEBџ\8}рљЙ-&#152;%‡С"vЅ2їXЌ ІЂXђ<рДZ†”З| Z&»ч9~й $уT‚     J&#127;FaљSGЃ EьB°Pz=yЂеµ—;»§Fк  …Ћ&#127;Ћ@¤ Cм7yEeEГY „ЁчџL
ц0 ќ|/(Т3Ь>м зvU °яо) ТL ±РnХg`xrлqE”\ =ЋeЫf;ћiЖб‡ ?·U i c T  Ќ&[@C]N Cјб cОй—f 7 ­ГSGdh–0Э§
;(ГЙЃ0 :Ю@ Ozхv.Фн—#M, ё‹J zu-І щж єљБТџљзтоrlќ'жїитc7|*В/ы^pБў     <ЎчњB^KѓU¤и [Є№XЎхиI'g ! }  ЗB8CK54Ш™ ‡/і–О© =‡Щc.аAЋC©Т\6КtOгц  рIЅ2?Ь\ eVќ©5 ‚ЖуVЕKeh­&+ №н Y™ “ ?™ЮMЫ] є ›ш ЛпаиѓLжѓ’ЭИЙ =|–nYIйЦxэкv{  _
#«cN5 Ј Iв&#127; «ицт4‡,§•:N Xc·=C©/Кђ±SЙ(ѕн`ю< 7cГ›§§ю ›’Ё¬j } &Ґ  &YпгРР“sф E0 NЛ<]ъыЬW2`'cюt AmС%„ПTErX  uv ¬K7aгJD8ґQзиь )Уpс­8мќitљњ4:?|ћZлdІсu F‚% Z”5Ќ°)щ 9ж ЈGґЮA6Й хS  ІhГM cVыУ 8:ю4WЁEН5л+WжАЕ «;@ѕЗІќ>гw$щ’
/г ‰2= Љ·UйN‡)обЎК щ НДТ>KЫшєљtПj€лыЎ}ѕХй ђР9ЃGh/ХЩ  ± TYhЁ
Hjї5t№BЪ©њ—Rs0»Уa
 э0 Џ™‡VЈnH=A
мЦўЮиыWf=ШСяш®¶еґ­(:Уѓ9тH  ГDѓ вШI§аЪ ]@ђ8Mљ¶ щ ч©ЯвЗд›»oз"¬д%БHњЛЮkЇ э&ердVw іT#Ц Bj  Iі‰aЉСжњЩП%«И,АcЦ
.+Л_‰3В№Ћ¤TТ ”еяp«ЄіѕшFДм‰ѓJdRрјот‡ќщ0‘РxвѓзlУ†Hґ: nS&#152;3= WцJgД ™–‚ †Дѕ¶оDY¦yкЩsDA ЁШ©&ЮЬ€у°щт; S»;6РHO
Ж™Ь‰Кёz«9S NwЫ№ 5зЖћ„TqЙ     
ј#R –¶ЫtФм‘ђђ, мѕ«Џnп|њq 
 )  К   ©rV°‹ЊКА X,‡ > Д†«anЎqжњ (Vуќ‘Ф[п =ЉЬU'№b8 Q     јкЄ^3 ББЖЪшРqOОЊ
ЁвµM5/vА2vТPtц삤jёEJюzЁ6p 0эEљ DҐ@Њ^2Ыі  ™#:_` ©Њ2"Д‚ fћКхJ±? °ЉxТ“p)Й{фЙz¶‰ццПъ;vт ; k«МFзрP»аU;гбиИеГ^М 1mІц
qљА…“Ќ™h ѕ©. =ku —АМ…гWYqђўВ {FЁsо!Яd NЦС?"  Mј­—ЉLииЖ,~ЙXЕSNN1µ9ј   }Н>‡ GЏ`L‡<З®®[ЙX=7ч«КbrG™ 9  ўЏ'ЪаґqTЊ ‹[ЬC¦±WПU іМZн1P ¤/‚o·‡†н фѓdдогЅQЅ^ R«е О  Є zЮ‡ќДйВIT  D В©n.¤ ё±џВ^–4! /YжMптт9 }№ґ –„лЋ  †e”­& .S} гЂ hhДќ’ :a»[Ї     9“?u¶HX W&#152;7|DZ
Жs¤–+6Г‰ftк И"~ј  Ч k
О†ќ™cю‰+ ШЇИ<†ЬЬ+Dh[BKd ґЫэБv'Ў,›3I ь–dhjс)xZXVҐ0Hvюрq± Rя“%‰Зw4¶™щъвP¦ЮћСwп XїСbS= I8
31ю3 ¦@Џ%п Ак@НoКK xмr8В­†ю"Ђ0Ў~dJ «+јф‹J5јvЫћz AГA}ЈЉ1 тJо‚АЦ%» lУiѓе›iЦч±Iм·ћ ЄЈ•НK~ Y‰8p|ё Ѕyd/№ШV4~<&#152;BЯT­МУнЗќщҐ§OхA _ --З,uF Яg+PI‰Љ     _Otе°ЯЊнѓёё› ™эv/7ЙЉ1љИ`(#_ЄP§®ХьЌёМљ РаЌBюґЃ6±vФhbDоЃ3­lU@Ф2ыћр¶@
/GЫ¤Є $«ЧЦ РQ#;Ў  ађnСG• I ¤йF¬г љь CЉіЋxт¦>¶ЮW&#152;1тЛ¦|н®»TТU *›IшђВ? P0і> !фШu6$&ѕђ<e     [Ю©…\@ќ 1 &ўjмЕ©)`ЫО`Pе—ѓуVW2аЇу  ?ГЇfY AЯ 6O6ЋВ%(RD«Ф·µй ‚ї&#152;› FXнМ[ N «ґаьЃ [ПЙ‚Є!ЖiПВgДЃЪ©ж3ќГ" З¤F·ТNK‚yЛ}OЕєиЎЫ{„гHп9&#127;yє1џџl њН”yt> _] ^   Yй%Я:*уџ)ёms}о п! і&#152;)7Њі РKyГµъs¤шЙ   _ђ0 Jщх+¶Чv~‹ёУ™Ћ^кч®ЄчNЅ ‚[Б Щк­Mh™ТнЙ  FZe~&]3< BCi _ G&#152;р«К|щ‹ъt— ¦бcюНяэу…·н6–N ДПжM\МШ0yм ›42 йµJ'Б‰ ¤фтRЌХ55-сЏнҐРрН 0Ґ<U—Д2 •ъt {™­жи¬2НsN0€оСн“Аz B q Avaт‰&#127;b 5Ё&#127;І т°Ќ30©5c‰ф–§Љ8МЫM ‰oW|YЉ­iP I5њ_«Г®«ЂБ(АnPFоСІ=ШпC¬ЂЕ:щ№ Ю,,BшПВЊЂГТШr?|•3XыЦ@ (°щµ ,« ЂМЊ—r…L™&ЕЌ  pШ-'QцЧsI¶шНЬgАѓ]–8ЯБх Ьv Їщґґ ~њс@EћTР]ёfz®Ън; BљЁсzК ь §@fаРZРl ЌО4+…A`Љ6¬  I gХСе>ќ g> .Г€ўn T3]ђfГ"‚
В ‰ТБtV
I¤  сg ґmє
Ш§uzQђoђN фZ т Гx`а)ХYB Q>„Ъ2Р7И ДЙ п ?™wYс KнC‘f.‡ђАd9  ЌФХ ІFb­!з„qТЫ>Ђя№вхѕ ‡hч(ЊЇсК Ђ «пP +©unчes  »F ь}Т_№о¬5Ф8 +,љP^d[µЫюkю 
endstream
endobj
 
Replies

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points