You have encoding backwards.
Encoding happens left to right (or in array index order) while decoding happens in reverse.
Yes. I understand.
Assume "a" is a stream
If I first encode "a" with ASCII85 and then with FLATE and use /Filter [/FlateDecode /ASCII85Decode] it works fine.
But if I first encode "a" with FLATE and then with ASCII85 and use /Filter [/ASCII85Decode /FlateDecode] there is a problem.
It seems like you are not reading ISO 32000 (aka the PDF Standard). It is very clear there:
name or array
(Optional) The name, or an array of zero, one or several names, of filter(s) that shall be applied in processing the stream data found between the keywords stream and endstream. Multiple filters shall be specified in the order in which they are to be applied.
So if you encode A with ASCII85 and then Flate, then the correct value would be and NOT the other way around!
a = deflate(ascii85_encode(a))
WORKS ONLY WITH /Filter [/FlateDecode /ASCII85Decode] and not the other way around
a = ascii85_encode(deflate(a))
DOES NOT WORK WITH /Filter [/ASCII85Decode /FlateDecode] or /Filter [/FlateDecode /ASCII85Decode]
What I mean by works, is it opens in Acrobat.
This is what is given in ISO 32000
For example, data encoded using LZW and ASCII base-85 encoding (in that order) shall be decoded using the following entry in the stream dictionary:
EXAMPLE 2/Filter [ /ASCII85Decode /LZWDecode ]
This is exactly the way I have used it too.
Could anything else be wrong ?
Do you have a sample of data that was deflated and then ascii85 encoded?
Also I would be interested to see the result of the ascii85 decoding prior to it being read by the flatedecoder.
Because based on what you've stated here, the only thing I can think of is that the output of the ascii85 decoding is not exactly correct. And I only suggest that because you said you wrote the ascii85 decoder yourself.
The above is the link to the following uploaded sample files. Please let me know if you need further samples.
vc_ups_0.pdf - Object 24 not compressed
Remarks - PDF is ok.
vc_ups_1.pdf - Object 24 uses /Filter [/FlateDecode /ASCII85Decode]
Remarks - PDF is ok.
vc_ups_2.pdf - Object 24 uses /Filter [/ASCII85Decode /FlateDecode]
Remarks - PDF is no ok.
vc_ups_3.pdf - Object 24 uses /Filter [/ASCII85Decode]
Remarks - PDF is ok.
vc_ups_4.pdf - Object 24 uses /Filter [/FlateDecode]
Remarks - PDF is ok
The stream in vc_ups_2.pdf - Object 24 appears to be invalid.
Adobe® Portable Document Format
"The ASCII base-85 encoding uses the characters ! through u and the character z, with the 2-character sequence ~> as its EOD marker. The ASCII85Decode filter ignores all white-space characters (see Section 3.1, “Lexical Conventions”). Any other characters, and any character sequences that represent impossible combinations in the ASCII base-85 encoding, cause an error."
You have a bunch of chars in that stream that violate the spec.
As you can see vc_ups_1.pdf which is first ASCII85 encoded and then Flate encoded works fine.
vc_ups_3.pdf which is only ASCII85 encoded works fine.
vc_ups_4.dpf which is only Flate encoded also works fine.
So it appears that the ASCII85 encoder is working fine.
But the problem starts when the stream is first Flate encoded and then ASCII85 encoded.
So probably Flate encoding inserts some white space characters in the stream.
Can you please let me know what are the white space characters.
Do I simply remove all these white space characters from the Flate encoded stream before the ASCII85 encoding process ?
The invalid chars are in the picture I added to my last post (see all the values within the blue highlighted area that are less than 0x21h).
Also, I have decoded object 24 from vc_ups_3.pdf and vc_ups_4.pdf. They are not the same.
If I had to guess, I would say your code is not properly accounting for non-printable characters.
Also your ASCII85 stream in object 24 does not terminate with the proper character sequence of "~>" as it is stated in the spec.
RE: Do I simply remove all these white space characters from the Flate encoded stream before the ASCII85 encoding process ?
Answer: No. If you did that, once you ASCII85Decode the stream, your next step would be to try to FlateDecode that stream which would fail since you previously removed characters from it.
It is OK to have non-printable chars in a Flate encoded stream.
It is NOT OK to have non-printable chars in an ASCII85 encoded stream.
Your ASCII85 code needs to take into account the correct start and end point of the streams. And i mean the EXACT start and end points. See the PDF spec section about streams ending with 0x0D0Ah vs just 0x0Ah or just 0x0Dh. It needs to take into account the length of each piece and the overall size to properly account for any required padding.
I guess I can't edit my post so here's an update to it:
Object 24 from vc_ups_3.pdf and vc_ups_4.pdf after decoding ARE in fact the same. Sorry about that.
However the ASCII85 encoded stream in vc_ups_3.pdf is missing the terminating '~>'
Everything else stated in my previous post stands.
I have no ASCII85 encoder myself, so I'm coding one up. I'll keep ya posted with my findings.
Thanks for your efforts. Please let me know when you are done with the encoder.
Well, that was an adventure. I actually needed to do this anyway so it was not an issue for me.
If you run across any issues with the code, feel free to drop me an issue on my github page noted in the code.
Keep in mind though that the rules noted in the specification regarding EOL chars before and after the streams should still be respected. This is only for encoding / decoding.
Ensuring you have the correct start and end points remains up to you.
When I click on the ASCII85 Code link, I get a file not found error.
Can you please send it again ?