Skip navigation
Currently Being Moderated

PDF Size will increase in size dramatically with every submit.

Aug 17, 2011 7:41 AM

I have a PDF Form desinged using Adobe LiveCycle Desinger ES2.

 

It has a submit button which will submit the form to the server (IIS and ASP.NET) using this javascript command:

 

 

event.target.submitForm( {cURL: "http://server/ASPNETWebPage.ASPX", aPackets:["datasets","pdf"], cSubmitAs: "XDP"});

 

On the server, from ASP.NET, I use the following code to extract the submitted "chunk" element and convert it from Base64 to Binary PDF File:

 

            fs = New System.IO.FileStream(mFormFileNameFolder, IO.FileMode.Create)
            bw = New System.IO.BinaryWriter(fs)
            ' Get chunk element form the submitted XML
            Dim srChunk As New StringReader(mXML.GetElementsByTagName("chunk")(0).InnerXml)
            Do While True
                Dim theChunkLine As String
                theChunkLine = srChunk.ReadLine
                If Not String.IsNullOrEmpty(theChunkLine) Then
                    theReadBytes = theChunkLine.Length
                Else
                    theReadBytes = 0
                    Exit Do
                End If
                Dim theBase64Length = (theReadBytes * 3 / 4)
                Dim buffer() As Byte
                buffer = Convert.FromBase64String(theChunkLine)
                bw.Write(buffer)
            Loop
            bw.Close()
            bw = Nothing
            fs.Close()
            fs = Nothing

 

 

The above code is working fine, and PDF is generted successfully.

 

I have one problem.

 

With every submit, the generated PDF Size will increase dramatically. I reported this to Adobe Support, and they cofirmed that this is by desing and that with every submit, the previous PDF State is saved, and the new state is added. That is why I get huge PDF File.

 

I was told that the only way to solve this problem is to submit the form as PDF ONLY, and after I save the PDF File on a file system, I then must use Adobe Service/Process "exportData" to extract the XML Data from the PDF.

 

I think this is really big change to me. I was hoping that there is a way to indentify the latest PDF State from the chunk element.

 

Any help will be greatly appreciated.

 

Tarek.

 
Replies
  • Currently Being Moderated
    Aug 17, 2011 8:22 AM   in reply to tarekahf

    Are you submitting as XDP because you want the data and PDF separately? If not why not just submit the data and leave the PDF out of it. You can change the cSubmitAs parameter to XML and then you will get data only. Are there signatures involved in this scenario? Do you have LiveCycle Server at the back end.

     

    Paul

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 18, 2011 5:00 AM   in reply to tarekahf

    So it is the signatures themselves that is making the size of the PDF grow ......there is nothing I can do about that.

     

    Paul

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 18, 2011 6:04 AM   in reply to tarekahf

    Then that makes no sense .....the file shoudl grow slightly as more data is added but the signatures will cause a copy of the pdf to be saved so you can compare to the pre signature version and that is generally what cause the pdf size to grow large. Is the file Reader Extended?

     

    Paul

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 18, 2011 6:42 AM   in reply to tarekahf

    What version of Reader Extensions are you using .......the last state of the PDF shoudl not be changed if you have not signed it yet!

     

    Paul

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 18, 2011 7:10 AM   in reply to tarekahf

    No need ......I am not doubting that it is doubling in size ...just trying to figure out why.

     

    There was an issue at one time where Reader Extensions was affecting the size of the PDF but that was in earlier versions than the one that you have.

     

    Have you submitted just the PDF and if so does the size get affected?

     

    Paul

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 18, 2011 9:06 PM   in reply to tarekahf

    Did you catch my comment on your recorded video. assuming that you're the same person?  The issue that I saw there is that you were including images in the form data, and that they image was a 1.9MB TIFF file. Each image that you include will become part of the saved PDF. You should use an appropriately sized image. 

     

    And yes, Lee, Paul and I have all been talking about this issue.

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 19, 2011 5:20 AM   in reply to tarekahf

    Going back to our last test where you submitted just the PDF and saw an insignificant increase in size. This should be no different than the submission as an XDP (except the XDP will have all of the data as well) so it shoudl be a few K bigger). Chuck has mentioned the use of images ......are you including the images in your data stream? Are you also including the template in your data stream. If you are unsure can you write the inbound XDP file for a couple of submissions to separate files and send me the results. We can have a look at them here and see where the file size is coming from. You can send the files to LiveCycle8@gmail.com

     

    Paul

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 19, 2011 7:59 AM   in reply to tarekahf

    Then lets leave it with support and let them get to the root of the issue.

     

    Paul

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 24, 2011 1:43 PM   in reply to tarekahf

    I've been in touch with the support person in Edinburgh and looked at your files and your various attempts to make the size smaller. 

    Simply stated, it really is working as designed, but it is difficult to appreciate that unless you go a bit deeper into the file. 

     

    Or, to say this another way, it grows dramatically in size because you have added dramatic amounts of data.

     

    So, using your example of this form with a 1.9MB TIF image embedded as a inch-square thumnail five times,  I picked up most of this information using publicly available tools, such as the document font list (document properties/ fonts), text editors, and Windjack's Canopener.

     

    I'll give you a few metrics and comments which may help:

     

    1. The Base PDF file size is about 1.4MB.  Much of this is because of your embedded fonts which take over 1.1MB
    2. Your form is a reader-extended dynamic XFA form.  That means that the PDF itself does not contain the real pages as PDF marking operators...  It's generated each time you open it in Reader from the XFA form definition and your data.
    3. The image itself is 1.9MB.  But remember that this image is Base64-encoded, so it takes four bytes of XML for every three of image.  That makes the XML data 2.6MB/image.  And I'll note again that that's an incredibly large image to use in a square inch image.
    4. The file you've given us has the image repeated 5 times.  That explains the 14MB file size (2.6*5+1=14).  You can see a snapshot of the XML data and its size in the canopener view for "big".
    5. I presume that you know that PDF files have a versioned structure, where changes to the file add on in incremental change areas.  The file you sent has two areas...  One about 1MB and one 13MB.  You can see these if you open the file in a good text editor and search for %%EOF.  That happens at the end of each incrememental change.  In other words, the incremental change is all the XML data and there is only one incremental update area.  See section 7.5.6 of the PDF reference manual if you'd like to know more about the incremental update.
    6. You also observed that if you open this file in Acrobat 9.1 and save it, the file shrinks from 14MB to 4MB.  This is due to a feature that Acrobat added where it will compress parts of the XFA data stream.  You can see this in the canopener view for small: it is the exact same uncompressed size, but is reduced 10MB by the flate_compression. So you can thank Acrobat engineering, but it won't help your form submission issue much
    7. I'll also note that a basic check that I did on your file was to export the form data (tools/ forms/ more form options/ manage form data/ export in Acrobat 10) and saw the same size XML data stream for both of these.

     

    You're basically running up against basic laws of space conservation: put a number of big things in a flexible sack, and the sack grows. I'd suggest that you give strong guidance to folks on the size of the image that they use.

     

    PDF can be a bit mysterious if you can't see what's happening.  That's why tools like Canopener are key to shedding daylight on the dark insides.

     

    Finally, I will note that your filesize WILL increase when you add digital signatures.  The size comes when you sign, not when you add the field.  Simply stated, Acrobat (or Reader) will make a pdf marking set of the pages each time that the form is signed... that's the record part of it and it is a new level of incremental change.  So you can expect it to grow as signatures are added.  Again, this is even more reason to use appropriately sized images.

    Attachments:
     
    |
    Mark as:
  • Currently Being Moderated
    Aug 24, 2011 4:40 PM   in reply to tarekahf

    I'd like to see one of the files that has grown so much.  Or, better yet, I'd like to see a sequence of files, base, after submit with one image, after the next submit that adds another image, etc., and we can diagnose from there; also, a step where just some of the form data, not pictures, are changed.  But I'd also suggest that you get the 10 day trial pdf canopener from Windjack to inspect the files yourself for the base data AND that you count the number of %%EOF so that you can see the number of incremental updates (sounds like a good use of GREP).  But let's get some scientific numbers on the problem.

     

    Best would be to send the files to support on the existing case number.

     

    And I'd like to take this to a point of conclusion and then even do a brief blog on this topic.  I can only imagine that other people have these same issues.

     

    As for the "getting just the last chunk," it really depends on the SW you are running on the server.  "Simple" PDF utilities will just always make an incremental update. More rich software, like Form Data Integration in processes in LC let you export the data and then import to a clean form. And there are also tools in LiveCycle like assembler that will consolidate the incremental updates. 

     

    But the overall question is "what software are you using to merge the XML data into the form?"  Is it from Adobe or somwhere else?  Your forum posts don't shed any light on this.

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 24, 2011 4:57 PM   in reply to Chuck Myers (Adobe)

    I will try to prepare the files you requested, and I will send them all to support.

     

    I am not using any tool to merge the XML Data with PDF. I have developed a .NET Program to merge XML with PDF using XDP format. The result is rendered to the client browser as XDP MIME Type using VB.NET "Response.write()"

     

    When the PDF is rendered on the client, then when the user clicks "Submit" or save, and the PDF sent to ASPX Page on the server, then the "chunk" element is extracted from "Page.InputStream" and converted from Base64 to Binary Array, and the PDF is then generated as PDF file and saved on the server. All this is doen using .NET Program under IIS Server on Windows 2003 Server.

     

    I will try to use LiveCycle assembler services that will consolidate the incremental updatesthat but I have never done that from ASP.NET.

     

    Tarek.

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 26, 2011 1:36 PM   in reply to Tarek AHF

    You sent a file to support that shows the problem well.  The signed file had 7 incremental updates, and each update was about 1.3MB. But I noted that the image size varied significantly. Some were GIFs were 3KB while the TIFs were 360KB (all measured on the base64 data).  I would venture to say that you won't have a dramatic issue like this if the files are 1/100th the size

     

    I kicked this around with a key form developer (see his blog) and he had a great idea.  You can check the size of the image that the user has attached and give them an error if they have added an image that is too large: that can give them some idea on how to create a thumbnail.  John's words we "Just look at the length of the imagefield.rawValue – will tell them the size of the base64 image. If it’s too big, clear the field." That may be the most effective way to make the size increase less dramatic.  And it should not change your workflow.

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 21, 2011 1:19 PM   in reply to Chuck Myers (Adobe)

    The heart of the problem was that large images were being placed in XFA image fields. Due to the design of PDF and incremental updates, copies of these images were being added to the file for each file save.  I'll write more on this later, most likely on the ADEP product blog.  But for now, the solution is to limit the size of the image in the field.  [As background, the image was used for a 1x1 inch thumbnail of a face, which is well-satisfied by a 72 DPI highly compressed JPG, or around 20-40K bytes or less.  The images in the file were on the order of megabytes, which caused massive issues. 

     

    John Brinkman did a blog post on how to check the image size and generate an error if it is too large.  You can see this on John's Formfeed blog, and it is quite elegant.

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 21, 2011 4:44 PM   in reply to tarekahf

    Thanks John. That Formfeed blog post is a great answer to this question.

    -Jeff

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 22, 2013 4:11 PM   in reply to Chuck Myers (Adobe)

    I have a similar problem with the size of a pdf increasing.

    Using canopener did not reveal much at first, but then I used PDFXplorer and found a difference in results.

     

    Catalog

    --- Acroform

    ------ Fields

    ------ XFA

     

    With canopener the Fields object is empty.

    With PDFXplorer the Fields object is repeating (alot!) and contains XFA object. I am guessing this is where the huge size exists.

     

    Can anybody advise me as to what the Fields object actually represents and how/when it is populated?

     

    If I save the pdf using Acrobat, the size is heavily reduced.

    Then if I view in PDFXplorer, the Fields object is empty.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 28, 2013 3:02 PM   in reply to tarekahf

    Hi Tarek

     

    Appreciate your comment.

     

    I do not have any specific version of Reader or Acrobat the users are using when this issue occurs. I have asked 1st level support to gather those details next time if possible.

     

    I have raised this with Adobe Enterprise Support, hopefully they can shed some light.

     

    I did find that Fields object is part of the Interactive Form Dictionary.

    Also, along your feedback, there is only 2 instances of %%EOF, in the pdf at fault.

     

    Moris

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points