10 Replies Latest reply on Mar 19, 2013 7:39 AM by JaredHess

    Publishing is Always Publishing All

    JaredHess Level 1

      Hi there.

       

      RH 9 (latest patch)

      Windows 7-64-bit

      WebHelp output

       

      We have a couple of very large projects (2000+ topics in each) and publishing is taking a long time because even though we have Publish All cleared, it's still sending up all the files.

       

      For example, in one of these large projects, I made changes to two topics, and added an index entry and then generated it and published it. The generation time isn't too bad and is what I would expect.

       

      But the publish time for only changing two topics and an index entry is ridiculous. It's sending up thousands of files. Here's my results:

       

      Total Files: 2519

      Files Published: 2179

      Elapsed Time: 18:04

       

      How can we improve this? This is happening for me and my two coworkers on our Doc Team. We are publishing to an internal intranet webserver and we use that for documentation reviews from SMEs. We all share the same publishing location. We all upload our content the same way, by mapping a drive to the server and publish using the File System option.

       

      We are also sharing projects using the open source Merucrial source control system.

       

      Is RH getting confused because multiple authors are sending to one spot? Meaning if I publish to that location and if someone else publishes there, then the next day when I publish again, does RH no longer know that I've published there before and think it has to do all the files over again?

       

      Thanks in advance!

        • 1. Re: Publishing is Always Publishing All
          Peter Grainge Adobe Community Professional (Moderator)

          Jared

           

          No issue with what you say about the time but I don't think Rh is sending all the files, it just gives that impression.

           

          You will see it running through all the filenames but it is checking the two files are synchronised and only uploading what has changed. However, that is more than the two topics. The index has changed and that affects a number of support files that also have to be uploaded.

           

          Think of it this way. If you uploaded just what has changed using FTP it would be quicker. If you had to stop and manually compare the server and local versions, it would take longer and that is what Rh is doing.

           

          I'm not sure about how source control works here. To the best of my knowledge the process takes what is there and then works as if it were on your PC, in other words it updates your copy. I'm not sure if multiple authors publishing rather than one person doing it is good practice. Maybe ask about the workflow in the source control forum.

           


          See www.grainge.org for RoboHelp and Authoring tips

           

           

          @petergrainge

          • 2. Re: Publishing is Always Publishing All
            JaredHess Level 1

            Hi Peter. Thanks for responding, but I'm currently watching the current target directory as I type, and it's clearly pushing up all the .htm files. I didn't change these and they have no relation to the files I did modify. The date modified for the htms in the publishing directory is my current date and time not some older date if it were only doing a compare.

             

            I'll do some more testing.

            • 3. Re: Publishing is Always Publishing All
              Peter Grainge Adobe Community Professional (Moderator)

              Let us know how that goes.

               


              See www.grainge.org for RoboHelp and Authoring tips

               

               

              @petergrainge

              • 4. Re: Publishing is Always Publishing All
                Amebr Level 4

                Perhaps try doing a Get Latest before generating and publishing, to be sure all the files are exactly the same as the last checkin? There's a .txt file in the publishing directory (at least there is one on a test project I have) that lists MD5 and SHA1 codes, so perhaps these aren't matching for some reason.

                 

                You could also try having only one person doing the publishing for a few days, just to see if that makes a difference.

                 

                Amber

                • 5. Re: Publishing is Always Publishing All
                  JaredHess Level 1

                  Amber thanks for replying. I'm not sure what you mean by MD5 and SHA1 codes. What are those?

                  • 6. Re: Publishing is Always Publishing All
                    Amebr Level 4

                    They're codes that can be generated against file that are supposed to be unique to the specific file. If the file is changed in any way then the code will no longer match. I assume RH is using these codes to determine if the file has changed since the last upload. So if you all do a Get Latest before publishing, then I'm theorising that the unchanged files will then match those codes on the server (rather than perhaps having older versions being published from your local drive). But it's only a guess on my part.

                     

                    Amber

                    (Ah, I like how wikipedia describes MD5 "also commonly used to check data integrity". )

                    • 7. Re: Publishing is Always Publishing All
                      Jeff_Coatsworth Adobe Community Professional & MVP

                      I bet RH is only looking at the date stamps to figure out what's changed. Doing hashes on each file would suck up a lot of time/processing.

                      • 8. Re: Publishing is Always Publishing All
                        JaredHess Level 1
                        ...There's a .txt file in the publishing directory (at least there is one on a test project I have) that lists MD5 and SHA1 codes, so perhaps these aren't matching for some reason.
                        ...
                        They're codes that can be generated against file that are supposed to be unique to the specific file. If the file is changed in any way then the code will no longer match. I assume RH is using these codes to determine if the file has changed since the last upload. So if you all do a Get Latest before publishing, then I'm theorising that the unchanged files will then match those codes on the server (rather than perhaps having older versions being published from your local drive). But it's only a guess on my part.

                         

                        Okay. I've seen this before. The bsscftp.txt file. There's one for each folder in my project.

                         

                        "Perhaps these aren't matching," you said. What are they supposed to match? How do I interpret these files? Looks like there's three lines of text for each file in the project:

                         

                        Here's a few:

                         

                        100

                        FILENAME:Assigning_PC-DMIS_Functions_to_buttons_on_the_SpaceMouse_or_SpaceBall.htm    MD5:8466113706989726975102507288529710379995797109686161    SHA-1:541226911911210852100981196582681001225299481005610067821051051007761   

                        FILENAME:Automating_PC_DMIS.htm    MD5:115102103871041118712253122491027452102116971039710357666161    SHA-1:1208343487111511678837711856857449847288845183556748115437261   

                        FILENAME:Available_PC_DMIS_Functions_for_SpaceMouse_or_SpaceBall.htm    MD5:75717310010089101114848773841117797779911978110120686161    SHA-1:1166612079687482821021071001161161085410149105100681137010384118717861  

                        ...

                         

                        And so on.

                         

                        As for getting the latest, I assume you're referring to a source control? I do try to do a pull of the latest changes (we're using Mercurial, not RoboControl) but I don't know if that makes a difference.

                        • 9. Re: Publishing is Always Publishing All
                          Amebr Level 4

                          Sorry for the delay, I've been away for a couple of weeks.

                           

                          If RH was checking these codes, the process would be something like: RH calculates the MD5 for a file, looks up the number in the .txt file, if the number is the same as the newly calculated one don't upload, if it's different (because the file has changed) then upload.

                           

                          Yes, I meant getting the latest version from source control.

                           

                          The last suggestion is designating one person to do all publishing jobs for a few days, to see if that makes a difference. It might not be workable longer term, but at least it might indicate a single file that is storing a list of "last changed" topics.

                           

                          Amber

                          • 10. Re: Publishing is Always Publishing All
                            JaredHess Level 1

                            It does make a diff to have just one person publishing. Lately, I've been the one pulling the other authors' changes and publishing when they need it. In that case, the publish part of generation only puts up the newly changed/added topics (along with a bunch of file its creates on the fly to handle searching, toc, index etc, but that's 'normal').