8 Replies Latest reply on Aug 17, 2009 8:20 AM by Peder280370

    [js/cs4] diff'ing texts

    Peder280370 Level 1

      I would like to use Extendscript to compare two texts (e.g. from two text frames) and display the differences to the user in a sensible format. I don't care about the text formatting, whitespace and such changes.


      My current approach is to:

      1) Fetch the contents from the two text frames.

      2) Compare the contents using a javascript I found on the Internet ("jsdiff.js")

      3) Save the result into a local html work file along with a wee bit of html decoration.

      4) Open the file using the execute method, thus displaying the result in a web browser.


      It works decently, but I would still prefer a more elegant and integrated approach. Could be that someone in this forum has a better approach...

      For instance, is there any of the inDesign UI or ScriptUI widgets that will render html properly (I couldn't spot it)?

      Another approach would have been to generate pdfs with the texts and use BridgeTalk to open them in Acrobats "Compare documents" window. However, you seem to have very limited control over this function.

      Any suggestions would be appreciated.



      - Peder

        • 1. Re: [js/cs4] diff'ing texts
          Bob Stucky Adobe Employee

          There's nothing in ScriptUI that can do it. You could write an html renderer for InDesign using rules and xml import, so long as your generated html is well formed xml. Handle the styling in "rules" for each html tag. It wouldn't get too ugly unless you tried to support css.


          Likely the easiest way to do it is to put up a UI using Flex. Flex has html rendering features. You'd need to look at PatchPanel in Adobe labs.



          • 2. Re: [js/cs4] diff'ing texts
            [Jongware] Most Valuable Participant

            Wouldn't it be the summum of elegance if you inserted the differences as conditional text in one of the two frames? That way, you could easily hide all differences at once, and on a per-occurrence base decide whether you want the original or the adjusted part (just remove the condition of the part you want).


            [Looking at the OMV, it seems to be as simple as


            myCondition = app.activeDocument.conditions.add ({name:"Added", indicatorMethod:ConditionIndicatorMethod.USE_HIGHLIGHT, indicatorColor:UIColors.YELLOW});


            at the start of your script -- only once per document -- and


            someTextRange.applyConditions ( [ myCondition ] );


            where you inserted "someTextRange" from one frame to the other.

            Depending on how good the diff script is, you could even use other condition types, such as "Changed", "Deleted".


            Just thinking out loud ]

            • 3. Re: [js/cs4] diff'ing texts
              Bob Stucky Adobe Employee

              Absolutely! Nice thinking!


              If he wants to do HTML, Flex or render it in ID. But what you described is, I think, vastly superior.



              • 4. Re: [js/cs4] diff'ing texts
                Peder280370 Level 1

                Thank you both. This is just the kind of input I was looking for.

                - Peder

                • 5. Re: [js/cs4] diff'ing texts

                  I really like the idea of using the conditional text for the diff. Somehting worth testting. I would really appreciate it to hear from you when you get it to work (or if not). Cool stuff!





                  • 6. Re: [js/cs4] diff'ing texts
                    Peder280370 Level 1

                    Hi Oscar et al.


                    I got to look at it now, so, I'll fill you in...


                    Firstly, what I actually want to solve is the following problem:

                    A plain text that originates from a database gets inserted somewhere in a Story. After that, two types of changes to the text may occur: The designer may update the placed text (layout changes, formatting, but also actual text changes), or the original plain text may get updated in the database.


                    So, I want to allow the designer to compare the placed text with the current database value, and to be able to manually "merge" changes without loosing layout changes, formatting, white-spaces, etc.


                    The solution I have implemented (based on Jongware's proposal) is as follows:


                    1) The main script function is called with an inDesign text range (the placed text version) and a string (the DB text version).

                    2) The entire inDesign text range is highlighted with a yellow condtion, signalling that it is now in diff mode.

                    3) All diffs added locally (or removed from the db value) gets highlighted with a green condition.

                    4) All diffs removed locally (or added to the db value) gets highlighted with a red condition.

                    5) Whenever the designer right-clicks somewhere in the inDesign text range (yellow condition), two menu-items are added to the popup-menu:

                       5.1) "End diff mode": Will remove the yellow and green condition mark-up and purge text marked with red condition.

                       5.2) "Revert diffs": Will remove the yellow and red condition mark-up and purge text marked with green condition.


                    In between starting the diff mode and ending it using one of the two menu calls, the designer can update the marked-up text, and e.g preserve a red change by removing the red condition.


                    Overall, the solutions seems to work fine (being still at the development stage). I have a few reservations, though:

                    1) To use the diff alogrithm, I pretty much rely on Text.words being analogous to Text.contents.split(/\s+/). I'm not sure that this is a universal truth.

                    2) I'm not overly impressed with the speed of some of the operations (e.g. removing conditions from a text range). But alas, that may have more to do with me being pretty new to inDesign scripting...

                    3) The script needs to run in the "session" target engine, because I update the text popup menu. It seems to be the case that you then have to run the script from within inDesign rather than from the ExtendScript Toolkit. (Anyone?)


                    Anyway, if you are interested in the gory details, I can post the test script.


                    Best regards,

                    - Peder


                    • 7. Re: [js/cs4] diff'ing texts
                      [Jongware] Most Valuable Participant

                      Hi Peder,


                      You beat me to it :-) I was still trying to figger out how the original js diff actually works. A few preliminary thoughts.


                      1) This runs into problems with multiple spaces and/or tabs. InDesign correctly treats "word space space word" as two words, but a few of my own scripts also barf on that ... Punctuation seems to be no problem.


                      2) ID scripts stall when you do operations on an entire 'collection' of objects. Storing as many objects as possible into variables might help, but for that I'd need to see the entire script.


                      Can you post the script somewhere for other people to look at? If it's more than a couple of hundreds of lines, give it a .txt extension and attach it to your post -- it'll go through that queue thingy but at least we'll know for sure no lines will be broken at bad places.

                      • 8. Re: [js/cs4] diff'ing texts
                        Peder280370 Level 1

                        Thanks for the input. I have attached the file as .txt as you propose.

                        Just create a new document, add two text boxes labelled "test1" and "test2" and copy some almost-identical text into them. Then run the script (One has to run it from within inDesign, for the menus to work properly).


                        Ad 1) I also suspect that there may be different types of white-spaces in, say, non-latin languages, that are treated differently by the two methods. I guess One ought to re-write the diff algorithm to work directly on inDesign entities, but it is a bit out of the scope of this little pet project.


                        Ad 2) I think you are right, excessive use of property collections seems to slow the procedure down, and I actually try to use variables here and there. For my use, however, the speed is quite acceptable. I had a particular problem with removing a specific condition from a text range whilst preserving other conditions. I see no other options but to scan the text-range character by character...


                        Best regards,

                        - Peder