9 Replies Latest reply on Jul 18, 2010 12:15 PM by mtomkins

    Custom command to strip all except specified tags / parameters using javascript or regex?

    mtomkins Level 1

      Hi all

       

      I'm trying to figure out how to create a custom command that will allow me to strip all tags from a page except for those I specify, and will only allow the parameters that I specify for the remaining tags. It seems to me that it should be achievable either with a custom command using javascript, or by simply recording a command and doing a search and replace with regex.

       

      I'm not familiar with javascript, so I've been trying the latter route, but I'm open to help with either method (or for that matter, any other method that will allow me to perform this with just a couple of mouse clicks, and no need to initiate a long sequence of commands manually every time).

       

      The tags I'd like to retain are:

       

      • p
      • br
      • ul
      • ol
      • li
      • a
      • table
      • tr
      • td

       

      Any tags other than the above would need to be stripped. For the tags above, I'd obviously want to retain opening and closing tags, and I'd also want to allow href and target parameters for the a tag. I would want to strip all other parameters from these tags, though.

       

      I got partway into this, in that I managed to make regex that finds all the tags apart from the above, but hadn't yet gotten to trying to strip the unwanted parameters:

       

      </?\w+(?<!p|br|ul|ol|li|a|table|tr|td)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s* )/?>

       

      This validates on the online version of RegExr (http://gskinner.com/RegExr/), and finds all tags apart from the above correctly as far as I can see. However, it doesn't work in Dreamweaver's Find box with Use Regular Expression checked, reporting "Invalid Quantifier".

       

      ...and this is as far as I've gotten. If anybody can offer help, it'd be much appreciated!

       

      Thanks in advance to you all...

       

      ---

      Mike

        • 1. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
          mtomkins Level 1

          Figured I'd stop by to say, if there's anything I can do to help test / figure this out, or any further information I need to provide, please do let me know.

           

          I should probably note that I'm using Dreamweaver CS5...

          • 2. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
            Nancy OShea Adobe Community Professional & MVP

            Not sure which tags you're trying to strip, but I often use the tag stripping feature in DW's Find & Replace tool.

             

            Ctrl+F / Cmd+F

             

            Current document | Folder | Selected Files | Entire Local Site

             

            Search: Specific Tag      |   font, whatever...

             

            Action: Strip Tag

             

            Depending on what you're trying to accomplish, you may need to run it a few times.

             

             

            Nancy O.
            Alt-Web Design & Publishing
            Web | Graphics | Print | Media  Specialists
            http://alt-web.com/
            http://twitter.com/altweb
            http://alt-web.blogspot.com

            • 3. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
              mtomkins Level 1

              Hi Nancy

               

              Sorry for my tardy reply... I've been rather busy the last few weeks!

               

              Thanks for the reply. Unfortunately this doesn't really help me, because it's not a handful of specific tags I need to strip, but rather all tags *except* specific ones. I'm given content that I'm expected to publish by a variety of sources, in all sorts of formats. (Various Word versions, PDFs, pre-existing web pages with formatting that doesn't match our own, you name it and I get it.) Hence pretty much any tag could show up, and most attributes could need stripping from the few tags I need to remain.

               

              With Find & Replace, I'd be stuck having to go through the code by hand to see what needed replacing, then running a bunch of separate Find & Replace commands that would vary from document to document. Basically, what I do already.

               

              Hence my need to achieve this using Javascript or Regex, in a manner where I can specify only the tags that should remain (and only the attributes that should remain for those tags).

               

              I do appreciate your trying though, especially since you're the only reply I've had to date.

               

              Any more suggestions, anybody?

              • 4. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
                BCDoherty Level 3

                Sounds like it might be easier to copy and paste your input into your page templates. Use Paste Special without formatting or with limited formatting. This strips out all tags. Then just go through and insert the proper tags. Alternatively, copy the new text into a plain text editor (like Notepad on the PC). I've found this to be pretty efficient, unless the formatting you are trying to preserve is really complex, and from the tags you're allowing, this doesn't seem to be the case..

                 

                Barry

                • 5. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
                  mtomkins Level 1

                  Thanks, Barry. Paste Special is what I'm currently doing, but as you can imagine, it's mighty tedious cleaning up the formatting by hand on every single document, when the formatting I need is all there to start with in a regular paste -- just along with a bunch of extraneous formatting. With as many as a dozen or more documents to do each day, some with lengthy, nested lists and many bits of formatting that I need to keep, it's a bit soul destroying to have to do it manually.

                   

                  It must be possible with Javascript, but my problem is I'm not a Javascript coder and would need to learn from scratch. It's likely possible with regex too, but my issue there is that Dreamweaver's version of Regex seems to be cut down from the version all of the regex validators I could find online use. I came fairly close to having something workable done entirely with regex when I last looked at this a couple of months back, and it validated just fine (only remaining quirk was to do with some attributes I didn't need, but that was probably solvable.) Unfortunately, although it validated in multiple online tools, Dreamweaver refused to accept it as valid.

                   

                  I've since found a (very old) Dreamweaver Command extension that partially works, incidentally, and could perhaps be extended to do what I need -- but again, it would require me to understand code I simply don't, as yet.

                   

                  http://www.andrewwooldridge.com/dreamweaver/commands.html

                   

                  The Remove Tags Except command there mostly works, but it seems to have an issue with nested tags that throws up an error message, and it also doesn't have a way for me to specify which attributes to keep, just which tags. Other than that it's close to ideal though -- the only other way it could be improved would be if I could kludge it to simply run straight away with a predetermined exclusions file, rather than making me manually select and load the exclusions every time.

                  • 6. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
                    BCDoherty Level 3

                    I can understand how that can be a tedious process. DW's regex is a strange beast of uncertain ancestry. Takes a little getting use to (and I haven't done so yet).

                     

                    Javascript is not the answer as that is a client side scripting language that doesn't readily take an input file and create a processed output file. You probably would be better off using PHP to process the file....

                     

                    If I have a chance I will take a look at that extension your found.

                     

                    Barry

                    • 7. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
                      MurraySummers Level 8

                      DW's regex is pretty much standard regex.  Have you found that not to be the case?

                       

                      Murray

                      • 8. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
                        BCDoherty Level 3

                        Actually, yes.

                         

                        Coming from an immersion in Linux regex I definitely found that I had to some rewriting of the code. The Adobe documentation was also at odds with regular regex, to coin a phrase.

                         

                        Can't remember the exact issues off the top of my head but may have some documentation.

                        • 9. Re: Custom command to strip all except specified tags / parameters using javascript or regex?
                          mtomkins Level 1

                          Murray: Yes, as noted previously, I have found that regex which validates and works elsewhere does not work in Dreamweaver. I don't have a specific example at just this moment, sorry. Seems Barry's had the same.

                           

                          Barry, just wondering if you'd gotten a chance to look at that Dreamweaver command extension I'd linked to?