11 Replies Latest reply on Dec 23, 2011 1:50 AM by tomtomtom

    How to transform/parse an html table

    tomtomtom

      Hi

       

      I have a string containing the code of an html table...

       

      <table>

      <tr>

      <td>Column 1</td>

       

      <td>Column 2</td>

       

      <td>Column 3</td>

      </tr>

      <tr>

      <td>Row1Value1</td>

       

      <td>Row1Value2</td>

       

      <td>Row1Value3</td>

      </tr>

      ... etc. ...

      </table>

       

      Actually I would like to have the content of the table in a query (would prefer this) or in a xml structure or in a cf structure or in an array or whatever. I tried to do this with XMLParse but I did not get it - I do not want to understand DTDs or what ever it needs.

       

      Is there a simple way for a rookie as me?

       

      Thank you very much!

        • 1. Re: How to transform/parse an html table
          Dan Bracuk Level 5

          Where are the values (Row1Value1 for example) coming from now?

          • 2. Re: How to transform/parse an html table
            tomtomtom Level 1

            Hello Dan

             

            I'm parsing with cfhttp an external webpage and then I'm stripping the code at a point where I got the discussed html table left. I'm not sure whether this idea works to the end but I thought this would be at least a nice start.

             

            Nice to read from you! I hope you can help me out (what you didi already severla times in the past - many thanks for this).

            • 3. Re: How to transform/parse an html table
              Dan Bracuk Level 5

              Once you get this data into a variable type that you like, how do you plan to use it?

              • 4. Re: How to transform/parse an html table
                tomtomtom Level 1

                I would then strip certain data types (p.e. everything in column 2) and output 'em into a text file....

                The best would be to have the entire table data in a query...

                • 5. Re: How to transform/parse an html table
                  Dan Bracuk Level 5

                  If the programming that creates the external web page could be made available as a web service or something, that would be nice.  If not, you could do something like this.

                   

                  Replace all the <tr> tags with a delimeter

                  Replace all the <td> tags with another delimiter.

                  Replace the <html> tag and all the closing tags with empty strings.

                   

                  You should now have nested lists which should be sufficient for the task at hand.

                  • 6. Re: How to transform/parse an html table
                    tomtomtom Level 1

                    Dear Dan

                     

                    I'm now at a point where I got the following content in my string:

                     

                     

                       new,

                       8,

                       7.69$$

                     

                       time,

                       7,

                       6.73$$

                     

                       happened,

                       6,

                       5.77$$

                     

                       one,

                       5,

                       4.81$$

                     

                       patient,

                       5,

                       4.81$$

                     

                       planing,

                       5,

                       4.81$$

                     

                       cheat,

                       4,

                       3.85$$

                     

                       apple,

                       4,

                       3.85$$

                     

                    I used $$ to replace </tr> (the last cell of each row) and commas at the end of each cell value. You were talking about nested lists. Can you give me a hint or even better an example on how to put that stuff in a query? This would be fantastic...

                    • 7. Re: How to transform/parse an html table
                      Dan Bracuk Level 5

                      First, don't use $$ as a delimiter.  Delimiters can only be one character.  Next, using commas is dangerous because there might be commas in the data.

                       

                      With the nested lists, you should be able to create your text file.  In fact, if you use chr(10) as your outer delimeter and chr(9) as the inner, you might even be able to use the result as a simple string and not have to loop through your lists.  That of course is based on the assumption that you want a tab delimited file.

                      • 8. Re: How to transform/parse an html table
                        tomtomtom Level 1

                        Dear Dan

                         

                        thank you very much for your inputs. So far I have changed my string to:

                         

                         

                           dies;

                           1;

                           1.88|

                         

                           untersuchung;

                           6;

                           2.88|

                         

                           nicht;

                           3;

                           2.45|

                         

                           info;

                           7;

                           1.33|

                         

                        The text file I was talking about is actually the very end of my story. It would be even more helpful if I would be able to start creating a query from my string above. Most probably I have to go through the lists. But I'm stuck way before:

                        How do I create from the string above n lists containing 3 values (word;number;number with two digits)?

                        How can I count the sets of data from the string above (in my example above there are 4, but this is always different)?

                         

                        I think as soon these two questions are answered I will be able to do what I need.

                         

                        Thanks for more patience...

                        • 9. Re: How to transform/parse an html table
                          tomtomtom Level 1

                          Meanwhile I even managed to remove all tabs, carriage returns and line feeds. My string looks now like this:

                           

                          fuehren;4;2.92|meglich;2;1.72|werden;2;1.23|diese;1;4.56|

                          • 10. Re: How to transform/parse an html table
                            Dan Bracuk Level 5

                            From here all you need to do to get the 3 values of word;number,number is to treat the string as a | delimited list.

                            • 11. Re: How to transform/parse an html table
                              tomtomtom Level 1

                              Thank you. With your input I was able to solve my problem at the end like this:

                               

                               

                              <cfset myQuery = QueryNew("fruit,quantity,quality")>

                               

                              <cfloop index="ListElement" list="#myRow#" delimiters="|">

                              <cfset temp = QueryAddRow(myQuery)>

                              <cfset Temp = QuerySetCell(myQuery, "fruit", "#Trim(ListGetAt(ListElement, 1, ';', 'yes'))#")>

                              <cfset Temp = QuerySetCell(myQuery, "quantity", "#Trim(ListGetAt(ListElement, 2, ';', 'yes'))#")>

                              <cfset Temp = QuerySetCell(myQuery, "quality", "#Trim(ListGetAt(ListElement, 3, ';', 'yes'))#")>

                              </cfloop>