10 Replies Latest reply on Oct 14, 2010 8:13 AM by La_Salamandra

    Looking for a script that get images' urls from a website

    La_Salamandra

      Hi,

       

      I have a website where users can insert theirs hotel informations.

       

      Most of them do not insert photos, I guess because the procedure is quite long and difficoult for the average user's skills.

       

      I'm looking for a script that take as input the url of the website and give as output the urls of the images of the site, so the user can decide which to upload on his tab.

       

      Someone can help me?

        • 1. Re: Looking for a script that get images' urls from a website
          ilssac Level 5

          <cfhttp....> could be used to return the HTML of a URL.  Note you will want to use the option to make all the internal relative links in the HTML absolute.

           

          reFind() could be used to find all the <img...> tags, or other desired content.

           

          I.E.  I think this is a pretty good start at the regex expression" <img[^>]*>.  With a little work from regex people more skilled then me, that could be refined to find the href inside that tag.

           

          Just note, that not all images will be in <img...> tags on all web sites, they may be flash based for example.

           

          Also, all the images you return may not be very attractive.  Spacer gifs, or a single image divided up into several <img...> tags for example.

           

          The basic parsing is pretty easy to do, decideing what to do with the extracted data, not so easy.

          1 person found this helpful
          • 2. Re: Looking for a script that get images' urls from a website
            La_Salamandra Level 1

            Thanks Ilsaac,

             

            > The basic parsing is pretty easy to do, decideing what to do with the extracted data, not so easy.

             

            I agree. It's not easy to program a professional parser. That's why maybe I have to mark that I'm asking for a professional script, also not completely free.

            • 3. Re: Looking for a script that get images' urls from a website
              ilssac Level 5

              Define professional?

               

              Something that is going to be smart enough to handle all the ways that images are include into web pages and all the ways images are used in web pages, is going to be a complicated, messy, and a hard to control product.  I have never heard of anybody who has tried to make something like this.

               

              As I said before, parsing the images out of HTML is pretty easy to do, knowing what that image is and how to use it, not so much and pretty much requires Google level engineering AND|OR human intervention.

              • 4. Re: Looking for a script that get images' urls from a website
                La_Salamandra Level 1

                > Something that is going to be smart enough to handle all the ways

                > that images are include into web pages and all the ways images are

                > used in web pages, is going to be a complicated, messy, and a hard

                > to control product.

                > I have never heard of anybody who has tried to make something like this.

                 

                When you share a link on your profile, Facebook allow you to get one of the images of the page you are sharing to accompany the shared link. It's quite good something like that.

                • 5. Re: Looking for a script that get images' urls from a website
                  ilssac Level 5

                  When you share a link on your profile, Facebook allow you to get one of the images of the page you are sharing to accompany the shared link. It's quite good something like that.

                   

                  Yes, but whenever I share one of my personal web site pages, it picks the logo image, but it only gets half of the image, becuse I've used CSS to composite a background image behind the foreground image.  Facebook only sees the foreground image and, frankly, it looks like crap on the Facebook page background without the other image.  So I always tell it to not use an image in my postings.  Then again, I also don't post much so it's not much of an issue to me.

                   

                  Personally, I find your Facebook example, to be a good case for why this is NOT so easy to do.

                   

                  Message was edited by: ilssac  But, if you want to do it, I've told you the two commands you would probably use to accomplish the basics.  The devil is in the details, and I am not aware of anybody who is offering to sell or give away a tool that has worked out any of the devilish details.

                  • 6. Re: Looking for a script that get images' urls from a website
                    BKBK Adobe Community Professional & MVP

                    La_Salamandra wrote:

                     

                     

                     

                    I have a website where users can insert theirs hotel informations.

                     

                    Most of them do not insert photos, I guess because the procedure is quite long and difficoult for the average user's skills.

                     

                    I'm looking for a script that take as input the url of the website and give as output the urls of the images of the site, so the user can decide which to upload on his tab.

                     

                    Someone can help me?

                     

                    You have to improve your design. Even if you can find the Coldfusion code, your design will still fall short in 2 ways.

                     

                    First, it is unreliable, because you're depending on some arbitrary site to be available and up to speed. Secondly, it is aesthetically wrong to be collecting pictures, especially large numbers of them, dynamically from someone else's site.  Think of their copyright and bandwidth.

                     

                    Fortunately, there are simple solutions. First, identify, by eye, the web pages containing the pictures you're interested in. Ask for permission from the owner.

                     

                    You could indeed use Coldfusion's cfhttp or any other script to download the JPGs, PNGs, and so on. But then, why waste your time re-inventing the wheel? It is infinitely better to use a web crawler !

                     

                    With most crawlers, you only have to supply the URL of the site, and the file extensions it has to grab (in your case, jpg, png, bmp, and so on).

                    One click on the button, and you have them reeling in. Automatically. Some crawlers are considerate enough to enable you to adjust the download bandwidth. (We can learn from a million years evolution wisdom. The vampire bat is known to inject a painkiller before sucking!).

                     

                    Now that you've downloaded the images to your site, the links you display to your users are all yours. You may choose to resize some of the images, display them as you wish, and the issues of reliability and bandwidth are now up to you.

                    • 7. Re: Looking for a script that get images' urls from a website
                      La_Salamandra Level 1

                      Hi BKBK,

                       

                      users should see photos and images from their own websites; there would not be any copyright violation.

                       

                      Yes, you're right. What am I looking for is just a crawler. A crawler that take in input a url and returns a list of images contained in that website. Users will decide which ones to upload on their hotel tabs.

                       

                      Have you in mind some coldfusion crawler I could use to achieve that?

                      • 8. Re: Looking for a script that get images' urls from a website
                        ilssac Level 5

                        I got this from Google for "ColdFusion web crawler", after I passed up the top couple of sites that seem to just be search spam, returning results only tangently  related to what I wanted.

                         

                        http://ketanjetty.com/coldfusion/useful-code/web-crawler/

                        • 9. Re: Looking for a script that get images' urls from a website
                          BKBK Adobe Community Professional & MVP

                          La_Salamandra wrote:

                           

                           

                          users should see photos and images from their own websites; there would not be any copyright violation.

                          OK. Then the issue of permission is sorted.

                           

                          What am I looking for is just a crawler. A crawler that take in input a url and returns a list of images contained in that website. Users will decide which ones to upload on their hotel tabs.

                          You should be glad to know, most crawlers will in fact take more than 1 URL! So, you can just give such a crawler the URL of 10 sites, and just sit back and wait.

                           

                          The point I wished to make is that you have to download the images you require to your own site. Organize the downloaded images on your web site, and let the user make a choice from your copies of their images.

                           

                          Have you in mind some coldfusion crawler I could use to achieve that?

                          The crawler doesn't have to be in Coldfusion, it just has to be good. Good enough to download the images.

                           

                          My all-time favourite crawler is httrack.  It is open-source and free. It's wonderful software written by folks like you and me who spend their private time to serve fellow developers. If you find HTTrack useful like I did, then please give the developers a donation.

                          • 10. Re: Looking for a script that get images' urls from a website
                            La_Salamandra Level 1

                            Just for posterity ...

                             

                            It was easier than I thought.

                             

                            Here is a little code for a little image parser:

                             

                            <cfset website = "www.example.com">


                            <cfhttp

                               url = "#website#"

                               method="GET"

                               resolveURL = "yes"

                               throwOnError = "yes"

                               redirect = "yes"

                               timeout = "15">

                            </cfhttp>


                            <cfset images = reMatchNoCase("<img([^>]*[^/]?)>", cfhttp.FileContent)>


                            <cfdump var="#images#">

                             

                            With a little more work, what I needed is done.

                             

                            Thanks guys.