Copy link to clipboard
Copied
Hi,
I have a website where users can insert theirs hotel informations.
Most of them do not insert photos, I guess because the procedure is quite long and difficoult for the average user's skills.
I'm looking for a script that take as input the url of the website and give as output the urls of the images of the site, so the user can decide which to upload on his tab.
Someone can help me?
Copy link to clipboard
Copied
<cfhttp....> could be used to return the HTML of a URL. Note you will want to use the option to make all the internal relative links in the HTML absolute.
reFind() could be used to find all the <img...> tags, or other desired content.
I.E. I think this is a pretty good start at the regex expression" <img[^>]*>. With a little work from regex people more skilled then me, that could be refined to find the href inside that tag.
Just note, that not all images will be in <img...> tags on all web sites, they may be flash based for example.
Also, all the images you return may not be very attractive. Spacer gifs, or a single image divided up into several <img...> tags for example.
The basic parsing is pretty easy to do, decideing what to do with the extracted data, not so easy.
Copy link to clipboard
Copied
Thanks Ilsaac,
> The basic parsing is pretty easy to do, decideing what to do with the extracted data, not so easy.
I agree. It's not easy to program a professional parser. That's why maybe I have to mark that I'm asking for a professional script, also not completely free.
Copy link to clipboard
Copied
Define professional?
Something that is going to be smart enough to handle all the ways that images are include into web pages and all the ways images are used in web pages, is going to be a complicated, messy, and a hard to control product. I have never heard of anybody who has tried to make something like this.
As I said before, parsing the images out of HTML is pretty easy to do, knowing what that image is and how to use it, not so much and pretty much requires Google level engineering AND|OR human intervention.
Copy link to clipboard
Copied
> Something that is going to be smart enough to handle all the ways
> that images are include into web pages and all the ways images are
> used in web pages, is going to be a complicated, messy, and a hard
> to control product.
> I have never heard of anybody who has tried to make something like this.
When you share a link on your profile, Facebook allow you to get one of the images of the page you are sharing to accompany the shared link. It's quite good something like that.
Copy link to clipboard
Copied
When you share a link on your profile, Facebook allow you to get one of the images of the page you are sharing to accompany the shared link. It's quite good something like that.
Yes, but whenever I share one of my personal web site pages, it picks the logo image, but it only gets half of the image, becuse I've used CSS to composite a background image behind the foreground image. Facebook only sees the foreground image and, frankly, it looks like crap on the Facebook page background without the other image. So I always tell it to not use an image in my postings. Then again, I also don't post much so it's not much of an issue to me.
Personally, I find your Facebook example, to be a good case for why this is NOT so easy to do.
Message was edited by: ilssac But, if you want to do it, I've told you the two commands you would probably use to accomplish the basics. The devil is in the details, and I am not aware of anybody who is offering to sell or give away a tool that has worked out any of the devilish details.
Copy link to clipboard
Copied
Just for posterity ...
It was easier than I thought.
Here is a little code for a little image parser:
<cfset website = "www.example.com">
<cfhttp
url = "#website#"
method="GET"
resolveURL = "yes"
throwOnError = "yes"
redirect = "yes"
timeout = "15">
</cfhttp>
<cfset images = reMatchNoCase("<img([^>]*[^/]?)>", cfhttp.FileContent)>
<cfdump var="#images#">
With a little more work, what I needed is done.
Thanks guys.
Copy link to clipboard
Copied
La_Salamandra wrote:
I have a website where users can insert theirs hotel informations.
Most of them do not insert photos, I guess because the procedure is quite long and difficoult for the average user's skills.
I'm looking for a script that take as input the url of the website and give as output the urls of the images of the site, so the user can decide which to upload on his tab.
Someone can help me?
You have to improve your design. Even if you can find the Coldfusion code, your design will still fall short in 2 ways.
First, it is unreliable, because you're depending on some arbitrary site to be available and up to speed. Secondly, it is aesthetically wrong to be collecting pictures, especially large numbers of them, dynamically from someone else's site. Think of their copyright and bandwidth.
Fortunately, there are simple solutions. First, identify, by eye, the web pages containing the pictures you're interested in. Ask for permission from the owner.
You could indeed use Coldfusion's cfhttp or any other script to download the JPGs, PNGs, and so on. But then, why waste your time re-inventing the wheel? It is infinitely better to use a web crawler !
With most crawlers, you only have to supply the URL of the site, and the file extensions it has to grab (in your case, jpg, png, bmp, and so on).
One click on the button, and you have them reeling in. Automatically. Some crawlers are considerate enough to enable you to adjust the download bandwidth. (We can learn from a million years evolution wisdom. The vampire bat is known to inject a painkiller before sucking!).
Now that you've downloaded the images to your site, the links you display to your users are all yours. You may choose to resize some of the images, display them as you wish, and the issues of reliability and bandwidth are now up to you.
Copy link to clipboard
Copied
Hi BKBK,
users should see photos and images from their own websites; there would not be any copyright violation.
Yes, you're right. What am I looking for is just a crawler. A crawler that take in input a url and returns a list of images contained in that website. Users will decide which ones to upload on their hotel tabs.
Have you in mind some coldfusion crawler I could use to achieve that?
Copy link to clipboard
Copied
I got this from Google for "ColdFusion web crawler", after I passed up the top couple of sites that seem to just be search spam, returning results only tangently related to what I wanted.
http://ketanjetty.com/coldfusion/useful-code/web-crawler/
Copy link to clipboard
Copied
La_Salamandra wrote:
users should see photos and images from their own websites; there would not be any copyright violation.
OK. Then the issue of permission is sorted.
What am I looking for is just a crawler. A crawler that take in input a url and returns a list of images contained in that website. Users will decide which ones to upload on their hotel tabs.
You should be glad to know, most crawlers will in fact take more than 1 URL! So, you can just give such a crawler the URL of 10 sites, and just sit back and wait.
The point I wished to make is that you have to download the images you require to your own site. Organize the downloaded images on your web site, and let the user make a choice from your copies of their images.
Have you in mind some coldfusion crawler I could use to achieve that?
The crawler doesn't have to be in Coldfusion, it just has to be good. Good enough to download the images.
My all-time favourite crawler is httrack. It is open-source and free. It's wonderful software written by folks like you and me who spend their private time to serve fellow developers. If you find HTTrack useful like I did, then please give the developers a donation.