1 person found this helpful
One of the easier and maintainable ways would be to use a Filemaker database web viewer to retrieve the html source, parse it into the appropriate records and fields, and then go from there with direct setting of InD frame content or outputting and placing of tagged text or xml files. ( Applescript is easiest for this. )
Thank you very much for your input.
I don't know of a way for an InDesign script to grab the textual data from a web page ... but it is possible using Adobe's Creative Suite Extension Builder technology, which is based on Adobe AIR. Once the Extension has retrieved the HTML from the web page, it can pour the text contents into an InDesign document, just like a script would, but faster.
I'm pretty sure it is possible with the ExtendScript socket object to
get the contents of a web page. It would have to be parsed like any HTML.
What you are asking for is indeed possible, but it may not be the most efficient way to achieve what I believe are your actual goals, and here is why:
When you publish data to a website, you usually have to insert a lot of formatting code, which means that the actual content becomes mixed with what, from InDesign's point of view, is garbage, which means that you have to spend a lot of effort to clean it up.
If, as I presume it true for a blog, the content is stored in a database, it is much, much better to retrieve the raw content, without the formatting "garbage", directly from the server. How to do that depends on your blog backend, but usually it involves making calls to some kind of API, or a server-side script that pulls the requested data.
Once you have the data it is simply a matter of putting it into InDesign. If you are lucky enough to get the data as XML, then it is simply a matter of importing and styling, possibly with some xsl transformations.
If you are really lucky, your blog backend may allow you to publish the content as XML/XHTM/RSS/ATOM etc, in which case there is no need for any specific server-side work, as you simple can pull the content directly through a regular http request.
The point I am trying to make is that you should pick up the content before it is littered with "garbage", instead of trying to clean it up afterwards. That is what you do if you have to do it by hand, but you really should try not to replicate your manual workflow, but instead go for what is a good automated solution