When you read a binary file you get binary data. Word .doc
are not text
files. If you can not convert the files to txt or at least
rtf files
you will have to use the word com object to parse the file.
This is a
very problematic solution as it involves installing MS Word
on the
server. The trouble is the MS Word is not designed to run on
a server
and both Adobe nee Macromedia, and Microsoft warn against
doing so.
If you do so, have good access to the server. Because as you
program,
anytime you do something that causes MS Word to ask a
question with a
dialog box, it is going to send that to the server's screen
and lock up
and wait for somebody sitting at the server to answer the
dialog. Since
it is not a server application it doesn't understand how to
send these
to clients in any way.
No since you can read some of the text from the binary, you
may be able
to get it out with Regex or other string processing, but that
does not
sound like fun to me.
kitty1967 wrote:
> Hello all:
>
> I have a directory of word documents that I need to loop
through and read the
> contents and save various parts of the textual content
into a database.
>
> I've used cfdirectory to loop through the directory and
then cffile
> action="read" to read the contents of the file into a
variable. However, I have
> what appears to be binary information stored before and
after the text that is
> saved in the variable specified in the cffile tag.
>
> How can I get rid of this so that I'm left with just the
text contained in the
> Word file?
>
> TIA
> Lisa
>