• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Read contents of Word docs?

New Here ,
Jun 19, 2006 Jun 19, 2006

Copy link to clipboard

Copied

Hello all:

I have a directory of word documents that I need to loop through and read the contents and save various parts of the textual content into a database.

I've used cfdirectory to loop through the directory and then cffile action="read" to read the contents of the file into a variable. However, I have what appears to be binary information stored before and after the text that is saved in the variable specified in the cffile tag.

How can I get rid of this so that I'm left with just the text contained in the Word file?

TIA
Lisa
TOPICS
Advanced techniques

Views

217

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 19, 2006 Jun 19, 2006

Copy link to clipboard

Copied

LATEST
When you read a binary file you get binary data. Word .doc are not text
files. If you can not convert the files to txt or at least rtf files
you will have to use the word com object to parse the file. This is a
very problematic solution as it involves installing MS Word on the
server. The trouble is the MS Word is not designed to run on a server
and both Adobe nee Macromedia, and Microsoft warn against doing so.

If you do so, have good access to the server. Because as you program,
anytime you do something that causes MS Word to ask a question with a
dialog box, it is going to send that to the server's screen and lock up
and wait for somebody sitting at the server to answer the dialog. Since
it is not a server application it doesn't understand how to send these
to clients in any way.

No since you can read some of the text from the binary, you may be able
to get it out with Regex or other string processing, but that does not
sound like fun to me.

kitty1967 wrote:
> Hello all:
>
> I have a directory of word documents that I need to loop through and read the
> contents and save various parts of the textual content into a database.
>
> I've used cfdirectory to loop through the directory and then cffile
> action="read" to read the contents of the file into a variable. However, I have
> what appears to be binary information stored before and after the text that is
> saved in the variable specified in the cffile tag.
>
> How can I get rid of this so that I'm left with just the text contained in the
> Word file?
>
> TIA
> Lisa
>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation