7 Replies Latest reply on Oct 11, 2016 10:30 AM by Loic.Aigon

    Read data from .doc file

    tpk1982 Level 4

      Hi,

       

      Is it possible to read the values form word document? I know we can get from the txt or csv file.

       

      But i don't want to convert the file format. In my word file i have a table format.

       

      Screen Shot 2016-10-11 at 4.10.14 PM.png

      Tried coding:

       

      var myFolder = File.openDialog("Select the word file to Proceed");
      if(myFolder!=null){FilePath=decodeURI(myFolder);}    
      var datafile = new File(FilePath);
      datafile.open('r') ;
      var myarray=[]
          while (!datafile.eof){
          strLineIn = datafile.readln();
          colorArray = strLineIn.split("\t");
          var myUK=colorArray[0];
          var myUSA=colorArray[1];
          alert(myUK)
          }
      

       

      The above coding gives the values as junk characters.

       

      Regards,

      K

        • 1. Re: Read data from .doc file
          jakec88782761 Level 2

          Hi,

           

          What is colorArray[0]?

           

          Also, on a side note, I'm pretty sure that from the context of the table Fete wouldn't translate as Free in English

          • 2. Re: Read data from .doc file
            tpk1982 Level 4

            Hi,

             

            I just i mentioned a variable name as colorArray, colorArray[0] shows the English words, because i spliced by tab

            • 3. Re: Read data from .doc file
              jakec88782761 Level 2

              Maybe the font that's used in word isn't compatible in InDesign.

               

              There may be an issue with the Unicode characters.

               

              Sorry I'm not much help, I'm sure someone else has an answer

              • 4. Re: Read data from .doc file
                Loic.Aigon Adobe Community Professional

                I don't think you could read such files by just "opening" them. Unless I am wrong a doc file would be a binary format file when the the docx format is a zip file format including xml files. The latter would be the less difficult I guess because you would have to unzip and look after the form contents. Reading the doc file format would require a higher level of intelligence. I am not aware of an extendscript library to do so but there is as it seems a lot of python libraries that you may try to use.

                 

                FWIW

                 

                Loic

                1 person found this helpful
                • 5. Re: Read data from .doc file
                  [Jongware] Most Valuable Participant

                  Right. Expecting to just open a .doc and get all text out of it is quite naïve. For starteres, there are dozens of variations - the plain .doc format is ancient. The data itself consists of a "plain text" thread and several formatting threads, and tables are one of the most complicated constructions in that. On top of it all, this entire composite data format is wrapped in a COM Object (I think it was that) binary storage format.

                   

                  I spent several months going through Microsoft's documentation - it's all there, you just have to type it into a search engine! - and finally created a Javascript that could successfully read ... well, parts of some files.

                   

                  It must be far, far easier to just import the text as usual into a temporary InDesign document and then fish out the data you want.

                  1 person found this helpful
                  • 6. Re: Read data from .doc file
                    tpk1982 Level 4

                    Thanks a lot for you information Loic and Jongware.

                    Yes i tried the import option in Indesign, but faced a problem. After importing the table and convert those to text, the single lines are comes good. But if a cell contains more than one line the it  is not get as desired.

                     

                    Like below:

                     

                    00.jpg

                     

                    Is it possible to fix this?

                     

                    Thanks,

                    K

                    • 7. Re: Read data from .doc file
                      Loic.Aigon Adobe Community Professional

                      What you get is logical as your original lines are juste one row. You would need one line per row to get the expected output.