4 Replies Latest reply on Mar 10, 2011 5:49 AM by dbldutch21

    Connecting CF to OpenOffice (convert html to doc)


      I have a simple problem that appears to have a difficult solution.


      I need to convert an HTML file to a DOC file (an actual word document, not just the extension)


      In the past I would use cfobject to connect to Word, open the file and then save it as a word file.  This required Word to be installed on the server and got a little fuzzy win running in 64bit so the solution needs to skip Word being installed.


      I've seen Apache's POI and some talk about using Open Office.  Using Open Office would be the prefered solution as this will already be installed on the server.


      Does anyone know an easy way (free or cheap) to convert an HTML file to a DOC file?


      Even if you know how to use ColdFusion to connect to Open Office I can start there.


      Thanks for any help!!

        • 1. Re: Connecting CF to OpenOffice (convert html to doc)
          Adam Cameron. Level 5

          Can I recommend you start with the docs - http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WS56EA2935-FBD2-4089-8402-FDDA2BAF55 FB.html - and see what you can come up with?  Otherwise all anyone would be doing is reciting the docs to you anyhow.  Or at least their take on the docs ;-)




          • 2. Re: Connecting CF to OpenOffice (convert html to doc)
            dbldutch21 Level 1

            Unfortuately I think I'm going to have to connect to OpenOffice with CreateObject or something but I can't get it to connect.  You can see in the link you sent that the built in tags do not support HTML to DOC.  The link also says "POI libraries support conversion of all office files except Word documents".


            So I can't use anything built in....

            • 3. Re: Connecting CF to OpenOffice (convert html to doc)
              Adam Cameron. Level 5

              Bloody hell: sorry.  I just looked up the doc page and posted it (without, ahem, reading it).  It would never occur to me in a million years if CF claimed to have office interoperabilty that that would have the caveat of "except for Word".  Which is kind of the most fundamental "office operability" requirement one might have.  Jeez.  What a gyp.  So anyway, I didn't think to check that, as I just assumed it to be the case.  Oops.


              Anyway, guilt set in and I started looking at how to do this in POI last night.  The Java code I found (just googling "poi create word doc", and wade through a lot of chaff) seemed to make it seem very straight fwd, but I could not get it going in CF.  I did not try to just run the Java code to see if that worked.  I probably should have.  I could get the Word doc written, but:

              a) no styles were showing up (despite the object confirming it thought they were set);

              b) Word died if I tried to to a "Save As..." on the generated doc;

              c) Open Office would not open the file.


              So it was obviously bung.


              Then it occurred to me it was 11pm and I had food to eat and TV to watch, so I went "yeah, well f*ck that", and downed-tools.


              However I still feel guilty for giving you bad info, plus it's piqued my interest now, so I will try to revisit later on.  I'm tied up this evening and Fri evening, but I might have a look @ it over the w/end.


              I guess I could at least post the code I was working with in case it pushes you in the right direction, but I'm not currently sure it is in the right direction.  And it's on my home PC and I'm @ work now, so it'll have to wait anyhow.


              Sorry again for the bum steer.  I assure you I was not just trying to blat any old answer out, dismissively.  I thought those were the correct docs to be looking at.




              • 4. Re: Connecting CF to OpenOffice (convert html to doc)
                dbldutch21 Level 1

                Hey Adam, no need to appologize at all.  I was really hoping you had a solution with that link!!


                Please don't feel like you have to spend too much time on it, I know I've spent tons of hours on this over the years.


                Solution that DID work:

                • Created a word document and saved it as an HTML file (these files have all the Microsoft tags in them allowing me to set how the file will open in word as well as enable Trach Changes.  Its not just a simply HTML file)
                • Use CF to to replace text and create new HTML files
                • Start Word as a COM object using CF
                • Open the Word file using CF
                • Save as a DOC file using CF (It must be a REAL doc file, not just a file with a doc extension)
                • Close Word


                This solution worked until I had to move the site to a 64bit server.  It did not like to open Word after the move.  I also understand that Word isn't necessarily made to be run on a server and I don't want to install it anymore and rely on it so I'm moving past this as a solution.


                Solution I want to work:

                • Create the HTML files like I have before
                • Use OpenOffice or POI to convert to an actual Word file


                I've had problems with this.  OpenOffice doesn't open the HTML files in Writer.  It opens in Writer/Web and doesn't have the ability to save as a doc, only as other HTML files.  I tried to use POI but I can't seem to get it to simply open a file and save it.  It would be way to difficult and I have to much formatting to create it on the fly, I also need Track Changes enabled when the file is open.


                Anyway, that's where I am. I think a company called Aspose has a product called "Aspose.Words for Java" which may work BUT it costs more than the projects budget will allow and its a yearly cost not just a one time up front cost.


                Thanks for taking a look. I may need to break my overall process to create the files, have the user download a zip file of the 100 HTML files and then have them manually open and save them as doc files and then upload a new zip file.  I could then continue the processes my files go through.