Copy link to clipboard
Copied
I have a sizeable HTML Help project that worked fine in RH7. The raw HTML files are installed into an application that uses the SAX parser to parse the HTML. This all worked correctly in RH7. After upgrading to RH8, the same HTML files installed into the application now fail with the following error message: "org.xml.sax.SAXParseException: Content is not allowed in prolog."
Upon examination of the same HTML file that works in RH7 but not in RH8, we note the following differences:
In RH7, the HTML preceding the first <head> token is:
<!doctype HTML public "-//W3C//DTD HTML 4.0 Frameset//EN">
IN RH8, the HTML preceding the first <head> token is:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3/org/1999/xhtml">
We have verified that we if manually modify the RH8 file to look like the RH7 file as above, it works in the application.
Is there a setting somewhere in RH8 that I need to change?
Any suggestions will be greatly appreciated, but please don't tell me to manually modify the HTML in 300+ help topics.
Bob Boller
Copy link to clipboard
Copied
Hi Bob.
I think you may have to change all 300+ files but you could easily do this with a find and replace tool like BkReplacem or FAR. Both are excellent for this type of thing.
The problem here is that you are using the raw topic files and effectively generating the output outside of RH. If you are doing this you can't expect Adobe to support the SAX parser. RH8 uses XHTML as opposed to RH7's HTML and upgrades each topic when the project is first opened. I can't see any way back unless there is something on the SAX parser side to allow for XHTML.
Read the RoboColum(n).
Copy link to clipboard
Copied
Hi,
Changing the doctype of your output files from XHTML to HTML might not be such a good idea. XHTML has a (very slightly) different syntax then HTML and changing the DTD may have unforeseen consequences, although it will probably work for most browsers. In any case, your output will no longer be 'valid' as you will be using some incorrect syntax for HTML. See http://www.w3schools.com/XHTML/xhtml_html.asp for an overview of the difference between HTML and XHTML.
I don't know anything about the SAX parser, but I agree with Colum that the only (and probably the best) way is to get the parser to work with XHTML.
Greet,
Willam
Copy link to clipboard
Copied
Well, here is some more insight into this problem. The editied HTM files in RH8 are in XHTML whereas in RH7 and earlier, they were in HTML. As a result, in RH8 threre is what is known as a signature at the beginning of each HTM file that specifies the character encoding. If you use UTF-16 encoding, then you must include a byte order mark to indicate whether the encoding is big endian or little endian. If you use UTF-8 encoding, then the byte order mark is not needed. There seems to be a difference of opinion between the XHTML standards folks the Java folks as to whether specifying a byte order mark with UTF-8 constitues invalid syntax. The XHTML say it is not invalid syntax, the Java folks say it is invalid syntax. What is happening in my situation is that the application using the HTM files for on-line help uses the Java built-in parser and it rejects the UTF-8 byte order mark as invalid.
I have found a work-around. For my project, I am generating HTML Help. Instead of using the RoboHelp edited HTM files, I use the Microsoft HTML Help Workshop to decompile the generated chm file. Before generating the chm file, I check the check box labeled "Convert RoboHelp edited topics to HTML" on the General tab of the Options dialog in RoboHelp. This results in the HTM files in the chm being in HTML rather than XHTML. With these files, the Java parser is happy.
I don't know if we should raise this issue with Adobe. Any thoughts?
Bob Boller
Copy link to clipboard
Copied
Hi Bob
Interesting information. I will bring your post to Adobe's attention.
Turning to your problem, I have seen posts before where tools used by developers whinge about the output and the problem is the tool, not the help. I am not saying that is the case here but maybe some searching outside this forum might reveal something. Whilst the parser fails the output, does it nonetheless work with the application?
You say
The raw HTML files are installed into an application...
Later you say
For my project, I am generating HTML Help. Instead of using the RoboHelp edited HTM files, I use the Microsoft HTML Help Workshop to decompile the generated chm file. Before generating the chm file, I check the check box labeled "Convert RoboHelp edited topics to HTML" on the General tab of the Options dialog in RoboHelp. This results in the HTM files in the chm being in HTML rather than XHTML.
To me, raw HTML files means the source files before RH has done any processing. That conflicts with the later statement you are generating a CHM.
I am also not clear on the process as you seem to be using the Help Workshop to create a CHM, then decompiling it, then using the source files that creates to get RH to create another CHM. Why not have RH create the CHM with the Convert to HTML mar set?
See www.grainge.org for RoboHelp and Authoring tips
Copy link to clipboard
Copied
Peter - When I used the term "raw HTML", I was referring to the RH-edited
files. The check box "Convert RoboHelp edited topics to HTML" seems to
affect only the HTML that is compiled into the chm. It does not change the
RH-edited files, they remain in XHTML. I am decompiling the RH-produced chm
to get to the HTML files inside. I cannot provide the chm file to the
application as it uses the built-in Java parser and, thus, isn't able to
decompile the chm file.
Bob
Copy link to clipboard
Copied
So why not generate webhelp with that checkbox ticked? That will give you output files which is surely what your developers want?
See www.grainge.org for RoboHelp and Authoring tips
Copy link to clipboard
Copied
Peter - That's what I'm doing. It just that only the files compiles into the
chm are in HTML. The RH-edited files remain in XHTML.
Bob
Copy link to clipboard
Copied
I did a quick test, and generating webhelp with the setting ticked, my topics get the following doc type:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
Just to clarify, the setting only applies to the output files (i.e. the conversion takes place as part of the Generation process), not the source files that you edit in RH.
So if you output to Webhelp, you won't need to to the decompile step in HTML Help Workshop - all your topics will be in HTML.
Copy link to clipboard
Copied
The problem is not the doctype, the problem is the signature -
<?xml version="1.0" encoding="utf-8" ?>
Specifically, it's the three characters following the "utf-8", known as the
byte order mark (BOM). The BOM is required for UTF-16, but is not needed for
UTF-8. The disagreement seems to center on whether having a BOM with UTF-8
constitutes invalid syntax in XHTML.
Bob
Copy link to clipboard
Copied
Hi,
A html file only uses the XML declaration for XHTML. The XML declaration is not required but encouraged by the W3C, see http://www.w3.org/TR/xhtml1/#docconf. Since the DTD is no problem, I can think of two thing:
- Remove the XML declaration from your output files. The Byte Order Mark problem should disappear, as long as you only use UTF-8 or UTF-16 encoding.
- Output your files as HTML. 'Regular' 4.01 html doen't use the XML declaration, regardless of the DTD.
Greet,
Willam
Copy link to clipboard
Copied
Hi Bob,
I should have been more clear. When I generate to webhelp with the 'convert to html' setting ticked, the start of the file is the doctype declaration I posted above. The '<?xml' tag was not included.
Amebr
Copy link to clipboard
Copied
I don't think you are doing what I suggested. To create webhelp, you use a different layout. CHMs are generated from RH using the Microsoft HTML layout and you get one file. To create WebHelp, you use the WebHelp layout and you get the HTML files that your developers want as a whole bunch of separate HTML files. If you use the Convert to HTML option, they will be what you want.
Tell me if I am misunderstanding you but you seem to be going around the houses to get HTML when you can get RH to do it for you.
See www.grainge.org for RoboHelp and Authoring tips
Copy link to clipboard
Copied
Peter - Well, I tried that and now when I plug the HTML files into the
application, I get the error message shown in the attachment on every topic.
I noticed that for every topic I looked at, the line number was several
lines beyond the last line of the HTML file. I guess that suggests there is
something missing that the HTML parser is looking for.
I also notice that in the portion of the HTML line called the signature,
that is, the part that specifies the character set, RH did not include a BOM
for UTF-8, whereas in the RH-8 edited XHTML files, the BOM is present.
Bob
Copy link to clipboard
Copied
Hi,
There's no attachment to your post.
Greet,
Willam
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Hi Bob
In glancing at this thread a question forms in my mind.
It would seem that all you want is a basic editor to regurgitate raw HTML files that are used by your application in a way that avoids any features really added by RoboHelp.
So my question is to ask quite simply: Why are you using RoboHelp for this?
Is it simply because you are accustomed to its interface? I might think that if it's causing issues for you in working with your application, you might find it simpler to move to a more simplistic basic HTML editor. Perhaps Dreamweaver? CoffeeCup HTML Editor? There are many different HTML editors around.
Note that RoboHelp HTML is very near and dear to my heart, so suggesting a move from it doesn't come lightly. But if you aren't really utilizing its output, I'm not seeing what the purpose is. It would seem to compare to using a compound miter saw with laser guides to slice the butter for your bread.
Cheers... Rick
Helpful and Handy Links RoboHelp Wish Form/Bug Reporting Form Begin learning RoboHelp HTML 7 or 8 within the day - $24.95! |
Copy link to clipboard
Copied
Rick - I use RH for two primary reasons, the WYSIWYG editor so I don't have to mess with HTML and so I can generate a User Guide from the on-line help that I know is content-identical to the on-line help - write once, publish twice.
Bob
Copy link to clipboard
Copied
By the way, in case anyone is wondering, the application I'm writing the help for is not some off-the-wall, wierd application. It is a plug-in to Eclipse 3.4, which is used by many thousands of users. And yes, the handling of the help files is done by Eclipse, not by the plug-in, so the problem isn't in the plug-in.
Bob
Copy link to clipboard
Copied
Bob
RH8 has a script for producing Eclipse Help. Maybe that is what you should be using?
My topic About RH8 has a link to the RH8 Reviewers Guide where there is mention of it. My topic also has a link to my RH Tour where there is a basic reference.
Beyond that, I don't think I can suggest anything else on this one.
See www.grainge.org for RoboHelp and Authoring tips
Copy link to clipboard
Copied
I have a different, but possibly related problem.
I developed several WebHelp systems with Madcap Flare, only to discover that when served by our product's homegrown HTTP 1.1 web server, though they worked fine for FireFox, they did not work with IE 7 or IE 8. The problem is that our web server, seeing the Madcap output files have a .htm extension, sends a MIME Type of text/html with the response. Intenet Explorer, apparently, inspects the file itself for the MIME Type, and sees the the files are actually XML files, starting with this:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Now, we could adjust our web server--apparently Apache and IIS have done so--but instead I've been told to migrate my help systems to a tool that works with our web server.
So I downloaded a trial version of RoboHelp 8 and created a test system to be sure RoboHelp works. Unfortunately, it has the same problem, and this is not surprising since the RoboHelp 8 output files also begin with:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
My first question is (and from what I've read I think I know the answer), does RoboHelp 7 have this <?xml...> and XHTML prolog?
My second question, assuming the answer to the first is No, is how easy is it to obtain RoboHelp 7 at this point? Does Adobe insist that new licensees get RoboHelp 8?
Incidentally, with Madcap, I tried removing the "<?xml version="1.0" encoding="utf-8"?>" line from the output .htm files, but this did not solve the problem for Internet Explorer (so maybe Internet Explorer is looking somewhere else to conclude that these are XML files--maybe it's the DOCTYPE line, which I didn't remove). I haven't tried this for my RoboHelp 8 test system, but hesitate to do so because the developer that would test it doesn't have time to test things likely to fail.
Has anyone else run into this problem with non-Apache and non-IIS web servers? I can supply further particulars about it (for example, it works on IE if we turn compression off; but doing so would slow our system down too much, apparently).
Thanks,
- Willie
Copy link to clipboard
Copied
Your problem is sort of related.
The server has to be set up to recognise UTF8 and it sounds like the problem is that yours is not.
I had a problem where the output was OK in IE but only the BOM characters showed in Firefox. This is what I was advised by the company hosting my site.
"I would therefore conclude that the solution to this problem (on Linux systems running Apache) is to add the AddDefaultCharset utf-8 directive to either the Apache config or the site .htaccess file. The advantage of the latter is that it only affects individual sites. The default Apache character set is taken from the locale file on Linux and defaults to iso-8859-1. It is the conflict between the Apache header with iso-8859-1 and the page character set of utf-8 that obviously causes Firefox a problem."
In a forum post Chrissy_Tissy added
My machine is Windows, but this fix still worked - some notes about making the fix visible:
1. Do the fix itself (httpd.conf: AddDefaultCharset utf-8).
2. Restart the box to apply the fix.
3. Once the box is restarted, clear your cache in FireFox to make sure you don't continue to see the cached file.
Once all this is done you will see the output content as expected.
I am wondering if your server can be amended in a similar way? If not, in RH8 look in Tools > Options and tick the options I have highlighted. See if that produces an output that will be agreeable to your server.
Finally, if not, Adobe does have a tool that works on the output and changes the encoding to whatever you want. Trouble is it works on one folder at a time so it can be painful if you have many folders.
I would appreciate you posting back the solution you finally go for. It all helps us when people have similar problems.
See www.grainge.org for RoboHelp and Authoring tips