Hi, I recently upgraded to v10, but still have quite some old content originally created in v7. While the content looks pretty ok on screen when i open for instance a Russian file, it looks like total disaster when creating a MIF file out of it. Seems like somewhere in the process the whole encoding goes banana.
The fonts used originally were HelveticaCyr Upright and Times Ten Cyr Upright. When I install these on my machine the content looks pretty OK on my screen, but I can't change the font to anything else without totally screwing up the layout. Changing it to any other font supporting Cyrillic (for instance plain old Arial) makes the content look gibberish again
An example : Take following Russian word : Оглавление (means content for the ones intrested...)
When I copy this word freshly entered from a texteditor into FM10 using one of the Cyr Upright fonts it's shown as ????????. Changing the font to for instance Arial makes it show up as expected.
When I select the original word in FM (so looking ok on screen) and change the font to Arial it looks like Îãëàâëåíèå. Same behaviour happens if I copy the word directly from FM10 to any text editor. So, while all looks ok on screen when I open the file, I can't change the font without destroying the document, and worst of all, when I export to MIF all my Russian is converted to unreadable strings looking something like Îãëàâëåíèå
It's driving me nuts, anybody has a clue where my problem is located ?
Thanks in advance !
I don't have a clue as to where the problem is, but I've found after doing
upgrades for years and years that I generally get better results saving the
lower number file out as MIF and then opening it in the new version. Seems
to have the end effect of cleaning up things better than either just
opening the lower number .fm file or opening it in the new package and then
saving out as MIF.
"... In my opinion, there's nothing in this world beats a '52 Vincent and
a redheaded girl." -- Richard Thompson
No disclaimers apply.
I support www.TheGrotonLine.com, hyperlocal news for Groton MA.
Your FM7 document is using legacy "overlay" aka "code page" cyrillic font technology. The Russian text was created by applying the Cyr font in Paragraph Format, Character Format, or just as a local override, then typing the 8-bit code points for the glyphs in that font (which map to various accented characters in a roman font, as you see).
What's encoded in the .fm file (and .mif) for that text is 8-bit characters which could be displayed as roman, symbol, dingbat, cyrillic, Thai, or other glyph families, depending on what Font Family is applied.
FM8 and later, including your FM10, support Unicode (UTF8, specifically), in which every glyph, from every character set, planet-wide, has a unique 3-byte code point. No overlays. No confusion about whether any byte triplet is supposed to be a D or a gamma. The only issue is whether or not the font has that glyph code point populated with outlines.
FM apparently continues to support legacy overlay fonts, as long as the font is loaded on the machine during edit (and this is as it has always been).
The real concern here is: for FM8 and later, was any legacy-to-Unicode conversion supposed to be happening when you open a legacy document in a Unicode FM?
I'm guessing that it doesn't happen, because there would have to be some dialog controlling how to map the legacy font to some arbitrary Unicode font. FM would have to know what the language (glyph map) was for the old font, and would have to verify that each code point exists in the new Unicode font - seems unlikely.
We haven't needed to do this, so I have no direct experience with how it is supposed to work (I'm only watching the issue so we handle our non-roman special character usage in a portable way). This Adobe FM forum doesn't seem to go back to the FM8 transition period, and thus hasn't much content on this issue. There may be some work-arounds. Possibly FM gets a little more magical if your PC is set to Russian when you open the old doc in FM10.
The whole problem is that it saves the same garbage in the MIF file, so it's only readable inside Framemaker, when using the old fonts. I've tried to 're-code'the weird characters (the Îãëàâëåíèå stuff), but without any luck
> The whole problem is that it saves the same garbage in the MIF file, so it's only readable inside Framemaker, when using the old fonts.
Yep, and that's the case for all legacy codepage apps and their documents. This is major part of why Unicode was invented.
> I've tried to 're-code'the weird characters (the Îãëàâëåíèå stuff), but without any luck.
Those aren't weird. They are merely the roman/western glpyhs for the 8-bit code points of what are Russian characters in a legacy codepage Cyrillic font.
You need a code page to unicode converter. There are commercial products for that. There may be free ones, or even web pages that you can paste legacy into, selected lang, and copy UTF8 back out of. Never having needed to do this, the foregoing is already more than I know.
Monitor this thread for a few days. Perhaps some old hands who went through all of this back at FM8 will recall the process.
You could try a utility like this: http://www.codeproject.com/Articles/18393/CodePage-File-Converter
However, it wants a text file, so you'll need to re-tag afterwards. The alternatives are to use brute force find/change to modify the old FrameRoman cyrllic codepoint values to unicode ones or to create script (Extendscript or Framescript) that essentially does the samething as the manual approach. A royal PITA....
AFAIK, there never was any techdoc or blog entry created by Adobe on how to handle legacy non-codepage 1252 documents when migrating to the unicode versions starting with FM8.