I'm struggling a little with this UTF-8 topic currently. I can sympathize with your several painful hours now. :-)
1) Can you (or somebody else) reproduce the following issue: (Win 8.1. LR 5.6)
If your photos are stored in a UTF-8 encoded directory such as c:\users\username\Pictøäöüש (the last letter being the Hebrew letter shin). (This is kind of my test case after users from Norway and Israel reported problems.)
local picName = selectedPhoto:getRawMetadata ("path")
I get the wrong result:
If I use, on the other hand, getFormattedMetadata:
outputToLog (selectedPhoto:getFormattedMetadata ("folderName") .. " and " .. selectedPhoto:getFormattedMetadata ("fileName"))
I get a correct result (but not the full pathname)
Pictøäöüש and 7L6B7931.CR2
Going from there, I could probably figure out the full path name (which does not seem to be offered in getFormattedMetadata), but I would like to figure out what's wrong with selectedPhoto:getRawMetadata ("path").
2) The following is more for reference: I cannot seem to pass previews.db path name to sqlite if the path of the previews.db (LR catalog path) contains non-ASCII utf-8 characters. (Other UTF8 commands on the command line work well.) chcp 65001 doesn't help. sqlite is supposed to accept UTF8 characters in the db name, but somehow doesn't (at least my version, which is somewhat older). I have worked around this issue by first cd-ing to the directory and then starting sqlite i.e. along the lines of "cd <previews-dir> && sqlite3 previews.db" This seems to work so far, even if some new issues have come up of which I don't know yet whether they are related to this or not.
Re 1): I can't reproduce the problem on LR 5.6 / Windows 8.1. Here's what photo:getRawMetadata() returns for me:
When I log the result to a file and then examine it with Sublime 2, I see the expected answer:
Perhaps the problem you're observing is somewhere between the call to your function outputToLog() and the text editor you're using to examine the log file. Even in 2014, Unicode is an unnatural act for much software.
Re 2): Though you said you used "chcp 65001", you didn't post an example .bat file, and this does smell like a problem with cmd.exe and its antiquated concept of code pages. A couple of things to narrow this down:
- Try another shell, e.g. Cygwin's bash, to invoke sqlite3.exe. If it runs under that shell, then the problem is related to cmd.exe.
- Use the Windows 7 "run" command to run the sqlite3 command line. (On Windows 7, you type "run" into the Start search box; I forget the details of how you do it on Windows 8.) I don't believe the "run" command uses cmd.exe and thus could avoid its issues with code pages.
- Rather than opening the database by passing it on the command line, write all the sqlite commands to a temporary file and invoke sqlite3 with:
sqlite3 < tempfile
You'll need a newer version of sqlite3 that has the ".open" command.
re 1) many thanks for checking this, it was indeed a problem with the text editor not showing the result correctly, I didn't expect Notepad to not handle UTF-8 by default (Windows is not my native platform and I haven't used it much for a couple of years). Worse, LrDialogs.message likewise gives the wrong output too! - which at the time I had taken as confirmation. I have now checked with Sublime 2 and it looks good. One problem solved...
re 2) I'll get back to you later on this.
You mention using a .bat file in the context of chcp 65001. In my earlier unsuccessful attempts, I simply passed "chcp 65001 && sqlite ... ", assuming that this would switch the codepage before passing the sqlite utf-8 parameter. Now I'm thinking that maybe the command still gets passed with the old codepage, and thus the utf-8 is mangled, is this why you are referring to a batch file?
I simply passed "chcp 65001 && sqlite ... ",
Do you mean you passed that string to LrTasks.call()? A couple of thoughts:
- It may be that the command line is completely parsed before it is executed, so by the time chcp executes, the rest of the command line has already been interpreted as ASCII rather than UTF-8.
- I'm not sure how LrTasks.call() executes its command line, and in particular, whether it is passing a UTF-8 string or an ASCII string to cmd.exe. In the past, when I've wanted complete control over how a command line gets executed on Windows, I've written a temporary batch file with "chcp 65001" as the first line and then executed that.