• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Japanese Characters working as URL parameters, turning to question marks when in URL string itself

Guest
May 18, 2006 May 18, 2006

Copy link to clipboard

Copied

I'm having some trouble getting coldfusion to see japanese characters in the URL string.

To clarify, if I have something like this:

http://my.domain.com/index.cfm?categorylevel0=Search&categorylevel1=%E3%82%A2%E3%82%B8%E3%82%A2%E3%8...

All of my code works correctly and the server is able to pass the japanese characters to the database and retrieve the correct data.

If I have this instead:

http://my.domain.com/index.cfm/Search/%E3%82%A2%E3%82%B8%E3%82%A2%E3%83%BB%E3%83%93%E3%82%B8%E3%83%8...

My script (which works fine with English characters) parses CGI variables and converts these to the same URL parameters that I had in the first URL using a loop and a CFSET url.etc..

In the first example, looking at the CF debug info shows me what I expect to see:

URL Parameters:
CATEGORYLEVEL0=Search
CATEGORYLEVEL1=アジア・ビジネス開発

In the second example it shows me this:
URL Parameters:
CATEGORYLEVEL0=Search
CATEGORYLEVEL1=???·??????

Can anyone suggest means for debugging this? I'm not sure if this is a CF problem, an IIS problem, a JRUN problem or something else altogether that causes it to lose the characters if they are in the URL string but NOT as a parameter.
TOPICS
Advanced techniques

Views

2.4K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
May 18, 2006 May 18, 2006

Copy link to clipboard

Copied

maxgsilverscape wrote:
> I'm having some trouble getting coldfusion to see japanese characters in the
> URL string.

what version of cf? what encoding are you using? did you remember to use the
setEncoding() function?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
May 18, 2006 May 18, 2006

Copy link to clipboard

Copied

Thanks for responding... this is CFMX7 with the most recent updater. I did use setEncoding and like I said, it works like a charm on URL parameters but it doesn't seem to have any effect on it when its done the other way.

I think the issue might be that you cant run setencoding on the cgi scope, just URL and form so when I run code like this:


<cfset cgipathinfo = cgi.SCRIPT_NAME & cgi.path_info>
<cfset query_string_length = Len(cgipathinfo)-Len(CGI.SCRIPT_NAME)>

<cfif query_string_length neq 0>
<cfset query_string = Right(cgipathinfo, query_string_length)>
<cfset i = 0>
<cfloop list="#query_string#" delimiters="/" index="currentcatname">
<cfset "url.categorylevel#i#" = currentcatname>
<cfset i = i + 1>
</cfloop>


it mangles the data looping through cgi.path_info/etc... Not sure how to get around that though.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
May 18, 2006 May 18, 2006

Copy link to clipboard

Copied

maxgsilverscape wrote:
> Thanks for responding... this is CFMX7 with the most recent updater. I did use
> setEncoding and like I said, it works like a charm on URL parameters but it
> doesn't seem to have any effect on it when its done the other way.

ok i think i'm catching on. i've seen this on occasion using utf-8 names for
files--try writing out a cf page named in japanese or whatever on an en_US
server you should see something strange like "Either the Macromedia application
server is unreachable or it does not have a mapping to process this request."
when you try to access that file vai cf. i *think* it's an interaction between
cfserver & iis. i never really looked hard for a solution, let me dig into this.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
May 19, 2006 May 19, 2006

Copy link to clipboard

Copied

I've been researching this quite a bit in the last day or two, and I'm leaning toward this being impossible on an en_US server. It assumes that the request is looking for a file (understandably) and cant process that even though we are not actually looking for a file, but instead are giving URL parameters with slashes to create search engine friendly URL's. I'm not sure why it can pass the URL parameters along in UTF-8 but not the path part of the query string, but that seems to be the case just the same. Still open to suggestions if anyone has them.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
May 19, 2006 May 19, 2006

Copy link to clipboard

Copied

maxgsilverscape wrote:
> I've been researching this quite a bit in the last day or two, and I'm leaning
> toward this being impossible on an en_US server. It assumes that the request is

cheap test, can you serve a utf-8 named htm file via IIS?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
May 19, 2006 May 19, 2006

Copy link to clipboard

Copied

Funny, I expected this not to work but just tried it and it served the page up perfectly. Now I'm really confused as to where the characters are getting lost .

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
May 19, 2006 May 19, 2006

Copy link to clipboard

Copied

maxgsilverscape wrote:
> Funny, I expected this not to work but just tried it and it served the page
> up perfectly. Now I'm really confused as to where the characters are getting
> lost .

can you serve a similarly named cf file? if not then it looks like cf. i'll ask
the cf team i guess.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
May 19, 2006 May 19, 2006

Copy link to clipboard

Copied

Changing it to a cfm breaks it, and gives some more insight into where the problem is...

Exceptions

15:46:33.033 - java.io.IOException - in : line -1
The filename, directory name, or volume label syntax is incorrect


I recall there having been lots of issues with some of the file I/O concerning unicode with sun's jvm, not sure if these issues were ever resolved. I'm assuming not since this server runs a fairly new one.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
May 19, 2006 May 19, 2006

Copy link to clipboard

Copied

maxgsilverscape wrote:
> 15:46:33.033 - java.io.IOException - in : line -1
> The filename, directory name, or volume label syntax is incorrect

no, i've used java before to do recursive dir deletes (in cf6) & one requirement
was to handle utf-8 named files/dirs, which it did. probably something about the
way cf handles this.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 20, 2006 May 20, 2006

Copy link to clipboard

Copied

My script (which works fine with English characters) parses CGI variables and converts these to the same URL parameters that I had in the first URL using a loop and a CFSET url.etc..
I suspect the problem lies here. The characters and URL parameters you produce are not in an encoding compatible with Japanese. To be sure, could we see your code?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
May 22, 2006 May 22, 2006

Copy link to clipboard

Copied

You bet, 3rd post in this topic is the exact code I'm using for that.

If its beneficial, here it is again:


<cfset cgipathinfo = cgi.SCRIPT_NAME & cgi.path_info>
<cfset query_string_length = Len(cgipathinfo)-Len(CGI.SCRIPT_NAME)>

<cfif query_string_length neq 0>
<cfset query_string = Right(cgipathinfo, query_string_length)>
<cfset i = 0>
<cfloop list="#query_string#" delimiters="/" index="currentcatname">
<cfset "url.categorylevel#i#" = currentcatname>
<cfset i = i + 1>
</cfloop>

Its not surprising that it doesn't work at this step, since in the debug info the CGI variables are showing up as this:

CGI Variables:
HTTP_ACCEPT=text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
HTTP_ACCEPT_ENCODING=gzip,deflate
HTTP_ACCEPT_LANGUAGE=en-us,en;q=0.5
PATH_INFO=/Search/???·??????
QUERY_STRING=

With the question marks in path_info and a blank query_string.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 22, 2006 May 22, 2006

Copy link to clipboard

Copied

I would use cgi.query_string directly and modify the ambiguous statement, <cfset "url.categorylevel#i#" = currentcatname>. Do things get better with

<cfif Len(cgi.query_string) neq 0>
<cfset i = 1>
<cfloop list="#cgi.query_string#" delimiters="/" index="currentcatname">
<cfset categorylevel["#i#"] = currentcatname>
<cfset i = i + 1>
</cfloop>


added edit: needs correcting; see my next posts


Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
May 22, 2006 May 22, 2006

Copy link to clipboard

Copied

Thanks for the fast response, no dice on this. It doesn't get past that first if statement, cgi.query_string seems to lose everything, not just the utf-8 characters.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 22, 2006 May 22, 2006

Copy link to clipboard

Copied

It doesn't get past that first if statement
Please clarify

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
May 22, 2006 May 22, 2006

Copy link to clipboard

Copied

Surely, just meant that it does not set anything at all because of this if:

<cfif Len(cgi.query_string) neq 0>
...
</cfif>

the Length of cgi.query_string is 0, so it is not getting past this.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 23, 2006 May 23, 2006

Copy link to clipboard

Copied

My suggestion was that you test with the first url, not the second. However, I can see a source of confusion. I overlooked your delimiter, "/". It should be "?" and "=" in this case. With these modifications, we get

<cfif Len(cgi.query_string) neq 0>
<cfset i = 1>
<cfloop list="#cgi.query_string#" delimiters="&" index="currentcatname">
<cfoutput>categorylevel#i# = #ListGetAt(currentcatname,2,"=")#</cfoutput><br>
<cfset i = i + 1>
</cfloop>

If it is a failing of Coldfusion, the above test should fail, too.

Now, an adaptation of the same test to your second url.

<cfset url2 = " http://my.domain.com/index.cfm/Search/%E3%82%A2%E3%82%B8%E3%82%A2%E3%83%BB%E3%83%93%E3%82%B8%E3%83%8...

<cfset query_str = ListGetAt(replacenocase(url2,".cfm/","?"),2,"?")>
<cfif Len(query_str) neq 0>
<cfset i = 1>
<cfloop list="#query_str#" delimiters="/" index="currentcatname">
<cfoutput>categorylevel#i# = #currentcatname#</cfoutput><br>
<cfset i = i + 1>
</cfloop>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
May 23, 2006 May 23, 2006

Copy link to clipboard

Copied

Ah ok. I ran this test on both URLs and it worked fine using the first URL as the URL I actually accessed the page with and cfset'ing the second url just like you did. The problem seems to happen earlier then that, with the data getting mangled before coldfusion sets the CGI variables. CGI.PATH_INFO would be the only way to access the URL in a search-engine friendly URL scenario right? Unfortunately it is always this: PATH_INFO=/Search/???·??????

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 23, 2006 May 23, 2006

Copy link to clipboard

Copied

So, the Japanese doesn't stick. Would be interesting to see what the CF team makes of it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Feb 25, 2007 Feb 25, 2007

Copy link to clipboard

Copied

Hi
Did anyone get to the bottom of this? I have a similar problem, but with the actual file path, not the URL string. I have directories and files in Russian.

On my dev PC (XP, using the internal Jrun webserver), this works fine.
On my live server (W2k3, IIS), I get this error:

java.io.IOException: The system cannot find the file specified
at java.io.WinNTFileSystem.canonicalize0(Native Method)

It's basically because CF is seeing:
/path/??????/index.cfm rather than /path/россия/index.cfm, and obviously a ? is an illegal character in a file path.

Note that it's not Windows or IIS' fault, as if I rename index.cfm to index.htm, it works fine.

I guess it's something not quite right with the IIS->CF connector.

I strongly suspect that having unencoded Cyrillic characters in the URL is probably illegal anyhow, so I shall probably translate it back to English.

Thoughts?

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 25, 2007 Feb 25, 2007

Copy link to clipboard

Copied

> It's basically because CF is seeing:
> /path/??????/index.cfm rather than /path/??????/index.cfm, and obviously a ?
> is an illegal character in a file path.

Err... OK, the Cyrillic characters didn't make it through to the news feed,
it seems (it's OK on the web UI though). Imagine the second path there has
the word "Russia" in Russian, instead of question marks ;-)

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 25, 2007 Feb 25, 2007

Copy link to clipboard

Copied

LATEST
Adam Cameron wrote:
> I strongly suspect that having unencoded Cyrillic characters in the URL is
> probably illegal anyhow, so I shall probably translate it back to English.

check what the JVM is using as default file, etc. encoding.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation