5 Replies Latest reply on May 23, 2013 1:07 PM by johnrellis

    Anybody have some unicode string handling code I could use?

    areohbee Level 5

      Text handling plugins are not working for some users, since Lua and Lr are not on the same page char-representation-wise.

       

      For example, if you enter

       

      À

       

      in Lightroom, it's getting interpreted as

       

      À

       

      in Lua.

       

      in non-plugin environment, one could just use slnunicode, however in plugin one needs pure lua solution.

       

      Anybody?

       

      ref: http://lua-users.org/wiki/LuaUnicode

       

      R

        • 1. Re: Anybody have some unicode string handling code I could use?
          johnrellis Most Valuable Participant

          I'm not sure I understand the example issue you raised.  You've seen before what I learned two years ago:

           

          http://forums.adobe.com/message/3251706#3251706

           

          It should be the case that Lua can properly handle and store Unicode strings coming and going from the LR SDK?

          • 2. Re: Anybody have some unicode string handling code I could use?
            areohbee Level 5

            Try

             

            Debug.pause( "uni-find", string.find( photo:getFormattedMetadata( 'title' ), 'À' ) )

             

            when title field has an 'À' in it.

             

            Instead of seeing the coordinates of the 'À' that is there, you'll see nil.

             

            PS - this code looks promising (?) found at http://forums.gaspowered.com/viewtopic.php?f=19&t=29879

             

               function conv2utf8(unicode_list)

                  local result = ''

                  local w,x,y,z = 0,0,0,0

                  local function modulo(a, b)

                     return a - math.floor(a/b) * b

                  end

                  for i,v in ipairs(unicode_list) do

                     if v ~= 0 and v ~= nil then

                        if v <= 0x7F then -- same as ASCII

                           result = result .. string.char(v)

                        elseif v >= 0x80 and v <= 0x7FF then -- 2 bytes

                           --[[

                           y = (v & 0x0007C0) >> 6

                           z = v & 0x00003F

                           ]]--

                           y = math.floor(modulo(v, 0x000800) / 64)

                           z = modulo(v, 0x000040)

                           result = result .. string.char(0xC0 + y, 0x80 + z)

                        elseif (v >= 0x800 and v <= 0xD7FF) or (v >= 0xE000 and v <= 0xFFFF) then -- 3 bytes

                           --[[

                           x = (v & 0x00F000) >> 12

                           y = (v & 0x000FC0) >> 6

                           z = v & 0x00003F

                           ]]--

                           x = math.floor(modulo(v, 0x010000) / 4096)

                           y = math.floor(modulo(v, 0x001000) / 64)

                           z = modulo(v, 0x000040)

                           result = result .. string.char(0xE0 + x, 0x80 + y, 0x80 + z)

                        elseif (v >= 0x10000 and v <= 0x10FFFF) then -- 4 bytes

                           --[[

                           w = (v & 0x1C0000) >> 18

                           x = (v & 0x03F000) >> 12

                           y = (v & 0x000FC0) >> 6

                           z = v & 0x00003F

                           ]]--

                           w = math.floor(modulo(v, 0x200000) / 262144)

                           x = math.floor(modulo(v, 0x040000) / 4096)

                           y = math.floor(modulo(v, 0x001000) / 64)

                           z = modulo(v, 0x000040)

                           result = result .. string.char(0xF0 + w, 0x80 + x, 0x80 + y, 0x80 + z)

                        end

                     end

                  end

                  return result

               end

             

            or maybe this: (?)

            function unichr(ord)
               
            if ord == nil then return nil end
               
            if ord < 32 then return string.format('\\x%02x', ord) end
               
            if ord < 126 then return string.char(ord) end
               
            if ord < 65539 then return string.format("\\u%04x", ord) end
               
            if ord < 1114111 then return string.format("\\u%08x", ord) end
            end

            from http://stackoverflow.com/questions/7780179/what-is-the-way-to-represent-a-unichar-in-lua

             

            R

            • 3. Re: Anybody have some unicode string handling code I could use?
              johnrellis Most Valuable Participant

              Hmm, I'm seeing something different.  I tried the following from within a Debug window:

               

              photo:getFormattedMetadata ("title") =>

              "xÀy"

               

              string.find (photo:getFormattedMetadata ("title"), "À") =>

              2

              3

               

              which is the right answer. Perhaps I'm misunderstanding?

              • 4. Re: Anybody have some unicode string handling code I could use?
                areohbee Level 5

                I figured out the difference - I was entering the character 'À' into my text editor which encodes it differently than when you enter it into a Lr text field.

                 

                They look the same, but alas: they are *not* the same.

                 

                In the former case, it's a single-byte 192 (decimal), in the latter case it's a double-byte: 195 & 128 (decimal).

                 

                Thanks again John.

                 

                Turns out this wild goose chase has led me full circle: my client said it (search and replace of fields with accented characters) wasn't working, so I assumed it wasn't working because of text handling, and tested that theory: sure enough, it wouldn't work (wrong)... Actually, it works just fine (not sure what my client had been smoking, or how he entered the character being searched for...) - this story still unfolding...

                 

                Cheers,

                Rob

                • 5. Re: Anybody have some unicode string handling code I could use?
                  johnrellis Most Valuable Participant

                  In the former case, it's a single-byte 192 (decimal), in the latter case it's a double-byte: 195 & 128 (decimal).

                  Ack, how frustrating.  I've gotten myself tied into knots in similar situations, trying to understand how various tools are encoding characters.  (Almost as bad as the situation with metadata date/time :->)