Skip navigation
jeffbottom
Currently Being Moderated

how to save a UTF-8 encoded text file ?

Jun 18, 2012 7:01 PM

Tags: #file #utf-8

hi People

 

I have a little script which reads the source text from a layer and saves it to a .txt file. This is on a Mac and all was good until recently when I tried opening the .txt file on a PC in Notepad and found my ˚ degree symbols all whack.

 

Resaving the .txt file in TextEdit as Unicode (UTF-8) encoding solved the problem, now opens fine in Notepad.

 

But ideally I'd like the script to output the .txt as UTF-8 in the first place. It's currently Western (Mac OS Roman). I've tryed adding in myfile.encoding = "UTF8" but the resulting file is still Western (and the special charaters have wigged out again)

 

any help greatly appreciated../daniel

 

 

{
    var theComp = app.project.activeItem;
    var dataRO = theComp.layer("dataRO").sourceText;
    
    // prompt user to save file
    var theFile = new File ("~/Desktop/"+ theComp.name + "_output.txt");
    theFile = theFile.saveDlg("Save an ASCII export file.");
 
    if (theFile != null) {          // check user didn't cancel dialog
        theFile.lineFeed = "windows";
        //theFile.encoding = "UTF8";
        theFile.open("w","TEXT","????");
        theFile.writeln("move details:");
        theFile.writeln(dataRO.value.toString());
        }
    theFile.close();
}
 
 
Replies
  • Currently Being Moderated
    Jun 18, 2012 9:51 PM   in reply to jeffbottom

    Have you tried setting the encoding after you open the file?

     

    Dan

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 19, 2012 6:44 PM   in reply to jeffbottom

    Hi,

     

    I remember working hard two years ago on creating a correct text file on OSX, but did not remember if it was a utf-8 case or anything. As my home computer is not a mac, I have no mean to test it tonight, but anyway, here is the big line of it. :

     

    var theFile= new File(.........);

    theFile.open("w", "TEXT");

    theFile.encoding = "BINARY"

    theFile.linefeed = "Unix"

    theFile.writeln("éàçËôù")

    theFile.close();

     

    Let me know if it is working.

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 20, 2012 7:18 AM   in reply to jeffbottom

    Hi, I was just looking at how a  text software knows what is the text encoding of a file is and I found that on wikipedia. http://en.wikipedia.org/wiki/Byte_order_mark

     

    So I created a utf8 file in notepad, and look at the binary. At the start of the file, there is those caracters : 0xEF,0xBB,0xBF or 

     

    So you should try to add those characters at the start of the  file.

     

    var theFile= new File(.........);

    theFile.open("w", "TEXT");

    theFile.encoding = "BINARY"

    theFile.linefeed = "Unix"

    theFile.write("");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)

    theFile.write("Your stuff éàçËôù");

    theFile.close();

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 21, 2012 5:39 PM   in reply to jeffbottom

    Hi,

     

    Got it, it seems, the utf-8 standard use 2-bytes (and more) encoding on accents and special characters.

     

    I found some info there with some code http://ivoronline.com/Coding/Theory/Tutorials/Encoding%20-%20Text%20-% 20UTF%208.php

    However there was some error so I fixed it. (However for 3 and 4 bytes characters i didnt test it. So maybe you'll have to change back the 0xbf to 0x3f or something else.)

     

    So here is the code.

     

     

    Header 1

    function convertCharToUTF(character){

        var utfBytes = "";

        c = character.charCodeAt(0)

        if (c < 0x80) {

            utfBytes =  String.fromCharCode (c);

        }

        else if (c < 0x800) {

            utfBytes =  String.fromCharCode (0xC0 | c>>6);

            utfBytes +=  String.fromCharCode (0x80 | c & 0xbF);

        }

        else if (c < 0x10000) {

            utfBytes = String.fromCharCode (0xE0 | c>>12);

            utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);

            utfBytes += String.fromCharCode (0x80 | c & 0xbF);

        }

        else if (c < 0x200000) {

            utfBytes += String.fromCharCode (0xF0 | c>>18);

            utfBytes += String.fromCharCode (0x80 | c>>12 & 0xbF);

            utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);

            utfBytes =+ String.fromCharCode (0x80 | c & 0xbF);

        }

            return utfBytes

    }

    function convertStringToUTF(stringToConvert){

        var utfString = ""

        for (var i = 0 ; i < stringToConvert.length; i++){

            utfString = utfString + convertCharToUTF(stringToConvert.charAt (i))

        }

        return utfString;

    }

     

    var theFile= new File("~/Desktop/_output.txt");

    theFile.open("w", "TEXT");

    theFile.encoding = "BINARY"

    theFile.linefeed = "Unix"

    theFile.write("");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)

    theFile.write(convertStringToUTF("Your stuff éàçËôù"));

    theFile.close();

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points