Jump to content


Important Announcement!

Please read this post

Photo
- - - - -

Encoding issues


  • Please log in to reply
17 replies to this topic

#1 Goofy

Goofy

    Advanced Member

  • Super Mod
  • 8,435 posts
  • Gender:Male
  • Location:GoofyLand


  • Extension Developer: Yes
  • Extensions: BabelZillaMenu-BabelZilla Glossary-OpenTran...
  • Translator for French (fr)
  • My OS Gnu/Linux
  • Translation Credits to Goofy

Posted 08 September 2005 - 10:31 PM

hello all smile.gif

As everyone, I still have encoding problems from time to time. Even if I can avoid most of them by experience, I am very ignorant about ANSI, UTF8 (to which I have a superstitious fervor) and whatsoever weird things blink.gif
Sometimes I am proud to find the bug and to have spent hours for a "&" in a locale file..
Sometimes I am ashamed of having destroyed the ja-JP locale of a babelZillian friend, just with opening/closing a file with Pspad set for UTF8

So... Some of you seem to know much about that, some others are real hackers, some are dedicated to sorcery code blink.gif ...

so please post your tricks and tips and knowledge here !



Think Global, Make Locales!


Sometimes I am on irc://moznet/BabelZilla
but you can also drop a word in the shoutbox

#2 victory

victory

    Advanced Member

  • Members
  • 237 posts


  • Extension Developer: No
  • Translator for [No translator]

Posted 09 September 2005 - 09:09 AM

this is just a passing idea though..

when we include Japanese characters into *.propaties files,
we should encode those file using native2ascii(or something like this) after edit, before zip.
can't you use this for adding some sentence into multi-byte files?

native2ascii -encoding utf-8 native-file ascii-file
native2ascii --reverse ascii-file native-file

only adding after its content, you can use redirect, of course.


other?
use binary editor for merging your sentence, in conjunction with favorite text editor :-)
try xyzzy :-)

in fact, I edited using binary editor when I edited http://www.bugzilla.org/download/
because that file has numerous characters.

#3 Luana

Luana

    (Just a passionate localizer since 2004!)

  • Admin
  • 3,478 posts
  • Gender:Female
  • Location:(Previously known as MatrixIsAllOver, now simply Luana; I'm from the past, like a ghost...)
  • Interests:Let me dream one better world...<br />Let me believe that we can begin together from here...<br />Let me the time in order to see the volunteers of all international Forums, without distinctions, here for this project!


  • Extension Developer: No
  • Translator for Italian (it)
  • Translation Credits to it: Luana Di Muzio - BabelZilla

Posted 10 September 2005 - 02:59 PM

QUOTE(Goofy @ Sep 10 2005, 15:11)
- dear MatrixIsAllOver, I have read frequently your messages about using ANSI, and I never really tried, not that I don't trust you  rolleyes.gif , but I thought it was compulsory to use utf-8. Can you confirm nothing strange happens for the extension when your locale is in ANSI ? (I suppose it is for the .properties file, what with the contents.rdf and the .dtd ? )

View Post



Dear Goofy,
I can confirm that the following quoted method works really fine
QUOTE
I try to explain better using an Italian example:
In case I have to translate "it will open" I start my translation using UTF8 encoding, therefore I'll write "aprirà", but before saving the file I'll change the encoding to ANSI-> "aprirà "; and this method works fine!  wink.gif


NOTE: ANSI is 8-BIT, and not UNICODE !!!
E.g.>>In the above example "it will open" using UTF-8 encoding becomes "aprirà", using ANSI-> "aprirà " and using * UNICODE "aprir\u00E0"
* I use UNICODE (& PsPad) when I have to localize properties file!


Naturally, only if you try you may confirm yourself too rolleyes.gif

Regards
E' meglio tacere ed essere, che dire e non essere (Ignazio, II sec. d. C)
Quod scripturae mandatur, non solum praesentibus sed futuris prodesse valeat...

#4 Goofy

Goofy

    Advanced Member

  • Super Mod
  • 8,435 posts
  • Gender:Male
  • Location:GoofyLand


  • Extension Developer: Yes
  • Extensions: BabelZillaMenu-BabelZilla Glossary-OpenTran...
  • Translator for French (fr)
  • My OS Gnu/Linux
  • Translation Credits to Goofy

Posted 10 September 2005 - 03:41 PM

smile.gif thank you for explaining, but I am so very goofy as you know, some points remain unclear for me.

1. For .dtd files, I always use PsPad set once and for all on uft8, and no encoding problem ever occurs with our weird characters : I just write " mêmes les élèves garçons mangeront du maïs où ils veulent" save it as it is and have it without problem when the extension runs.
Is it for dtd files that you use ANSI ? (I suspect I did not understand anything once again blink.gif )

2. For .js and .properties file, I use just the same utf8 setting, BUT (as you know) I cconvert the special characters just before saving with a special file (see attached zip below) that I once and for all have patched in Pspad app folder.
Is it in this case that ANSI can be useful ?

Or else did you mean that the very recent builds of Firefox need to apply ANSI with extensions ?

Help, I am deep in a pretty kettle of fish huh.gif blink.gif biggrin.gif !

Attached Files


Think Global, Make Locales!


Sometimes I am on irc://moznet/BabelZilla
but you can also drop a word in the shoutbox

#5 Luana

Luana

    (Just a passionate localizer since 2004!)

  • Admin
  • 3,478 posts
  • Gender:Female
  • Location:(Previously known as MatrixIsAllOver, now simply Luana; I'm from the past, like a ghost...)
  • Interests:Let me dream one better world...<br />Let me believe that we can begin together from here...<br />Let me the time in order to see the volunteers of all international Forums, without distinctions, here for this project!


  • Extension Developer: No
  • Translator for Italian (it)
  • Translation Credits to it: Luana Di Muzio - BabelZilla

Posted 10 September 2005 - 05:20 PM

QUOTE(Goofy @ Sep 10 2005, 16:41)
Is it for dtd files that you use ANSI ? (I suspect I did not understand anything once again blink.gif )

Exactly smile.gif
I use for *dtd files UTF-8 encoding while I translate, but I switch in ANSI encoding when I save the files

QUOTE(Goofy @ Sep 10 2005, 16:41)
For .js and .properties file ANSI can be useful ?

Absolutely no!
I use for these files UNICODE encoding too wink.gif


QUOTE(Goofy @ Sep 10 2005, 16:41)
Or else did you mean that the very recent builds of Firefox need to apply ANSI with extensions ?

Not "very recent build"...
I use this "method" in order to localize in Italian language every extension which has a compatibility between FF aviary builds (1.0.x) and FF DP alpha1/alpha2 & DP beta1
E' meglio tacere ed essere, che dire e non essere (Ignazio, II sec. d. C)
Quod scripturae mandatur, non solum praesentibus sed futuris prodesse valeat...

#6 Goofy

Goofy

    Advanced Member

  • Super Mod
  • 8,435 posts
  • Gender:Male
  • Location:GoofyLand


  • Extension Developer: Yes
  • Extensions: BabelZillaMenu-BabelZilla Glossary-OpenTran...
  • Translator for French (fr)
  • My OS Gnu/Linux
  • Translation Credits to Goofy

Posted 10 September 2005 - 05:45 PM

QUOTE("The very patient Miao")
smile.gif
I use for *dtd files UTF-8 encoding while I translate, but I switch in ANSI encoding when I save the files


Good, but.. I still cannot figure out WHY you switch to ansi, because utf8 alone is sufficient for me and does not need further trick... What if you do NOT save in Ansi ? Do you have parsing errors ? I don't when leaving utf8 alone blink.gif ?

Sooooorry for insisting, I must be missing something somewhere and maybe everybody knows but me sad.gif sad.gif
Think Global, Make Locales!


Sometimes I am on irc://moznet/BabelZilla
but you can also drop a word in the shoutbox

#7 Pedro

Pedro

    eXtenZilla IT Member

  • Members
  • 711 posts
  • Gender:Male
  • Location:Ferrara - Italia


  • Extension Developer: No
  • Translation Credits to Luca Pedrazzi - www.extenzilla.org

Posted 10 September 2005 - 06:11 PM

QUOTE("Goofy")
Soooorry for insisting, I must be missing something somewhere and maybe everybody knows but me  sad.gif  sad.gif


You are not the only one! wink.gif This discussion is cleaning some doubt that I had about encoding. user posted image
Are you sure that I am a translator? Have you seen my english? IPB Image
IPB Image IPB Image IPB Image
Goofy's corrections © inside. The dog with the glasses has come back.

#8 Luana

Luana

    (Just a passionate localizer since 2004!)

  • Admin
  • 3,478 posts
  • Gender:Female
  • Location:(Previously known as MatrixIsAllOver, now simply Luana; I'm from the past, like a ghost...)
  • Interests:Let me dream one better world...<br />Let me believe that we can begin together from here...<br />Let me the time in order to see the volunteers of all international Forums, without distinctions, here for this project!


  • Extension Developer: No
  • Translator for Italian (it)
  • Translation Credits to it: Luana Di Muzio - BabelZilla

Posted 10 September 2005 - 06:38 PM

QUOTE(Goofy @ Sep 10 2005, 18:45)
What if you do NOT save in Ansi ?

I don't know if this behaviour may be depending 'cause of the different keyboards (I'm not sure at all, but somewhere I read that the French keyboards are various from those Italians unsure.gif [OT]c'est vrai?[/OT]), but if I don't switch in ANSI encoding I notice a XML Error sad.gif

I think that tittoproject may confirm this strange behaviour smile.gif

E' meglio tacere ed essere, che dire e non essere (Ignazio, II sec. d. C)
Quod scripturae mandatur, non solum praesentibus sed futuris prodesse valeat...

#9 Goofy

Goofy

    Advanced Member

  • Super Mod
  • 8,435 posts
  • Gender:Male
  • Location:GoofyLand


  • Extension Developer: Yes
  • Extensions: BabelZillaMenu-BabelZilla Glossary-OpenTran...
  • Translator for French (fr)
  • My OS Gnu/Linux
  • Translation Credits to Goofy

Posted 11 September 2005 - 08:42 AM

@Pedro : my previous post and your reaction deleted because they were too irrelevant and - though not intended at all !- may be considered as bad taste humor. Sorry to be goofy.

@Miao I don't know if the keyboard has anything to do with it. Maybe it is the encoding feature chosen on our browser (it is quite possible I write some silly thing here, who cares biggrin.gif ). Mine is "Occidental (ISO-8859-1).

If you mean the possibility of accessing to accentuated characters with the keys, it is true that my french (?) keyboard has éèçàù, the double point as appearing on ï and the special accent like on this ê or î, û, ô .
Is there anything like that on an "italian" keyboard ?
Think Global, Make Locales!


Sometimes I am on irc://moznet/BabelZilla
but you can also drop a word in the shoutbox

#10 Luana

Luana

    (Just a passionate localizer since 2004!)

  • Admin
  • 3,478 posts
  • Gender:Female
  • Location:(Previously known as MatrixIsAllOver, now simply Luana; I'm from the past, like a ghost...)
  • Interests:Let me dream one better world...<br />Let me believe that we can begin together from here...<br />Let me the time in order to see the volunteers of all international Forums, without distinctions, here for this project!


  • Extension Developer: No
  • Translator for Italian (it)
  • Translation Credits to it: Luana Di Muzio - BabelZilla

Posted 11 September 2005 - 09:00 AM

QUOTE(Goofy @ Sep 11 2005, 09:42)
If you mean the possibility of accessing to accentuated characters with the keys, it is true that my french (?) keyboard has éèçàù, the double point as appearing on ï and the special accent like on this ê or î, û, ô .
Is there anything like that on an "italian" keyboard ?

View Post


No, in the Italian keyboard we don't have ç, ï ,ê, î, û and ô
* * *
And I don't think that this strange behaviour UTF-8 - ANSI may be depending on the browser encoding, 'cause I use the same your one, and I suppose tittoproject is using the same one too
E' meglio tacere ed essere, che dire e non essere (Ignazio, II sec. d. C)
Quod scripturae mandatur, non solum praesentibus sed futuris prodesse valeat...

#11 Goofy

Goofy

    Advanced Member

  • Super Mod
  • 8,435 posts
  • Gender:Male
  • Location:GoofyLand


  • Extension Developer: Yes
  • Extensions: BabelZillaMenu-BabelZilla Glossary-OpenTran...
  • Translator for French (fr)
  • My OS Gnu/Linux
  • Translation Credits to Goofy

Posted 11 September 2005 - 09:09 AM

QUOTE("TheVeryPatientMatrixIsAllOver")
No, in the Italian keyboard we don't have ç, ï ,ê, î, û and ô


Oh yes, I am too blink.gif silly, should have known that, considering you have no words with these characters !

* * *
QUOTE
And I don't think that this strange behaviour UTF-8 - ANSI may be depending on the browser encoding, 'cause I use the same your one, and I suppose tittoproject is using the same one too

Oh. I am pleased to see we have a common ground (in spite of various keyboard designs biggrin.gif )

To get back to serious matters (which is pretty difficult these times, I reckon), I still don't understand why I should switch to ANSI. Has it anything to do with the selected OS on my pspad customized features (Format menu) ? Mine is always set to DOS and not to UNIX.
Think Global, Make Locales!


Sometimes I am on irc://moznet/BabelZilla
but you can also drop a word in the shoutbox

#12 Goofy

Goofy

    Advanced Member

  • Super Mod
  • 8,435 posts
  • Gender:Male
  • Location:GoofyLand


  • Extension Developer: Yes
  • Extensions: BabelZillaMenu-BabelZilla Glossary-OpenTran...
  • Translator for French (fr)
  • My OS Gnu/Linux
  • Translation Credits to Goofy

Posted 11 September 2005 - 09:12 AM

QUOTE("MIAO")
we don't have ç, ï ,ê, î, û and ô


But surprise.gif surprise.gif how did you write on this very post ? copy/paste ?

- anything about os selecting in pspad ? (seems another goofy ideas of mine dry.gif )
Think Global, Make Locales!


Sometimes I am on irc://moznet/BabelZilla
but you can also drop a word in the shoutbox

#13 xavivars

xavivars

    Advanced Member

  • Members
  • 62 posts
  • Location:Benissa - País Valencià
  • Extension Developer: No
  • Translation Credits to Xavi Ivars - Softcatalà

Posted 12 September 2005 - 09:41 AM

I'm not sure, but maybe MIAO has to change the encoding to ANSI because the editor doesn't save UTF-8 correctly. So changing the encoding, he forces the editor to save with that encoding (look that just before saving the text file, character à is viewed like aprirà , what is the same as if you see an UTF-8 file with a non-UTF-8 editor.

Another thing is that I never save .properties files with "escaped Unicode", I always do it with UTF-8, and I haven't had any problem. Could someone explain me why? blink.gif

#14 Goofy

Goofy

    Advanced Member

  • Super Mod
  • 8,435 posts
  • Gender:Male
  • Location:GoofyLand


  • Extension Developer: Yes
  • Extensions: BabelZillaMenu-BabelZilla Glossary-OpenTran...
  • Translator for French (fr)
  • My OS Gnu/Linux
  • Translation Credits to Goofy

Posted 12 September 2005 - 09:54 AM

QUOTE
I never save .properties files with "escaped Unicode", I always do it with UTF-8, and I haven't had any problem. Could someone explain me why?


I am unable to explain this other mystery. It must depend of the editor's configuration you are using.

QUOTE
I'm not sure, but maybe MIAO has to change the encoding to ANSI because the editor doesn't save UTF-8 correctly. So changing the encoding, he forces the editor to save with that encoding (look that just before saving the text file, character à is viewed like aprirà , what is the same as if you see an UTF-8 file with a non-UTF-8 editor.

I see, it is very likely. The strange point for me is : that forced encoding (that I must use for .properties or .js files) is useless for me with the dtd files / but it is necessary for Miao and tittoproject when using the same editor (pspad) as me.huh.gif (I am just wondering what different settings we have, but of course I have not the smallest doubt about the efficiency of the ansi encoding as described above)
Think Global, Make Locales!


Sometimes I am on irc://moznet/BabelZilla
but you can also drop a word in the shoutbox

#15 Ptit Lutin

Ptit Lutin

    Tech Admin

  • Admin
  • 901 posts


  • Extension Developer: No
  • Translator for French (fr)

Posted 26 September 2005 - 05:48 PM

Axel Hecht, l10n coordinator, has the same trouble than us:

http://www.axel-hech...ves/000190.html

According to Java spec, the right encoding of property-files should be ISO 8859-1 but Mozilla supports UTF-8 too. user posted image

#16 Sil

Sil

    Member

  • Members
  • 13 posts
  • Location:Hungary
  • Extension Developer: No
  • Translator for Hungarian (hu-HU)

Posted 26 September 2005 - 10:05 PM

QUOTE(Ptit Lutin @ Sep 26 2005, 18:48)
According to Java spec, the right encoding of property-files should be ISO 8859-1 but Mozilla supports UTF-8 too. user posted image

View Post


I think it's better (or at least much easier) to work with a file that is entirely in escaped Unicode than with a file in ISO-8859-1 encoding mixed with some characters in escaped Unicode (for those characters that don't exist in ISO-8859-1). For example, in Hungarian there are characters like ő and ű (o and u with 2 strokes on them) that have no equivalents in ISO-8859-1 unlike the other special Hungarian characters (á, é, í, ó, ö, ú and ü). If I save the .properties files in escaped Unicode, I won't have any encoding issues and don't have to worry about the accents. But those who use a Western European language don't even meet the difficulties that a mixed encoding would represent (and I think that's why it's suggested this way in the specifications), because the ISO-8859-1 encoding was tailored specifically for them.

BTW, when I started translating extensions, I didn't know much about encodings, I simply followed the instructions at mozdev.org. (And even today I'm using UniRed for editing, because that's what I'm used to.)

#17 victory

victory

    Advanced Member

  • Members
  • 237 posts


  • Extension Developer: No
  • Translator for [No translator]

Posted 28 September 2005 - 05:59 AM

differs from latin-derivative-language,
Japanese(of course Chinese and Korean also I think) charcters are completely differece so we cannot edit in 'escaped Unicode' format.
as I said above, we use native2ascii to transform them after edit has finished.
I hate this so I'm using semi-automation script to avoid this step though.
anyway, latin chars go through asis, so author's editor should be able to handle at least latin chars correctly :-p

additionary, authors who don't have public repository(real repository, or web-based) should provide diffs between previous version and recent version :-p
my work go using svn so I can see diffs anyway though :-)

#18 victory

victory

    Advanced Member

  • Members
  • 237 posts


  • Extension Developer: No
  • Translator for [No translator]

Posted 05 October 2005 - 11:06 PM

try this
http://www.omegat.or...gat/omegat.html


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users