Hi, I'm getting trouble with encoding. I thought I was using only UTF-8, as defined in my locale.
Well, I guess that was enogh but it is not. Today I download a file (and you are welcome to do so...)
The file on my HD should be Constituiçao_Compilado.htm but when I 'ls' on it
All right, something's wrong here. Have to figure out what's going on...
See that e7 on first line? That's a ccedil (ç) on UTF-8 as should be. Why is it messed up?
OK, I'll try a move
Gotcha! now appears c3 a7 which is ccedil on ISO-8859-1. Geez, where's that coming from?
I know fonts are not standard on Linux, but how can I apply UTF-8 as a whole and get rid of that annoying ISO-8859-1?
I think what I'm seeing should be something related to X (config is not locale related), but had no luck on System Settings, probably is on another place. Ideas are welcome.
Code:
$ locale LANG=pt_BR.UTF-8 ...
Code:
$ wget -k [url]https://www.planalto.gov.br/ccivil_03/Constituicao/Constitui%E7ao_Compilado.htm[/url]
Code:
$ ls Cons* Constitui?ao_Compilado.htm
Code:
$ ls Cons* | hexdump -C 00000000 43 6f 6e 73 74 69 74 75 69 e7 61 6f 5f 43 6f 6d |Constitui.ao_Com| 00000010 70 69 6c 61 64 6f 2e 68 74 6d 0a |pilado.htm.| 0000001b
OK, I'll try a move
Code:
$ mv Constitui�ao_Compilado.htm Constituiçao_Compilado.htm $ ls Cons* Constituiçao_Compilado.htm $ $ ls Cons* | hexdump -C 00000000 43 6f 6e 73 74 69 74 75 69 c3 a7 61 6f 5f 43 6f |Constitui..ao_Co| 00000010 6d 70 69 6c 61 64 6f 2e 68 74 6d 0a |mpilado.htm.| 0000001c
I know fonts are not standard on Linux, but how can I apply UTF-8 as a whole and get rid of that annoying ISO-8859-1?
I think what I'm seeing should be something related to X (config is not locale related), but had no luck on System Settings, probably is on another place. Ideas are welcome.
Comment