Index characters

Fredrik Ekman ekman at lysator.liu.se
Sun Dec 18 02:54:19 CET 1994


After having followed the debate on different character encoding 
standards that I started a few days ago, I have decided that I personally 
prefer Latin-1, for the following reasons:

1) ISO-464 looks ugly. Yes, I can read it too, but I do not like it.

2) ISO-464 messes up the comments.

3) Using ISO-464 will result in the same character (|) being used for the 
Swedish oe and the Norwegian oe, for example, even though the characters 
are actually different. This is not a major problem since the different 
languages probably will not be mixed in the same files, but anyway.

4) Latin-1 is the standard for WWW documents. I can see a future where 
the database will be immediately accessible for searching throught html 
forms.

5) Like Bror pointed out, Latin-1 is used by MS-Windows, Amiga and many X 
implementations (and also by Linux, I believe). MS-DOS has its own 
"standard" and must be converted any way. What about Mac? 

6) Mail transfer is a problem, but can be solved by using UUencode (that 
goes for FTP-mail too) or MIME.

7) I have the impression that ISO-464 is, slowly, being phased out.

8) I believe that ISO-464 is local to the Nordic countries and possibly 
Germany. What if, say, someone wants to index French or Italian comics? 
That would be solved automatically if we decided on Latin-1. Latin-1 
still excludes most slavic languages, however.

I think that this covers most of the points that have been up for 
discussion and some that have not. Further comments?

Anders wrote, about me changing his a, a and o:
>You are aware that this means quite a lot of work, right? That's the un-
>fortunate consequence of me starting to index before such a standard
>was settled. Sorry 'bout that.

I know. But I am a very stubborn kind of person. Much could be gained by 
doing careful "search & replace" for common words. The rest will just 
have to be gone through manually, one title at a time. I own very few of 
those books, but few ambiguities should be encountered, and I can always 
ask about those I do find. Could you send me what you have now? Then I 
can see what to do about it during Christmas.

One final point that seems to be confusing some people: When transferring 
files through FTP, setting ASCII will not affect the transfer of the 
individual characters. Only end-of-line characters will be changed. This, 
of course, is valid only for transfer through the FTP protocol, not when 
using mail.

  /F



More information about the DCML mailing list