Index characters
Fredrik Ekman
ekman at lysator.liu.se
Sun Dec 18 02:54:19 CET 1994
After having followed the debate on different character encoding
standards that I started a few days ago, I have decided that I personally
prefer Latin-1, for the following reasons:
1) ISO-464 looks ugly. Yes, I can read it too, but I do not like it.
2) ISO-464 messes up the comments.
3) Using ISO-464 will result in the same character (|) being used for the
Swedish oe and the Norwegian oe, for example, even though the characters
are actually different. This is not a major problem since the different
languages probably will not be mixed in the same files, but anyway.
4) Latin-1 is the standard for WWW documents. I can see a future where
the database will be immediately accessible for searching throught html
forms.
5) Like Bror pointed out, Latin-1 is used by MS-Windows, Amiga and many X
implementations (and also by Linux, I believe). MS-DOS has its own
"standard" and must be converted any way. What about Mac?
6) Mail transfer is a problem, but can be solved by using UUencode (that
goes for FTP-mail too) or MIME.
7) I have the impression that ISO-464 is, slowly, being phased out.
8) I believe that ISO-464 is local to the Nordic countries and possibly
Germany. What if, say, someone wants to index French or Italian comics?
That would be solved automatically if we decided on Latin-1. Latin-1
still excludes most slavic languages, however.
I think that this covers most of the points that have been up for
discussion and some that have not. Further comments?
Anders wrote, about me changing his a, a and o:
>You are aware that this means quite a lot of work, right? That's the un-
>fortunate consequence of me starting to index before such a standard
>was settled. Sorry 'bout that.
I know. But I am a very stubborn kind of person. Much could be gained by
doing careful "search & replace" for common words. The rest will just
have to be gone through manually, one title at a time. I own very few of
those books, but few ambiguities should be encountered, and I can always
ask about those I do find. Could you send me what you have now? Then I
can see what to do about it during Christmas.
One final point that seems to be confusing some people: When transferring
files through FTP, setting ASCII will not affect the transfer of the
individual characters. Only end-of-line characters will be changed. This,
of course, is valid only for transfer through the FTP protocol, not when
using mail.
/F
More information about the DCML
mailing list