Norwegian index
Jørgen Andreas Bangor
jorgenb at ifi.uio.no
Sat Dec 17 00:24:40 CET 1994
Norwegian Index
---------------
Thanx for your comments to my proposal of a Norwegian index. I got
a private mail from Harry (thanx!), but I think I will answer in
public, since others might be interested in that (yeah, right...).
My format is indeed inspired by the general format as devised by Harry.
My format:
> 1994-44 1 D91397 10 DD Stjernereporteren
> 1994-44 2 8 DD Oppdrift i manesjen
> 1994-44 3 D91409 10 Vi DD Sponsor ma man ha
> 1994-44 4 D93323 8FH Fer MM Fossiljegerne 2
> 1994-44 5 KF2.2.58 1 Fa Den maten!
Harry's format:
> 74-18b WDCD 11 3 @ DA Flyg och far
> 74-18c D 2572 4 GO Den rullande skrothogen
> 74-18d D 2645 4 LW Att grava en grop at andra
> 74-18e W WDC 189-?1 10 CB CB DD Kampa inte mot naturen
> 74-18f WDCD 12 3 @ BR Tornadorken
> 74-18g ZX 70-03-29 1 SC Ludde aker dit
> 74-18h I 846 6 US Den fantastiska tekniken [part 1]
Comments mostly to Harry:
I use 1994 instead of 94, because the format otherwise will look a bit
strange in about five years. If you (Harry) left that out to save space,
why not remove plotter and inker instead? You haven't used those columns
much. Couldn't that information be included as comments were needed?
I will change my numbering of stories to lettering. That will save space,
and makes searching easier.
How do you 'normalize' the story codes? I haven't looked at your code, so
I don't know why you do it, but if I'm just looking for where the story
D2572 is, it's easier to use grep on 'D2572' than 'D 2572'. I could of
course just look for '2572'...
I will standardise the creator abbreviations, and if standardising the
character abbreviations makes it easier to use it, I'll do that too.
Instead of '[part 1]' I just say '1', as Ole. Long comments could also
make it wider than eighty columns. That makes it difficult to read for
humans.
Too many comments can also make machine reading more difficult.
The Extras
----------
The Norwegian Extras aren't just facsimiles of old issues, although some
of them are. Of course they could be put into an own file, but then I
would have to look through more files to find a certain story. Oh, maybe
I'm just too lazy...
Empty columns
-------------
Maybe they should be filled by something, like '-', if the information
isn't known. That would make it easier to use, for instance, awk on them.
Character standards
-------------------
1) }|}[\] is ISO-646-something, usually called ASCII. Some terminals,
or terminal emulators, translate those seven bit characters into seven
bit Scandinavian characters. Even if they look Scandinavian on the
screen, they will still look like {|} etc. other places, where they are
not translated, or converted.
2) ISO-Latin-1, or ISO-8859-1. These are eight bit characters. That means
they can address a lot more different characters than seven bit sets.
If a file containing eight bit characters is transferred, to a system that
only handles seven bit text, as eight bit, for instance on a disk, or with
ftp in binary mode, the eighth bit will simply be stripped away. That makes
the three last characters in the Norwegian alphabet look like 'efx'.
3) Electronic mail makes it even more difficult. The SMTP, the mail protocol,
is built to transfer seven bit files. That means that eight bit files will
have to be translated to seven bit when it leaves the sender, and being
retranslated to eight bit when it reaches the reciever. One way to handle
this is MIME. MIME puts in a couple of extra headers which tells the system
to use it. Eight bit characters are then translated to sequences of seven
bit characters. The last characters in the Norwegian alphabet (in lower
case) will then be =E6 =F8 =E5. If the reciever doesn't use MIME, they
will remain that way.
Your problem with the ftp mail service, Harry, is that the file is put
into the mail as eight bit (if you told it to use binary mode) and then
stripped to seven bit in the mail.
Suggestions to solve the problem
--------------------------------
Stop using MS-DOS ;-) Anyway, if your system can handle eight bit
characters, but your mail system can't, the problem is most easy solved
by using UUencode, ftp in binary mode, or a WWW-client. The WWW-protocol
is, as far as I know, eight bit.
In my opinion the ISO-Latin-1 standard should be universal, so I will
use that. It isn't that difficult to tranlate it to ISO-646 if that should
be necessary.
Jorgen
More information about the DCML
mailing list