Norwegian index

Jørgen Andreas Bangor jorgenb at ifi.uio.no
Sat Dec 17 00:24:40 CET 1994


Norwegian Index
---------------
Thanx for your comments to my proposal of a Norwegian index. I got
a private mail from Harry (thanx!), but I think I will answer in
public, since others might be interested in that (yeah, right...).

My format is indeed inspired by the general format as devised by Harry.

My format:
> 1994-44    1  D91397     10            DD  Stjernereporteren
> 1994-44    2              8            DD  Oppdrift i manesjen
> 1994-44    3  D91409     10       Vi   DD  Sponsor ma man ha
> 1994-44    4  D93323      8FH     Fer  MM  Fossiljegerne 2
> 1994-44    5  KF2.2.58    1            Fa  Den maten!

Harry's format:
> 74-18b     WDCD 11      3  @                DA  Flyg och far
> 74-18c   D  2572        4                   GO  Den rullande skrothogen
> 74-18d   D  2645        4                   LW  Att grava en grop at andra
> 74-18e   W WDC 189-?1  10       CB  CB      DD  Kampa inte mot naturen
> 74-18f     WDCD 12      3  @                BR  Tornadorken
> 74-18g   ZX 70-03-29    1                   SC  Ludde aker dit
> 74-18h   I  846         6                   US  Den fantastiska tekniken [part 1]

Comments mostly to Harry:

I use 1994 instead of 94, because the format otherwise will look a bit 
strange in about five years. If you (Harry) left that out to save space,
why not remove plotter and inker instead? You haven't used those columns
much. Couldn't that information be included as comments were needed?

I will change my numbering of stories to lettering. That will save space,
and makes searching easier.

How do you 'normalize' the story codes? I haven't looked at your code, so
I don't know why you do it, but if I'm just looking for where the story
D2572 is, it's easier to use grep on 'D2572' than 'D  2572'. I could of
course just look for '2572'... 

I will standardise the creator abbreviations, and if standardising the
character abbreviations makes it easier to use it, I'll do that too.

Instead of '[part 1]' I just say '1', as Ole. Long comments could also
make it wider than eighty columns. That makes it difficult to read for 
humans.
Too many comments can also make machine reading more difficult.

The Extras
----------
The Norwegian Extras aren't just facsimiles of old issues, although some
of them are. Of course they could be put into an own file, but then I 
would have to look through more files to find a certain story. Oh, maybe 
I'm just too lazy...

Empty columns
-------------
Maybe they should be filled by something, like '-', if the information 
isn't known. That would make it easier to use, for instance, awk on them.

Character standards
-------------------
1) }|}[\] is ISO-646-something, usually called ASCII. Some terminals, 
or terminal emulators, translate those seven bit characters into seven 
bit Scandinavian characters. Even if they look Scandinavian on the 
screen, they will still look like {|} etc. other places, where they are 
not translated, or converted.

2) ISO-Latin-1, or ISO-8859-1. These are eight bit characters. That means
they can address a lot more different characters than seven bit sets.
If a file containing eight bit characters is transferred, to a system that
only handles seven bit text, as eight bit, for instance on a disk, or with 
ftp in binary mode, the eighth bit will simply be stripped away. That makes
the three last characters in the Norwegian alphabet look like 'efx'.

3) Electronic mail makes it even more difficult. The SMTP, the mail protocol,
is built to transfer seven bit files. That means that eight bit files will
have to be translated to seven bit when it leaves the sender, and being 
retranslated to eight bit when it reaches the reciever. One way to handle 
this is MIME. MIME puts in a couple of extra headers which tells the system
to use it. Eight bit characters are then translated to sequences of seven
bit characters. The last characters in the Norwegian alphabet (in lower 
case) will then be =E6 =F8 =E5. If the reciever doesn't use MIME, they
will remain that way.
Your problem with the ftp mail service, Harry, is that the file is put
into the mail as eight bit (if you told it to use binary mode) and then
stripped to seven bit in the mail.

Suggestions to solve the problem
--------------------------------
Stop using MS-DOS ;-) Anyway, if your system can handle eight bit 
characters, but your mail system can't, the problem is most easy solved
by using UUencode, ftp in binary mode, or a WWW-client. The WWW-protocol
is, as far as I know, eight bit. 
In my opinion the ISO-Latin-1 standard should be universal, so I will
use that. It isn't that difficult to tranlate it to ISO-646 if that should
be necessary.



   Jorgen




More information about the DCML mailing list