Dan rants: Archiving issues

I took the archives apart today and tried to build some scripts to reconstruct them into an SQL served database, and as I was doing so a couple of issues came up. I'd appreciate feedback.

I've been uterly anal-retentive about trying to maintain web integrity. When I create a file, I leave it (or a placeholder) there. In fact I feel really guilty about the fact that I haven't been able to con the keeper of the server which carried my original site in 1994 to keep those files around, even though I haven't been able to access them for nearly two years and have updated copies in the same tree structure on my current server.

But I don't expect that people are linking into the archives of my web log, or at least not the "quickies" and "urls" section.

So as I faced down over a megabyte of text today and tried to cram it into a format that'd be useful I had to look at what to do with it.

There are lots of broken links. If I have an entry that refers only to that link, do I delete the entire entry?

There are lots of "Marylaine updated today" sorts of links. One every Monday for the past year and a half. Do those still matter?

Is the current archive structure (and the associated trash that comes up because of the big search granularity) used by anyone? If not, should I keep it just in case some web historian feels like digging through it at some far distant time? And if I do keep it, should I index it with the search engine, or index just my other database entries?

And is sucking the old archive entries into the new database worth anything?

I'm interested in your comments, and if you e-mail me tell me what I can excerpt and how you'd like to be cited.



Saturday, December 4th, 1999 danlyke@flutterby.com