Update times
2002-01-27 19:49:47+00 by
Dan Lyke
5 comments
Dave commented on my use of Weblogs.com date feeds, which prompted Dori to ask why Backup Brain wasn't in that list. That list down there on the right is derived directly from my Opera bookmarks list. It should be pretty much impossible to get http://www.backupbrain.com (note the lack of trailing slash) into the address that a browser finally reports getting, and therefore that it puts in the bookmarks file. I've noticed there are a few other pages missing updates too, I need to figure out what's going on there. And I believe that Dave has blocked Large American Penis from the weblogs.com update list, I'd love to be pulling this from the aggregate of several update checkers. Anyway, let's see if the new code does any better, if I've got a few spare minutes maybe I'll try to fix the current database as well, rather than just waiting for it to update.
[ related topics:
Weblogs Dave Winer Flutterby Meta
]
comments in ascending chronological order (reverse):
#Comment made: 2002-01-27 21:07:03+00 by:
jim winstead
[edit history]
http://blo.gs/ is the aggregate of several update checkers. it also supports the weblogs.com xml-rpc api for being pinged directly. and i'm working on incorporating polling.
blo.gs also makes sure that urls have at least a single slash for the path part of the url, as part of the (evolving) duplicate prevention.
you can also search the blo.gs database to see if it has ever seen an update about a weblog.
#Comment made: 2002-02-21 05:34:49+00 by:
Dori
It looks like things are better, but not completely fixed. It's showing an
updated time, which is an improvement, but not the most current one.
#Comment made: 2002-01-28 00:59:07+00 by:
DaveP
[edit history]
I tried to do something like Dan's list of links with update times back at the end of last
November.
My plan was to initially ask blogs about their updates directly if they weren't in the
weblogs.com list (and worry about writing the code to talk to weblogs.com later)
The problem is that I tried to use the HTTP HEAD request, and numerous sites don't
respond correctly (or refuse to respond at all with a 403 FORBIDDEN) to a HEAD
request. I based the script on surl and xurl scripts that come with the
perl.cookbook.examples from http://examples.oreilly.com/cookbook/
pcookexamples.tar.gz
I also swapped some mail with Ev, and he said he was
going to add some code to the server at blogspot so it
would return correct update times, which it wasn't. I haven't checked to see if he's
made that change or not, but without the change, the "Last-Modified" time in the
header just wouldn't change on an update.
Sadly, I seem to have gotten upset at the whole project after a while and just
chucked the code because I was finding so many sites that wouldn't tell me when
they'd been updated. Bleh.
#Comment made: 2002-02-21 05:34:51+00 by:
Dan Lyke
Yeah, I've taken special care to make sure that my Last-Modified
header is relatively correct, but that sure isn't the case across the web. And so we build baroque work-arounds because people don't care to implement standards (Yes, this is a hypocritical take relative to my complaints with the Web Standards Project, but they don't have a Last-Modified
header either).
Dori, my server polls weblogs.com every 3 hours, but my front page only updates when the message counts change or there are new entries. Don't know a good way to change that, so I may have to tweak my CMS to update the page, but keep the headers consistent and not poll weblogs.com unless one of those two things triggered the change.
#Comment made: 2002-02-21 05:34:51+00 by:
jim winstead
[edit history]
what do people think about pinging weblogs.com when there are new comments to a weblog? one of the reasons i switched to polling flutterby for http://blo.gs/ is because i assumed the frequent listing on weblogs.com was due to a helper application gone awry that didn't know enough to ignore the number of comments when checking if flutterby was updated. i didn't even stop to think it might be an intentional ping. (my polling script grabs the update time from the rss feed.)
the blo.gs poller is really just an adaptation of the polling script i was using for my own news page before. like everyone else, i found the last-modified header totally unreliable, and ended up coding up little regular expressions for most of the sites to grab whatever timestamp they included in their page. for those that didn't have a timestamp at all, i just stripped out a bunch of stuff (images, '# comments', etc) and did an md5sum. for the two dozen or so sites i was tracking that way, it to be quite reliable. when i started using weblogs.com data in preference to my own polling, i found that a number of the sites would show up as updated on weblogs.com when they weren't really updated (like http://www.evhead.com/ and http://www.camworld.org/).