Trackback
2005-12-07 01:35:59.222555+00 by
Dan Lyke
1 comments
A week or three ago the Flutterby server locked up. It still responded to pings, but I couldn't get into it to see what was going wrong. So I had 'em reboot it, and let it run, monitoring it somewhat carefully.
After a few days, I noticed something curious: There were multiple copies of the TrackBack checker running. The script is set to fire off once a day, runs through all of the outstanding TrackBack requests, and verifies that the requested page actually has a link to the page it claims to. Bogus requests were coming in so fast that my machine wasn't keeping up with them.
So: Future communications systems will have to have trust issues engineered in from the start, and implementations which don't verify that trust will have to be quickly squelched, otherwise the system will become worthless.
And I coould write a multi-threaded version that might be able to keep up with the spam and bad links and all, but I'm declaring TrackBack dead. A good idea in theory, but yet another commons ruined.
[ related topics:
Web development Weblogs Spam Software Engineering
]
comments in ascending chronological order (reverse):
#Comment Re: made: 2005-12-08 16:42:22.458756+00 by:
Jerry Halstead
At the mapping company I developed a system that I think could be utilized for comments/trackback
problems.
We had the problem of folks (including competitors) trying to geocode their entire database
programatically through our mapping interface. You can certainly spot them after the fact but by then
it is water (er, lat/lons) under the bridge.
I wrote tools that tailed the web logs and kept track of connection rates and types for a given "user."
For example, a spam script is going to pound on a comments or trackback page but won't be loading
the graphics. There's a number of use patterns you can code (this was using perl). As it identified
problems it put them into an exceptions list.
The main script for rendering pages checked the list and when a user hit certain thresholds they would
first get pages back after some longer delay and then eventually would just get blank results or an error
page.
The goal was to reduce the load on the rest of the server once questionable usage was identified. The
exception list automatically timed out, so a user who somehow triggered the response wouldn't be
blacklisted forever. It also wrote out a history file so we could tune the system and make permanent
blacklists if needed.