Actually, my numbers in the previous post are not entirely correct. I'm basing them on the logfile I keep of activity, and I found I was graphing spam as ham, which means the 50% will be even more. I've corrected it, and I'll leave it for a little while, and then I'll report on the results (if anyone is interested of course), but basically in the beginning I was blocking 90% of POST requests to my forums due to the API calls to stopforumspam. And that all without any false positives. I think it's quite promising, but it needs a bit of tweaking to finish it off. Just in case you're interested, the forum is at age-tea-tea-pea forums.phplist.com.
Yesterday the SFS calls blocked 1400 posts to my forums. I'm only doing an SFS check on POST requests (as those are the only ones that need blocking). In the very early stages I accidentally did an API call on every single request, and ended up going over the 20.000/day limit that SFS have put in their API (somewhere around noon).
The class I'm writing is using three sources for anti-forums spam: stopforumspam.com, projecthoneypot.org and akismet.com. At the moment, I only use SFS as the active component, but I'm also checking honeypot and found that it adds a few entries that SFS misses, but not many. I haven't finished with akismet yet, but my impression is that it's not going to contribute very much. Akismet is probably too focused on Wordpress type comment-spam, as opposed to forum spam.
If anyone knows of any other creative commons sources of anti-spam data, please let me know and I'll add support for the service in the class. I guess I should also look into Bad Behaviour, the original subject of this thread.