StupidFilter: Bayesian filtering for "stupidity"
StupidFilter is an attempt to automatically detect "stupid" English writing. I'm pretty skeptical of this -- we do a pretty crummy job of detecting spam, and stupidity is a lot more subjective (for example, text-messaging abbreviations and "LOL" are not necessarily indications of stupidity). Still, it seems like an entertaining way to pass the time:
The solution we're creating is simple: an open-source filter software that can detect rampant stupidity in written English. This will be accomplished with weighted Bayesian analysis and some rules-based processing, similar to spam detection engines. The primary challenge inherent in our task is that stupidity is not a binary distinction, but rather a matter of degree. To this end, we're collecting a ranked corpus of stupid text, gleaned from user comments on public websites and ranked on a five-point scale.Link (Thanks, Eileen!)


the latest
latest episodes
>>we do a pretty crummy job of detecting spam, >>and stupidity is a lot more subjective
The difference is that stupid people aren't out there aggressively changing their tactics to evade detection.
This filter actually measures statistical divergence from standard English rather than stupidity but the link is definitely worth a visit for the eminently t-shirtable FAQ entry:
Do you really expect to be able to detect and filter anything that's conceivably stupid?
No, of course not. You'd need real AI for that, and beyond a certain point it's simply subjective; after all, a sufficiently advanced AI would probably filter out the whole of human discourse, which isn't the idea.
We actually do a pretty good job of detecting spam, with reliability levels of well over 90%, even though it's trying to avoid detection. A mere 90% success rate on detecting stupidity could make unmoderated public forums very much more pleasant.
'A software'? Ah, I'm off on the wrong foot already.
I like the idea but in practice the occurrence of certain words - the essence of Bayesian analysis - says nothing about the value of what's being said. Most of the humour on bash.org is funny and insightful without using any objectively "intelligent" words and I've known many a manager who, with an arsenal of "word of the day" vocabulary, couldn't string a coherent concept together for love or money.
Tom Stoppard explores this in Arcadia - if you have a ball to hit, you can hit it with anything.. but if you don't have a point to make it doesn't matter how well you can't make it.
I'm amused by the assumption that "real AI" is smarter than Internet stupidity. As far as I was aware, nothing is foolproof because fools are so ingenious.
If you can measure divergence from Standard English, you can initialize your database with youTube Comment English, and also with the English from top writers. Then you have a scale of the youtubeyness of a passage of text, which is nearly the same thing as its stupidity.
I have been aching for this for years. But I actually do think that text messaging abbreviations are stupid when entry is done with a keyboard.
Okay, actually I think they're stupid all the time. I don't even use them when I'm texting.
But maybe I'm getting old.
Hey, isn't this called "censorship" when other people do it?
Kyle, I came here to make the exact same point. I believe text message abbreviations, shortening words and such are stupid, too.
zomgwtfbbq!
Kyle n dculberson r teh stopid 1's.
txt abbrvshuns r teh best.
lol!
"for example, text-messaging abbreviations and "LOL" are not necessarily indications of stupidity"
Perhaps not, but LOLCats certainly are.
Well, this would explain where the missing 5 million emails from the White House disappeared to.
I'm with PineappleCharm: true things may be said in halting or stupid ways. This isn't going to be a true stupid-filter, but it should make an interesting experiment.
Matt Staggs, we're entitled to sort and exclude incoming signals. There's a world of difference between "I don't need that signal" and "This signal is not allowed to exist."
Heh, anybody that has spend their fair share of time on irc can tell you that a majority of the occourences of 'LOL' are postfix or prefix for something inanely stupid - but this, unfortunately has only been my experience.
Dan Tentler, I find it much more unfortunate that it's also been my experience.
Teresa,
I just think that it's funny seeing this touted at BoingBoing.
Netscape developer Chris Finke wrote a Firefox add-on called YouTube Comment Snob that hides comments based on selectable criteria: More than x spelling mistakes, no capital letters, all capital letters, doesn't start with a capital letter, excessive punctuation, excessive capitalization.
http://www.chrisfinke.com/addons/youtube-comment-snob/
Matt, you think Boing Boing doesn't filter for stupidity?
So once the stupid filter is perfected, how is it going to be used? The instructions on how to set the digital clock on a microwave oven as they are translated from chinese would never make it.
But hey, I'm a guy, I never need those stupid instructions.....