What Tom Sawyer Knew and Google Is Learning
Doing work isn’t fun. Getting other people to do work for you is.
You may recall from your grade school days that Tom Sawyer got stuck with the unfortunate task of painting a fence on a beautiful summer Saturday. It didn’t take him long to try and pass the task to someone else. His first unsuccessful attempt was with Aunt Polly’s servant Jim. He acknowledged to Jim that the job was undesirable and tried to trade for Jim’s job. Jim didn’t take the bait.
Then the neighborhood boys started arriving to make fun of Tom for being stuck with work while they swam and played. Tom could have sullenly taken the insults, but he decided to try a little reverse psychology instead.
[Ben, a neighborhood boy] “Say — I’m going in a-swimming, I am. Don’t you wish you could? But of course you’d druther WORK — wouldn’t you? Course you would!”
Tom contemplated the boy a bit, and said: “What do you call work?”
“Why, ain’t THAT work?” . . . .”Oh come, now, you don’t mean to let on that you LIKE it?” . . .
“Like it? Well, I don’t see why I oughtn’t to like it. Does a boy get a chance to whitewash a fence every day?”
That put the thing in a new light. Ben stopped nibbling his apple. Tom swept his brush daintily back and forth — stepped back to note the effect — added a touch here and there — criticised the effect again — Ben watching every move and getting more and more interested, more and more absorbed. Presently he said “Say, Tom, let ME whitewash a little.”
Tom considered, was about to consent; but he altered his mind:
“No — no — I reckon it wouldn’t hardly do, Ben. You see, Aunt Polly’s awful particular about this fence — right here on the street, you know — but if it was the back fence I wouldn’t mind and SHE wouldn’t. Yes, she’s awful particular about this fence; it’s got to be done very careful; I reckon there ain’t one boy in a thousand, maybe two thousand, that can do it the way it’s got to be done.”
“No — is that so? Oh come, now — lemme just try. Only just a little — I’d let YOU, if you was me, Tom.” … “Oh, shucks, I’ll be just as careful. Now lemme try. Say — I’ll give you the core of my apple.” . . . “I’ll give you ALL of it!”
Tom gave up the brush with reluctance in his face, but alacrity in his heart. And while [Ben] worked and sweated in the sun, the retired artist sat on a barrel in the shade close by, dangled his legs, munched his apple, and planned the slaughter of more innocents. There was no lack of material; boys happened along every little while; they came to jeer, but remained to whitewash.
Chapter II, The Adventures of Tom Sawyer
Tom solved his fence problem by wrapping it in a package that was attractive to someone else. Ben and the other boys even paid Tom for the opportunity to do his work. Does the story sound a bit far-fetched? Not when there are companies doing the same thing every day.
Google has a metadata problem. Metadata is defined as “data about data” – a tag, classification, description, or taxonomy. In general, Google does a good job with metadata – it’s not hard to pull keywords from an article or blog post. But how can Google get metadata about a picture? Most web developers don’t write accurate Alt HTML tags. Image recognition technology is improving but still unreliable. Could you pay someone to spend all day labeling images? Sounds expensive and inefficient.
To solve this problem, Google created the Google Image Labeler. It’s an ingenious game with simple rules – you and an anonymous partner are both shown the same image. You type in words that describe the image, and when you have both typed in the same word, another image is displayed. Points are awarded for each image correctly matched. Players are motivated to label as many images as possible in the shortest amount of time.
The Image Labeler tool “Tom Sawyers” users into creating image metadata for free. The solution is both elegant and fun.
Other Tom Sawyer tricks are implemented elsewhere on the web. One of the most prominent examples is reCAPTCHA (also a Google company). A CAPTCHA is one of those images at the bottom of web forms which has distorted text and requires you to type the letters in order to prove that you’re not a computer. Instead of creating random sequences of numbers and letters, reCAPTCHA displays words from book pages which have been scanned. Optical Character Recognition (ORC) can decipher some of the text that is scanned, but it is often inaccurate. reCAPTCHA gets around the inaccuracy by comparing the OCR scan to the human-entered text. The result is a more accurate digitization of the text.
Success is in the numbers. According to the reCAPTCHA website, 200 million CAPTCHAs are solved every day. Assuming that each CAPTCHA takes 10 seconds to solve, that adds up to over 150,000 hours of free work every single day. Not too shabby.