Up to the Challenge: Computer Scientists Crack a Set of AI-based Puzzles

November 7, 2002

Sara Robinson

In a September 2000 conference call, Udi Manber, chief scientist at Yahoo, told several theoretical computer scientists at Carnegie Mellon University about a "chat room problem" at Yahoo.

Nefarious computer programs were infiltrating Yahoo chat rooms, marketing company products and gathering personal information. Spam companies, too, were taking advantage of Yahoo's free e-mail accounts, writing programs to sign up for hundreds of free accounts and using them to send bulk mailings. Manber wanted a mechanism for screening out "bots," as such programs are called, from Yahoo.

CMU professor Manuel Blum's solution to the problem was Captchas, an acronym for Completely Automated Public Turing Test to Tell Computers and Humans Apart. As described in the April 2002 issue of SIAM News, Captchas are little cognitive puzzles based on the hardest problems of artificial intelligence. They have the paradoxical property that computers are capable of generating and grading them, but not of passing them. But humans---or at least most humans---can pass them easily. Yahoo incorporated such a test into its registration process.

The Crypto Connection
The idea of using a Turing test to solve Yahoo's problem wasn't new. Manber had already been thinking along those lines, and Andrei Broder, several years earlier, had come up with a similar solution to problems with the registration process for the search engine Alta Vista.

But Blum envisioned Captchas as far more than a stopgap measure for Yahoo. As a cryptographer, he saw many parallels between Captchas and cryptographic codes, and he applied many elements of the cryptography paradigm to Captchas. In particular, he knew the value of having a competitive attack-and-defense interplay between researchers.

Captchas have another important crypto-like feature: The databases used to generate the puzzles are public, which means that the security of the scheme is determined only by the hardness of the underlying AI problem.

Blum and his student Luis von Ahn posed a few sample Captchas as a challenge to the artificial intelligence community. Eventually, Blum hopes to prove theorems that "reduce" Captchas to hard AI problems. That way, unsolved Captchas will be useful tools; the solved Captchas will also solve hard AI problems.

"We want people to break our Captchas because we want people to do good AI," von Ahn has stated.

Blum and von Ahn recently got their wish. In September, Jitendra Malik, a professor of computer science at the University of California, Berkeley, and Greg Mori, his PhD student, devised a program that cracks Gimpy, the Captcha that Yahoo is using as part of its registration process.

Their crack follows an effort by Henry Baird, a researcher at Palo Alto Research Center, and Monica Chew, a graduate student at the University of California at Berkeley. Baird and Chew modified an OCR (optical character recognition) program so that it can defeat Gimpy about 20% of the time.

Because Baird and Chew's algorithm is deterministic, it defeats the same instances of Gimpy each time. This means that Gimpy can be bolstered just by omitting the instances that their program can solve. Malik and Mori's algorithm, by contrast, seems to be a thorough crack of the concept of Gimpy. It solves more than 80% of the instances of the Yahoo puzzle, and because the algorithm is randomized, it can defeat different fractions of the puzzle each time.

Figure 1. Examples of Gimpy: the EZ version (top), which Yahoo incorporated into its registration process to screen out computer programs, and the hard version (bottom), which asks viewers to identify some of the distorted or overlapping words shown. A team of Berkeley computer scientists has written programs that can solve instances of both.

Gimpy comes in two flavors: EZ Gimpy and Hard Gimpy (see Figure 1). EZ Gimpy consists of a single distorted word surrounded by lots of background noise. Designed to defeat commercial OCR codes, EZ Gimpy is extremely easy for humans to solve (a big part of its appeal for Yahoo). A Hard Gimpy puzzle is a picture of several distorted and overlapping words chosen at random from an 850-word dictionary. Solving the puzzle requires identifying about half of the words and typing them into the box provided.

Malik and Mori wrote one program that can crack EZ Gimpy more than 80% of the time and another that cracks Hard Gimpy about 20% of the time. Both programs use a general object recognition technique called shape context matching, developed last year by Malik and his former student Serge Belongie, who is now at the University of California, San Diego.

Manuel Blum says that he is delighted that Gimpy was cracked, particularly since the researchers used a very general technique rather than a specialized program. "I'm impressed . . . and grateful," he says.

Manber, who was a professor of computer science at the University of Arizona before joining Yahoo, was less grateful but nonetheless quite gracious. "When I put my academic hat on, it's nice," he says, "but with my Yahoo hat on, there's more work to be done."

To Catch a Gimpy
Belongie and Malik's shape context matching technique was designed to solve the general computer vision problem of finding instances of known objects in a cluttered environment. The technique works by identifying each object with a descriptor, a small amount of information about the object, that is robust to background clutter and deformation.

One excellent descriptor for an object is obtained by taking a random set of n points from the interior and exterior contours and then, for each point, computing the set of vectors from that point to all other points. This collection of points and associated vectors is an excellent descriptor because, as n grows, it becomes an exact description of the shape of the object. But it's too detailed for making rapid comparisons of objects.

To overcome that obstacle, Belongie and Malik approximate the vector descriptor, computing, for each point, a coarse histogram of the relative coordinates of the remaining n - 1 points. The points are assigned to bins according to a log-polar coordinate system. This way the descriptor is more sensitive to nearby points than to those farther away. In addition, the researchers associate to each bin a vector that represents the average direction of the edge points in that bin.

To find instances of a shape in a cluttered environment with shape context matching, the researchers first use other techniques to locate boundaries between objects; they then compute descriptors and do comparisons. They were able to identify objects in standard data sets with more than 99% accuracy.

Applying this technique to Captchas, however, which are selected from rather large databases, would be prohibitively slow. Malik and Mori decided to break the technique into a fast-pruning stage, followed by a more detailed matching on the remaining, smaller set of objects. The fast pruning consists of shape contexts based on a very small sample of points. That effort is followed by the computationally expensive detailed stage, performed only on the pruned database: The researchers attempt to deform each candidate model shape, bringing it into alignment with the query shape. They compute the cost of each deformation and select the model with the lowest cost.

For the EZ Gimpy attack, the researchers start by computing descriptors for a sample set of letters. Then, given an image, they use a fast-pruning stage to guess at possible locations of letters in the image. They then find strings of these possible letters that form candidate words, after which they use the detailed matching stage to choose the most likely words. Once the letters are preprocessed, their Matlab program breaks Gimpy in about a minute. Optimized in C, Malik says, the program could run in real time.

For the Hard Gimpy attack, the researchers knew that comparing letters wouldn't work. Because the the words overlap, even a human has difficulty making out many of the letters. Instead, the researchers devised a scheme for finding and comparing the shape contexts of entire words. The method is far slower and less accurate, but running the algorithm again and again improves the success rate.

Beyond Gimpy
Now that Gimpy is broken, the onus is back on Blum and von Ahn to design something better. The two CMU researchers are confident that they can meet the challenge, but Malik doesn't think it will be easy to create a character recognition Captcha that will hold up for long.

Gimpy was designed to defeat standard OCR programs, which are made to recognize font distortions or handwriting and must run very fast, far faster than a human can read. Once you throw out these constraints, Malik says, text recognition algorithms can become much more powerful.

"Humans have a general-purpose object recognition ability," he says. "OCR programs have only dealt with it in a special limited sense."

Thus, from a computer vision standpoint, Malik says, the most challenging Captcha is Pix (Figure 2). Each instance of Pix displays a set of images that look very different, but have in common an idea that can be expressed by a single word. One image might be a photograph of a woman, another a Picasso-esque painting of a woman; a third picture might show a discarded evening gown and shoes with no woman in sight.

Figure 2. Asked what the images in an instance of the Captcha known as Pix have in common, most human beings easily come up with an answer. But "a computational solution that matches the human method will be a long time coming," says Jitendra Malik, who recently cracked two versions of the Captcha Gimpy.

A human being brings the resources of years of learning to bear on a puzzle of this type; a computational solution that matches the human method will be a long time coming, Malik says. The open database requirement for Captchas, however, can make an instance of Pix far easier than the underlying AI problem it's based on. Because the database of pictures used to generate an instance of Pix must be public, the puzzle is vulnerable to an easy database attack. The researchers preclude such attacks by requiring that each picture be deformed by some transformation after it has been selected. But then the problem of solving Pix is still different from the underlying AI problem. This example demonstrates some of the challenges of designing good Captchas.

Still, Blum's project is fulfilling his vision as more and more researchers join the effort to create new Captchas and defeat existing ones.

Baird, at PARC, is working on both challenges. Across the bay at Berkeley, Doug Tygar a professor of computer security, has joined the effort as well. He and Chew, his student, are devising user authentication schemes that rely on tasks humans are good at, like recognizing faces or objects, rather than those they perform poorly, like remembering passwords.

"Up to now, the focus was on what machines do well," said Tygar. "Captchas are putting what people can do at the forefront."

Sara Robinson is a freelance writer based in Berkeley, California.

Renew SIAM · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Youtube