New Winds in Applied Mathematics: The Road to NumerocracyMarch 21, 2009
Philip J. Davis
The Numerati. By Stephen Baker, Houghton Mifflin, New York, 2008, 256 pages, $26.00.
A New York Times article (October 13, 2008) on the work of James Pennebaker, a professor of psychology at the University of Texas, described his computerized text analyses of material ranging from Beatles songs to terrorist communications. Pennebaker wanted to determine "how much could be learned by looking at every single word that people used." I would enroll him in the ranks of the Numerati.
Who are the Numerati? They are mathematicians, computer scientists, engineers, physicists, economists, biologists, psychologists, linguists, data miners--anyone, in fact, who consciously devises or uses algorithms that extract certain intimate patterns from the behavior of people, individual or collective. This information is then utilized in a way that has a direct impact on their personal lives. Admittedly, this definition is rather broad, for it might apply to almost any quantitative social science. I hope that it will become a bit more clear as I mention a few of the author's many examples.
"Numerati" is the coinage of the author; the final "i" is merely the Latin masculine plural ending. In this particular term, as with the words "literati," "epigoni," "dilettanti," and many others, there is to my mind implied something elitist, exclusivist, yet dubious, conspiratorial, spurious, if not downright sinister, about the views espoused by members of the group, or about ideas they have put into practice or projects they have designed for the future.
The Numerati has alerted me to a subworld of applied mathematics that is slowly moving onto a stage that for years has been dominated by the laws of Newton or of quantum physics. This is a world whose existence I suspected, while having little knowledge of its increasing extent and impact. What is reported in the book dropped my jaw.
Stephen Baker, a journalist writing in the mode of an investigative reporter, has given us a vivid window into the professional lives of the Numerati. He has smoked out practitioners in their lairs, be they universities, companies, institutes, centers, clinics. He has interviewed them, shared pizza with them, described their modes of thought, their products, and their plans or dreams for the future.
A few years ago, under public pressure, the U.S. Congress scrapped the ad-ministration's plan for a Total Information Awareness program, but the idea seems to be alive and well in the various enterprises Baker surveyed. He introduced me to totally unfamiliar terms and acronyms, such as CGM (consumer-generated media derived from the Internet: forums, blogs, discussions, wikis); NORA (non-obvious relationship awareness); ANNA (derived from anonymity); Bluetooth technology. He has surveyed new companies, such as Accenture, Alliance Tech, Clairvoyance, In-Q-Tel, Tacoda, Umbria, comScore, that employ Numerati by the basketful. And he has not forgotten such oldies as IBM or BBN Technologies.
But enough generalities, let me get down to a few products.
Working a vein similar to that mined by Pennebaker, Baker reports that Jack Her-mansen, a student of computational linguistics, has set up a name-identifying company, LAS (Language Analysis Systems), which has now been sold to IBM. (How do you tell whether Mr. Chang is the same fellow as Mr. Tchang, or even Mr. Tchung?) Well, they're working to improve the algorithms they have.
Consider too the work of Eric Dishman at Intel, who dreams of humans totally wired up for purposes of predictive and prescriptive medicine.
"Dishman sees sensors eventually recording and building statistical models of almost every aspect of our behavior," Baker writes. "They'll track our pathways in the house. . . . They'll diagram our thrashing in bed and chart our nightly trips to the bathroom. Some of these gadgets will even measure the pause before we recognize a familiar voice on the phone." All this personal surveillance is in the service (or the hope) of providing us with longer, healthier lives.
Other groups are extending work of this type in other directions, and the efforts aren't limited to humans. Dan Andresen and Steve Warren dream of wiring up a half million Kansas cows, "tracking every animal from birth to the slaughterhouse"; they have received funding from the National Science Foundation for the project. What is at stake---enough to make a vegetarian shriek in horror---is the quality and availability of T-bone steaks increased in this manner.
Market research? That's been around since the ancient Babylonians kept cuneiform records of the price of onions, but the sophistication of today's version is mind-boggling. Baker informs us that Acxsiom (sic) of Conway, Arkansas, keeps shopping and life style data for some 200 million Americans. "The company buys just about every bit of data about us that is sold," and then "sells selections of it to anyone out to target us in a [political or other] campaign."
ComScore Media Metrix, according to its Web site, is "a preferred source of Internet audience measurement for advertising agencies, publishers, marketers and financial analysts."
Yankelovich, also according to its Web site, "gathers vast amounts of consumer data---not just demographics, but also consumer attitudes, beliefs and aspirations. Next, our senior marketing experts analyze the data to gain insight into who buys what from whom, and why."
Such research, of course, extends beyond our preference for classic orange juice as opposed to orange–pineapple juice. Thus, Nielsen BuzzMetrics is monitoring CGM. (Remember? Consumer-generated media.)
All this I find worrisome. More worrisome or even scary to me is personal information being made available to potentially antagonistic institutions or individuals---for example, the FICO financial credit score.
Turning toward law, Baker writes of ChoicePoint, a company in Georgia that "quietly amasses court rulings, tax and real estate transactions, birth and death notices" so as to enhance, among other things, law and child support enforcement, public safety, and health care.
Suppose that as the chief strategist for a political party, you are worried by the latest polls. Spotlight Analysis, a Washington-based company that divides the electorate into ten groups with characteristic voting patterns, will help you get the swing voters onto your bandwagon.
If you are looking for a felicitous marriage, and ask if you should buy a baby carriage, a company known as chemistry.com, will compare your "vector" (i.e., parameters) against the vectors in a large database of vectors and suggest some optimum sweeties.
Search engines, which seek to hire the finest mathematical talent wherever they can find it, are expanding their horizons. Larry Page, co-founder of Google, has set his sights high:
"The ultimate search engine would understand everything in the world. It would understand everything that you asked it and give you back the exact right thing instantly."
Is this information technology and artificial intelligence gone berserk? Or, paralleling the dying words of the Emperor Julian, shall we say: "Thou hast conquered, O Engine of Search"?
What math do the Numerati and their minions employ? Baker, whose technical knowledge of mathematics I suspect to be modest, hardly delves into the mathematics, but he does manage to toss around some terms---vectors, matrices, probability, Markoff models---and invites us to understand that all of us, individually and collectively, have now been reduced, summed up, and epitomized in this language.
Technically speaking, what is going on is part of learning/computational statistics, a very hot area in applied mathematics and computer science. The useful formal training would include linear algebra, multi-variable calculus, optimization, probability, and statistics on the mathematics side, and algorithms and database theory on the computer science side. Specialized knowledge of the particular domain is of course necessary.
In the case of DNA studies, which have considerable social approval, the vector sequences of A, T, G, and C are very long indeed, and linking the sequences with outward behavior---a very hot topic---requires much field data along with sophisticated mathematical techniques. (See www.personalgenomes.org.) The old "your date and fate" promised by astrologic algorithms is slowly morphing into "your DNA and fate."
Over the years, I may very well have trained students in applied mathematics who have gone on, happily, productively, and lucratively, to enter the ranks of the Numerati. There are fortunes to be made in the Numerocratic domains, and young people know it. The low-hanging fruit in engineering may for the most part have been plucked by now. To go forward with schemes for clean energy may be difficult, whereas creating successful new data mining applications is comparatively easy. It would be interesting to know how the number of hires in these areas among people with recent degrees at any level in mathematics or computer science compares with the number in traditional engineering applications, and how the pay scales compare.
So what price do we pay for all these brave-new-world, big-brother computerized benefits? I get regular mailings from banks, hospitals, and so forth, assuring me that my privacy has been respected. I scoff, believing that my private life is now an open book that anyone interested can read; that my weekly trash has been picked over, my computer surfing, ogles, and googles have been recorded and analyzed; and that I myself have been a silent co-conspirator in this enterprise. Anyone who wants to know anything about me can download my vectors and matrices. Has my identity been totally compromised, or do I still retain a "real me, a private me" that is impervious to such interrogations and analyses? Baker quotes Scott McNealy, the CEO of Sun Microsystems: "Privacy is dead; deal with it."
Baker is only mildly judgmental in his presentations. Yes, loss of privacy is the principal complaint, and I certainly agree. Must we then, in the words of Benjamin Disraeli,* "denounce to a perplexed world the frigid theories of a generalizing age that have destroyed the individuality of man?"
Yet there is more. A component of some of the services offered by the Numerati is risk abatement. Why risk the consequences of snap, top-of-one's-head judgments when the compilation and analyses of hundreds, thousands, nay, millions of facts can be brought to bear?
But risk has many aspects. The risk to whom? On the one hand, we usually want to avoid or to mitigate risk. Most of us have numerous insurance policies, commercial or governmental, optional or mandated, that act to ameliorate matters. Without such policies, we would feel ourselves naked before the winds of chance.
One of the malign and worrisome consequences of data mining is actually an increase in individual risk, as with health insurance. As insurance companies become better able to predict the future health of a customer, the incentive increases to insure only customers who will remain healthy. The risk to the companies decreases, and universal health care becomes ever more important.
On the other hand, humans are risk-taking animals. Every time we cross a busy street we take a "calculated" risk. We gamble in many different ways, not just in casinos, when we take steps to avoid dull, routinized lives but in the process put our lives, our wealth, and our economic systems in jeopardy. Eliminate risk and you increase security. But the price is a decrease in personal liberty associated with the exercise of our free will. A near total elimination of risk would work against human nature.
There are numeratic companies whose product is the protection of the individual against inroads made by other numeratic companies. But there is no system of analysis, supervision, monitoring, or risk modeling that can't be frustrated by hackers or by life itself. When inadequacies or loopholes are plugged, others emerge even as the heads of the mythic Hydra when struck off grew back doubly. (I venture that this could be established by Gödel's theorem.) Such open-endedness bodes well for the future of the field of numeratology. As society moves from democracy to numerocracy, there is always much, much more that can be done. And we must "deal with it."
I wish to acknowledge the help of Ernest S. Davis, who supplied me with some elucidating material.
Philip J. Davis, professor emeritus of applied mathematics at Brown University, is an independent writer, scholar, and lecturer. He lives in Providence, Rhode Island, and can be reached at email@example.com.