Jonathan Gray - On Archiving Everything: Borges, Calvino, Google

In a sense Google’s approach to meaning is uncannily like that of the later Wittgenstein: don’t look for deeper structures underlying the way we make sense of things, pay attention to the surface, to what people do and how they interact with language, with words, sentences, and signs. Don’t derive an arbitrary ontology or an abstract rule from particular cases: watch what people do, how they behave, and iterate accordingly. The success of their algorithms is predicated on the recognition that meaning is not something fixed which can be analysed and understood apart from what people do. Statistical modelling based on actual user behaviour will win out over attempting to second guess what they want with static schema. In Google’s total archive, the company don’t just retain every book, every page, every sentence, but every interaction with every item: every click, pause, foray, allusion, babble, farrago and yawn. For our cacophonies are Google’s gold.

There can be no doubt that Google’s use of statistical techniques has helped it advance far beyond earlier attempts at “artificial intelligence,” since it can use its data supply to automate the process of learning, instead of relying on experts to mold the data to perfection.

However, it should be pointed out that whatever Google is doing, it is obviously inferior to whatever it is the brain is doing. A normal human brain doesn’t need to read every book ever in order to make terrible, ungrammatical translations from Chinese to English. A normal human brain doesn’t need to process thousands of training messages to tell spam messages from ham. However it is that the brain works, it is still able to learn much more and much more quickly than Google is with the same data set.