In 2006, I was lucky enough to get in on an analyst briefing at IBM's Alamaden Research Centre, near San Jose. It was a memorable day. The place itself -- a monumental California modernist building on a hilltop -- is striking. And they kept a copy of the whole public internet downstairs.
The idea of IBM's WebFountain project was to dynamically capture the internet and mine this very large dataset -- about half a petabyte -- in search of trends and concepts. IBM would sell this knowledge to corporate customers with an interest in knowing what was about to happen before it actually happened. It was proposed that this service would be valuable in spheres ranging from marketing and business strategy to "government affairs".
In the end, WebFountain didn't go far, perhaps because it didn't have enough juice. Unlike, Google and Facebook, IBM doesn't have access to private communications. It doesn't have our emails sitting on its servers, ready to be sliced and diced by data scientists. Knowledge really is power and IBM just didn't know enough.
And in reality, all those companies had been behind the real game since at least 2002, when -- only five months after the 9/11 terrorist attacks -- the Defence Advanced Research Projects Agency (which sponsored the original development of the technologies that underpin today's internet) opened the Information Awareness Office, which was to track, surveil and analyse asymmetric threats to the USA, in pursuit of what DARPA called Total Information Awarness. And in case anyone wasn't unnerved by this concept, it adopted this extraordinary logo:
Both houses of Congress had sufficient concerns about the implications for privacy for the IAO to be defunded in 2003. But a number of the office's projects proceeded separately -- including Evidence Extraction and Link Discovery, which developed systems to, as the Wikipedia article puts it:
... extract data from multiple sources (e.g., text messages, social networking sites, financial records, and web pages). It was to develop the ability to detect patterns comprising multiple types of links between data items or people communicating (e.g., financial transactions, communications, travel, etc.). It is designed to link items relating potential "terrorist" groups and scenarios, and to learn patterns of different groups or scenarios to identify new organizations and emerging threats.
So it did what IBM was playing with -- but with much bigger, much better data and with a quite different purpose. In particular, it could avail itself of the NSA Call Database, which compiles records of trillions of phone calls made by Americans (but not the calls themselves), sourced directly from telephone companies, and Pinwale, a less-well-understood archive of foreign and domestic emails.
The existence of the database -- and the fact that phone companies were directly supplying the records to the National Security Agency -- was revealed in 2006, by Seymour Hersh in the New Yorker. Which is why it should not have been too much of a surprise last week when The Guardian revealed that the NSA was "currently collecting the telephone records of millions of US customers" of the phone company Verizon, under a "top secret court order". This has been going on for years. The only news was that the Bush-era programme was carried on by Obama, although that should hardly have been a surprise.
It is worth reading this robust column by David Simon, the creator of The Wire. Even if you don't buy his nothing-to-see-here-you-clowns view -- and I don't -- he does bring some useful perspective to the argument:
... the fact remains that for at least the last two presidential administrations, this kind of data collection has been a baseline logic of an American anti-terrorism effort that is effectively asked to find the needles before they are planted into haystacks, to prevent even such modest, grass-rooted conspiracies as the Boston Marathon Bombing before they occur.
The next Guardian revelation was more perplexing. A programme called Prism directly sources user data from the servers of Microsoft, Yahoo, Facebook, Google, Apple and others. The evidence -- indivdual slides from a Powerpoint presentation -- seems good. But all the companies named have emphatically denied participating in, or even having heard of, the scheme. The Washington Post, which has had access to the same leaks as The Guardian, walked back its claims about Prism in some quite remarkable ways. Is this less than it seems? What's the status of the Powerpoint from which this is drawn? And why would such a massive project have a budget of only $20 million annually?
The last (so far) shoe to drop has been the extraordinary emergence of the source for these new stories -- 29 year-old NSA contractor Edward Snowden. But although the story -- like the others, the work of US lawyer and activst Glenn Greenwald, his colleague and film-maker Laura Poitras and Guardian Washington bureau chief Ewan MacAskill -- gives an admirable account of Snowden's motivations, it fails to put to him a number of important questions about the provenance of the Powerpoint and why the companies involved feel able to so strongly deny their (voluntary, at least) participation. There are gaps in this story that could have been closed by its source and its puzzling that they haven't been.
In part, this may be down to Greenwald himself. His dreadful invocation of a rape metaphor to respond to his critics in 2012 fell well short of the conduct we might expect of a public-interest journalist. He clearly isn't inclined to give his source a grilling. But it's impossible to ignore what Snowden does say in the accompanying Guardian Q&A:
The NSA has built an infrastructure that allows it to intercept almost everything. With this capability, the vast majority of human communications are automatically ingested without targeting. If I wanted to see your emails or your wife's phone, all I have to do is use intercepts. I can get your emails, passwords, phone records, credit cards.
Although Snowden carefully distinguishes his own actions from the indiscriminate leaking of Pfc. Bradley Manning, one observation can be made of both of them: how on earth could relatively junior staff (more so Manning than Snowden) apparently have access to everything? It is quite extraordinary.
Our own country falls into the mix in some dizzying ways. Another Guardian story covers Boundless Informant, the NSA's current tool for summarising metadata about signals intelligence and where it's being generated. It's not surprising in the least that the NSA has such a tool, or that its most intensive surveillance occurs in north and central Asia and the Middle East.
But where does New Zealand fit in all this, given that via the Government Communications Security Bureau, we are intelligence partners with the US? How safe are our own communications? Should we be less acquiescent about the bill going through under urgency to legitimise illegal surveillance of New Zealanders?
Throw in Kim Dotcom -- whose chief employee is going ballistic about the GCSB bill and whose encryption-built-in file locker business Mega is in an interesting space given that Dropbox is supposedly about to be added to the Prism stable -- and the completely bizarre case of Peter Dunne, and it's clear there's plenty for us to unpack.
We'll be looking at both the reporting and the story on Media3 this week, with technology commentator Ben Gracewood and internet security expert Adam Boileau. We've cleared out two thirds of the show to talk about it and I think it'll be worth watching.
If you'd like to join us for tomorrow evening's Media3 recording, come to the Villa Dalmacija, 10 New North Road, at 5.30pm. If you're an intelligence official, do say hi.