Long term trends: The Ngrams Viewer

“A database of intentions” is how John Battelle described Google. It is a thrilling concept, at times unsettling, that you can see into the searching soul of the connected populace by seeing the words they use t find things.

Google Trends is one of those miraculous tools of the web that has quickly become commonplace. With a prophylactic time-lapse to keep its powerful advantage of insight, Google lets us see what people were search for by year and by region.

The other day I came across the Google Ngrams Viewer for the first time. This gives a slightly longer trends view in language, taking all the books since 1800 as its data set (actually up to 2008, I think).

(more…)

Facebook planet

201101101112.jpg

This image has been on my computer desktop and on my mind since I saw it in December. High time I shared it here, really.

It’s a data visualisation of 10 million pairs of friends on Facebook and where they live in relationship to one another, created by an intern on Facebook’s data infrastructure engineering team. Read the original blog post in full – it is fascinating stuff.

As Ian Tait points out, what’s amazing is that there is no map underneath, and yet you can pick out the shapes of the continents.

Interesting too are the gaps – China, Brazil and Russia are underrepresented, perhaps due to the fact that other social networks are more prevalent in those territories (RenRen, Orkut and Vkontakte respectivelY).

Via Broadstuff. Reminded by Crackunit

Mining the where: CIA-funded start-up says 80% of online content contains location information

201011251323.jpg

Last week at Local Social Summit I talked about some of the issues around location and privacy. Especially problematic is people’s inadvertent tagging of their photos with location information – something potentially of interest to crooks, stalkers and others you might not want to know everything about you.

Open-source spying is a term which has been around for a while, reflecting the fact that when it comes to gathering information, the web is often as good a place as going into the field. In-Q-Tel’s investments reflect a justified fascination with the social web by intelligence agencies.

Well it turns out the CIA is also interested in this kind of information. In a post about the CIA’s Silicon Valley VC firm, In-Q-Tel, the Not So Private Parts blog on Forbes found the firm…

…likes companies coming up with better ways to mine social networking sites and geospatial location data. One of its investments, Geosemble, a private spin-off from USC, estimates that “80% of online content has location information.”

80%? Wow.

“Our mission is to shine a torchlight on geographic unknowns and help organizations neutralize threats and capitalize on opportunities in their areas of geographic interest,” says its website. Another of IQT’s geospatial investments, FortiusOne, promises instant maps based on Tweets and photo uploads, for mapping election-day threats in Afghanistan, for example.

The idealist in me is attracted to the data mining stories of humanitarian efforts of platforms like Ushahidi, but we should remember that governments and their agencies are interested in our geo-location information as well.

Bonus link : : Really interesting presentation from FortiusOne on analysing geo-data for business.

Mining the now: Real time sense making

Here’s an emerging idea about how we can and will make sense of the vast amounts of data becoming available about what is happening right now…

Juliana Rotich (whose work at Ushahidi I’ve discussed on this blog about a year ago) writes on her Afromusing blog about pulling together the ability to crowdsource the gathering and making sense of data in real time (which is what Ushahidi and the related Swift River project do) with the concept of the internet of things (objects and systems connected to the internet and making data available continuously):

Allow me to use the word holistic. As in holistic near ‘real time sense-making‘, incorporating the internet of things, with crowdsourced data delivered through channels that encourage participation. There is an opportunity to see things dynamically and not just do after-the-fact post mortem. This could work for flash point events like the Haiti earthquake (taking data [from] Geiger counters etc + crowdsourced data like that available on the haiti deployment run by Noula.ht. It could also work for longer term events such as the BP Oil spill in Louisiana.

Juliana quotes Jeff Jonas, a scientist at IBM:

…the closer to real-time one can get the right answer and respond, the better. And milliseconds matter.

The concept of real time sense making offers so many tantalising possibilities, from predicting the behaviour of human social networks to helping those networks (countries, companies, NGOs) respond to emergencies and more broadly to the challenges we face globally.

Be sure to check out Juliana’s original post for some amazing examples of how this concept is being put to work.

201011251225.jpg

Image: Literally, a swift river…

Web shadows: Looking after ourselves online

 

What does your web shadow look like?
What does your web shadow look like?

 

I’ve been reading and therefore thinking a fair bit about privacy and personal online reputation.

It’s something I’ve touched on in the past and the posts Managing your online reputation will be a core life skill and Online overshare: the personal rep pitfalls have had a small but steady trickle of traffic ever since. 

I tend to be an optimist, and in accepting the considerable benefits of living in part online in social media, have learned more and more to be open, while also being clear with myself about where the boundaries of one’s public online life are set. 

For some time we’ve had the concept of Google Shadows – what people find out about you when they put your name into Google (Jeff Jarvis is who I heard using it first). 

I like the idea. A shadow is something that’s always with us, that follows us, that’s not separate. We increasingly need to be conscious of the shadows online cast by our actions in everyday life. 

It’s not just Google, though, these days, but our other online places, all the public and private databases and spaces in our working and personal lives and in our social graphs, of course. 

We need to not only be aware of what our web shadows are, but how we affect them through all of our everyday actions. Sometimes when people want to know more about you, the shadow is all they will see. 

Here’s some of the most interesting posts and articles that I’ve been chewing over: 

  • When a Governor in Arizona’s indiscretion was picked up by an open mic Lawrence Lessig takes CNN to task for broadcasting it, and muses on how it seems we have to “remember that there are a million privacy invading technologies surrounding us”. The discussion in the comments is very good indeed. 
  • Like Lessig, private investigator Steve Rambam, summons the spectre of of life in the Cold War communist bloc in this video of his presentation called Privacy is Dead: Get Over It. Rather than worrying about Big Brother though, the proliferation of digital photography and video, among other things, means it’s more “Little Cousin” – as in we never know when we’re being recorded by one another, even inadvertently.
  • Rambam, an individual with a colourful Google shadow, to say the least, was also quoted in an article in the Economist’s the Perils of Sharing, part of the newspaper’s The World in 2009 special edition. More on that later… 
  • Lastly, the brilliant David Spark’s 12 Great Tales of De-Frieinding reminds us how quickly we are having to evolve new social strategies to deal with relationship issues online.

Meantime, one last recommended read on privacy – the New York Times had a great piece called You’re Leaving a Digital Trail. What About Privacy?

In part, it follows an experiment at the MIT Media Lab where 100 students electronic trails (emails, calls, etc) are recorded and followed. It also recalls how the data about us can be mined to interesting effect: 

In 2006, Sense Networks, based in New York, proved that there was a wealth of useful information hidden in a digital archive of GPS data generated by tens of thousands of taxi rides in San Francisco. It could see, for example, that people who worked in the city’s financial district would tend to go to work early when the market was booming, but later when it was down.

One suspects that early morning cabs are pretty plentiful at the moment then…  Anyhow, Dr Pentland, who heads the project describes this sort of thing as “reality mining”.

 

A map expressing collaboration between students in an MIT Media Lab experiment
A map expressing collaboration between students in an MIT Media Lab experiment