How do open source data analysis and visualizations by individuals compare to big data projects by data scientists using cutting edge software and infrastructure? Some clue may lie in looking at the impact of blogging on professional journalism.
A few years ago I worked for Lycos and helped launch a blogging tool. Blogging was pretty new then and Blogger, Typepad, and other tools were just getting off the ground. These new tools were primarily aimed at consumers who posted pictures, wrote in a diary, or even shared home improvmeent projects. There were sharp distinctions between “journalists” who were credentialed, had fact checking, and an editorial process vs. “bloggers” who could write about any topic and self-edit. In encyclopedias, you had similar distinctions between “real” encyclopedias like the Encyclopaedia Brittanica with extensive editorial processes and open source reference sources like Wikipedia with open review processes.
Over time, those distinctions between “blogger” and “journalist” has become less clear, and even something as big an undertaking as an encyclopedia can be democratized.
Can this happen to data analysis? It has a lot in common with a large editorial enterprise — was done by specialized writers with editorial control, and now a person can self-edit and self-publish using free or low cost tools. While there always will be a high end (data scientists working on big data), given the freely available sources of data and low cost tools like Excel, Powerpivot, Many Eyes, Tableau Public, it’s possible for anyone to analyze millions of records of data and visualize them. A recent article in the Guardian, Data Journalism is the New Punk argues that data journalism is such that everyone can do it.
The results will probably be similar to blogging — some analysis spectacularly good, deep, and insightful, some very wrong. The debates on whether it should be so readily available and impact on “professional work: is will continue as have with blogging and open source encyclopedias.