In big news on the government data and transparency front, the premier provider of federal campaign finance information, Center for Responsive Politics (CRP), have announced they are opening for bulk download 20 years worth of data used to power their web site OpenSecrets.org. More than 200 million records are being made available of itemized contributions, campaign spending, lobbying, personal finance, and sponsored travel. CRP began tracking campaign contributions in the late 1980s. Their stats and staff are trusted and quoted by the Media as the gold standard reference.
The opening of the OpenSecrets.org underlying archive of bulk, standardized and industry-coded data is a seminal event for transparency and Web 2.0 political data. Federal bailouts, $5.3 billion dollar election season, newspaper bankruptcies, and an administration pledging "unprecedented transparency" are forces enough to justify making this data archive available. Accessing two decades worth of campaign finance data will make it significantly easier for the many hands of the Web to forensically and predicatively examine influence in federal government.
What makes OpenSecrets.org's data so valuable - and this bulk release so significant - is the added standardization and industry-coding OpenSecrets.org applies the source data they acquire from the Federal Election Commission (FEC). This data conditioning will save many a scholar, reporter, and political junkie many hours of frustration. OpenSecrets.org cleans up the publicly available FEC data as much as possible to identify and standardize all the contributions from the same individual whose name and employer often varies significantly in different FEC filings by different campaigns. Furthermore, OpenSecrets.org relentlessly tracks and assigns a NAIC-like industry code to nearly every individual's employer in order to categorize the contribution according to its most likely economic affiliation.
This categorizing, or "coding", makes it possible to aggregate and sum millions of individual contributions made during an election cycle by employer, industry, and economic sector. Without someone categorizing employers and therefore contributions, it would be impossible for the rest of us to reasonably add up the numbers in meaningful ways or look for trends in the contributions systematically. We can see to whom those working in the banking and financial sector have been financially supporting only because OpenSecrets.org categorizes the individual contributions.
It is this standardized and coded data that is now available in zipped CSV files for each two-year election cycle since 1990 and makes up the biggest piece of CRP's newly opened archive. Many of the files are too big for a spreadsheet, however, so it is likely only those experienced with campaign finance or skilled with databases will be working with the data in the near term. The 200MB zipped file for the 2008 election cycle I downloaded expanded to nearly a gigabyte. Thankfully, the 59-page user guide provides a useful initial guide to the data itself and recommendations on how to do some basic calculations.
Although mostly funded by foundations, OpenSecrets.org enjoyed some revenues through selling this bulk data to the largest news organizations. So it isn't surprising they are a wee bit nervous about taking this step and have emphasized the attribution aspect of the data's Creative Commons Attribution Non-Commercial Share Alike license. The staff of OpenSecrets.org has a commitment to the quality and long-term preservation of this data, and like any good data steward they are concerned the rest of the world use the data in its proper context.
The burden is now squarely on the larger Web and "Government 2.0" crowd to show the opening of this data is worth it. Personally, I'm optimistic. There's more data available every day with which to mash this information. I'm particularly looking forward to seeing OpenSecrets.org industry-codes leveraged with contractors receiving Recovery Act money and looking for patterns of contribution trends over the past two decades.
ProgrammableWeb's Government vertical did not exist when I first started at the Sunlight Foundation in 2006 and now lists dozens of government APIs. The Sunlight Labs developer Google group has 500 developers. And organizations like the New York Times are creating APIs for the first time. OpenSecrets.org archive is another welcome asset to the growing data commons.