As promised, I wanted to “show my work” from my last post, which visualized just how much “Big Data” is being talked about in the tech press. So here goes: I work with an awesome PR database called ITDatabase that tracks what thousands of tech reporters are writing every day. (I’ll repeat my previous sidenote: If you’re in tech PR and don’t know about it… it’s definitely worth a look.) ITDatabase has some analytics already built into the product, but I wanted to take my analysis quite a bit further than ITDatabase currently allows, and then visualize my results. Thankfully, ITDatabase allows you to build projects and export your results into a CSV file, which handily, is one of the (tons of) file formats we support in Datameer.
The first thing I did was get the exact data I wanted from ITDatabase by searching the term “Big Data” over the past 365 days, and I really quickly got my result: over 13,405 articles. I got a complete list of those articles, including a ton of associated metadata, like the publication name, article date, article title, author name, the URL, and more. I exported the data from ITDatabase, opened up Datameer, and started on the first of three steps.
Step 1: Importing the Data
First, I headed to the “Browser Tab”, which is sort of the home base for your Datameer work. In the upper right hand corner, there’s a + button that you’ll use every time you create a new project in Datameer. For the first step, I selected + File Upload, then followed the prompts to bring in my CSV file with just a few clicks. Super easy.
Step 2: Digging into the Analytics
Once again, on the Browser tab, I hit the + button and this time chose New Workbook. It popped open a blank spreadsheet, and immediately asked me what data I wanted to work with. I chose my FileUpload and I was on my way.
The very first thing I wanted to see was simple – how many articles are being written each day about “Big Data”? So I clicked on the “New” tab on the bottom of the page to open a clean spreadsheet. I clicked on the first column and the formula builder popped up. I knew I wanted to group my data by date, so I chose the “GroupBy” function, clicked back over to the first tab with all of my data, then selected the “Date” column. Datameer popped me back over to my second sheet and had instantly listed each unique date. The next and last thing I had to do was count up how many articles were hiding under each date, so again I clicked on the next blank row, and the formula builder popped up again. This time I chose the “GroupCount” function, and voila, I had exactly what I needed: a list of the past 365 days and exactly how many articles contained the phrase “Big Data”. Clicked save, and I was on my way to the third and final step.
Step 3: Visualizing the Results
Back in the Browser tab, I clicked the + button for the third and final time, when I chose “Infographic”. This opened up what essentially is a blank palate for me to work with. On the left-hand side there was the “Add Widget” Inspector, where I just dragged and dropped whichever widget I wanted to work with – a line and area chart, a bar graph, pie chart, sunburst, word cloud, etc. For the first graph I created, the number of articles about “Big Data” per day, I chose the line and area chart. I dragged it onto the blank canvas, then headed to the Inspector on the right-hand side of the screen that housed all of my data that I just analyzed. I found the name of the Workbook I was working on, found the specific tabs I wanted, and again just dragged and dropped the data onto the line and area chart. It automatically populated the chart with all of my data. Then I just headed back to the “Add Widget” Inspector and clicked on the widget settings button, where I chose how many rows I wanted showing (365 in this case), chose how big I wanted the margins to be and how long I wanted the label orientation to look, and I was done. Here we have it:
That was basically it. After that, it was just a matter of rinse and repeat with each different data set I wanted to visualize. I picked my widget, dragged it on the canvas, and then dragged the data on top of it. I fine-tuned in the settings panel and I was done.
What might I do with these findings? Well, the easy answer is start following these reporters and their publications closely. I’ve followed those of them that are on Twitter, I added them to my RSS feed, etc. You might be thinking to yourself “just because they write the most content doesn’t mean they have the most reach, or influence.” That’s a great point, and a perfect example of how you can iterate on your data. The next thing I’ll be doing is joining the publication circulations with this data, to see which publications that write about “Big Data” have the biggest “reach”. This is exactly what data analytics is all about. It’s about asking multiple, iterative questions of your data, not just the same question over and over.
Finally, I’d be remiss to not point out the fact that here’s yet another example of a non-data scientist making use of data. There’s really no excuse to not get started working with whatever data you have today.