Like a lot of people, I have been trying to lose weight lately. And, rather than trying to eat fewer calories than usual and just hope for the best, why not be more systematic in my approach and measure progress over time? Recent books about getting in shape like the popular The 4-hour body by Tim Ferris emphasize the need for tracking one’s improvement.
After some research, I found that I needed to decide what method to use to try to lose weight and get in shape: should I focus solely on my diet? Emphasize working out instead of eating? Or concentrate my effort on sleeping more? Or perhaps a combination thereof?
So I purchased a FitBit, and started getting metrics about daily steps taken and stairs climbed. This was a start, but I quickly realized I needed more metrics, like weight measurement, caloric intake and such, to make sense out of my daily progress. After some research I realized that I was not the only one in that quest, and that there was already a whole movement about how to measure one’s progress against a certain goal, called the Quantified Self, which has gained a lot of steam lately. The Quantified Self group is about “self-knowledge through numbers” that attempts to improve upon oneself via various self-tracking devices. Unfortunately, the tools commonly used, such as FitBit, Withings (weight tracking), as well as apps such as RunKeeper are mostly independent of each other, and while some basic visualization of the measured metric is usually available, there is no easy way to look at the different tools/metrics all at once.
So it hit me: why not use Datameer to do this? Apparently I am not the only one to have thought of using a data analytics tool for that singular view I was looking for; as Stefean Heeke put it on the Quantified Self movement’s website, “Why would only companies have the benefit of data analytics? Why not apply their tools to personal solutions?” Since the problem to solve was not about crunching voluminous datasets, but rather to visualize the relationship between smaller disparate data sources, our Datameer Personal Edition would do just fine. Of course, anyone can do this in Datameer without having to reconfigure and index some configuration files like in other tools .
Since I had just started collecting numbers with my FitBit, I ended up using data by Chris Volinsky, who made his personal data available for anyone to use: calories burned from FitBit, FitPoints, from gym visits, average pace from runs measured on RunKeeper, sleep measurements, Withings weight metrics, among other things. Now I was able to study his data and see how he can best lose weight!
I first massaged the raw data by applying some of our date functions on the different data sets, in order to bring it to a state where it could be joined together.
I then created a multi-join from these datasets, using the dates as the join keys. Since not all of the data was measured on the same days, I ended up using outer-joins to merge all of the data together. My join looked like this:
Next, I created an infographic out of this data, using a multi-scatter plot graph to show the main data points that we have.
The below shows time-series scatter chart of weight (in blue), calories burned (in red), Fitbit points (in green), and number of minutes asleep over time (in orange). You will notice these four graphs are all showing on the same graph at different y-axis levels, and represent a third dimension of the data (the size of each square) according to, in order: the fat mass/weight, steps/calories burned, reps/FitPoints, time in bed/minutes asleep.
To better understand what is going on, I’ve normalized the values on to a 0-1 y-axis, which was done by simply calculating the ratio between (value N _ value Min) over (value Max _ value Min). Then I broke it down to draw a parallel between the different data sets we have.
In the first graph plotting weight loss (in blue) against calorie loss (in red), there does not seem to be a strong correlation.
Perhaps, Chris’ weight loss around January 15 is due to him getting more sleep?
Focusing on the sleep data, I used a Polar Area Chart to see when Chris sleeps the most during the week:
Apparently its on Fridays, the end of the traditional work week. Next, I compared weight data (in blue) against sleep data (in red):
It seems like we don’t have a strong correlation there either; Chris seems to get a good amount of sleep most of the time (red dots represent minutes slept, here put at scale with weight data).
However, plotting weight (in blue) against work-out data (in red), there is a clearer correlation between the two starting around January 2012, with the Fitbit point values going up (in red) and weight loss around end of January, as expected:
This also seems to concur with RunKeeper data where you can see a drop after he has run or cycled for long periods of time; note the drop at the end of June after Chris ran a 5k marathon:
So it seems like working out is clearly the best choice for Chris, according to the data.
Plotting all of the normalized data points noted above, we arrive at this:
So, what else can we do with these data sets in Datameer?
I also played around with Chris’s public tweets (@statpumpkin) via Datameer (you can find instructions on how to connect to Twitter with Datameer here). I filtered his posts to only retrieve FourSquare updates, and plot the aggregation of these against weight using our Sunburst:
Unfortunately there is not a lot of Chris’ Foursquare check-ins on Twitter, but for the ones shown it seems like the more the check-ins, the higher the weight.
Chris, two pieces of advice for you: