« Back to blog index

Fishing the Clickstream…


Firstly, I’m excited to announce that there’s a major new release of DAS (1.3) available.  1.3 includes, among other things, some powerful tools to perform clickstream analysis through just a few simple steps, and makes visualization of user behavior a breeze.  I wanted to give you a overview of these new tools, and provide some food for thought on how simple it is to extract meaningful insights into visitor behavior from raw web logs, a common use case for DAS and Hadoop.

The goal here is to be able to scrape raw log files from your Apache or IIS web servers and visualize something like this:

This new visualization in DAS, called the “Circular Connection Graph” tells us the relative density of one-hop clickpaths.  It’s an easy way to measure and visualize click-through rate (CTR) from various campaign landing pages, or to compare the popularity of referring web sites (i.e. marketing partners who drive traffic to your site). But this is just one small fish in the sea of weblogs (see what our customers say about the importance of behavioral analytics).

The real magic for Hadoop and DAS is that this data, when enriched with visitors profile or other interaction data (think: MySQL, Oracle, Teradata, Twitter), can give you fine-grained, visitor-level insights previously out of reach.  Canned web traffic reports from a traditional application might only give you aggregated data; cloud-based analytics solutions might show you detail in the clickstream, but can’t correlate that behavior with the transaction systems of record that track the rest of the customer lifecycle, namely: purchases, balance history, call center interactions or in-store visits.  There’s more about that here.

Let me show you a bit about what I mean.  With standard web analytics packages, you can easily get answers to the basic questions of web behavior (including popular pages, session duration and clicks per session) with canned reports.  These are straightforward aggregations (roll-ups) which are easily done in DAS, and much easier than in raw Hadoop, where you’d write Hive QL, Pig or MapReduce code.

Here’s a few examples of those (click the images if you’d like a larger view).

Thanks to the game-changing economics of Hadoop, you can always afford to save every click.  What does that mean?

1. Raw server logs can be fed into Hadoop, eliminating a separate ETL, modeling or pre-processing stage in the data pipeline.  With DAS, this requires zero coding.

2. Using DAS, key elements of user behavior; not just session stats, but page dwell time and click paths preferred by specific users, can easily be extracted and sliced on any dimension.  That provides insightful stats like what you see below. It could also mean dense visualizations like the one at the top of this post, which can serve up daily insights to the folks responsible for customer acquisition or marketeers managing campaigns.

DAS also gives you flexibility.   First, it separates the wheat from the chaff.  Filtering errors, image requests and page refreshes from the clickstream is simple.  Second, DAS let’s you divide-and-conquer the data pipeline.  Data warehousing expertise can be applied to cleanse, enrich and pre-process the data (e.g. sessionizing traffic your own way, with any timeout), which can then be fed on a platter to the BI and marketing teams to create roll-ups, or to data scientists to look for clusters of visitors or develop predictive models. Finally, you can go wild and join this with anything you can throw at DAS: user profile, demographics, emails from your CRM, Twitter feeds, or last month’s blog post.  Sound like a fantasy?  All you need is a handful of spreadsheets and an imagination.  Click to zoom in on the screenshot below to get a taste.  Or wait for the video I’ll be posting soon.

This is clearly a rudimentary example of clickstream analytics, but it’s a starting point that contains valuable nuggets of insight, and it’s easy to extend.  Most importantly, it makes this machine-generated data accessible.  And that’s what data science is all about.

Want to get started today? Contact us for a free trial download, VMWare, or turnkey instance in the cloud.

Happy fishing!

Matt Schumpert is Director of Product Management at Datameer.

Comments are closed.