About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Sneak Peek? Geeking Out on Hive Plots

By on August 22, 2012

Once a month or so at Datameer, the engineers get to take a day to hit ‘pause’ on our current development projects and instead work on something that we’ve personally envisioned for the product. Adobe calls this “JDI” days (Just Do It), Atlassian calls them  “ShipIt” Days, and at Datameer, we call it our GeekOut.

For this month’s GeekOut, I chose to develop what will hopefully become a new visualization feature in a future release of Datameer — a variant of the Hive plot. Hive plots are graphical tools that allow perceptually uniform visualizations of network data that show connections between graph nodes.

Take for example, the Apache Hadoop user mailing list. To create a Hive plot visualization of the list’s email communications, first we create a workbook to analyze certain parts of the email list data including the creator of an email thread, and a list (or a JSON array to be precise) of all the people who replied to that thread started by the creator. This is done with a few of our pre-built point-and-click analytic functions – GROUPBY, GROUP_JSON_ARRAY, etc.

Once we’ve done that, we would move over to the visualization module, choose the Hive widget, and then attach the data we worked with in the workbook.

So how do you interpret this visualization ? The nodes along the upper axes represent the email addresses of people who started email threads and did not participate in any other conversations. The left axis contains nodes of users who participated in email conversation without starting their own original thread. The two axis in lower right quadrant map people who both started new email threads and participated in threads created by other mailing list participants. Nodes on these two axes are duplicated for more transparent visualization of between-node connections (i.e. no curve starts and ends on the same axis).

There you have it, a brand new widget that you might just see in a future version of Datameer. This Datameer Hive plot visualization is powered by the D3.js library and is an adaptation of Mike Bostock’s Hive plot script. To see it in all its interactive glory, check out this demo video I created. Enjoy!

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Karel Kolman