The Challenge – Build an Automated Sankey Diagram in Tableau
Hello folks! I recently had an interesting challenge at work. I wanted to visualize customer change from one period to another to see which customers were increasing or decreasing in regards to order activity. I decided that I wanted to rank customers by the # of orders they had placed on the Jobaline.com platform from the 30 day period ending 90 days ago (old period) to the 30 day period ending yesterday (new period). These periods had to be dynamic, the dashboard had to refresh automatically using a live connection to our Amazon Redshift data warehouse, and had to effectively show big changes so we could easily see who was ordering a lot more and who was ordering less relative to all our other customers. I decided the way to go was to build a Sankey Diagram in Tableau.
A Sankey Diagram (Wikipedia) is a way to visualize process flows. It is not a native chart type in Tableau, but I knew that someone in the wonderful Tableau Community had figured it out. After some searching I came across this fantastic blog post on Jeffrey Schaffer’s(@HighVizAbility) Data + Science Blog. He, together with input from Jonathan Drummey (@jonathandrummey), both Tableau Zen Masters, came up with the perfect solution.
This allows you to pick two time periods, and as long as you have the rank for both, you can build a fantastic visualization that shows the change.
My problem was that this method involved using Excel and so the datasource was static and would have to be updated manually whenever a refresh was required. I wanted to do the data prep in SQL and use dynamic date filters to make everything ‘just work’ and be fully automated. I was able to do this and the below video outlines the process. In this video I use MS AdventureWorks 2014 Full sample dataset, which I hosted on an AWS MS SQL Server RDS box. For this demo I visualize the change in ranking of sales people based on the value of orders that fall under them from 2012 to 2013. The SQL is intermediate here, I use subqueries and window functions to get everything together. I also use Excel to generate a little bit of SQL. The Excel file, the final query and the Tableau workbook can be found below:
Even if you aren’t a SQL guru, by following the video and the supplemental files you should be able to reverse engineer how to do this. With a little tweaking to make it relevant to your dataset you too can create an automated Sankey diagram. Keep in mind you will have to change the quintiles that are hardcoded into the color formula to reflect the # of entities you are ranking. Also, the SQL you see is meant to be able to be cut and pasted into the Tableau data connection. There are shorter and more elegant ways to do it using variables and CTE’s (See this post on Data + Science) that you could use to create a table or view to hook Tableau up to. However that was overkill for my use case and for this video.
Continue The Discussion
Thank for watching and please leave a comment below to continue the discussion. If you like this post, join my mailing list by providing your name and email address on the right to be kept up to date with new posts and discussions on the Biznalysis blog!