Business Intelligence, SQL Intermediate, Tableau Intermediate

Automated Sankey Diagram Using Tableau and SQL

The Challenge – Build an Automated Sankey Diagram in Tableau

Hello folks! I recently had an interesting challenge at work. I wanted to visualize customer change from one period to another to see which customers were increasing or decreasing in regards to order activity. I decided that I wanted to rank customers by the # of orders they had placed on the platform from the 30 day period ending 90 days ago (old period) to the 30 day period ending yesterday (new period). These periods had to be dynamic, the dashboard had to refresh automatically using a live connection to our Amazon Redshift data warehouse, and had to effectively show big changes so we could easily see who was ordering  a lot more and who was ordering less relative to all our other customers. I decided the way to go was to build a Sankey Diagram in Tableau.

The Method

A Sankey Diagram (Wikipedia) is a way to visualize process flows. It is not a native chart type in Tableau, but I knew that someone in the wonderful Tableau Community had figured it out. After some searching I came across this fantastic blog post on Jeffrey Schaffer’s(@HighVizAbilityData + Science Blog. He, together with input from Jonathan Drummey (@jonathandrummey), both Tableau Zen Masters, came up with the perfect solution.

This allows you to pick two time periods, and as long as you have the rank for both, you can build a fantastic visualization that shows the change.

Sankey Diagram in Tableau
Sankey Diagram in Tableau showing change in state rank by happiness. From Data + Science blog.

The Solution

My problem was that this method involved using Excel and so the datasource was static and would have to be updated manually whenever a refresh was required. I wanted to do the data prep in SQL and use dynamic date filters to make everything ‘just work’ and be fully automated. I was able to do this and the below video outlines the process. In this video I use MS AdventureWorks 2014 Full sample dataset, which I hosted on an AWS MS SQL Server RDS box. For this demo I visualize the change in ranking of sales people based on the value of orders that fall under them from 2012 to 2013. The SQL is intermediate here, I use subqueries and window functions to get everything together. I also use Excel to generate a little bit of SQL. The Excel file, the final query and the Tableau workbook can be found below:

  1.  SQL Generation (.xlsx)
  2. Sankey Diagram SQL (.txt)
  3. Workbook on Tableau Public (.twbx)

Even if you aren’t a SQL guru, by following the video and the supplemental files you should be able to reverse engineer how to do this. With a little tweaking to make it relevant to your dataset you too can create an automated Sankey diagram. Keep in mind you will have to change the quintiles that are hardcoded into the color formula to reflect the # of entities you are ranking. Also, the SQL you see is meant to be able to be cut and pasted into the Tableau data connection. There are shorter and more elegant ways to do it using variables and CTE’s (See this post on Data + Science) that you could use to create a table or view to hook Tableau up to. However that was overkill for my use case and for this video.

Continue The Discussion

Thank for watching and please leave a comment below to continue the discussion. If you like this post, join my mailing list by providing your name and email address on the right to be kept up to date with new posts and discussions on the Biznalysis blog!

  • Niki Le Prohn

    This was great. I could follow right along, stop it when I needed, and by then end I’d created my own. thanks!

    • Thanks Niki, I am very glad that you got a lot out of this. Your experience is exactly what I was striving for. Please let me know if there are any other techniques or viz’s you have seen out there that you would like this sort of video created for in the future.

  • Stephen Crocker

    Great. How did you build the side bar legend? The different quintiles?

    • biznal5_wp

      I just eyeballed the graph, there is nothing dynamic or special about the sizes of the colors. You could make a dashboard with 5 discrete items on rows with measure names on colors or make 5 different floating objects with the colors specified.

  • Bree Lloyd

    This is very slick… Sankey Charts are awesome to use for mapping product pathway (What type of product did a person buy first, then second… etc) Have you ever thought of trying with three rankings?

    • biznal5_wp

      There is a post on Data + Science where they put a third ranking item in the chart. That can definitely be useful. Thanks for the feedback!

  • Wilson Chua

    I was able to create a slightly easier way to create the 49 rows per value.
    In excel create the data across the rows of -6.0 to 6.0 in .25 increments.

    Then copy it down across the rows of the table containing the rankings.

    Import into Tableau and PIVOT the data!

    • Interesting solution Wilson. However, if I understand your method properly you would have to repeat the copy and pivot every time you want to refresh the view (assuming rankings change). In my method it is all automated so once you put in the up front overhead of writing the query it is hands off. Your method and the original is definitely easier if your dashboard is meant to be static, but if your data changes don’t it on sql is better. Thanks for sharing your solution!