Mary Beth Kery, PhD student, and Brad Myers, professor, won best paper at the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) in Lisbon, Portugal, October 1-4, 2018. The paper, “Interactions for Untangling Messy History in a Computational Notebook,” explores their research on a new code versioning prototype to help data analysts quickly explore, understand and recreate a visualization or process.
Unlike regular code development, data science programming requires frequent versioning of more than just the code itself. Data scientists need the code as well as a variety of supporting context, including the conditions used to run the code, relationships among the data, visualizations and notes. Despite the demonstrated need for interaction techniques to quickly explore multiple versions of the code and its outputs, the current process is manual, difficult, and still very limited.
According to an excerpt from the paper, “In this work, we explore the design space of new interactions for providing easy-to-use history support for data scientists in their day-to-day tasks. Untangling messy history logs to deliver them in a useful form requires both advances in how edit history is modeled, and active testing of potential user interactions on actual log data from realistic data science tasks.”
Kery and Myers developed a prototype tool called Verdant (from the meaning “an abundance of growing plants”) as an extension for Jupyter notebooks. A Jupyter notebook is an open source web application that allows data scientists to create live code, equations, visualizations and explanatory notes, and then share it with others for easy collaboration. Jupyter supports over 40 different programming languages, including Python, R, Julia, and Scala, and can be shared via email, GitHub or Dropbox.
Verdant aims to store a relational history for all artifacts, allow data scientists to retrieve history specifically relevant to a given task, and clearly communicate how versions of different artifacts have combined together during experimentation. This should help data analysts be able to explore more carefully, and be able to go back later and understand and recreate what they did.
“Verdant is part of our long-term research effort to support exploratory programming, where what the code will do emerges and evolves during the programming, rather than being known or specified in advance,” said Myers.
A stable beta release of Verdant will be ready for all to use December 2018, and is an open-source project freely available at https://github.com/mkery/Verdant/.