Many of the activities of a data scientist or analyst require visualization, but it can be difficult to assemble a set of tools that cover all of the tasks involved. Initial exploration needs to be in a flexible, open-ended environment where it is simple to try out and test hypotheses. Once key aspects of the data have been identified, the analyst might prepare a specific image or figure to share with colleagues or a wider audience. Or, they might need to set up an interactive way to share a set of data that would be unwieldy as a fixed figure, using interactive controls to let others explore the effects of certain variables. Eventually, for particularly important data or use cases, the analyst might get involved in a long-term project to develop a full-featured web application or dashboard to deploy, allowing decision makers to interact directly with live data streams to make operational decisions.
With Python, initial exploration is typically in a Jupyter notebook, using tools like Matplotlib and Bokeh to develop static or interactive plots. These tools support a simple syntax for making certain kinds of plots, but showing more complex relationships in data can quickly turn into a major software development exercise, making it difficult to achieve understanding during exploration. Simple apps can be built using ipywidgets to control these visualizations, but the resulting combinations end up being tightly coupled to the notebook environment, unable to migrate into a standalone server context with an application that can be shared more widely. Bokeh includes widgets that can work in both notebook and server environments, but these can be difficult to work with for initial exploration. Bokeh and Matplotlib both also have limitations on how much data they can handle, in part because Bokeh requires the data to be put into the web browser’s limited memory space.
To address these issues, we have developed a set of open-source Python packages to streamline the process of working with small and large datasets (from a few points to billions) in a web browser, whether doing exploratory analysis, making simple widget-based tools, or building full-featured dashboards. The libraries in this ecosystem include:
Panel : Assembling objects from many different libraries into a layout or app, whether in a Jupyter notebook or in a standalone serveable dashboardBokeh : Interactive plotting in web browsers, running JavaScript but controlled by PythonhvPlot : Quickly return interactive HoloViews or GeoViews objects from your Pandas, Xarray, or other data structuresHoloViews : Declarative objects for instantly visualizable data, building Bokeh plots from convenient high-level specificationsGeoViews : Visualizable geographic data that that can be mixed and matched with HoloViews objectsDatashader : Rasterizing huge datasets quickly as fixed-size imagesParam : Declaring user-relevant parameters, making it simple to work with widgets inside and outside of a notebook context.
Comments