We introduce geoplotlib, an open-source python toolbox for visualizing geographical data. geoplotlib supports the development of hardware-accelerated interactive visualizations in pure python, and provides implementations of dot maps, kernel density estimation, spatial graphs, Voronoi tesselation, shapefiles and many more common spatial visualizations. We describe geoplotlib design, functionalities and use cases.
Geographical data visualization is a fundamental tool for communicating results related to geospatial analyses, and for generating hypotheses during exploratory data analysis[1, 2]. The constantly increasing availability of geolocated data from social media, mobile devices and spatial databases implies that we need new tools for exploring, mining and visualizing large-scale spatial datasets.
The python programming language  has been gaining attention as a data analysis tool in the scientific community [4, 5] thanks to the clarity and simplicity of its syntax, and due to an abundance of third-parties libraries e.g. within many disciplines including scientific computing [6, 7]8], bayesian modeling , neuroscience , and bioinformatics . Currently, however, there is limited support for geographical visualization.
Here, we introduce geoplotlib, a python toolbox for visualizing geographical data. geoplotlib provides a simple yet powerful API to generate geographical visualizations on OpenStreetMap  tiles. We release geoplotlib as open-source software , accompanied by a rich set of examples and documentation.
In the remainder of this paper, we discuss existing tools for geographical visualization and document the geoplotlib functionalities in detail, and finally we evaluate the computational performance on a large-scale dataset.
In this section we compare existing tools for visualizing geographical data using python. We divide the related work into three categories: pure python packages, HTML-based packages and Geographical Information System plug-ins.
The matplotlib  library has become the de-facto standard for data visualization in python and provides a large array of visualization tools including scatter and line plots, surface views, 3D plots, barcharts, and boxplots, but it does not provide any support for visualization on a geographical map by default.
The Basemap  and Cartopy  packages support multiple geographical projections, and provide several visualizations including point plots, heatmaps, contour plots, and shapefiles. PySAL  is an open-source library of spatial analysis functions written in Python and provides a number of basic plotting tools, mainly for shapefiles. These libraries however do not allow a user to draw on map tiles, and have limited support for custom visualizations, interactivity, and animation.
There is a very rich ecosystem for data visualization for the web. A number of frameworks allow users to generate plots and charts: we cite as representative Protoviz , d3 , Google Charts , sigmajs . There is also a large number of libraries for displaying online tile maps, including Google Maps , Bing Maps , Leaflet , OpenLayers , ModestMaps , PolyMaps .
Geographical Information System plugins
Geographic Information Systems (GIS) such as QGIS , GrassGIS , ARCGIS , MapInfo  provide very powerful tools for spatial data analysis and visualization. GIS tools usually provide some support for python scripting, although the availability varies from one to another. The main limitation of GIS products is their complexity, requiring a significant amount of training to be used effectively, and as discussed before, the need to export the data from python.
An overview of the geoplotlib architecture is given in Fig. 1. geoplotlib builds on top of numpy  and scipy  for numerical computations, and OpenGL/pyglet  for graphical rendering. geoplotlib implements the map rendering, the geographical projection, the user interface interaction and a number of common geographical visualizations.
geoplotlib is designed according to three key principles:
simplicity: geoplotlib tries to minimize the complexity of designing visualizations by providing a set of built-in tools for the most common tasks such as density visualization, spatial graphs, and shapefiles. The geoplotlib API is inspired by the matplotlib  programming model and syntax, the de-facto standard for data visualization in python; this makes it easier for matplotlib users to get started.
integration: geoplotlib visualizations are standard python scripts, and may contain any arbitrary python code and use any other package. There is no need to export to other formats (e.g. shapefiles, HTML) or use external programs. This supports a complete integration with the rich python data analysis ecosystem such as scientific computing, machine learning and numerical analysis packages. The visualization can even run within an IPython  session, supporting interactive data analysis and facilitating the iterative design for visualizations.
performance: under the hood, geoplotlib uses numpy/scipy for fast numerical computations, and pyglet/OpenGL for hardware-accelerated graphical rendering. This allows the visualizations to scale to millions of datapoints in realtime.
A first script
A simple geoplotlib script looks like this:
This script launches the geoplotlib window and shows a dot map of the data points, in this example the location of bus stops in Denmark (Fig. 2). geoplotlib automatically determines the map bounding box, downloads the map tiles, perform the geographical projection, draws the base map and the visualization layers (the dots in this example). The map is interactive and allows a user to zoom and pan with mouse and keyboard.
As discussed above, the usage of the geoplotlib API is very similar to matplotlib.
The visualization canvas is initially empty, and each command adds a new layer of graphics.
The geoplotlib window is displayed when
show() is called.
Alternatively, the map can be rendered to image file using
savefig(’filename’), or displayed inline in an IPython notebook using
The geoplotlib package provides several common geographical visualizations in form of layers. The API provides convenient methods for quickly adding a new visualization layer. In this section we provide a summary of the built-in visualizations. The data for all examples is available on the project website .
An elementary operation in geographical visualization is to display “what is where”, that is to place a graphic element on the map for each of the objects in consideration.
This provides an immediate idea of the absolute and relative locations of objects.
Moreover, the density of points directly maps to the density of objects on geographical surface, identifying zones of higher and lower density.
An example of dot map is shown in Fig. 2.
The dot map shows the spatial distribution of bus stops in Denmark at a glance.
The zones of higher density – corresponding to the Copenhagen metropolitan area and to the other major cities are immediately recognizable.
dot method allows users to configure points size, color and transparency, and optionally to attach a dynamic tooltip to each point.
One limitation of dot maps is that it is hard to distinguish between areas of high density, as the number of point is so high that they uniformly cover the visualization canvas.
A more direct visualization of density is to compute a 2D histogram of point coordinates.
A uniformly spaced grid is placed on the map, and the number of samples within each cell is counted.
This value is an approximation of the density, and can be visualized using a color scale.
In geoplotlib we can generate the 2D histogram of the data using
binsize refers to the size in pixels of the histogram bins.
The example above loads some data related to cell tower positions in Denmark, and then generates a histogram with a specific colorscale and bin size (Fig. 3). Compared to the dot map example, the histogram provides a clearer depiction of the density distribution.
The main deficiency of histogram visualizations is that they are discrete approximations of a (effectively continuous) density function. This creates a dependence on the bin size and offset, rendering histograms sensitive to noise and outliers. To generate a smoother approximation, a kernel density estimator approximates the true density function applying kernel functions in a window around each point. The size of this window depends on the bandwidth parameter: a smaller bandwidth will produce more detailed but also noisier estimation, while a larger bandwidth will produce a less detailed but smoother estimation. A kernel estimation function can then be visualized by a surface where the color encodes the density value (this visualization is often called a “heatmap”). In geoplotlib, the
kdemethod generates a kernel density estimation visualization:
Fig. 4 shows the kernel density estimation applied to the cell tower data. Comparing the histogram from Fig. 3 with the kernel density estimation in Fig. 4, it is evident how the latter produces a smoother and consequently clearer visualization of density. The kernel bandwidth (in screen coordinates) can be configured to regulate the smoothness. The density upper bound can be set to clip density values over a threshold. Also the density lower bound can be set, to avoid rendering areas of very low density:
In some cases it is useful to represent objects on the map using custom symbols with specific meaning.
markers method allows a user to place customs markers on the map:
Fig. 5 shows an example of custom markers for metro and train stops in Copenhagen. Markers graphics can be any common raster format (png, jpeg, tiff), and can be rescaled to a custom size. Optionally a dynamic tooltip can be attached to each marker.
Spatial graphs are a special type of graphs where nodes have a well-defined spatial configuration.
Examples includes transport networks (bus routes, train tracks, flight paths), supply chain networks, phone call networks and commute networks.
graph renders a spatial graph:
Fig. 6 shows the resulting spatial graph of airport locations, where each node represents an airport and each edge represents a flight connection. Edges are colored using a colormap encoding the edge length.
A Voronoi tessellation  is a partition of space into regions induced by some seed points, so that each region (called a Voronoi cell) consists of all points closer to a specific seed than to any others. The analysis of Voronoi tessellation is used in numerous fields including ecology, hydrology, epidemiology, mining and mobility studies.
voronoi can be used to generate a Voronoi tessellation visualization.
Voronoi cell fill, shading and colors can be configured.
Fig. 7 provides an example of Voronoi tessellation of bus stops in Denmark. Voronoi cells provide a measure of the space closer to one stop than any others. The density of points is also captured by the size of Voronoi cells, as smaller cells indicate more densely covered areas.
A Delaunay triangulation  is a convenient method for generating triangles meshes from a set of points.
In geoplotlib the
delaunay method can be used for this purpose.
The edge color can be configured to a fixed value, or to encode the length of the edges.
Fig. 8 shows the Delaunay triangulation of bus stops, with edges colored according to length.
A convex hull  of a set of finite points is the smallest convex polygon that contains all the points. Convex hulls can be used for example to visualize the approximate area corresponding to a set of points. In geoplotlib:
Fig. 9 shows the bus stops points split into 6 groups, and each group is represented by a differently colored convex hull.
is a popular file format for describing vector graphics for geographical information systems. geoplotlib uses pyshp to parse the shapefiles. The line color can be configured and an optional tooltip can be attached to each shape. In the following example we display the kommuner administrative regions in Denmark (Fig. 10):
GeoJSON  is a human-readable format for encoding geographical data, such as polygons and lines. geoplotlib can render shapes from the GeoJSON format, and shape color and tooltip can be dynamically altered to encode data. For instance GeoJSON shapes can be used to generate a choropleth where each geographic unit is colored to encode a continuous variable. In the following example (Fig. 11) we generate a choropleth of unemployment in USA :
DataAccessObject class is the fundamental interface between the raw data and all the geoplotlib visualizations. A
DataAccessObject is conceptually similar to a table with one column for each field and one row for each sample. This paradigm is very common in data analysis terminology, and is equivalent to ndarrays in numpy, and dataframes in pandas and R. A
DataAccessObject can be initialized by reading a comma-separated values (CSV) file with the built-in
read_csv method, or can be constructed from a python dict, or from a pandas  dataframe:
The only two fields required are
lon, which represent to the geographic coordinates. Most of the built-in visualization implicitly refer to these two fields to locate entities in space.
DataAccessObject also provides a few method for basic data wrangling, such as filtering, grouping, renaming and deleting rows and columns.
Any OpenStreetMap tile server can be configured using the
tile_provider method (users are kindly asked to check the tile usage policy for the selected server, and make sure to provide attribution as needed).
A number of common free tiles providers are supported, including Stamen Watercolor and Toner , CartoDB Positron and DarkMatter .
Defining custom layers
The built-in visualizations provide various commonly used tools for geographical data visualization.
Multiple layers can be combined into a single visualization for richer display.
For even more complex visualizations, geoplotlib allows users to define custom layers.
In order to generate a new visualization, a new class extending
BaseLayer must be defined.
The custom layer must at least define an
invalidate and a
invalidate method is called each time the map projection must be recalculated, which typically happens each time that the map zoom-level changes.
invalidate method receives a
Projection object, which provides methods for transforming the data points from the geographic coordinates to screen coordinates.
The screen coordinates can then be passed to a
BatchPainter object for the rendering.
BatchPainter can efficiently draw OpenGL primitives such as points, lines and polygons.
draw method is called at each frame, and typically calls the
batch_draw method of the painter prepared during
The following is a complete example of a custom layer, which simply draws samples as points:
The final step needed is to add the layer to the visualization using
add_layer, then call
A custom layer can be also used for creating animated visualizations. Each time the draw method is called, the custom layer can update its state to the next frame. As an example, let us imagine having data containing the position of an object over time. A simple animation can use a frame counter, and at each frame render only the datapoint at the current instant:
Notice that in this case we do not initialize the
invalidate, but we create a new one at each frame.
We also keep track of the current frame with the
Even this very simple code is able to visualize a non-trivial animation of an object moving over time.
To produce a movie from the animation, individual frames can be captured using the
screenshot method, and then combined together.
Colors can be used as additional mapping for encoding information into a visualization.
Continuous variables (for example points density or the edges distances) can be mapped to a continuous color scale.
ColorMap class allows a user to perform this conversion.
ColorMap object is constructed by passing any of the matplotlib colorscales, and optionally an alpha value and a number of discretization levels.
to_color method performs the conversion from real value to color:
Discrete variables such as categories can be represented using categorical colormaps.
colorbrewer method provides access to the ColorBrewer  colors.
Categorical colormaps can be also generated from regular colormaps using using
Controlling the map view
The map view is determined by the projection parameters: the latitude offset, the longitude offset and the zoom level.
By default, the projection is chosen so to fit all selected points, with the maximum zoom level possible.
The view can changed to a specific portion of the map by passing a
BoundingBox object to the
BoundingBox object defines the map view boundaries, and can be constructed in multiple ways.
The most direct way is to specify two ranges of latitudes and longitudes.
BoundingBox can be constructed to fit a subset of points using the
Finally, geoplotlib allows users to create interactive visualizations by provides support for rendering a user interface, and dynamically changing the visualization on user input:
on-screen text such as information or status can be added using the
mouseover tooltips can be configured on arbitrary graphical elements or screen regions using the
layers can be configured to react to specific key presses by defining a
We test the performance of geoplotlib by generating some of the described visualization on a dataset consisting of one million samples, using the default visualization parameters. All tests consider only the time needed for the actual rendering of the visualization, excluding the time for loading the data. The measurements are repeated 10 times for each visualization type. The experiments were performed on a MacBook Pro 2012 with an Intel 2.3 GHz i7 CPU, 8 GB RAM and nVidia GeForce GT 650M GPU. Table 1 shows that in all cases the visualizations require only a few seconds, thus demonstrating that geoplotlib is suitable even for large-scale datasets.
|visualization type||mean time [s]||SD [s]|
We have presented geoplotlib, a python toolbox for generating geographical visualizations. We demonstrated how geoplotlib provides a simple yet powerful API to visualize geographical data, greatly facilitating exploratory data analysis of geographical information. We believe that geoplotlib can become a powerful tool in the data analyst toolbox, both for analyzing complex spatial patterns and for communicating results in forms of geographical visualizations. Future work includes the addition of more visualization tools, and the integration of spatial analysis methods.
This work is funded in part by the High Resolution Networks project (The Villum Foundation), as well as Social Fabric (University of Copenhagen).
-  Tukey JW. Exploratory Data Analysis. Addison-Wesley; 1977.
-  Andrienko N, Andrienko G. Exploratory Analysis of Spatial and Temporal Data: A Systematic Approach. Springer Science & Business Media; 2006.
-  Van Rossum G, Drake Jr FL. Python Reference Manual. Centrum voor Wiskunde en Informatica Amsterdam; 1995.
-  Millman KJ, Aivazis M. Python for Scientists and Engineers. Computing in Science & Engineering. 2011;13(2):9–12. doi:http://dx.doi.org/10.1109/MCSE.2011.36.
-  Oliphant TE. Python For Scientific Computing. Computing in Science and Engineering. 2007;9(3):10–20.
-  Van Der Walt S, Colbert SC, Varoquaux G. The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering. 2011;13(2):22–30.
-  Jones E, Oliphant T, Peterson P, et al.. SciPy: Open Source Scientific Tools for Python; 2001–. Available from: http://www.scipy.org/.
-  Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. The Journal of Machine Learning Research. 2011;12:2825–2830.
-  Patil A, Huard D, Fonnesbeck CJ. PyMC: Bayesian Stochastic Modelling in Python. Journal of statistical software. 2010;35(4):1.
-  Peirce JW. PsychoPy — Psychophysics Software in Python. Journal of neuroscience methods. 2007;162(1):8–13.
-  Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. BioPython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics. Bioinformatics. 2009;25(11):1422–1423.
-  Haklay M, Weber P. OpenStreetMap: User-Generated Street Maps. Pervasive Computing, IEEE. 2008;7(4):12–18.
-  Cuttone A. geoplotlib; 2015-2016. Available from: https://github.com/andrea-cuttone/geoplotlib.
-  Hunter JD. Matplotlib: A 2D Graphics Environment. Computing in Science and Engineering. 2007;9(3):90–95.
-  Whitaker J. Basemap;. Available from: http://matplotlib.org/basemap/.
-  Met Office. Cartopy: A Cartographic Python Library with a Matplotlib Interface; 2010 - 2015. Available from: http://scitools.org.uk/cartopy.
-  Rey SJ, Anselin L. In: Fischer MM, Getis A, editors. PySAL a Python Library for Spatial Analytical Methods. Berlin: Springer-Verlag; 2010. p. 175–193.
-  Bostock M, Heer J. Protovis: A Graphical Toolkit for Visualization. Visualization and Computer Graphics, IEEE Transactions on. 2009;15(6):1121–1128.
-  Bostock M, Ogievetsky V, Heer J. D Data-Driven Documents. Visualization and Computer Graphics, IEEE Transactions on. 2011;17(12):2301–2309.
-  Google Charts;. Available from: https://developers.google.com/chart/.
-  Alexis Jacomy GP. sigmajs;. Available from: http://sigmajs.org/.
-  Google Maps;. Available from: https://developers.google.com/maps/.
-  Bing Maps;. Available from: https://www.bingmapsportal.com/.
-  leafletjs;. Available from: http://leafletjs.com/.
-  openlayers;. Available from: http://openlayers.org/.
-  modestmaps;. Available from: http://modestmaps.com/.
-  polymaps;. Available from: http://polymaps.org/.
-  folium;. Available from: https://github.com/wrobstory/folium.
-  vincent;. Available from: https://github.com/wrobstory/vincent.
-  mplleaflet;. Available from: https://github.com/jwass/mplleaflet.
-  QGis. Quantum GIS Geographic Information System. Open Source Geospatial Foundation Project. 2011;.
-  GrassGIS;. Available from: http://grass.osgeo.org/.
-  ArcGIS;. Available from: http://www.arcgis.com/.
-  MapInfo;. Available from: http://www.mapinfo.com/.
-  pyglet;. Available from: https://bitbucket.org/pyglet/pyglet/wiki/Home.
-  Pérez F, Granger BE. IPython: A System for Interactive Scientific Computing. Computing in Science & Engineering. 2007;9(3):21–29.
On Estimation Of a Probability Density Function and Mode.The annals of mathematical statistics. 1962; p. 1065–1076.
-  Aurenhammer F. Voronoi Diagrams — A Survey of a Fundamental Geometric Data Structure. ACM Computing Surveys (CSUR). 1991;23(3):345–405.
-  De Berg M, Van Kreveld M, Overmars M, Schwarzkopf OC. Computational Geometry. Springer; 2000.
-  ESRI. ESRI Shapefile Technical Description; 1998.
-  pyshp;. Available from: https://github.com/GeospatialPython/pyshp.
-  Butler H. geoJSON;. Available from: http://geojson.org/geojson-spec.html.
-  Bostock M. Unemployment in USA;. Available from: http://bl.ocks.org/mbostock/4060606.
-  McKinney W. Data Structures for Statistical Computing in Python. In: van der Walt S, Millman J, editors. Proceedings of the 9th Python in Science Conference; 2010. p. 51 – 56.
-  Stamen;. Available from: http://maps.stamen.com/.
-  CartoDB;. Available from: http://cartodb.com/basemaps/.
-  Harrower M, Brewer CA. ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps. The Cartographic Journal. 2003;40(1):27–37.