6  Reprojecting geographic data

6.1 Prerequisites

Let’s import the required packages:

import matplotlib.pyplot as plt
import numpy as np
import shapely.geometry
import geopandas as gpd
import rasterio
import rasterio.warp
from rasterio.plot import show
import pyproj

and load the sample data:

src_srtm = rasterio.open('data/srtm.tif')
src_nlcd = rasterio.open('data/nlcd.tif')
zion = gpd.read_file('data/zion.gpkg')
world = gpd.read_file('data/world.gpkg')

6.2 Introduction

Section 6.3 introduced coordinate reference systems (CRSs), with a focus on the two major types: geographic (‘lon/lat’, with units in degrees longitude and latitude) and projected (typically with units of meters from a datum) coordinate systems. This chapter builds on that knowledge and goes further. It demonstrates how to set and transform geographic data from one CRS to another and, furthermore, highlights specific issues that can arise due to ignoring CRSs that you should be aware of, especially if your data is stored with lon/lat coordinates.

In many projects there is no need to worry about, let alone convert between, different CRSs. It is important to know if your data is in a projected or geographic coordinate system, and the consequences of this for geometry operations. However, if you know the CRS of your data and the consequences for geometry operations (covered in the next section), CRSs should just work behind the scenes: people often suddenly need to learn about CRSs when things go wrong. Having a clearly defined project CRS that all project data is in, plus understanding how and why to use different CRSs, can ensure that things don’t go wrong. Furthermore, learning about coordinate systems will deepen your knowledge of geographic datasets and how to use them effectively.

This chapter teaches the fundamentals of CRSs, demonstrates the consequences of using different CRSs (including what can go wrong), and how to ‘reproject’ datasets from one coordinate system to another. In the next section we introduce CRSs in Python, followed by Section 6.4 which shows how to get and set CRSs associated with spatial objects. Section Section 6.5 demonstrates the importance of knowing what CRS your data is in with reference to a worked example of creating buffers. We tackle questions of when to reproject and which CRS to use in Section Section 6.6 and Section Section 6.7, respectively. We cover reprojecting vector and raster objects in sections Section 6.8 and Section 6.9 and modifying map projections in Section 6.10.

6.3 Coordinate Reference Systems

Most modern geographic tools that require CRS conversions, including Python packages and desktop GIS software such as QGIS, interface with PROJ, an open source C++ library that “transforms coordinates from one coordinate reference system (CRS) to another”. CRSs can be described in many ways, including the following.

  • Simple yet potentially ambiguous statements such as “it’s in lon/lat coordinates”.
  • Formalized yet now outdated ‘proj4 strings’ such as +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs.
  • With an identifying ‘authority:code’ text string such as EPSG:4326.

Each refers to the same thing: the ‘WGS84’ coordinate system that forms the basis of Global Positioning System (GPS) coordinates and many other datasets. But which one is correct?

The short answer is that the third way to identify CRSs is correct: EPSG:4326 is understood by geopandas and rasterio packages covered in this book, plus many other software projects for working with geographic data including QGIS and PROJ. EPSG:4326 is future-proof. Furthermore, although it is machine readable, unlike the proj-string representation “EPSG:4326” is short, easy to remember and highly ‘findable’ online (searching for EPSG:4326 yields a dedicated page on the website epsg.io, for example). The more concise identifier 4326 is also understood by geopandas and rasterio, but we recommend the more explicit AUTHORITY:CODE representation to prevent ambiguity and to provide context.

The longer answer is that none of the three descriptions are sufficient and more detail is needed for unambiguous CRS handling and transformations: due to the complexity of CRSs, it is not possible to capture all relevant information about them in such short text strings. For this reason, the Open Geospatial Consortium (OGC, which also developed the simple features specification that the sf package implements) developed an open standard format for describing CRSs that is called WKT (Well Known Text). This is detailed in a 100+ page document that “defines the structure and content of a text string implementation of the abstract model for coordinate reference systems described in ISO 19111:2019” (Open Geospatial Consortium 2019…to add citation!). The WKT representation of the WGS84 CRS, which has the identifier EPSG:4326 is as follows:

crs = pyproj.CRS.from_string('EPSG:4326') # or '.from_epsg(4326)'
print(crs.to_wkt(pretty=True))
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

The output of the command shows how the CRS identifier (also known as a Spatial Reference Identifier or SRID) works: it is simply a look-up, providing a unique identifier associated with a more complete WKT representation of the CRS. This raises the question: what happens if there is a mismatch between the identifier and the longer WKT representation of a CRS? On this point Open Geospatial Consortium (2019… to add citation!) is clear, the verbose WKT representation takes precedence over the identifier:

Should any attributes or values given in the cited identifier be in conflict with attributes or values given explicitly in the WKT description, the WKT values shall prevail.

The convention of referring to CRSs identifiers in the form AUTHORITY:CODE, which is also used by geographic software written in other languages, allows a wide range of formally defined coordinate systems to be referred to.26 The most commonly used authority in CRS identifiers is EPSG, an acronym for the European Petroleum Survey Group which published a standardized list of CRSs (the EPSG was taken over by the oil and gas body the Geomatics Committee of the International Association of Oil & Gas Producers (…to add citation!) in 2005). Other authorities can be used in CRS identifiers. ESRI:54030, for example, refers to ESRI’s implementation of the Robinson projection, which has the following WKT string:

crs = pyproj.CRS.from_string('ESRI:54030')
print(crs.to_wkt(pretty=True))
PROJCRS["World_Robinson",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["World_Robinson",
        METHOD["Robinson"],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Not known."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["ESRI",54030]]

WKT strings are exhaustive, detailed, and precise, allowing for unambiguous CRSs storage and transformations. They contain all relevant information about any given CRS, including its datum and ellipsoid, prime meridian, projection, and units.

Recent PROJ versions (6+) still allow use of proj-strings to define coordinate operations, but some proj-string keys (+nadgrids, +towgs84, +k, +init=epsg:) are either no longer supported or are discouraged. Additionally, only three datums (i.e., WGS84, NAD83, and NAD27) can be directly set in proj-string. Longer explanations of the evolution of CRS definitions and the PROJ library can be found in Bivand (2021), Chapter 2 of Pebesma and Bivand (2022), and a blog post by Floris Vanderhaeghe (…to add citations!). As outlined in the PROJ documentation there are different versions of the WKT CRS format including WKT1 and two variants of WKT2, the latter of which (WKT2, 2018 specification) corresponds to the ISO 19111:2019 (Open Geospatial Consortium 2019…to add citations!).

6.4 Querying and setting coordinate systems

Let’s look at how CRSs are stored in Python spatial objects and how they can be queried and set. First we will look at getting and setting CRSs in vector geographic data objects. Consider the GeoDataFrame object named world, imported from a file world.gpkg. The object world represents countries worldwide. Its CRS can be retrieved using the .crs property:

world.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

The output specifies the following pieces of information:

  1. The CRS type (Geographic 2D CRS) and EPSG code (EPSG:4326)
  2. The CRS name (WGS 84)
  3. The axes (latitude, longitude) and their units (degree)
  4. The applicable area name (World) and bounding box ((-180.0, -90.0, 180.0, 90.0))
  5. The datum (WGS 84)

The WKT representation, which is internally used when saving the object to a file or doing any coordinate operations, can be extracted using .crs.to_wkt() as shown above (Section 6.3). Above, we can see that the world object has the WGS84 ellipsoid, uses the Greenwich prime meridian, and the latitude and longitude axis order. We also have the suitable suitable area specification for the use of this CRS, and CRS identifier: EPSG:4326.

The CRS specification object, such as world.crs, has several other useful properties and methods to retrieve additional information about the used CRS. For example, try to run:

  • world.crs.is_geographic to check is the CRS is geographic or not
  • world.crs.axis_info[0].unit_name and world.crs.axis_info[1].unit_name to find out the CRS units of both axes (typically having identical units)
  • world.crs.to_authority() extracts the authority (e.g., EPSG) and the identifier (e.g., 4326)
  • world.crs.to_proj4() returns the proj-string representation

In cases when a coordinate reference system (CRS) is missing or the wrong CRS is set, the .set_crs method can be used on a GeoSeries or a GeoDataFrame to set it. The CRS can be specified using an EPSG code as the first argument. In case the object already has a different CRS definition, we must also specify allow_override=True to replace it (otherwise we get an error). For example, here we set the EPSG:4326 CRS, which has no effect because world already has that exact CRS definition:

world2 = world.set_crs(4326)

and here we replace the definition from the existing EPSG:4326, to a new definition EPSG:3857:

world3 = world.set_crs(3857, allow_override=True)

A number is interpreted as an EPSG code. We can also use strings, as in 'EPSG:4326', which is useful to make the code more clear and when using other authorities:

world4 = world.set_crs('ESRI:54009', allow_override=True)

In rasterio, the CRS information is stored as part of a raster file connection metadata (Section 1.3.2). Replacing the CRS definition for a rasterio file connection is typically not necessary, because it is not considered in any operation, only the transformation matrix and coordinates are. One exception is when writing the raster, in which case we need to construct the metadata of the raster file to be written, and therein specify the CRS anyway (Section 1.3.3). However, if we do for some reason need to change the CRS definition in the file connection metadata, we can do that when opening the file in r+ (reading and writing) mode:

src_nlcd2 = rasterio.open('data/nlcd.tif', 'r+')
src_nlcd2.crs
CRS.from_epsg(3857)
src_nlcd2.crs = 3857
src_nlcd2.crs
CRS.from_epsg(3857)

Importantly, the .set_crs (for vector layers) or the assignment to .crs (for rasters), as shown above, do not alter coordinates’ values or geometries. Their role is only to set a metadata information about the object CRS. Consequently, the objects we created, world3, world4, and src_nlcd2 are “incorrect”, in the sense that the geometries are in fact given in a different CRS than specified in the associated CRS definition.

In some cases the CRS of a geographic object is unknown, as is the case in the london dataset created in the code chunk below, building on the example of London introduced in Section 1.2.6:

lnd_point = shapely.geometry.Point(0.1, 51.5)
lnd_geom = gpd.GeoSeries([lnd_point])
lnd_layer = gpd.GeoDataFrame({'geometry': lnd_geom})
lnd_layer
geometry
0 POINT (0.10000 51.50000)
lnd_layer.crs

Nothing is printed as a result of the last expression, because the value of the CRS definition in .crs is None. This implies that geopandas does not know what the CRS is and is unwilling to guess. Unless a CRS is manually specified or is loaded from a source that has CRS metadata, geopandas does not make any explicit assumptions about which coordinate systems, other than to say “I don’t know”. This behavior makes sense given the diversity of available CRSs but differs from some approaches, such as the GeoJSON file format specification, which makes the simplifying assumption that all coordinates have a lon/lat CRS: EPSG:4326.

A CRS can be added to GeoSeries or GeoDataFrame objects using the .set_crs method, as mentioned above. Since the definition is missing, we do not need to specify allow_override=True, as there is nothing to override. For example:

lnd_layer = lnd_layer.set_crs(4326)
lnd_layer.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In general, all geographic coordinates have a coordinate system and software can only make good decisions around plotting and and geometry operations if it knows what type of CRS it is working with. When working with geopandas and rasterio, datasets without a specified CRS are not an issue in most workflows, since only the coordinates are considered. It is up to the user to make sure that, when working with more than one layer, all of the coordinates are given in the same CRS (whether specified or not). When exporting the results, though, it is important to keep the CRS definition in place, because other software typically do use, and require, the CRS definition in calculation. It should also be mentioned that, in some cases the CRS specification is left unspecified on purpose, for example when working with layers in arbitrary or non-geographic space (simulations, internal building plans, analysis of plot-scale ecological patterns, etc.).

6.5 Geometry operations on projected and unprojected data

6.6 When to reproject?

6.7 Which CRS to use?

6.8 Reprojecting vector geometries

Chapter 1 demonstrated how vector geometries are made-up of points, and how points form the basis of more complex objects such as lines and polygons. Reprojecting vectors thus consists of transforming the coordinates of these points, which form the vertices of lines and polygons.

Section 6.5 contains an example in which at least one GeoDataFrame object must be transformed into an equivalent object with a different CRS to calculate the distance between two objects (?).

6.9 Reprojecting raster geometries

The projection concepts described in the previous section apply equally to rasters. However, there are important differences in reprojection of vectors and rasters: transforming a vector object involves changing the coordinates of every vertex but this does not apply to raster data. Rasters are composed of rectangular cells of the same size (expressed by map units, such as degrees or meters), so it is usually impracticable to transform coordinates of pixels separately. Raster reprojection involves creating a new raster object, often with a different number of columns and rows than the original. The attributes must subsequently be re-estimated, allowing the new pixels to be ‘filled’ with appropriate values. In other words, raster reprojection can be thought of as two separate spatial operations: a vector reprojection of the raster extent to another CRS (Section 6.8), and computation of new pixel values through resampling (Section 4.4.4). Thus in most cases when both raster and vector data are used, it is better to avoid reprojecting rasters and reproject vectors instead.

Note

Reprojection of the regular rasters is also known as warping. Additionally, there is a second similar operation called “transformation”. Instead of resampling all of the values, it leaves all values intact but recomputes new coordinates for every raster cell, changing the grid geometry. For example, it could convert the input raster (a regular grid) into a curvilinear grid. The rasterio, like common raster file formats (such as GeoTIFF), does not support curvilinear grids (?).

The raster reprojection process is done using two functions from the rasterio.warp sub-package:

  • rasterio.warp.calculate_default_transform
  • rasterio.warp.reproject

The first function, calculate_default_transform, is used to calculate the new transformation matrix in the destination CRS, according to the source raster dimensions and bounds. Alternatively, the destination transformation matrix can be obtained from an existing raster; this is common practice when we need to align one raster with another, for instance to be able to combine them in raster algebra operations (Section 3.4.3) (see below). The second function rasterio.warp.reproject then actually calculates cell values in the destination grid, using the user-selected resampling method (such as nearest neighbor, or bilinear).

Let’s take a look at two examples of raster transformation: using categorical and continuous data. Land cover data are usually represented by categorical maps. The nlcd.tif file provides information for a small area in Utah, USA obtained from National Land Cover Database 2011 in the NAD83 / UTM zone 12N CRS, as shown in the output of the code chunk below (only first line of output shown). We already created a connection to the nlcd.tif file, named src_nlcd:

src_nlcd
<open DatasetReader name='data/nlcd.tif' mode='r'>

Recall that the raster transformation matrix and dimensions are accessible from the file connection as follows. This information will be required to calculate the destination transformation matrix (hereby printed collectively in a tuple):

src_nlcd.transform, src_nlcd.width, src_nlcd.height
(Affine(31.530298224786595, 0.0, 301903.344386758,
        0.0, -31.52465870178793, 4154086.47216415),
 1073,
 1359)

First, let’s define the destination CRS. In this case, we choose WGS84 (EPSG code 4326):

dst_crs = 'EPSG:4326'

Now, we are ready to claculate the destination raster transformation matrix (dst_transform), and the destination dimensions (dst_width, dst_height), as follows:

dst_transform, dst_width, dst_height = rasterio.warp.calculate_default_transform(
    src_nlcd.crs,
    dst_crs,
    src_nlcd.width,
    src_nlcd.height,
    *src_nlcd.bounds
)
dst_transform, dst_width, dst_height
(Affine(0.00025326212175780043, 0.0, 2.7120438858942424,
        0.0, -0.00025326212175780043, 34.92789350789737),
 1200,
 1248)

Note that *, in *src_nlcd.bounds, is used to unpack src_nlcd.bounds to four separate arguments, which calculate_default_transform requires:

src_nlcd.bounds
BoundingBox(left=301903.344386758, bottom=4111244.46098842, right=335735.354381954, top=4154086.47216415)

Next, we will create the metadata file used for writing the reprojected raster to file. For convenience, we are taking the metadata of the source raster (src_nlcd.meta), making a copy (dst_kwargs), and then updating those specific properties that need to be changed. Note that the reprojection process typically creates “No Data” pixels, even when there were none in the input raster, since the raster orientation changes and the edges need to be “filled” to get back a rectangular extent. We need to specify a “No Data” value of our choice, if there is none, or use the existing source raster setting, such as 255 in this case:

dst_kwargs = src_nlcd.meta.copy()
dst_kwargs.update({
    'crs': dst_crs,
    'transform': dst_transform,
    'width': dst_width,
    'height': dst_height
})
dst_kwargs
{'driver': 'GTiff',
 'dtype': 'uint8',
 'nodata': 255.0,
 'width': 1200,
 'height': 1248,
 'count': 1,
 'crs': 'EPSG:4326',
 'transform': Affine(0.00025326212175780043, 0.0, 2.7120438858942424,
        0.0, -0.00025326212175780043, 34.92789350789737)}

We are ready to create the reprojected raster. Here, reprojection takes place between two file connections, meaning that the raster value arrays are not being read into memory at once. It is also possible to reproject into an in-memory ndarray object, see the documentation.

To write the reprojected raster, we first create a destination file connection dst_nlcd, pointing at the output file path of our choice (output/nlcd_4326.tif), using the updated metadata object created earlier (dst_kwargs):

dst_nlcd = rasterio.open('output/nlcd_4326.tif', 'w', **dst_kwargs)

Then, we use the rasterio.warp.reproject function to calculate and write the reprojection result into the dst_nlcd file connection. Note that the source and destination accept a “band” object, created using rasterio.band. In this case, there is just one band. If there were more bands, we would have to repeat the procedure for each band, using i instead of 1 inside a loop:

rasterio.warp.reproject(
    source=rasterio.band(src_nlcd, 1),
    destination=rasterio.band(dst_nlcd, 1),
    src_transform=src_nlcd.transform,
    src_crs=src_nlcd.crs,
    dst_transform=dst_transform,
    dst_crs=dst_crs,
    resampling=rasterio.warp.Resampling.nearest
)
(Band(ds=<open DatasetWriter name='output/nlcd_4326.tif' mode='w'>, bidx=1, dtype='uint8', shape=(1248, 1200)),
 Affine(0.00025326212175780043, 0.0, 2.7120438858942424,
        0.0, -0.00025326212175780043, 34.92789350789737))

Finally, we close the file connection so that the data are actually written:

dst_nlcd.close()

Many properties of the new object differ from the previous one, including the number of columns and rows (and therefore number of cells), resolution (transformed from meters into degrees), and extent, as summarized again below (note that the number of categories increases from 8 to 9 because of the addition of NA values, not because a new category has been created — the land cover classes are preserved).

src_nlcd.meta
{'driver': 'GTiff',
 'dtype': 'uint8',
 'nodata': 255.0,
 'width': 1073,
 'height': 1359,
 'count': 1,
 'crs': CRS.from_epsg(3857),
 'transform': Affine(31.530298224786595, 0.0, 301903.344386758,
        0.0, -31.52465870178793, 4154086.47216415)}
src_nlcd_4326 = rasterio.open('output/nlcd_4326.tif')
src_nlcd_4326.meta
{'driver': 'GTiff',
 'dtype': 'uint8',
 'nodata': 255.0,
 'width': 1200,
 'height': 1248,
 'count': 1,
 'crs': CRS.from_epsg(4326),
 'transform': Affine(0.00025326212175780043, 0.0, 2.7120438858942424,
        0.0, -0.00025326212175780043, 34.92789350789737)}

Examining the unique raster values tells us that the new raster has the same categories, plus the value 255 representing “No Data”:

np.unique(src_nlcd.read(1))
array([1, 2, 3, 4, 5, 6, 7, 8], dtype=uint8)
np.unique(src_nlcd_4326.read(1))
array([1, 2, 3, 4, 5, 6, 7, 8], dtype=uint8)
fig, axes = plt.subplots(ncols=2, figsize=(9,5))
show(src_nlcd, ax=axes[0], cmap='Set3')
show(src_nlcd_4326, ax=axes[1], cmap='Set3')
axes[0].set_title('Original (EPSG:26912)')
axes[1].set_title('Reprojected (EPSG:4326)');

Figure 6.1: Reprojecting a categorical raster using nearest neighbor resampling

In the above example, we automatically calculated an optimal (i.e., most information preserving) destination grid using rasterio.warp.calculate_default_transform. This is appropriate when there are no specific requirements for the destination raster spatial properties. Namely, we are not required to otain a specific origin and resolution, but just wish to preserve the raster values as much as possible. To do that, calculate_default_transform “tries” to keep the extent and resolution of the destination raster as similar as possible to the source. In other situations, however, we need to reproject a raster into a specific “template”, so that it corresponds, for instance, with other rasters we use in the analysis. In the following code section, we reproject the nlcd.tif raster, again, buit this time using the nlcd_4326.tif reprojection result as the “template” to demonstrate this alternative workflow.

First, we create a connection to our “template” raster to read its metadata:

template = rasterio.open('output/nlcd_4326.tif')
template.meta
{'driver': 'GTiff',
 'dtype': 'uint8',
 'nodata': 255.0,
 'width': 1200,
 'height': 1248,
 'count': 1,
 'crs': CRS.from_epsg(4326),
 'transform': Affine(0.00025326212175780043, 0.0, 2.7120438858942424,
        0.0, -0.00025326212175780043, 34.92789350789737)}

Then, we create a write-mode connection to our destination raster, using this metadata, meaning that as the resampling result is going to have identical metadata as the “template”:

dst_nlcd_2 = rasterio.open('output/nlcd_4326_2.tif', 'w', **template.meta)

Now, we can resample and write the result:

rasterio.warp.reproject(
    source=rasterio.band(src_nlcd, 1),
    destination=rasterio.band(dst_nlcd_2, 1),
    src_transform=src_nlcd.transform,
    src_crs=src_nlcd.crs,
    dst_transform=dst_nlcd_2.transform,
    dst_crs=dst_nlcd_2.crs,
    resampling=rasterio.warp.Resampling.nearest
)
(Band(ds=<open DatasetWriter name='output/nlcd_4326_2.tif' mode='w'>, bidx=1, dtype='uint8', shape=(1248, 1200)),
 Affine(0.00025326212175780043, 0.0, 2.7120438858942424,
        0.0, -0.00025326212175780043, 34.92789350789737))
dst_nlcd_2.close()

Naturally, in this case, the outputs nlcd_4326.tif and nlcd_4326_2.tif are identical, as we used the same “template” and the same source data:

d = rasterio.open('output/nlcd_4326.tif').read(1) == rasterio.open('output/nlcd_4326_2.tif').read(1)
d
array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       ...,
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True]])
np.all(d)
True

The difference is that in the first example we calculate the template automatically, using rasterio.warp.calculate_default_transform, while in the second example we used an existing raster as the “template”.

Importantly, when the template raster has much more “coarse” resolution than the source raster, the:

  • rasterio.warp.Resampling.average (for continuous rasters), or
  • rasterio.warp.Resampling.mode (for categorical rasters)

resampling method should be used, instead of rasterio.warp.Resampling.nearest. Otherwise, much of the data will be lost, as the “nearest” method can capture one pixel value only for each destination raster pixel.

Reprojecting continuous rasters (with numeric or, in this case, integer values) follows an almost identical procedure. This is demonstrated below with srtm.tif from the Shuttle Radar Topography Mission (SRTM), which represents height in meters above sea level (elevation) with the WGS84 CRS.

We will reproject this dataset into a projected CRS, but not with the nearest neighbor method which is appropriate for categorical data. Instead, we will use the bilinear method which computes the output cell value based on the four nearest cells in the original raster. The values in the projected dataset are the distance-weighted average of the values from these four cells: the closer the input cell is to the center of the output cell, the greater its weight. The following code section create a text string representing WGS 84 / UTM zone 12N, and reproject the raster into this CRS, using the bilinear method. The code is practically the same, except for changing the source and destination file names, and replacing nearest with bilinear:

dst_crs = 'EPSG:32612'
dst_transform, dst_width, dst_height = rasterio.warp.calculate_default_transform(
    src_srtm.crs,
    dst_crs,
    src_srtm.width,
    src_srtm.height,
    *src_srtm.bounds
)
dst_kwargs = src_srtm.meta.copy()
dst_kwargs.update({
    'crs': dst_crs,
    'transform': dst_transform,
    'width': dst_width,
    'height': dst_height
})
dst_srtm = rasterio.open('output/srtm_32612.tif', 'w', **dst_kwargs)
rasterio.warp.reproject(
    source=rasterio.band(src_srtm, 1),
    destination=rasterio.band(dst_srtm, 1),
    src_transform=src_srtm.transform,
    src_crs=src_srtm.crs,
    dst_transform=dst_transform,
    dst_crs=dst_crs,
    resampling=rasterio.warp.Resampling.bilinear
)
dst_srtm.close()

Figure 6.2 shows the input and the reprojected SRTM rasters.

fig, axes = plt.subplots(ncols=2, figsize=(9,5))
show(src_srtm, ax=axes[0])
show(rasterio.open('output/srtm_32612.tif'), ax=axes[1])
axes[0].set_title('Original (EPSG:4326)')
axes[1].set_title('Reprojected (EPSG:32612)');

Figure 6.2: Reprojecting a continuous raster using bilinear resampling

6.10 Custom map projections

6.11 Exercises