import numpy as np
import shapely
import matplotlib.pyplot as plt
import geopandas as gpd
import rasterio
import rasterio.mask
import rasterstats
import rasterio.plot
import rasterio.features
import math
import os
5 Raster-vector interactions
5.1 Prerequisites
Let’s import the required packages:
and load the sample data:
= rasterio.open('data/srtm.tif')
src_srtm = rasterio.open('data/nlcd.tif')
src_nlcd = rasterio.open('data/grain.tif')
src_grain = rasterio.open('data/elev.tif')
src_elev = rasterio.open('data/dem.tif')
src_dem = gpd.read_file('data/zion.gpkg')
zion = gpd.read_file('data/zion_points.gpkg')
zion_points = gpd.read_file('data/cycle_hire_osm.gpkg')
cycle_hire_osm = gpd.read_file('data/us_states.gpkg')
us_states = gpd.read_file('data/nz.gpkg')
nz = rasterio.open('data/nz_elev.tif') src_nz_elev
5.2 Introduction
This Chapter focuses on interactions between raster and vector geographic data models, both introduced in Chapter 1. It includes four main techniques:
- raster cropping and masking using vector objects (Section 5.3),
- extracting raster values using different types of vector data (Section Section 5.4), and
- raster-vector conversion (Section 5.5 and Section 5.6).
These concepts are demonstrated using data from in previous chapters, to understand their potential real-world applications.
5.3 Raster masking and cropping
Many geographic data projects involve integrating data from many different sources, such as remote sensing images (rasters) and administrative boundaries (vectors). Often the extent of input raster datasets is larger than the area of interest. In this case raster masking, cropping, or both, are useful for unifying the spatial extent of input data (Figure 5.2 (b) and (c), and the following two examples, illustrate the difference between masking and cropping). Both operations reduce object memory use and associated computational resources for subsequent analysis steps, and may be a necessary preprocessing step before creating attractive maps involving raster data.
We will use two layers to illustrate raster cropping:
- The
srtm.tif
raster representing elevation (meters above sea level) in south-western Utah (a rasterio file connection namedsrc_srtm
) (see Figure 5.2 (a)) - The
zion.gpkg
vector layer representing the Zion National Park boundaries (aGeoDataFrame
namedzion
)
Both target and cropping objects must have the same projection. Since it is easier and more precise to reproject vector layers, compared to rasters, we use the following expression to reproject the vector layer zion
into the CRS of the raster src_srtm
:
= zion.to_crs(src_srtm.crs) zion
To mask the image, i.e., convert all pixels which do not intersect with the zion
polygon to “No Data”, we use the rasterio.mask.mask
function, as follows:
= rasterio.mask.mask(
out_image_mask, out_transform_mask
src_srtm,
zion.geometry, =False,
crop=9999
nodata )
Note that we need to choose and specify a “No Data” value, within the valid range according to the data type. Since srtm.tif
is of type uint16
(how can we check?), we choose 9999
(a positive integer that is guaranteed not to occur in the raster). Also note that rasterio does not directly support geopandas data structures, so we need to pass a “collection” of shapely geometries: a GeoSeries
(see below) or a list
of shapely geometries (see next example) both work.
The result is the out_image
array with the masked values:
out_image_mask
array([[[9999, 9999, 9999, ..., 9999, 9999, 9999],
[9999, 9999, 9999, ..., 9999, 9999, 9999],
[9999, 9999, 9999, ..., 9999, 9999, 9999],
...,
[9999, 9999, 9999, ..., 9999, 9999, 9999],
[9999, 9999, 9999, ..., 9999, 9999, 9999],
[9999, 9999, 9999, ..., 9999, 9999, 9999]]], dtype=uint16)
and the new transformation matrix out_transform
:
out_transform_mask
Affine(0.0008333333332777796, 0.0, -113.23958321278403,
0.0, -0.0008333333332777843, 37.512916763165805)
Note that masking (without cropping!) does not modify the raster spatial configuration. Therefore, the new transform is identical to the original:
src_srtm.transform
Affine(0.0008333333332777796, 0.0, -113.23958321278403,
0.0, -0.0008333333332777843, 37.512916763165805)
Unfortunately, the out_image
and out_transform
object do not contain any information indicating that 9999
represents “No Data”. To associate the information with the raster, we must write it to file along with the corresponding metadata. For example, to write the cropped raster to file, we need to modify the “No Data” setting in the metadata:
= src_srtm.meta
dst_kwargs =9999)
dst_kwargs.update(nodata dst_kwargs
{'driver': 'GTiff',
'dtype': 'uint16',
'nodata': 9999,
'width': 465,
'height': 457,
'count': 1,
'crs': CRS.from_epsg(4326),
'transform': Affine(0.0008333333332777796, 0.0, -113.23958321278403,
0.0, -0.0008333333332777843, 37.512916763165805)}
Then we can write the cropped raster to file:
= rasterio.open('output/srtm_masked.tif', 'w', **dst_kwargs)
new_dataset
new_dataset.write(out_image_mask) new_dataset.close()
Now we can re-import the raster:
= rasterio.open('output/srtm_masked.tif') src_srtm_mask
The .meta
property contains the nodata
entry. Now, any relevant operation (such as plotting) will take “No Data” into account:
src_srtm_mask.meta
{'driver': 'GTiff',
'dtype': 'uint16',
'nodata': 9999.0,
'width': 465,
'height': 457,
'count': 1,
'crs': CRS.from_epsg(4326),
'transform': Affine(0.0008333333332777796, 0.0, -113.23958321278403,
0.0, -0.0008333333332777843, 37.512916763165805)}
Cropping means reducing the raster extent to the extent of the vector layer:
- To just crop, without masking, we can derive the bounding box polygon of the vector layer, and then crop using that polygon (see further below), also combined with
crop=True
- To crop and mask, we can use
rasterio.mask.mask
same shown as above for masking, just settingcrop=True
instead of the defaultcrop=False
(see further below)
For the example of cropping only, here is how we can obtain the extent polygon of zion
, as a shapely
geometry object (Figure 5.1):
= zion.unary_union.envelope
bb bb
'Polygon'
geometry of the zion
layerThe extent can now be used for masking. Here, we are also using the all_touched=True
option so that pixels partially overlapping with the extent are included:
= rasterio.mask.mask(
out_image_crop, out_transform_crop
src_srtm,
[bb], =True,
crop=True,
all_touched=9999
nodata )
Finally, here is how we crop and mask, using rasterio.mask.mask
with crop=True
. When writing to file, it is now crucual to update the transform and dimensions, since they were modified as a result of cropping. Also note that out_image_mask_crop
is a three-dimensional array (even tough it has one band in this case), so the number of rows and columns are in .shape[1]
and .shape[2]
(rather than .shape[0]
and .shape[1]
), respectively:
= rasterio.mask.mask(
out_image_mask_crop, out_transform_mask_crop
src_srtm,
zion.geometry, =True,
crop=9999
nodata
)= src_srtm.meta
dst_kwargs
dst_kwargs.update({'nodata': 9999,
'transform': out_transform_mask_crop,
'width': out_image_mask_crop.shape[2],
'height': out_image_mask_crop.shape[1]
})= rasterio.open('output/srtm_masked_cropped.tif', 'w', **dst_kwargs)
new_dataset
new_dataset.write(out_image_mask_crop)
new_dataset.close()= rasterio.open('output/srtm_masked_cropped.tif') src_srtm_mask_crop
out_image_mask_crop.shape
(1, 436, 439)
Figure 5.2 shows the original raster, and the cropped and masked results:
# Original
= plt.subplots(figsize=(3.5, 3.5))
fig, ax =ax)
rasterio.plot.show(src_srtm, ax=ax, color='none', edgecolor='black');
zion.plot(ax
# Masked
= plt.subplots(figsize=(3.5, 3.5))
fig, ax =ax)
rasterio.plot.show(src_srtm_mask, ax=ax, color='none', edgecolor='black');
zion.plot(ax
# Cropped
= plt.subplots(figsize=(3.5, 3.5))
fig, ax =out_transform_crop, ax=ax)
rasterio.plot.show(out_image_crop, transform=ax, color='none', edgecolor='black');
zion.plot(ax
# Masked+Cropped
= plt.subplots(figsize=(3.5, 3.5))
fig, ax =ax)
rasterio.plot.show(src_srtm_mask_crop, ax=ax, color='none', edgecolor='black'); zion.plot(ax
5.4 Raster extraction
Raster extraction is the process of identifying and returning the values associated with a ‘target’ raster at specific locations, based on a (typically vector) geographic ‘selector’ object. The reverse of raster extraction—assigning raster cell values based on vector objects—is rasterization, described in Section 5.5.
In the following examples, we use a third-party package called rasterstats
, which is specifically aimed at extracting raster values:
- to points (Section 5.4.1) or to lines (Section 5.4.2), via the
rasterstats.point_query
function, or - to polygons (?sec-extraction-to-polygons), via the
rasterstats.zonal_stats
function.
5.4.1 Extraction to points
The simplest type of raster extraction is extracting the values of raster cells at specific points. To demonstrate extraction to points, we will use zion_points
, which contains a sample of 30 locations within the Zion National Park (Figure 5.3):
= plt.subplots()
fig, ax =ax)
rasterio.plot.show(src_srtm, ax=ax, color='black'); zion_points.plot(ax
The following expression extracts elevation values from srtm.tif
according to zion_points
, using rasterstats.point_query
:
= rasterstats.point_query(
result
zion_points, 1),
src_srtm.read(= src_srtm.nodata,
nodata = src_srtm.transform,
affine ='nearest'
interpolate )
The resulting object is a list
of raster values, corresponding to zion_points
. For example, here are the elevations of the first five points:
5] result[:
[1802, 2433, 1886, 1370, 1452]
To get a GeoDataFrame
with the original points geometries (and other attributes, if any), as well as the extracted raster values, we can assign the extraction result into a new column:
'elev'] = result
zion_points[ zion_points
geometry | elev | |
---|---|---|
0 | POINT (-112.91587 37.20013) | 1802 |
1 | POINT (-113.09369 37.39263) | 2433 |
2 | POINT (-113.02462 37.33466) | 1886 |
... | ... | ... |
27 | POINT (-113.03655 37.23446) | 1372 |
28 | POINT (-113.13933 37.39004) | 1905 |
29 | POINT (-113.09677 37.24237) | 1574 |
30 rows × 2 columns
5.4.2 Extraction to lines
Raster extraction is also applicable with line selectors. The typical line extraction algorithm is to extract one value for each raster cell touched by a line. However, this particular approach is not recommended to obtain values along the transects, as it is hard to get the correct distance between each pair of extracted raster values.
For line extraction, a better approach is to split the line into many points (at equal distances along the line) and then extract the values for these points using the “extraction to points” technique (Section 5.4.1). To demonstrate this, the code below creates (see Section 1.2 for recap) zion_transect
, a straight line going from northwest to southeast of the Zion National Park (Figure 5.4):
= [[-113.2, 37.45], [-112.9, 37.2]]
coords = shapely.LineString(coords)
zion_transect zion_transect
Here is a printout demonstrating that this is a 'LineString'
geometry representing a straight line between two points:
print(zion_transect)
LINESTRING (-113.2 37.45, -112.9 37.2)
The line is also illustrated in the context of the raster in Figure 5.5.
The utility of extracting heights from a linear selector is illustrated by imagining that you are planning a hike. The method demonstrated below provides an ‘elevation profile’ of the route (the line does not need to be straight), useful for estimating how long it will take due to long climbs.
First, we need to create a layer zion_transect_pnt
consisting of points along our line (zion_transect
), at specified intervals (e.g., 250
). To do that, we need to transform the line into a projected CRS (so that we work with true distances, in \(m\)), such as UTM. This requires going through a GeoSeries
, as shapely geometries have no CRS definition nor concept of reprojection (see Section 1.2.6):
= gpd.GeoSeries(zion_transect, crs=4326).to_crs(32612)
zion_transect_utm = zion_transect_utm.iloc[0] zion_transect_utm
The printout of the new geometry shows this is still a straight line between two points, only with coordinates in a projected CRS:
print(zion_transect_utm)
LINESTRING (305399.67208180577 4147066.650206682, 331380.8917453843 4118750.0947884847)
Next, we need to calculate the distances, along the line, where points are going to be generated, using np.arange
. This is a numeric sequence starting at 0
, going up to line .length
, in steps of 250
(\(m\)):
= np.arange(0, zion_transect_utm.length, 250)
distances 7] ## First 7 distance cutoff points distances[:
array([ 0., 250., 500., 750., 1000., 1250., 1500.])
The distances cutoffs are used to sample (“interpolate”) points along the line. The shapely .interpolate
method is used to generate the points. The points are then reprojected back to the geographic CRS of the raster (EPSG:4326
):
= [zion_transect_utm.interpolate(distance) for distance in distances]
zion_transect_pnt = gpd.GeoSeries(zion_transect_pnt, crs=32612)
zion_transect_pnt = zion_transect_pnt.to_crs(src_srtm.crs)
zion_transect_pnt zion_transect_pnt
0 POINT (-113.20000 37.45000)
1 POINT (-113.19804 37.44838)
2 POINT (-113.19608 37.44675)
...
151 POINT (-112.90529 37.20443)
152 POINT (-112.90334 37.20280)
153 POINT (-112.90140 37.20117)
Length: 154, dtype: geometry
Finally, we extract the elevation values for each point in our transect and combine the information with zion_transect_pnt
(after “promoting” it to a GeoDataFrame
, to accomodate extra attributes), using the point extraction method shown earlier (Section 5.4.1). We also attach the respective distance cutoff points distances
:
= rasterstats.point_query(
result
zion_transect_pnt, 1),
src_srtm.read(= src_srtm.nodata,
nodata = src_srtm.transform,
affine ='nearest'
interpolate
)= gpd.GeoDataFrame(geometry=zion_transect_pnt)
zion_transect_pnt 'dist'] = distances
zion_transect_pnt['elev'] = result
zion_transect_pnt[ zion_transect_pnt
geometry | dist | elev | |
---|---|---|---|
0 | POINT (-113.20000 37.45000) | 0.0 | 2001 |
1 | POINT (-113.19804 37.44838) | 250.0 | 2037 |
2 | POINT (-113.19608 37.44675) | 500.0 | 1949 |
... | ... | ... | ... |
151 | POINT (-112.90529 37.20443) | 37750.0 | 1837 |
152 | POINT (-112.90334 37.20280) | 38000.0 | 1841 |
153 | POINT (-112.90140 37.20117) | 38250.0 | 1819 |
154 rows × 3 columns
The information in zion_transect_pnt
, namely the 'dist'
and 'elev'
attributes, can now be used to draw an elevation profile, as illustrated in Figure 5.5 (b):
# Raster and a line transect
= plt.subplots()
fig, ax =ax)
rasterio.plot.show(src_srtm, ax=ax, color='black')
gpd.GeoSeries(zion_transect).plot(ax=ax, color='none', edgecolor='darkgrey');
zion.plot(ax
# Elevation profile
= plt.subplots()
fig, ax 'dist')['elev'].plot(ax=ax)
zion_transect_pnt.set_index('Distance (m)')
ax.set_xlabel('Elevation (m)'); ax.set_ylabel(
5.4.3 Extraction to polygons
The final type of geographic vector object for raster extraction is polygons. Like lines, polygons tend to return many raster values per polygon. For continuous rasters (Figure 5.6 (a)), we typically want to generate summary statistics for raster values per polygon, for example to characterize a single region or to compare many regions. The generation of raster summary statistics, by polygons, is demonstrated in the code below, which creates a list of summary statistics (in this case a list of length 1, since there is just one polygon), using rasterstats.zonal_stats
:
= rasterstats.zonal_stats(
result
zion, 1),
src_srtm.read(= src_srtm.nodata,
nodata = src_srtm.transform,
affine = ['mean', 'min', 'max']
stats
) result
/usr/local/lib/python3.11/site-packages/rasterstats/main.py:156: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
if 'Point' in geom.type:
[{'min': 1122.0, 'max': 2661.0, 'mean': 1818.211830154405}]
Transformation of the list
to a DataFrame
(e.g., to attach the derived attributes to the original polygon layer), is straightforward:
pd.DataFrame(result)
min | max | mean | |
---|---|---|---|
0 | 1122.0 | 2661.0 | 1818.21183 |
Because there is only one polygon in the example, a DataFrame
with a single row is returned. However, if zion
was composed of more than one polygon, we would accordingly get more rows in the DataFrame
. The result provides useful summaries, for example that the maximum height in the park is around 2661
\(m\) above see level.
Note the stats
argument, where we determine what type of statistics are calculated per polygon. Possible values other than 'mean'
, 'min'
, 'max'
are:
'count'
—The number of valid (i.e., excluding “No Data”) pixels'nodata'
—The number of pixels with ’No Data”'majority'
—The most frequently occurring value'median'
—The median value
See the documentation of rasterstats.zonal_stats
for the complete list. Additionally, the rasterstats.zonal_stats
function accepts user-defined functions for calculating any custom statistics.
To count occurrences of categorical raster values within polygons (Figure 5.6 (b)), we can use masking (Section 5.3) combined with np.unique
, as follows:
= rasterio.mask.mask(
out_image, out_transform
src_nlcd,
zion.geometry.to_crs(src_nlcd.crs), =False,
crop=9999
nodata
)= np.unique(out_image, return_counts=True) counts
According to the result, for example, pixel value 2
(“Developed” class) appears in 4205
pixels within the Zion polygon:
counts
(array([ 2, 3, 4, 5, 6, 7, 8, 15], dtype=uint8),
array([ 4205, 98285, 298299, 203701, 235, 62, 679, 852741]))
The input data for raster to polygon extraction examples in this section are illustrated in Figure 5.6:
# Continuous raster
= plt.subplots()
fig, ax =ax)
rasterio.plot.show(src_srtm, ax=ax, color='none', edgecolor='black');
zion.plot(ax
# Categorical raster
= plt.subplots()
fig, ax =ax, cmap='Set3')
rasterio.plot.show(src_nlcd, ax=ax, color='none', edgecolor='black'); zion.to_crs(src_nlcd.crs).plot(ax
5.5 Rasterization
5.5.1 Rasterizing points
Rasterization is the conversion of vector objects into their representation in raster objects. Usually, the output raster is used for quantitative analysis (e.g., analysis of terrain) or modeling. As we saw in Chapter 1, the raster data model has some characteristics that make it conducive to certain methods. Furthermore, the process of rasterization can help simplify datasets because the resulting values all have the same spatial resolution: rasterization can be seen as a special type of geographic data aggregation.
The rasterio
package contains the rasterio.features.rasterize
function for doing this work. To make it happen, we need to have the “template” grid definition, i.e., the “template” raster defining the extent, resolution and CRS of the output, in the out_shape
(the output dimensions) and transform
(the transformation matrix) arguments of rasterio.features.rasterize
. In case we have an existing template raster, we simply need to query its .shape
and .transform
. In case we create a custom template, e.g., covering the vector layer extent with specified resolution, there is some extra work to calculate both of these objects (see next example).
Furthermore, the rasterio.features.rasterize
function requires the input vector shapes in the form for a generator of (geom,value)
tuples, where:
geom
is the given geometry (shapely geometry object)value
is the value to be “burned” into pixels coinciding with the geometry (int
orfloat
)
Again, this will be made clear in the next example.
The geographic resolution of the “template” raster has a major impact on the results: if it is too low (cell size is too large), the result may miss the full geographic variability of the vector data; if it is too high, computational times may be excessive. There are no simple rules to follow when deciding an appropriate geographic resolution, which is heavily dependent on the intended use of the results. Often the target resolution is imposed on the user, for example when the output of rasterization needs to be aligned to the existing raster.
To demonstrate rasterization in action, we will use a template raster that has the same extent and CRS as the input vector data cycle_hire_osm_projected
(a dataset on cycle hire points in London, illustrated in Figure 5.7 (a)) and a spatial resolution of 1000 \(m\). First, we take the vector layer and tansform it to a projected CRS:
= cycle_hire_osm.to_crs(27700) cycle_hire_osm_projected
Next, we need to calculate the out_shape
and transform
of our template raster. To calculate the transform, we combine the top-left corner of the cycle_hire_osm_projected
bounding box with the required resolution (e.g., 1000 \(m\)):
= cycle_hire_osm_projected.total_bounds
bounds = 1000
res = rasterio.transform.from_origin(
transform =bounds[0],
west=bounds[3],
north=res,
xsize=res
ysize
) transform
Affine(1000.0, 0.0, 523038.61452275474,
0.0, -1000.0, 184971.40854297992)
To calculate the out_shape
, we divide the x-axis and y-axis extent by the resolution, and take the ceiling of the results:
= math.ceil((bounds[3] - bounds[1]) / res)
rows = math.ceil((bounds[2] - bounds[0]) / res)
cols = (rows, cols)
shape shape
(11, 16)
Now, we can rasterize. Rasterization is a very flexible operation: the results depend not only on the nature of the template raster, but also on the type of input vector (e.g., points, polygons), the pixel “activation” method, and the function applied when there is more than one match.
To illustrate this flexibility we will try three different approaches to rasterization (Figure 5.7 (b)-(d)). First, we create a raster representing the presence or absence of cycle hire points (known as presence/absence rasters). In this case, we transfer the value of 1
to all pixels where at least one point falls in. To transform the point GeoDataFrame
into a generator of shapely
geometries and the (fixed) values, we use the following expression:
= ((g, 1) for g in cycle_hire_osm_projected.geometry.to_list())
g g
<generator object <genexpr> at 0x7f76daa6a810>
Then, the rasterizing expression is:
= rasterio.features.rasterize(
ch_raster1 =g,
shapes=shape,
out_shape=transform
transform )
The result ch_raster1
is an ndarray
with the burned values of 1
where the pixel coincides with at least one point, and 0
in “unaffected” pixels:
ch_raster1
array([[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
To count the number of bike hire stations, we can use the fixed value of 1
combined with the merge_alg=rasterio.enums.MergeAlg.add
, which means that multiple values burned into the same pixel are summed, rather than replaced keeping last (the default):
= ((g, 1) for g in cycle_hire_osm_projected.geometry.to_list())
g = rasterio.features.rasterize(
ch_raster2 =g,
shapes=shape,
out_shape=transform,
transform=rasterio.enums.MergeAlg.add
merge_alg )
Here is the resulting array of point counts ch_raster2
:
ch_raster2
array([[ 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 3, 3],
[ 0, 0, 0, 1, 3, 3, 5, 5, 8, 9, 1, 3, 2, 6, 7, 0],
[ 0, 0, 0, 8, 5, 4, 11, 10, 12, 9, 11, 4, 8, 5, 4, 0],
[ 0, 1, 4, 10, 10, 11, 18, 16, 13, 12, 8, 6, 5, 2, 3, 0],
[ 3, 3, 9, 3, 5, 14, 10, 15, 9, 9, 5, 8, 0, 0, 12, 2],
[ 4, 5, 9, 11, 6, 7, 7, 3, 10, 9, 4, 0, 0, 0, 0, 0],
[ 4, 0, 7, 8, 8, 4, 11, 10, 7, 3, 0, 0, 0, 0, 0, 0],
[ 0, 1, 3, 0, 0, 1, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 1, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
dtype=uint8)
The new output, ch_raster2
, shows the number of cycle hire points in each grid cell. The cycle hire locations have different numbers of bicycles described by the capacity variable, raising the question, what’s the capacity in each grid cell? To calculate that, we must sum the field ('capacity'
) rather than the fixed values of 1
. This requires using an expanded generator of geometries and values, where we (1) extract both geometries and attribute values, and (2) filter out “No Data” values, as follows:
= ((g, v) for g, v in cycle_hire_osm_projected[['geometry', 'capacity']] \
g ='capacity')
.dropna(subset\
.to_numpy()
.tolist())= rasterio.features.rasterize(
ch_raster3 =g,
shapes=shape,
out_shape=transform,
transform=rasterio.enums.MergeAlg.add
merge_alg )
Here is the resulting array of capacity sums ch_raster3
:
ch_raster3
array([[ 0., 0., 0., 0., 0., 11., 34., 0., 0., 0., 0.,
0., 0., 11., 35., 24.],
[ 0., 0., 0., 7., 30., 46., 60., 73., 72., 75., 6.,
50., 25., 47., 36., 0.],
[ 0., 0., 0., 89., 36., 31., 167., 97., 115., 80., 138.,
61., 65., 109., 43., 0.],
[ 0., 11., 42., 104., 108., 138., 259., 206., 203., 135., 107.,
37., 0., 25., 60., 0.],
[ 88., 41., 83., 28., 64., 115., 99., 249., 107., 117., 60.,
33., 0., 0., 0., 0.],
[ 0., 89., 107., 95., 73., 119., 69., 23., 140., 141., 46.,
0., 0., 0., 0., 0.],
[ 0., 0., 55., 97., 101., 59., 119., 109., 75., 12., 0.,
0., 0., 0., 0., 0.],
[ 0., 10., 23., 0., 0., 5., 41., 0., 8., 0., 0.,
0., 0., 0., 0., 0.],
[ 0., 19., 9., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.],
[ 0., 29., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.]], dtype=float32)
The input point layer cycle_hire_osm_projected
and the three variants of rasterizing it ch_raster1
, ch_raster2
, and ch_raster3
are shown in Figure 5.7:
# Input points
= plt.subplots()
fig, ax ='capacity', ax=ax);
cycle_hire_osm_projected.plot(column
# Presence/Absence
= plt.subplots()
fig, ax =transform, ax=ax);
rasterio.plot.show(ch_raster1, transform
# Point counts
= plt.subplots()
fig, ax =transform, ax=ax);
rasterio.plot.show(ch_raster2, transform
# Summed attribute values
= plt.subplots()
fig, ax =transform, ax=ax); rasterio.plot.show(ch_raster3, transform
5.5.2 Rasterizing lines and polygons
Another dataset based on California’s polygons and borders (created below) illustrates rasterization of lines. There are three preliminary steps. First, we subset the California polygon:
= us_states[us_states['NAME'] == 'California']
california california
GEOID | NAME | REGION | ... | total_pop_10 | total_pop_15 | geometry | |
---|---|---|---|---|---|---|---|
26 | 06 | California | West | ... | 36637290.0 | 38421464.0 | MULTIPOLYGON (((-118.60337 33.4... |
1 rows × 7 columns
Second, we “cast” the polygon into a 'MultiLineString'
geometry, using the .boundary
property that GeoSeries
have:
= california.geometry.boundary
california_borders california_borders
26 MULTILINESTRING ((-118.60337 33...
dtype: geometry
Third, we create a template raster with a resolution of a 0.5
degree, using the same approach as in Section 5.5.1:
= california_borders.total_bounds
bounds = 0.5
res = rasterio.transform.from_origin(
transform =bounds[0],
west=bounds[3],
north=res,
xsize=res
ysize
)= math.ceil((bounds[3] - bounds[1]) / res)
rows = math.ceil((bounds[2] - bounds[0]) / res)
cols = (rows, cols)
shape shape
(19, 21)
Finally, we rasterize california_borders
based on the calculated template’s shape
and transform
. When considering line or polygon rasterization, one useful additional argument is all_touched
. By default it is False
, but when changed to True
—all cells that are touched by a line or polygon border get a value. Line rasterization with all_touched=True
is demonstrated in the code below (Figure 5.8, left). We are also using fill=np.nan
to set “background” values as “No Data”:
= rasterio.features.rasterize(
california_raster1 1) for g in california_borders.to_list()),
((g, =shape,
out_shape=transform,
transform=True,
all_touched=np.nan
fill )
Compare it to a polygon rasterization, with all_touched=False
(the default), which selects only raster cells whose centroids are inside the selector polygon, as illustrated in Figure 5.8 (right):
= rasterio.features.rasterize(
california_raster2 1) for g in california.geometry.to_list()),
((g, =shape,
out_shape=transform,
transform=np.nan
fill )
To illustrate which raster pixels are actually selected as part of rasterization, we also show them as points. This also requires the following code section to calculate the points, which we explain in Section 5.6:
= california_raster1.copy()
ids = np.arange(0, california_raster1.size) \
ids \
.reshape(california_raster1.shape)
.astype(np.int32)= rasterio.features.shapes(ids, transform=transform)
shapes = list(shapes)
pol = [shapely.geometry.shape(i[0]).centroid for i in pol]
pnt = gpd.GeoSeries(pnt) pnt
The input vector layer, the rasterization results, and the points pnt
, are shown in Figure 5.8:
# Line rasterization
= plt.subplots()
fig, ax =transform, ax=ax, cmap='Set3')
rasterio.plot.show(california_raster1, transform=ax, edgecolor='darkgrey', linewidth=1)
gpd.GeoSeries(california_borders).plot(ax=ax, color='black', markersize=1);
pnt.plot(ax
# Polygon rasterization
= plt.subplots()
fig, ax =transform, ax=ax, cmap='Set3')
rasterio.plot.show(california_raster2, transform=ax, color='none', edgecolor='darkgrey', linewidth=1)
california.plot(ax=ax, color='black', markersize=1); pnt.plot(ax
all_touched=True
all_touched=False
5.6 Spatial vectorization
Spatial vectorization is the counterpart of rasterization (Section 5.5), but in the opposite direction. It involves converting spatially continuous raster data into spatially discrete vector data such as points, lines or polygons.
There are three standard methods to convert a raster to a vector layer, which we cover next:
- Raster to polygons (Section 5.6.1)
- Raster to points (Section 5.6.2)
- Raster to contours (Section 5.6.3)
The most straightforward form of vectorization is the first one, converting raster cells to polygons, where each pixel is represented by a rectangular polygon. The second method, raster to points, has the additional step of calculating polygon centroids. The third method, raster to contours, is somewhat unrelated. Let us demonstrate the three in the given order.
5.6.1 Raster to polygons
The rasterio.features.shapes
function can be used to access to the raster pixel as polygon geometries, as well as raster values. The returned object is a generator, which yields geometry,value
pairs. The additional transform
argument is used to yield true spatial coordinates of the polygons, which is usually what we want.
For example, the following expression returns a generator named shapes
, referring to the pixel polygons:
= rasterio.features.shapes(
shapes
src_grain.read(), =src_grain.transform
transform
) shapes
<generator object shapes at 0x7f76daa5b450>
We can generate all shapes at once, into a list
named pol
, as follows:
= list(shapes) pol
Each element in pol
is a tuple
of length 2, containing:
- The GeoJSON-like
dict
representing the polygon geometry - The value of the pixel(s) which comprise the polygon
For example:
0] pol[
({'type': 'Polygon',
'coordinates': [[(-1.5, 1.5),
(-1.5, 1.0),
(-1.0, 1.0),
(-1.0, 1.5),
(-1.5, 1.5)]]},
1.0)
Note that each raster cell is converted into a polygon consisting of five coordinates, all of which are stored in memory (explaining why rasters are often fast compared with vectors!).
To transform the list
into a GeoDataFrame
, we need few more steps of data reshaping:
# Create 'GeoSeries' with the polygons
= [shapely.geometry.shape(i[0]) for i in pol]
geom = gpd.GeoSeries(geom, crs=src_grain.crs)
geom # Create 'Series' with the values
= [i[1] for i in pol]
values = pd.Series(values)
values # Combine the 'Series' and 'GeoSeries' into a 'DataFrame'
= gpd.GeoDataFrame({'value': values, 'geometry': geom})
result result
value | geometry | |
---|---|---|
0 | 1.0 | POLYGON ((-1.50000 1.50000, -1.... |
1 | 0.0 | POLYGON ((-1.00000 1.50000, -1.... |
2 | 1.0 | POLYGON ((-0.50000 1.50000, -0.... |
... | ... | ... |
11 | 2.0 | POLYGON ((0.00000 -0.50000, 0.5... |
12 | 0.0 | POLYGON ((0.50000 -1.00000, 0.5... |
13 | 2.0 | POLYGON ((1.00000 -1.00000, 1.0... |
14 rows × 2 columns
The resulting polygon layer is shown in Figure 5.9. As shown using the edgecolor='black'
option, neighboring pixels sharing the same raster value are dissolved into larger polygons. The rasterio.features.shapes
function does not offer a way to avoid this type of dissolving. One way to work around that is to convert an array with consecutive IDs, instead of the real values, to polygons, then extract the real values from the raster (similarly to the “raster to points” example, see below).
='value', edgecolor='black', legend=True); result.plot(column
grain.tif
converted to a polygon layer5.6.2 Raster to points
To transform raster to points, we can use rasterio.features.shapes
, as in conversion to polygons, only with the addition of the .centroid
method to go from polygons to their centroids. However, to avoid dissolving nearby pixels, we will actually convert a raster with consecutive IDs, then extract the “true” values by point (it is not strictly necessary in this example, since the values of elev.tif
are all unique):
# Prepare IDs array
= src_elev.read(1)
r = r.copy()
ids = np.arange(0, r.size).reshape(r.shape).astype(np.int32)
ids ids
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]], dtype=int32)
# IDs raster to points
= rasterio.features.shapes(ids, transform=src_elev.transform)
shapes = list(shapes)
pol = [shapely.geometry.shape(i[0]).centroid for i in pol]
geom = gpd.GeoSeries(geom, crs=src_elev.crs)
geom = gpd.GeoDataFrame(geometry=geom) result
# Extract values to points
'value'] = rasterstats.point_query(
result[
result,
r, = src_elev.nodata,
nodata = src_elev.transform,
affine ='nearest'
interpolate )
/usr/local/lib/python3.11/site-packages/rasterstats/io.py:313: UserWarning: Setting nodata to -999; specify nodata explicitly
warnings.warn("Setting nodata to -999; specify nodata explicitly")
The result is shown in Figure 5.10:
# Input raster
= plt.subplots()
fig, ax ='value', legend=True, ax=ax)
result.plot(column=ax);
rasterio.plot.show(src_elev, ax
# Points
= plt.subplots()
fig, ax ='value', legend=True, ax=ax)
result.plot(column='Greys', ax=ax); rasterio.plot.show(src_elev, cmap
elev.tif
5.6.3 Raster to contours
Another common type of spatial vectorization is the creation of contour lines representing lines of continuous height or temperatures (isotherms) for example. We will use a real-world digital elevation model (DEM) because the artificial raster elev produces parallel lines (task for the reader: verify this and explain why this happens). Plotting contour lines is straightforward, using the contour=True
option of rasterio.plot.show
(Figure 5.11):
= plt.subplots()
fig, ax =ax)
rasterio.plot.show(src_dem, ax
rasterio.plot.show(
src_dem, =ax,
ax=True,
contour=np.arange(0,1200,50),
levels='black'
colors; )
Unfortunately, rasterio
does not provide any way of extracting the contour lines in the form of a vector layer, for uses other than plotting. There are two possible workarounds:
- Using
gdal_contour
on the command line (see below), or through its Python interfaceosgeo
- Writing a custom function to export contour coordinates generated by, e.g.,
matplotlib
orskimage
We hereby demonstrate the first and easiest approach, using gdal_contour
. Although we deviate from exclusively using the Python language, the benefit of gdal_contour
is the proven algorithm, customized to spatial data, and with many relevant options. gdal_contour
(along with other GDAL programs) should already be installed on your system since this is a dependency of rasterio
. For example, generating 50 \(m\) contours of the dem.tif
file can be done as follows:
'gdal_contour -a elev data/dem.tif output/dem_contour.gpkg -i 50.0') os.system(
Note that we ran the gdal_contour
command through os.system
, in order to remain in the Python environment. You can also run the standalone command in the command line interface you are using, such as the Anaconda Prompt:
gdal_contour -a elev data/dem.tif output/dem_contour.gpkg -i 50.0
Like all GDAL programs, gdal_contour
works with files. Here:
- The input is the
data/dem.tif
file - The result is exported to the
output/dem_contour.gpkg
file
To illustrate the result, let’s read the result back into the Python environment. Note that the layer contains an arrtibute named elev
(as specified using -a elev
) with the contour elevation values:
= gpd.read_file('output/dem_contour.gpkg')
contours contours
ID | elev | geometry | |
---|---|---|---|
0 | 0 | 750.0 | LINESTRING (795382.355 8935384.... |
1 | 1 | 800.0 | LINESTRING (795237.703 8935384.... |
2 | 2 | 650.0 | LINESTRING (798098.379 8935384.... |
... | ... | ... | ... |
29 | 29 | 450.0 | LINESTRING (795324.083 8931774.... |
30 | 30 | 450.0 | LINESTRING (795488.616 8931774.... |
31 | 31 | 450.0 | LINESTRING (795717.420 8931774.... |
32 rows × 3 columns
Here is a plot of the contour layer in dem_contour.gpkg
(Figure 5.12):
= plt.subplots()
fig, ax =ax)
rasterio.plot.show(src_dem, ax=ax, edgecolor='black'); contours.plot(ax
gdal_contour
program5.7 Distance to nearest geometry
Calculating a raster of distances to the nearest geometry is an example of a “global” raster operation (Section 3.4.6). To demonstrate it, suppose that we need to calculate a raster representing the distance to the nearest coast in New Zealand. This example also wraps many of the concepts introduced in this chapter and in previous chapter, such as raster aggregation, raster conversion to points, and rasterizing points.
For the coastline, we will dissolve the New Zealand administrative division polygon layer and “extract” the boundary (which is a "MultiLineString"
geometry):
= gpd.GeoSeries(nz.unary_union, crs=nz.crs) \
coastline \
.to_crs(src_nz_elev.crs)
.boundary coastline
0 MULTILINESTRING ((1229997.901 4...
dtype: geometry
For a “template” raster, we will aggregate the New Zealand DEM, in the nz_elev.tif
file, to 5 times coarser resolution. The code section below follows the aggeregation example in Section 4.4.3, then replaces the original (aggregated) values with unique IDs, which is a preliminary step when converting to points, as explained in Section 5.6.2. Finally, we also replace “erase” (i.e., replace with np.nan
) IDs which were np.nan
in the aggregated elevation raster, i.e., beyond the land area of New Zealand:
= 0.2
factor # Reading aggregated array
= src_nz_elev.read(1,
r =(
out_shapeint(src_nz_elev.height * factor),
int(src_nz_elev.width * factor)
),=rasterio.enums.Resampling.average
resampling
)# Updating the transform
= src_nz_elev.transform * src_nz_elev.transform.scale(
new_transform / r.shape[1]),
(src_nz_elev.width / r.shape[0])
(src_nz_elev.height
)# Generating unique IDs per cell
= r.copy()
ids = np.arange(0, r.size).reshape(r.shape).astype(np.float32)
ids # "Erasing" irrelevant IDs
= np.nan
ids[np.isnan(r)] ids
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
The result is an array named ids
with the IDs, and the corresponding new_transform
, as plotted in Figure 5.13:
= plt.subplots()
fig, ax =new_transform, ax=ax)
rasterio.plot.show(ids, transform=ax, edgecolor='black'); gpd.GeoSeries(coastline).plot(ax
To calculate distances, we must convert each pixel to a vector (point) geometry. We use the technique demonstrated in Section 5.6.2:
= rasterio.features.shapes(ids, transform=new_transform)
shapes = list(shapes)
pol = [shapely.geometry.shape(i[0]).centroid for i in pol] pnt
The result pnt
is a list
of shapely
geometries, representing raster cell centroids (excluding np.nan
pixels):
print(pnt[0])
POINT (1572956.546197626 6189460.927303582)
Next we calculate the correspinding list
of distances, using the .distance
method from shapely
:
= [(i, i.distance(coastline)) for i in pnt]
distances 0] distances[
(<POINT (1572956.546 6189460.927)>,
0 826.752396
dtype: float64)
Finally, we rasterize (see Section 5.5.1) the distances into our raster template:
= rasterio.features.rasterize(
image
distances,=ids.shape,
out_shape=np.float_,
dtype=new_transform,
transform=np.nan
fill
) image
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]])
The resulting raster of distances is shown in Figure 5.14:
= plt.subplots()
fig, ax =new_transform, ax=ax)
rasterio.plot.show(image, transform=ax, edgecolor='black'); gpd.GeoSeries(coastline).plot(ax