In this tutorial you will learn how to
- define geometries (points, lines, polygons)
- plot those geometries
- execute spatial joins (which points are contained in a polygon?)
- get the distance between a set of points
- do all of the above within the context of geospatial data (e.g. cities, roads, counties)
To get started we need to learn how to define and operate on abstract geometries without a coordinate reference system (CRS). In other words, we need to learn how to deal with points, lines and polygons on your vanilla XY plane as opposed to longitude and latitude coordinates on a globe. To do this, we will make heavy use of the Simple Feature (sf) package from Edzer Pebesma.
Simple Feature Geometries (sfg)
We can define a point with the coordinates (x,y) = (1,1) as follows
pt1 <- st_point(c(1,1)) pt1 # POINT(1 1)
We could also include a Z coordinate by doing
st_point(c(1,1,1)) # POINTZ(1 1 1)
and/or an M coordinate by doing
st_point(c(1,1,1,20)) # POINTZM(1 1 1 20) st_point(c(1,1,20), dim = "XYM") # POINTM(1 1 20)
Here X, Y, and Z represent spatial coordinates while M represents an associated measurement such as time. Inspecting the class of pt1 we see
class(pt1) # "XY" "POINT" "sfg"
Our point object inherits from multiple classes – “XY”, “POINT”, and “sfg”. Here sfg stands for Simple Feature Geometry. We can build other Simple Feature Geometries such as
line1 <- st_linestring(x=rbind(c(4,1), c(5,2), c(6,2))) line1 class(line1) # "XY" "LINESTRING" "sfg" plot(line1)
poly1 <- st_polygon(x=list(rbind(c(1,1), c(3,1), c(3,2), c(1,2), c(.5,.5), c(1,1)))) poly1 class(poly1) # "XY" "POLYGON" "sfg" plot(poly1)
POLYGON (with holes)
poly2 <- st_polygon(x=list( rbind(c(1,3), c(3,3), c(3,4), c(1,4), c(.5,2.5), c(1,3)), # outer polygon rbind(c(1.5, 3.5), c(1.7, 3.5), c(1.7, 3.7), c(1.5, 3.7), c(1.5, 3.5)), # hole1 rbind(c(2.5, 3.5), c(2.7, 3.5), c(2.7, 3.7), c(2.5, 3.5)) # hole2 )) poly2 class(poly2) # "XY" "POLYGON" "sfg" plot(poly2)
You can also define a Simple Feature Geometry that includes multiple points, linestrings, polygons, or a mixture of all three.
multipt <- st_multipoint(rbind(c(1,1), c(2,2), c(1.5,1.5), c(2,1))) multipt class(multipt) # "XY" "MULTIPOINT" "sfg" plot(multipt)
GEOMETRYCOLLECTION (a point, line, and polygon)
geomcoll <- st_geometrycollection(x = list(pt1, line1, poly2)) class(geomcoll) # "XY" "GEOMETRYCOLLECTION" "sfg" plot(geomcoll)
Notice that printing the objects returns a description of its coordinates. This textual description is called Well Known Text (WKT).
We can do lots of cool things with geometries like
create new point which is pt1 shifted one unit up
pt1 + c(0, 1)
get the area of polygon1
determine if pt1 lies inside poly2
st_within(pt1, poly1, sparse = F)
determine the distance between two points
Simple Feature List-Column (sfc)
Suppose we shoot bullets at a target. We’ll want to analyze where each bullet hit the target. For this we can use a Simple Feature List-Column (sfc). For example
shot1 <- st_point(c(0.4, 0.3)) shot2 <- st_point(c(0.8, 0.1)) shot3 <- st_point(c(0.6, 0.55)) shotsSFC <- st_sfc(shot1, shot2, shot3) shotsSFC class(shotsSFC) plot(shotsSFC)
Now we can do cool stuff like
get the distance between every pair of bullets
determine the minimum bounding box that contains all points
You might be wondering why we created a Simple Feature List-Column instead of a MULTIPOINT object. The reason is because MULTIPOINT is meant to represent one “thing” which happens to have multiple points. In this case, we can consider each shot to be a separate “thing”, so it make sense to represent them as multiple POINT objects within a Simple Feature List-Column. To make this more clear, suppose we use a shotgun (which sprays many beads for each shot) instead of a rifle (one bead per shot). Here we’d represent each shot as a MULTIPOINT, but still organize the collection of shots together as a Simple Feature List-Column.
shot1 <- st_multipoint(rbind(c(0.4, 0.3), c(0.45, 0.32), c(0.42, 0.35))) shot2 <- st_multipoint(rbind(c(0.8, 0.1))) shot3 <- st_multipoint(rbind(c(0.93, 0.55), c(0.97, 0.52), c(0.95, 0.48), c(1.05, 0.29))) shotsSFC <- st_sfc(shot1, shot2, shot3) shotsSFC class(shotsSFC) plot(shotsSFC)
Simple Feature (sf)
The next layer up from Simple Feature List-Column is the Simple Feature class. Consider our example of shooting a gun at target. With each shot there is associated data we’d like to retain. For example, the shot number and the type of gun used. ShotNumber and GunType are the features associated with the points representing the bullet locations. This concept of having a geometry associated with a set of features is the underlying concept behind the Simple Feature class type.
shotsDF <- data.frame( ShotNumber = c(1, 2, 3, 4, 5), GunType = c("rifle", "rifle", "shotgun", "shotgun", "shotgun") ) shot1 <- st_point(c(0.2, 0.75)) shot2 <- st_point(c(1.1, 0.9)) shot3 <- st_multipoint(rbind(c(0.4, 0.3), c(0.45, 0.32), c(0.42, 0.35))) shot4 <- st_multipoint(rbind(c(0.8, 0.1))) shot5 <- st_multipoint(rbind(c(0.93, 0.55), c(0.97, 0.52), c(0.95, 0.48), c(1.05, 0.48))) shotsSFG <- st_sfc(shot1, shot2, shot3, shot4, shot5) shotsSF <- st_sf(shotsDF, shotsSFG) shotsSF class(shotsSF) plot(shotsSF)
Notice that when we print shotsSF it looks much like a data.frame. In fact, class(shotsSF) returns “sf”, “data.frame” meaning shotsSF is an extension of a data.frame with additional properties defined for Simple Feature objects. The upshot of this is that we can use it much like we use a normal data.frame
subset the rows
Simple Feature is the class type you will find yourself using most frequently when working with spatial data. A major benefit to this is that ggplot2 has a native geom_sf() method for plotting your Simple Feature objects. For example
mypoints <- data.frame(x = c(1,2,3), y = c(1,2,3), foo = c("a", "b", "a")) mypoints <- st_as_sf(mypoints, coords = c("x", "y")) ggplot()+geom_sf(data = mypoints, aes(color = foo))+labs(title = "My simple plot")
(Note that geom_sf() is brand new, so it still has some bugs which are being worked out.)
Here we define a square target to illustrate some of the cool things you can do with sf objects.
target <- st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0)))) plot(target) plot(shotsSF, add=T, col=shotsSF$ShotNumber)
Which shots hit the target at least a little bit?
st_intersects(target, shotsSF, sparse = F)
Which shots landed entirely within the target?
st_contains(target, shotsSF, sparse = F)
For each shot, how close was the center of the target to the nearest bead?
For each shot, how close was the center of the target to the center of the beads?
Data with a Coordinate Reference System (CRS)
For geospatial data, we can mimmick the concepts above but we’ll need to include something extra – a Coordinate Reference System (CRS). The CRS will inform sf to treat x and y coordinates as longitude and latitude, and help with operations like plotting and measuring the distance between points. CRS is a deep, technical topic and I won’t pretend to be an expert on it. If you’re interested in understanding it better, I’ll defer you to the Wikipedia article for Spatial Reference Systems.
Geospatial data is typically stored as a “shape file” which is really a collection of files, each with the same prefix but with differing file extensions which store different data elements. As an example, the sf package comes with a shape file representing the counties in North Carolina. You can see them by navigating to
system.file("shape", package="sf") # For me, /Library/Frameworks/R.framework/Versions/3.4/Resources/library/sf/shape list.files(system.file("shape", package="sf"))
To load a shape file, we can do
nc <- st_read(system.file("shape/nc.shp", package="sf")) nc
Notice that the resulting object is a Simple Feature where each row in the data.frame represents a county, and the geometry used to represent counties is a MULTIPOLYGON. This is because some counties in NC have disjoint sections broken apart by water.
Heres a plot of the data
Also take note of the CRS. It’s signified by the SRID 4267.
Suppose we want to plot a few cities on the map. First we can build a table of some major cities in NC
cities <- data.frame( City = c("Charlotte", "Raleigh", "Greensboro"), Population = c(842051, 458880, 287027), lon = c(-80.8431, -78.6382, -79.7920), lat = c(35.2271, 35.7796, 36.0726) )
Convert to sf object
citiesSF <- st_as_sf(cities, coords = c("lon", "lat"), crs = 4267)
Plot North Carolina counties with major cities overlaid
ggplot()+ geom_sf(data = nc)+ geom_sf(data = citiesSF, aes(size = Population), color = "red")
For more details, be sure to check out the sf vignettes, starting with this one.