R for GIS
BayGeo, Spring 2024

Common Manipulations for sf Objects



Simple Features

simple features is an ISO standard for storing and accessing spatial data. It is widely adopted in spatial databases, open source formats like GeoJSON, GIS software, tools, etc.

sf employs a standard called “Well Known Text” (WKT) to encode the geometry. WKT is a very simple written representation of the ‘connect-the-dots’ vector data model.

Geometry Sample WKT
point POINT (2 4)
multipoint MULTIPOINT (2 2, 3 3, 3 2)
linestring LINESTRING (0 3, 1 4, 2 3)
polygon POLYGON ((1 0, 3 4, 5 1, 1 0))

Import the Yosemite points-of-interest Shapefile.

library(sf)

## Import the Yosemite Points-of-Interest Shapefile
yose_poi_utm <- st_read(dsn="./data", layer="yose_poi")

## View column names in the attribute table
names(yose_poi_utm)

## Plot the points
plot(yose_poi_utm |> st_geometry(), pch=16, asp=1, cex=0.5, axes=TRUE)
## Reading layer `yose_poi' from data source `D:\Workshops\R-Spatial\rspatial_mod\outputs\rspatial_bgs24\exercises\data' using driver `ESRI Shapefile'
## Simple feature collection with 2720 features and 30 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 246416.2 ymin: 4153717 xmax: 301510.7 ymax: 4208419
## Projected CRS: NAD83 / UTM zone 11N
##  [1] "OBJECTID"   "POINAME"    "POIALTNAME" "POILABEL"   "POIFEATTYP" "POITYPE"    "UNITCODE"   "UNITNAME"   "GROUPCODE"  "REGIONCODE" "ISEXTANT"   "MAPMETHOD" 
## [13] "MAPSOURCE"  "SRCESCALE"  "SOURCEDATE" "XYERROR"    "LOCATIONID" "ASSETID"    "NOTES"      "DISTRIBUTE" "RESTRICTIO" "CREATEDATE" "CREATEUSER" "EDITDATE"  
## [25] "EDITUSER"   "FEATUREID"  "GEOMETRYID" "PLACESID"   "GlobalID"   "TAGS"       "geometry"

Take a closer look at the geometry column:

class(yose_poi_utm$geometry)
## [1] "sfc_POINT" "sfc"
head(yose_poi_utm[, "geometry"])
## Simple feature collection with 6 features and 0 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 260859.4 ymin: 4153717 xmax: 270931.1 ymax: 4179771
## Projected CRS: NAD83 / UTM zone 11N
##                   geometry
## 1 POINT (260859.4 4178493)
## 2 POINT (264115.4 4158415)
## 3 POINT (268276.8 4153717)
## 4 POINT (268620.1 4178344)
## 5 POINT (269052.6 4178870)
## 6 POINT (270931.1 4179771)

The geometry column of a sf object does not have to be called ‘geometry’. You can view the column name by running attr(x, "sf_column") where x is a sf object.

The features in a sf object can be a mix of points, lines, and polyons. The data class of the geometry column is therefore sfc (simple features collection).

Other properties of sf objects

Bounding box

It’s often useful to view the metadata of a sf object. You can get the bounding box (aka extent) with st_bbox().

st_bbox(yose_poi_utm)
##      xmin      ymin      xmax      ymax 
##  246416.2 4153717.3  301510.7 4208419.0

Coordinates

To extract the coordinates of a sf object as a matrix, use st_coordinates().

x <- st_coordinates(yose_poi_utm)
head(x)
##             X       Y
## [1,] 260859.4 4178493
## [2,] 264115.4 4158415
## [3,] 268276.8 4153717
## [4,] 268620.1 4178344
## [5,] 269052.6 4178870
## [6,] 270931.1 4179771

Projections

Projections are particularly important whenever you want to:

The more generic term for projections as is Coordinate Reference System (CRS).

CRS also includes unprojected geographic coordinates (longitude & latitude).

A CRS has three parts:

  1. a mathematical representation of the Earth (datum)
  2. a 2D to 3D transformation model (projected CRS’s)
  3. an origin

CRS functions

All sf objects are able to store projection info.

st_read() imports the CRS info from standard GIS file formats

st_crs() creates, views, or assigns the CRS

st_transform() (re)projects sf objects into a different crs

Using Established CRSs

The sf package uses the PROJ.4 library for established CRS. It provides two ways to specify a known projection:

EPSG codes were developed by the European Petroluem Survey Group.

The following are equivalent:

st_crs("+init=epsg:3857")
st_crs(3857)

If you happen to know the proj4string, you can use it also

st_crs("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")
## Coordinate Reference System:
##   User input: +proj=utm +zone=10 +ellps=GRS80 +datum=NAD83 +units=m +no_defs 
##   wkt:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6269]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["UTM zone 10N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-123,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]],
##         ID["EPSG",16010]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

Common EPSG Numbers

Geographic CRSs

4326. Geographic, WGS84 (default for lon/lat)
4269. Geographic, NAD83 (USA Fed agencies like Census)

Projected CRSs

5070. USA Contiguous Albers Equal Area Conic
3310. CA ALbers Equal Area
32610. UTM Zone 10, WGS84 (Northern Cal)
32611. UTM Zone 11, WGS84 (Southern Cal)
3857. Web Mercator (web maps)

The one-stop shop for finding proj4 strings and epsg numbers is http://www.spatialreference.org. (Google usually works also).

Reference code for your convenience:

## Some common epsg numbers in California
epsg_geo_wgs84 <- 4326
epsg_geo_nad83 <- 4269
epsg_utm10n_wgs84 <- 32610
epsg_utm11n_wgs84 <- 32611
epsg_utm10n_nad83 <- 26910
epsg_utm11n_nad83 <- 26911
epsg_webmerc <- 3857
epsg_caalbers <- 3310

To view all EPSG codes (>5,500), run rgdal::make_EPSG()

See also Projection and Datum Guidelines for California (CDFW)

View or Assign the CRS

View the CRS

yose_trails <- sf::st_read("./data/yose_trails.gdb", layer="Trails")
## Reading layer `Trails' from data source `D:\Workshops\R-Spatial\rspatial_mod\outputs\rspatial_bgs24\exercises\data\yose_trails.gdb' using driver `OpenFileGDB'
## Simple feature collection with 1074 features and 13 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: 245134 ymin: 4153668 xmax: 323239.7 ymax: 4250703
## Projected CRS: NAD83 / UTM zone 11N
st_crs(yose_trails)
## Coordinate Reference System:
##   User input: NAD83 / UTM zone 11N 
##   wkt:
## PROJCRS["NAD83 / UTM zone 11N",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["UTM zone 11N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-117,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["North America - between 120°W and 114°W - onshore and offshore. Canada - Alberta; British Columbia; Northwest Territories; Nunavut. United States (USA) - California; Idaho; Nevada, Oregon; Washington."],
##         BBOX[30.88,-120,83.5,-114]],
##     ID["EPSG",26911]]

Set the CRS

Occasionally, you have a sf object that doesn’t have a CRS recorded. This is most common when you are importing a CSV file and are turning it into a sf object. In thse cases, you can use st_crs() to assign the CRS.

st_crs(my_gps_pts) <- 4326

Note: this is not the same as projecting data. You’re simply telling it the CRS.

Projecting Data

You can project data from one CRS to another with:

st_transform(sf_object, new_crs)

Let’s ‘unproject’ the Yosemite trails layer from UTM to geographic coordinates.

## (Un)project the trails layer to geographic coordinates
yose_trails_ll <- st_transform(yose_trails, 4269)

Now that the trails are in geographic coordinates, we can overlay them on the boundary and historical points.

yose_hp_ll <- st_read("./data/yose_historic_pts.kml", layer="yose_historic_places", quiet=TRUE)

## Plot the trails, then overlay the historic points and park boundary
plot(yose_trails_ll |> st_geometry(), asp=1, col="bisque3", axes=T, main="Yosemite National Park")
plot(yose_hp_ll |> st_geometry(), col="gray30", pch=16, add=TRUE)
plot(yose_bnd_ll |> st_geometry(), col=NA, border="chartreuse4", lwd=3, add=TRUE)

You can project one layer to match another without knowing the exact CRS. Just extract the CRS of the layer you’re trying to match with st_crs(), and use that as the second argument in st_transform().

yose_trails_ll <- st_transform(yose_trails, st_crs(yose_bnd_ll))

Import the Yosemite roads and add those to your map. You may use the code below to get started.

## Get the path to the geodatabase 
gdbroads_fn <- "./data/yose_roads.gdb"
file.exists(gdbroads_fn)

## Import the roads layer
yose_roads_utm <- st_read(gdbroads_fn, layer="Yosemite_Roads")

[Solution]

## (Un)project the roads layer to geographic coordinates
yose_roads_ll <- st_transform(yose_roads_utm, 4269)

## Plot all four layers
plot(yose_trails_ll |> st_geometry(), asp=1, col="bisque3", axes=T, main="Yosemite")
plot(yose_roads_ll |> st_geometry(), col="navyblue", lwd=1, lty=5, add=TRUE)
plot(yose_hp_ll |> st_geometry(), col="gray30", pch=16, add=TRUE)
plot(yose_bnd_ll |> st_geometry(), col=NA, border="chartreuse4", lwd=3, add=TRUE)
## [1] TRUE

Convert Plain Data Frames to sf Data Frames

To turn a data frame into a sf object, use

st_as_sf(df, coords, crs)

where:

Example: Turn the quakes Data Frame into a sf Object

quakes is a dataframe that comes with R (in the datasets package). Here we turn it into a sf data frame.

## Make a sf object from the built-in 'quakes' dataset
head(quakes)
quakes_sf <- st_as_sf(quakes, coords=c("long", "lat"), crs=4326)
plot(quakes_sf %>% st_geometry(), axes = TRUE, asp = 1)
##      lat   long depth mag stations
## 1 -20.42 181.62   562 4.8       41
## 2 -20.62 181.03   650 4.2       15
## 3 -26.00 184.10    42 5.4       43
## 4 -17.97 181.66   626 4.1       19
## 5 -20.42 181.96   649 4.0       11
## 6 -19.68 184.31   195 4.0       12

If you don’t know the datum for geographic coordinates, use WGS84 (epsg 4326).

Exporting sf objects to disk

You can export sf objects as Shapefiles and other formats using st_write()

If you name the output file with a standard extension (e.g., .shp, .kml), st_write() will know which format to use. Otherwise you have to add the driver argument.

## For testing purposes, pull out fires from the 2000s
yose_fires20_pt <- yose_fires_pt |> filter(DECADE == 2000)

## Export to Shapefile
st_write(yose_fires20_pt, 
         dsn="yose_fires20s_pt.shp")

## Export to GeoPackage
st_write(yose_fires20_pt, 
         dsn="~yose_fires.gpkg", 
         layer="yose_fires20s")

By default, st_write() will throw an error if the output file already exists. To overwrite existing data (not recommended), add delete_dsn = TRUE.

Notebook Time!

nb_sf-manip.Rmd

Convert a CSV file to a simple feature data frame, project it, and export as GeoJSON

preview notebook | answer key

Repairing Geometries

Function Fix
tidyr::drop_na(geometry) drop rows where there geometry column contains NA
sf::st_zm() Drop Z and/or M dimensions from feature geometries
sf::st_is_valid() check if features have geometry problems
sf::st_make_valid() repair common geometry problems

Converting sf ↔︎ sp

sf is relatively new, and many R packages still use the older data classes from the sp package.

Fortunately it’s relatively easy to convert back and forth:

sp → sf
st_as_sf()

sf → sp
as(x, “Spatial”)

Example: sp → sf

## Get a sample sp object from the maptools package
library(maptools)
## Please note that 'maptools' will be retired during October 2023,
## plan transition at your earliest convenience (see
## https://r-spatial.org/r/2023/05/15/evolution4.html and earlier blogs
## for guidance);some functionality will be moved to 'sp'.
##  Checking rgeos availability: TRUE
data(wrld_simpl)
class(wrld_simpl)

## Convert sp to sf
wrld_simpl_sf <- sf::st_as_sf(wrld_simpl)
class(wrld_simpl_sf)

## Plot the sf object
plot(wrld_simpl_sf["REGION"])
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
## [1] "sf"         "data.frame"

See also:

Summary

Today we saw how to:

Additional Resources



Next: Visualization with tmap and leaflet