R for GIS
BayGeo, Spring 2024

Working with Text

Character Data

Combine vs. Concatenation

Combine means building a vector. For character (string) vectors, we generally use the c() function.

x <- c("I", "have", "a", "dream")
x
## [1] "I"     "have"  "a"     "dream"

Concatenate means string together. In spreadsheets and most programming languages, you can concatenate strings with the & or + operators (e.g., “I” & “have” & “a” & “dream” in most languages returns ‘I have a dream’).

R is different. To concatentate character objects in R, you have to use paste().

paste("I", "have", "a", "dream")
## [1] "I have a dream"

You can customize the paste() function by specifying the separator character.

paste("I", "have", "a", "dream", sep="-")
## [1] "I-have-a-dream"

Note in the paste() expression above the strings are passed as separate arguments, not elements of a vector. See what happens when you pass a character vector.

paste(c("I", "have", "a", "dream"))
## [1] "I"     "have"  "a"     "dream"

If you want to concatenate the elements of a character vector, you can use the collapse argument.

paste(c("I", "have", "a", "dream"), collapse="--")
## [1] "I--have--a--dream"

paste() is a vectorized function, meaning if you pass it two or more character vectors, it will concantenate the corresponding elements.

x <- c("hot", "cold", "stale")
y <- c("soup", "sandwich", "donut")
paste(x, y)
## [1] "hot soup"      "cold sandwich" "stale donut"

Factors

Factors are a memory efficient way to save character vectors, particularly when there are duplicate values. Under the hood, R creates a lookup table of the unique values (called levels), and saves them as integers.

animals_vec <- c("dog", "mouse", "horse")
animals_factor <- as.factor(animals_vec)

You can usually work with factors the same way you would work with character vectors. One exception is when the character values are actually numeric strings.

nums_str <- c("34", "47", "99")
as.numeric(nums_str)
## [1] 34 47 99
nums_fact <- as.factor(nums_str)
as.numeric(nums_fact)
## [1] 1 2 3

When importing a CSV, R will often convert columns to factors without informing you. You can control this behavior with the stringsAsFactors argument. Even better, always check and if needed change the data types of columns after they’ve been brought into R.

flowers_df <- read.csv("flower_data.csv", stringsAsFactors = FALSE)

Other Text Manipulations

Other things you might want to do with text:

All of these are definitely possible!


More Info

Working with character data in base R can be klunky. For a richer set of functions, see the stringr package.



Next: Automation and Batch Processing