Ch. 14: Strings
Key questions:
- 14.2.5. #3, 6
- 14.3.2.1. #2
- 14.3.3.1 #1, #2
writeLines: see raw contents of a string (prints each string in a vector on a new line)str_length: number of characters in a stringstr_c: combine two or more strings- use
collapsearg to make vector of strings to single string
- use
str_replace_na: printNAas “NA”str_sub:startandendargs to specify position to remove (or replace), can use negative numbers as well to represent from backstr_to_lower,str_to_upper,str_to_upper: for changing string caselocalearg (to handle slight differences in characters)
str_order,str_sort: more robust version oforderandsortwhich take allow alocaleargumentstr_view,str_view_all: shows how character and regular expression match\d: matches any digit.\s: matches any whitespace (e.g. space, tab, newline).[abc]: matches a, b, or c.[^abc]: matches anything except a, b, or c.{n}: exactly n{n,}: n or more{,m}: at most m{n,m}: between n and mstr_detect: returns logical vector ofTRUE/FALSEvaluesstr_subset: subset ofTRUEvalues fromstr_detectstr_count: number of matches in a stringstr_extract: extract actual text of a matchstr_extract_all: returns list with all matchessimplify = TRUEreturns a matrix
str_match: similar tostr_extractbut gives each individual component of match in a matrix, rather than a character vector (also have astr_match_all)tidyr::extract: likestr_matchbut name columns with matches which are moved into new columnsstr_replace,str_replace_all: replace matches with new stringsstr_splitsplit a string into pieces – default is individual words (returns list)simplify = TRUEagain will return a matrix
boundaryuse to specify level of split, e.g.str_view_all(x, boundary("word"))str_locate,str_locate_all: gives starting an dending positions of each matchregexuse in match to specify more options, e.g.str_view(bananas, regex("banana", ignore_case = TRUE))multiline = TRUEallows^and$to match start and end of each line (rather than of string)comments = TRUEallows you to add comments on a complex regular expressiondotall = TRUEallows.to match more than just letters e.g.\\n
fixed,collrelated alternatives toregexapropossearches all objects available from global environment (e.g. say you can’t remember function name)dir: lists all files in a directorypatternarg takes a regex
stringimore comprehensive package thanstringr(~5x as many funs)
14.2: String basics
Use wrteLines to show what string ‘This string has a \n new line’ looks like printed.
string_exp <- 'This string has a \n new line'
print(string_exp)
## [1] "This string has a \n new line"
writeLines(string_exp)
## This string has a
## new line
To see full list of specifal characters:
?'"'
Objects of length 0 are silently dropped. This is particularly useful in conjunction with if:
name <- "Bryan"
time_of_day <- "morning"
birthday <- FALSE
str_c(
"Good ", time_of_day, " ", name,
if (birthday) " and HAPPY BIRTHDAY",
"."
)
## [1] "Good morning Bryan."
Collapse vectors into single string
str_c(c("x", "y", "z"), c("a", "b", "c"), collapse = ", ")
## [1] "xa, yb, zc"
Can use assignment form of str_sub()
x <- c("Apple", "Banana", "Pear")
str_sub(x, 1, 1) <- str_to_lower(str_sub(x, 1, 1))
x
## [1] "apple" "banana" "pear"
str_pad looks interesting
str_pad("the dogs come for you.", width = 40, pad = ",", side = "both") #must specify width =, side = default is left
## [1] ",,,,,,,,,the dogs come for you.,,,,,,,,,"
14.2.5
In code that doesn’t use stringr, you’ll often see
paste()andpaste0(). What’s the difference between the two functions?paste0()has nosepargument and just appends any value provided like another string vector.- They differ from
str_c()in that they automatically convertNAvalues to character.
paste("a", "b", "c", c("x", "y"), sep = "-")## [1] "a-b-c-x" "a-b-c-y"paste0("a", "b", "c", c("x", "y"), sep = "-")## [1] "abcx-" "abcy-"What
stringrfunction are they equivalent to?paste()andpaste0()are similar tostr_c()though are different in how they handle NAs (see below). They also will return a warning when recycling vectors whose legth do not have a common factor.paste(c("a", "b", "x"), c("x", "y"), sep = "-")## [1] "a-x" "b-y" "x-x"str_c(c("a", "b", "x"), c("x", "y"), sep = "-")## Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE): ## longer object length is not a multiple of shorter object length## [1] "a-x" "b-y" "x-x"How do the functions differ in their handling of
NA?paste(c("a", "b"), c(NA, "y"), sep = "-")## [1] "a-NA" "b-y"str_c(c("a", "b"), c(NA, "y"), sep = "-")## [1] NA "b-y"In your own words, describe the difference between the
sepandcollapsearguments tostr_c().sepputs characters between items within a vector, collapse puts a character between vectors being collapsedUse
str_length()andstr_sub()to extract the middle character from a string.x <- "world" str_sub(x, start = ceiling(str_length(x) / 2), end = ceiling(str_length(x) / 2))## [1] "r"What will you do if the string has an even number of characters?
In this circumstance the above solution would take the anterior middle value, below is a solution that would return both middle values.
x <- "worlds" str_sub(x, ceiling(str_length(x) / 2 + 1), start = ceiling(str_length(x) / 2 + 1))## [1] "l"str_sub(x, start = ifelse(str_length(x) %% 2 == 0, floor(str_length(x) / 2), ceiling(str_length(x) / 2 )), end = floor(str_length(x) / 2) + 1)## [1] "rl"What does
str_wrap()do? When might you want to use it?- Use
indentfor first line,exdentfor others
- could use
str_wrap()for editing of documents etc., settingwidth = 1will give each word its own line
str_wrap("Tonight, we dine in Hell.", width = 10, indent = 0, exdent = 3) %>% writeLines()## Tonight, ## we dine in ## Hell.- Use
What does
str_trim()do? What’s the opposite ofstr_trim()? Removes whitespace from beginning and end of character,sideargument specifies which sidestr_trim(" so much white space ", side = "right") # (default is 'both')## [1] " so much white space"Write a function that turns (e.g.) a vector
c("a", "b", "c")into the stringa, b, and c. Think carefully about what it should do if given a vector of length 0, 1, or 2.vec_to_string <- function(x) { #If 1 or 0 length vector if (length(x) < 2) return(x) comma <- ifelse(length(x) > 2, ", ", " ") b <- str_c(x, collapse = comma) #replace ',' with 'and' in last str_sub(b,-(str_length(x)[length(x)] + 1), -(str_length(x)[length(x)] + 1)) <- " and " return(b) } x <- c("a", "b", "c", "d") vec_to_string(x)## [1] "a, b, c, and d"
14.3: Matching patterns w/ regex
x <- c("apple", "banana", "pear")
str_view(x, "an")
To match a literal \ need \\\\ because both string and regex will escape it.
x <- "a\\b"
writeLines(x)
## a\b
str_view(x,"\\\\")
Using \b to set boundary between words (not used often)
apropos("\\bsum\\b")
## [1] "contr.sum" "sum"
apropos("^(sum)$")
## [1] "sum"
Other special characters:
\d: matches any digit.\s: matches any whitespace (e.g. space, tab, newline).[abc]: matches a, b, or c.[^abc]: matches anything except a, b, or c.
Controlling number of times:
?: 0 or 1+: 1 or more*: 0 or more{n}: exactly n{n,}: n or more{,m}: at most m{n,m}: between n and m
By default these matches are “greedy”: they will match the longest string possible. You can make them “lazy”, matching the shortest string possible by putting a ? after them. This is an advanced feature of regular expressions, but it’s useful to know that it exists:
x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII"
str_view(x, 'C{2,3}')
str_view(x, 'C{2,3}?')
14.3.1.1
Explain why each of these strings don’t match a
\:"\","\\","\\\"."\"-> leaves open quote string because escapes quote"\\", -> escapes second\so left with blank"\\\"-> third\escapes quote so left with open quote as wellHow would you match the sequence
"'\?x <- "alfred\"'\\goes" writeLines(x)## alfred"'\goesstr_view(x, "\\\"'\\\\")What patterns will the regular expression
\..\..\..match?Would match 6 character string of following form “(dot)(anychar)(dot)(anychar)(dot)(anychar)”
x <- c("alf.r.e.dd.ss..lsdf.d.kj") str_view(x, pattern = "\\..\\..\\..")How would you represent it as a string?
x_pattern <- "\\..\\..\\.." writeLines(x_pattern)## \..\..\..
14.3.2.1
How would you match the literal string
"$^$"?x <- "so it goes $^$ here" str_view(x, "\\$\\^\\$")Given the corpus of common words in
stringr::words, create regular expressions that find all words that:- Start with “y”.
str_view(stringr::words, "^y", match = TRUE)- End with “x”
str_view(stringr::words, "x$", match = TRUE)- Are exactly three letters long. (Don’t cheat by using
str_length()!)
str_view(stringr::words, "^...$", match = TRUE)- Have seven letters or more.
str_view(stringr::words, ".......", match = TRUE)Since this list is long, you might want to use the
matchargument tostr_view()to show only the matching or non-matching words.
14.3.3.1
Create regular expressions to find all words that:
- Start with a vowel.
str_view(stringr::words, "^[aeiou]", match = TRUE)- That only contain consonants. (Hint: thinking about matching “not”-vowels.)
str_view(stringr::words, "^[^aeiou]*[^aeiouy]$", match = TRUE)- End with
ed, but not witheed.
str_view(stringr::words, "[^e]ed$", match = TRUE)- End with
ingorise.
str_view(stringr::words, "(ing|ise)$", match = TRUE)Empirically verify the rule “i before e except after c”.
str_view(stringr::words, "(^(ei))|cie|[^c]ei", match = TRUE)Is “q” always followed by a “u”?
str_view(stringr::words, "q[^u]", match = TRUE)of the words in list, yes.
Write a regular expression that matches a word if it’s probably written in British English, not American English.
str_view(stringr::words, "(l|b)our|parat", match = TRUE)Create a regular expression that will match telephone numbers as commonly written in your country.
x <- c("dkl kls. klk. _", "(425) 591-6020", "her number is (581) 434-3242", "442", " dsi") str_view(x, "\\(\\d\\d\\d\\)\\s\\d\\d\\d-\\d\\d\\d\\d")Aboves not a good way to solve this, will see better methods in next section.
14.3.4.1
Describe the equivalents of
?,+,*in{m,n}form.?:{0,1}+:{1, }*:{0, }Describe in words what these regular expressions match: (read carefully to see if I’m using a regular expression or a string that defines a regular expression.)
^.*$: starts with anything, and ends with anything–matches whole thing
str_view(x, "^.*$")"\\{.+\\}": match text in brackets greater than nothing
x <- c("test", "some in {brackets}", "just {} no match") str_view(x, "\\{.+\\}")\d{4}-\d{2}-\d{2}: 4 numbers - 2 numbers - 2 numbers
x <- c("4444-22-22", "test", "333-4444-22") str_view(x, "\\d{4}-\\d{2}-\\d{2}")"\\\\{4}": 4 brackets
x <- c("\\\\\\\\", "\\\\\\", "\\\\", "\\") writeLines(x)## \\\\ ## \\\ ## \\ ## \str_view(x, "\\\\{4}")x <- c("\\\\\\\\", "\\\\\\", "\\\\", "\\") str_view(x, "\\\\\\\\")Create regular expressions to find all words that:
- find all words that start with three consonants
str_view(stringr::words, "^[^aeoiouy]{3}", match = TRUE)- Include
ybecause when it shows up otherwise, is in vowel form.
- have three or more vowels in a row
str_view(stringr::words, "[aeiou]{3}", match = TRUE)In this case, do not include the
y.- have 2 or more vowel-consonant pairs in a row
str_view(stringr::words, "([aeiou][^aeiou]){2,}", match = TRUE)Solve the beginner regexp crosswords at https://regexcrossword.com/challenges/beginner.
14.3.5.1
Describe, in words, what these expressions will match:
- I change questions 1 and 3 to what I think they were meant to be written as
(.)\\1\\1and(.)\\1respectively.
(.)\\1\\1: repeat the char in the first group, and then repeat that char again"(.)(.)\\2\\1": 1st char, 2nd char followed by 2nd char, first char(..)\\1: 2 chars repeated twice"(.).\\1.\\1": chars shows-up 3 times with one character between each"(.)(.)(.).*\\3\\2\\1": 3 chars in one order with * chars between, then 3 chars with 3 letters in the reverse order of what it started
x <- c("steefddff", "ssdfsdfffsdasdlkd", "DLKKJIOWdkl", "klnlsd", "t11", "(.)\1\1") str_view_all(x, "(.)\\1\\1", match = TRUE) #xxxstr_view_all(fruit, "(.)(.)\\2\\1", match = TRUE) #xyyxstr_view_all(fruit, "(..)\\1", match = TRUE) #xxyystr_view(stringr::words, "(.).\\1.\\1", match = TRUE) #x.x.xstr_view(stringr::words, "(.)(.)(.).*\\3\\2\\1", match = TRUE) #xyz.*zyx- I change questions 1 and 3 to what I think they were meant to be written as
Construct regular expressions to match words that:
- Start and end with the same character.
str_view(stringr::words, "^(.).*\\1$", match = TRUE)- Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
str_view(stringr::words, "(..).*\\1", match = TRUE)- Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
str_view(stringr::words, "(.).+\\1.+\\1", match = TRUE)
14.4 Tools
noun <- "(a|the) ([^ \\.]+)"
has_noun <- sentences %>%
str_subset(noun) %>%
head(10)
has_noun %>%
str_extract_all(noun, simplify = TRUE)
#creates split into seperate pieces
has_noun %>%
str_match_all(noun)
#Can make dataframe with, but need to name all
tibble(has_noun = has_noun) %>%
extract(has_noun, into = c("article", "noun"), regex = noun)
- When using
boundary()withstr_splitcan set to “character”, “line”, “sentence”, and “word” and gives alternative to splitting by pattern.
14.4.2
For each of the following challenges, try solving it by using both a single regular expression, and a combination of multiple
str_detect()calls.- Find all words that start or end with
x.
str_subset(words, "^x|x$")## [1] "box" "sex" "six" "tax"- Find all words that start with a vowel and end with a consonant.
str_subset(words, "^[aeiou].*[^aeiouy]$")## [1] "about" "accept" "account" "across" "act" ## [6] "actual" "add" "address" "admit" "affect" ## [11] "afford" "after" "afternoon" "again" "against" ## [16] "agent" "air" "all" "allow" "almost" ## [21] "along" "alright" "although" "always" "amount" ## [26] "and" "another" "answer" "apart" "apparent" ## [31] "appear" "appoint" "approach" "arm" "around" ## [36] "art" "as" "ask" "at" "attend" ## [41] "awful" "each" "east" "eat" "effect" ## [46] "egg" "eight" "either" "elect" "electric" ## [51] "eleven" "end" "english" "enough" "enter" ## [56] "environment" "equal" "especial" "even" "evening" ## [61] "ever" "exact" "except" "exist" "expect" ## [66] "explain" "express" "if" "important" "in" ## [71] "indeed" "individual" "inform" "instead" "interest" ## [76] "invest" "it" "item" "obvious" "occasion" ## [81] "odd" "of" "off" "offer" "often" ## [86] "old" "on" "open" "or" "order" ## [91] "original" "other" "ought" "out" "over" ## [96] "own" "under" "understand" "union" "unit" ## [101] "unless" "until" "up" "upon" "usual"Counted
yas a vowel if ending with, but not to start. This does not work perfect. For example words likeygrittewould still be included even thoughyis activng as a vowel there whereas words likeboywould be excluded even though acting as a consonant there. From here on out I am going to always excludey.- Are there any words that contain at least one of each different vowel?
vowels <- c("a","e","i","o","u") words[str_detect(words, "a") & str_detect(words, "e") & str_detect(words, "i") & str_detect(words, "o") & str_detect(words, "u")]## character(0)No.
- Find all words that start or end with
What word has the highest number of vowels? What word has the highest proportion of vowels? (Hint: what is the denominator?)
vowel_counts <- tibble(words = words, n_string = str_length(words), n_vowel = str_count(words, vowels), prop_vowel = n_vowel / n_string)‘Experience’ has the most vowels
vowel_counts %>% arrange(desc(n_vowel))## # A tibble: 980 x 4 ## words n_string n_vowel prop_vowel ## <chr> <int> <int> <dbl> ## 1 experience 10 4 0.4 ## 2 individual 10 3 0.3 ## 3 achieve 7 2 0.286 ## 4 actual 6 2 0.333 ## 5 afternoon 9 2 0.222 ## 6 against 7 2 0.286 ## 7 already 7 2 0.286 ## 8 america 7 2 0.286 ## 9 benefit 7 2 0.286 ## 10 choose 6 2 0.333 ## # ... with 970 more rows‘a’ has the highest proportion
vowel_counts %>% arrange(desc(prop_vowel))## # A tibble: 980 x 4 ## words n_string n_vowel prop_vowel ## <chr> <int> <int> <dbl> ## 1 a 1 1 1 ## 2 too 3 2 0.667 ## 3 wee 3 2 0.667 ## 4 feed 4 2 0.5 ## 5 in 2 1 0.5 ## 6 look 4 2 0.5 ## 7 need 4 2 0.5 ## 8 room 4 2 0.5 ## 9 so 2 1 0.5 ## 10 soon 4 2 0.5 ## # ... with 970 more rows
14.4.3.1
In the previous example, you might have noticed that the regular expression matched “flickered”, which is not a colour. Modify the regex to fix the problem.
Add space in front of colors:
colours <- c("red", "orange", "yellow", "green", "blue", "purple") %>% paste0(" ", .) colour_match <- str_c(colours, collapse = "|") more <- sentences[str_count(sentences, colour_match) > 1] str_view_all(more, colour_match)From the Harvard sentences data, extract:
- The first word from each sentence.
str_extract(sentences, "[A-z]*")- All words ending in
ing.
#ends in "ing" or "ing." sent_ing <- str_subset(sentences, ".*ing(\\.|\\s)") str_extract_all(sent_ing, "[A-z]+ing", simplify=TRUE)- All plurals.
str_subset(sentences, "[A-z]*s(\\.|\\s)") %>% #take all sentences that have a word ending in s str_extract_all("[A-z]*s\\b", simplify = TRUE) %>% .[str_length(.) > 3] %>% #get rid of the short words str_subset(".*[^s]s$") %>% #get rid of words ending in 'ss' str_subset(".*[^i]s$") #get rid of 'this'
14.4.4.1
Find all words that come after a “number” like “one”, “two”, “three” etc. Pull out both the number and the word.
#Create regex expression nums <- c("one", "two", "three", "four", "five", "six", "seven", "eight", "nine") nums_c <- str_c(nums, collapse = "|") # see stringr cheatsheet: "(?<![:alpha:])" means not preceded by re <- str_c("(", "(?<![:alpha:])", "(", nums_c, "))", " ", "([^ \\.]+)", sep = "") sentences %>% str_subset(regex(re, ignore_case = TRUE)) %>% str_extract_all(regex(re, ignore_case = TRUE)) %>% unlist() %>% tibble::enframe(name = NULL) %>% separate(col = "value", into = c("num", "following"), remove = FALSE)## # A tibble: 30 x 3 ## value num following ## <chr> <chr> <chr> ## 1 Four hours Four hours ## 2 Two blue Two blue ## 3 seven books seven books ## 4 two met two met ## 5 two factors two factors ## 6 three lists three lists ## 7 Two plus Two plus ## 8 seven is seven is ## 9 two when two when ## 10 Eight miles Eight miles ## # ... with 20 more rows- I’d initially appended
"\\b"in front of each number to prevent things like “someone” being captured – however this didn’t work with cases where a sentence started with a number – hence switched to using the not preceded by method in the stringr cheatsheet.
- I’d initially appended
Find all contractions. Separate out the pieces before and after the apostrophe.
#note the () facilitate the split with functions contr <- "([^ \\.]+)'([^ \\.]*)" sentences %>% #note the improvement this word definition is to the above [^ ]+ str_subset(contr) %>% str_match_all(contr)## [[1]] ## [,1] [,2] [,3] ## [1,] "It's" "It" "s" ## ## [[2]] ## [,1] [,2] [,3] ## [1,] "man's" "man" "s" ## ## [[3]] ## [,1] [,2] [,3] ## [1,] "don't" "don" "t" ## ## [[4]] ## [,1] [,2] [,3] ## [1,] "store's" "store" "s" ## ## [[5]] ## [,1] [,2] [,3] ## [1,] "workmen's" "workmen" "s" ## ## [[6]] ## [,1] [,2] [,3] ## [1,] "Let's" "Let" "s" ## ## [[7]] ## [,1] [,2] [,3] ## [1,] "sun's" "sun" "s" ## ## [[8]] ## [,1] [,2] [,3] ## [1,] "child's" "child" "s" ## ## [[9]] ## [,1] [,2] [,3] ## [1,] "king's" "king" "s" ## ## [[10]] ## [,1] [,2] [,3] ## [1,] "It's" "It" "s" ## ## [[11]] ## [,1] [,2] [,3] ## [1,] "don't" "don" "t" ## ## [[12]] ## [,1] [,2] [,3] ## [1,] "queen's" "queen" "s" ## ## [[13]] ## [,1] [,2] [,3] ## [1,] "don't" "don" "t" ## ## [[14]] ## [,1] [,2] [,3] ## [1,] "pirate's" "pirate" "s" ## ## [[15]] ## [,1] [,2] [,3] ## [1,] "neighbor's" "neighbor" "s"
14.4.5.1
Replace all forward slashes in a string with backslashes.
x <- c("test/dklsk/") str_replace_all(x, "/", "\\\\") %>% writeLines()## test\dklsk\Implement a simple version of
str_to_lower()usingreplace_all().x <- c("BIdklsKOS") str_replace_all(x, "([A-Z])", tolower)## [1] "bidklskos"Switch the first and last letters in
words. Which of those strings are still words?str_replace(words, "(^.)(.*)(.$)", "\\3\\2\\1")Any words that start and end with the same letter, e.g. ‘treat’, as well as a few other examples like, war –> raw .
14.4.6.1
Split up a string like
"apples, pears, and bananas"into individual components.x <- "apples, pears, and bananas" str_split(x, ",* ") #note that regular expression works to handle commas as well## [[1]] ## [1] "apples" "pears" "and" "bananas"Why is it better to split up by
boundary("word")than" "?Handles commas and punctuation32.
str_split(x, boundary("word"))## [[1]] ## [1] "apples" "pears" "and" "bananas"What does splitting with an empty string (
"") do? Experiment, and then read the documentation. Splitting by an empty string splits up each character.str_split(x,"")## [[1]] ## [1] "a" "p" "p" "l" "e" "s" "," " " "p" "e" "a" "r" "s" "," " " "a" "n" ## [18] "d" " " "b" "a" "n" "a" "n" "a" "s"- splits each character into an individual element (and creates elements for spaces between strings)
14.5: Other types of patterns
regex args to know:
ignore_case = TRUEallows characters to match either their uppercase or lowercase forms. This always uses the current locale.multiline = TRUEallows^and$to match the start and end of each line rather than the start and end of the complete string.comments = TRUEallows you to use comments and white space to make complex regular expressions more understandable. Spaces are ignored, as is everything after#. To match a literal space, you’ll need to escape it:"\\ ".dotall = TRUEallows.to match everything, including\n.
Alternatives to regex():
* fixed(): matches exactly the specified sequence of bytes. It ignores
all special regular expressions and operates at a very low level.
This allows you to avoid complex escaping and can be much faster than
regular expressions.
* coll(): compare strings using standard collation rules. This is
useful for doing case insensitive matching. Note that coll() takes a
locale parameter that controls which rules are used for comparing
characters.
14.5.1
How would you find all strings containing
\withregex()vs. withfixed()? would be\\instead of\\\\str_view_all("so \\ the party is on\\ right?", fixed("\\"))What are the five most common words in
sentences?str_extract_all(sentences, boundary("word"), simplify = TRUE) %>% as_tibble() %>% gather(V1:V12, value = "words", key = "order") %>% mutate(words = str_to_lower(words)) %>% filter(!words == "") %>% count(words, sort = TRUE) %>% head(5)## Warning: `as_tibble.matrix()` requires a matrix with column names or a `.name_repair` argument. Using compatibility `.name_repair`. ## This warning is displayed once per session.## # A tibble: 5 x 2 ## words n ## <chr> <int> ## 1 the 751 ## 2 a 202 ## 3 of 132 ## 4 to 123 ## 5 and 118
14.7: stringi
Other functions:
apropossearches all objects available from the global environment–useful if you can’t remember fun name.
Check those that start with replace:
apropos("^(replace)")
## [1] "replace" "replace_na"
Check those that start with str, but not stri
apropos("^(str)[^i]")
## [1] "str_c" "str_conv" "str_count"
## [4] "str_detect" "str_dup" "str_extract"
## [7] "str_extract_all" "str_flatten" "str_glue"
## [10] "str_glue_data" "str_interp" "str_length"
## [13] "str_locate" "str_locate_all" "str_match"
## [16] "str_match_all" "str_order" "str_pad"
## [19] "str_remove" "str_remove_all" "str_replace"
## [22] "str_replace_all" "str_replace_na" "str_sort"
## [25] "str_split" "str_split_fixed" "str_squish"
## [28] "str_sub" "str_sub<-" "str_subset"
## [31] "str_to_lower" "str_to_title" "str_to_upper"
## [34] "str_trim" "str_trunc" "str_view"
## [37] "str_view_all" "str_which" "str_wrap"
## [40] "strcapture" "strftime" "strheight"
## [43] "strOptions" "strptime" "strrep"
## [46] "strsplit" "strtoi" "strtrim"
## [49] "StructTS" "structure" "strwidth"
## [52] "strwrap"
14.7.1
Find the stringi functions that:
- Count the number of words. –
stri_count - Find duplicated strings. –
stri_duplicated - Generate random text. –
str_rand_strings
- Count the number of words. –
How do you control the language that
stri_sort()uses for sorting?The
decreasingargument
Appendix
14.4.2.3
One way of doing this using iteration methods:
vowels <- c("a","e","i","o","u")
tibble(vowels = vowels, words = list(words)) %>%
mutate(detect_vowels = purrr::map2(words, vowels, str_detect)) %>%
spread(key = vowels, value = detect_vowels) %>%
unnest() %>%
mutate(unique_vowels = rowSums(.[2:6])) %>%
arrange(desc(unique_vowels))
## # A tibble: 980 x 7
## words a e i o u unique_vowels
## <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl>
## 1 absolute TRUE TRUE FALSE TRUE TRUE 4
## 2 appropriate TRUE TRUE TRUE TRUE FALSE 4
## 3 associate TRUE TRUE TRUE TRUE FALSE 4
## 4 authority TRUE FALSE TRUE TRUE TRUE 4
## 5 colleague TRUE TRUE FALSE TRUE TRUE 4
## 6 continue FALSE TRUE TRUE TRUE TRUE 4
## 7 encourage TRUE TRUE FALSE TRUE TRUE 4
## 8 introduce FALSE TRUE TRUE TRUE TRUE 4
## 9 organize TRUE TRUE TRUE TRUE FALSE 4
## 10 previous FALSE TRUE TRUE TRUE TRUE 4
## # ... with 970 more rows
#seems that nothing gets over 4
I still sometimes prefer to use patterns where possible over
boundaryfunction. Regex is more generally applicabale as well outside of R.↩