合并2个数据帧与R中相同但不同的案例列
问题描述:
我有两个数据框,但问题是合并“by”列在不同情况下具有值。合并2个数据帧与R中相同但不同的案例列
sn1capx1e0001 vs SN1CAPX1E0001。
authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))
books <- data.frame(
name = I(c("tukey", "venables", "tierney",
"tipley", "ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
"Venables & Smith"))
m1 <- merge(authors, books, by.x = "surname", by.y = "name")
给
姓死者国籍标题other.author
麦克尼尔澳大利亚没有交互式数据分析NA
所以我想是不区分大小写合并它们。我无法使用合并或加入。
我看到我们可以使用正则表达式来使用循环来匹配值。
答
为什么不将它们转换为相同的形式?
library(stringr)
authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))
books <- data.frame(
name = I(c("tukey", "venables", "tierney",
"tipley", "ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
"Venables & Smith"))
authors$surname <- str_to_title(authors$surname)
books$name <- str_to_title(books$name)
m1 <- merge(authors, books, by.x = "surname", by.y = "name")
给
surname nationality deceased title other.author
1 Mcneil Australia no Interactive Data Analysis <NA>
2 Ripley UK no Stochastic Simulation <NA>
3 Tierney US no LISP-STAT <NA>
4 Tukey US yes Exploratory Data Analysis <NA>
5 Venables Australia no Modern Applied Statistics ... Ripley
答
我发现这很简单
秘密都使用 “TOUPPER()”
books$name<-toupper(books$name)
简单....