superponer - Reemplazar valores en un marco de datos basado en la tabla de búsqueda

superponer graficas en r (6)

Tengo problemas para reemplazar los valores en un marco de datos. Me gustaría reemplazar los valores basados en una tabla separada. A continuación se muestra un ejemplo de lo que estoy tratando de hacer.

Tengo una mesa donde cada fila es un cliente y cada columna es un animal que compraron. Llamemos a esta table marco de datos.

> table # P1 P2 P3 # 1 cat lizard parrot # 2 lizard parrot cat # 3 parrot cat lizard

También tengo una tabla a la que haré referencia llamada lookUp .

> lookUp # pet class # 1 cat mammal # 2 lizard reptile # 3 parrot bird

Lo que quiero hacer es crear una nueva tabla llamada new con una función que reemplace todos los valores de la table con la columna de class en lookUp . lapply esto yo mismo usando una función lapply , pero recibí las siguientes advertencias.

new <- as.data.frame(lapply(table, function(x) { gsub(''.*'', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE) Warning messages: 1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument ''replacement'' has length > 1 and only the first element will be used 2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument ''replacement'' has length > 1 and only the first element will be used 3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument ''replacement'' has length > 1 and only the first element will be used

¿Alguna idea de cómo hacer que esto funcione?

Cada vez que tenga dos data.frame s separados e intente llevar información de uno a otro, la respuesta es fusionarse .

Todos tienen su propio método de fusión favorito en R. El mío es data.table .

Además, dado que desea hacer esto en muchas columnas, será más rápido melt y dcast , en lugar de hacer un bucle sobre las columnas, aplíquelo una vez a una tabla reformada, luego vuelva a formar.

library(data.table) #the row names will be our ID variable for melting setDT(table, keep.rownames = TRUE) setDT(lookUp) #now melt, merge, recast # melting (reshape wide to long) table[ , melt(.SD, id.vars = ''rn'') # merging ][lookup, new_value := i.class, on = c(value = ''pet'') #reform back to original shape ][ , dcast(.SD, rn ~ variable, value.var = ''new_value'')] # rn P1 P2 P3 # 1: 1 mammal reptile bird # 2: 2 reptile bird mammal # 3: 3 bird mammal reptile

En caso de que el dcast / melt un poco intimidante, aquí hay un enfoque que simplemente recorre las columnas; dcast / melt es simplemente esquivar el bucle para este problema.

setDT(table) #don''t need row names this time setDT(lookUp) sapply(names(table), #(or to whichever are the relevant columns) function(cc) table[lookUp, (cc) := #merge, replace #need to pass a _named_ vector to ''on'', so use setNames i.class, on = setNames("pet", cc)])

Haga un vector con nombre y recorra cada columna y coincidencia, vea:

# make lookup vector with names lookUp1 <- setNames(as.character(lookUp$class), lookUp$pet) lookUp1 # cat lizard parrot # "mammal" "reptile" "bird" # match on names get values from lookup vector res <- data.frame(lapply(df1, function(i) lookUp1[i])) # reset rownames rownames(res) <- NULL # res # P1 P2 P3 # 1 mammal reptile bird # 2 reptile bird mammal # 3 bird mammal reptile

datos

df1 <- read.table(text = " P1 P2 P3 1 cat lizard parrot 2 lizard parrot cat 3 parrot cat lizard", header = TRUE) lookUp <- read.table(text = " pet class 1 cat mammal 2 lizard reptile 3 parrot bird", header = TRUE)

Intenté otros enfoques y tomaron mucho tiempo con mi gran conjunto de datos. Usé lo siguiente en su lugar:

# make table "new" using ifelse. See data below to avoid re-typing it new <- ifelse(table1 =="cat", "mammal", ifelse(table1 == "lizard", "reptile", ifelse(table1 =="parrot", "bird", NA)))

Este método requiere que escriba más texto para su código, pero la vectorización de ifelse hace que se ejecute más rápido. Debe decidir, en función de sus datos, si desea pasar más tiempo escribiendo código o esperando que su computadora se ejecute. Si desea asegurarse de que funcionó (no tenía ningún iflese tipográfico en sus comandos iflese ), puede usar apply(new, 2, function(x) mean(is.na(x))) .

datos

# create the data table table1 <- read.table(text = " P1 P2 P3 1 cat lizard parrot 2 lizard parrot cat 3 parrot cat lizard", header = TRUE)

La respuesta above muestra cómo hacer esto en dplyr no responde la pregunta, la tabla está llena de NA. Esto funcionó, agradecería cualquier comentario que muestre una mejor manera:

# Add a customer column so that we can put things back in the right order table$customer = seq(nrow(table)) classTable <- table %>% # put in long format, naming column filled with P1, P2, P3 "petCount" gather(key="petCount", value="pet", -customer) %>% # add a new column based on the pet''s class in data frame "lookup" left_join(lookup, by="pet") %>% # since you wanted to replace the values in "table" with their # "class", remove the pet column select(-pet) %>% # put data back into wide format spread(key="petCount", value="class")

Tenga en cuenta que probablemente sería útil mantener la tabla larga que contiene el cliente, la mascota, la especie de la mascota (?) Y su clase. Este ejemplo simplemente agrega un guardado intermedio a una variable:

table$customer = seq(nrow(table)) petClasses <- table %>% gather(key="petCount", value="pet", -customer) %>% left_join(lookup, by="pet") custPetClasses <- petClasses %>% select(-pet) %>% spread(key="petCount", value="class")

Otra opción es una combinación de tidyr y dplyr

library(dplyr) library(tidyr) table %>% gather(key = "pet") %>% left_join(lookup, by = "pet") %>% spread(key = pet, value = class)

Publicó un enfoque en su pregunta que no estuvo mal. Aquí hay un enfoque smiliar:

new <- df # create a copy of df # using lapply, loop over columns and match values to the look up table. store in "new". new[] <- lapply(df, function(x) look$class[match(x, look$pet)])

Un enfoque alternativo que será más rápido es:

new <- df new[] <- look$class[match(unlist(df), look$pet)]

Tenga en cuenta que uso corchetes vacíos ( [] ) en ambos casos para mantener la estructura de lo new como era (un data.frame).

(Estoy usando df lugar de table y look lugar de lookup en mi respuesta)