cheatsheet - mean by group in r

¿Por qué mi dplyr group_by & summaryize no funciona correctamente?(nombre-colisión con plyr) (4)

Creo que ha cargado plyr después de dplyr , por lo que obtiene un resumen general en lugar de un resumen agrupado.

Esto es lo que sucede con plyr cargado en último lugar.

library(dplyr) library(plyr) df %>% group_by(DRUG,FED) %>% summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE)) mean low high min max sd 1 150 105 195 100 200 50

Ahora elimine plyr e intente nuevamente y obtendrá el resumen agrupado.

detach(package:plyr) df %>% group_by(DRUG,FED) %>% summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE)) Source: local data frame [4 x 8] Groups: DRUG DRUG FED mean low high min max sd 1 0 0 150 150 150 150 150 NaN 2 0 1 NaN NA NA NA NA NaN 3 1 0 100 100 100 100 100 NaN 4 1 1 200 200 200 200 200 NaN

Tengo un marco de datos que se ve así:

#df ID DRUG FED AUC0t Tmax Cmax 1 1 0 100 5 20 2 1 1 200 6 25 3 0 1 NA 2 30 4 0 0 150 6 65

Ans, etc. Quiero resumir algunas estadísticas sobre AUC, Tmax y Cmax por drogas DRUG y FED ESTADO FED . Yo uso dplyr. Por ejemplo: para las AUC:

CI90lo <- function(x) quantile(x, probs=0.05, na.rm=TRUE) CI90hi <- function(x) quantile(x, probs=0.95, na.rm=TRUE) summary <- df %>% group_by(DRUG,FED) %>% summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE))

Sin embargo, la salida no está agrupada por DROGA y FED. Proporciona solo una línea que contiene las estadísticas de todos los que no están facetados en DROGAS y FED.

¿Alguna idea de por qué? ¿Y cómo puedo hacer que haga lo correcto?

O podría considerar usar data.table

library(data.table) setDT(df) # set the data frame as data table df[, list(mean = mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high = CI90hi(AUC0t), min = as.double(min(AUC0t, na.rm=TRUE)), max = as.double(max(AUC0t, na.rm=TRUE)), sd = sd(AUC0t, na.rm=TRUE)), by=list(DRUG, FED)] # DRUG FED mean low high min max sd # 1: 1 0 100 100 100 100 100 NA # 2: 1 1 200 200 200 200 200 NA # 3: 0 1 NaN NA NA Inf -Inf NA # 4: 0 0 150 150 150 150 150 NA # Warning messages: # 1: In min(AUC0t, na.rm = TRUE) : # no non-missing arguments to min; returning Inf # 2: In max(AUC0t, na.rm = TRUE) : # no non-missing arguments to max; returning -Inf

Pruebe sqldf es la mejor manera y fácil de aprender para agrupar los datos. A continuación se muestra un ejemplo de su necesidad. Todos los tipos de datos de agrupación de la biblioteca sqldf son muy útiles.

install.packages("sqldf") library(sqldf) dat1 <- sqldf("select x,y, y/sum(y) as Z from dat group by x")

Una variante de la respuesta de aosmith que podría ayudar a algunas personas. Directo R para llamar a las funciones de dplyr directamente. Buen truco cuando un paquete interfiere con otro.

df %>% dplyr::group_by(DRUG,FED) %>% dplyr::summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE))