Cómo pegar valores de factores variables en R o Python según la fecha-Creación de interrupciones escolares

python-3.x for-loop (1)

Tengo el siguiente conjunto de datos ( Break_data ) recopilado del calendario escolar que comienza y finaliza los descansos:

print(Break_data) Start End Break Year 1 2016-02-24 2016-02-29 Spring_Break 2016 2 2016-03-23 2016-03-28 Easter_Recess 2016 3 2016-10-05 2016-10-10 Mid_Term_Break 2016 4 2017-03-01 2017-03-06 Spring_Break 2017 5 2017-04-12 2017-04-17 Easter_Recess 2017 6 2017-10-04 2017-10-09 Mid_Term_Break 2017 7 2018-02-28 2018-03-05 Spring_Break 2018 8 2018-03-28 2018-04-02 Easter_Recess 2018

Y este es un conjunto de datos muy grande.

head(df$date) [1] "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" tail(df$date) [1] "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12"

Siguiendo los pasos proporcionados en: https://stackoverflow.com/a/51052626/9341589

Quiero crear una ruptura de la variable del factor similar comparándola con un rango de conjunto de datos df (es decir, incluye muchas variables además de la fecha de 2016-02-05 a 2018-07-12 ) - el intervalo de muestreo es de 15 minutos (es decir, un día es de 96 filas).

En mi caso, además de estos valores que se muestran en la tabla, quiero que los valores que no pertenecen al Start y al End de estas fechas se consideren días sin Non_Break .

Siguiendo los pasos en el enlace mencionado anteriormente, esta es la versión modificada del código en R:

Break_data$Start <- ymd(Break_data$Start) Break_data$End <- ymd(Break_data$End) df$date <- ymd(df$date) LU <- Map(`:`, Break_data$Start, Break_data$End) LU <- data.frame(value = unlist(LU), index = rep(seq_along(LU), lapply(LU, length))) df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]]

Supongo que además de esto, tengo que proporcionar Non_Break en un for loop o simple if funciona durante el periodo de tiempo que no está dentro de los rangos de inicio y final.

Edición: he intentado de dos maneras diferentes:

PRIMERO- sin usar el mapeo

for (i in c(1:nrow(df))){ if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i]<-"Spring_Break" else if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i]<-"Easter_Recess" else if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i]<-"Mid_Term_Break" else if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i]<-"Spring_Break" else if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i]<-"Easter_Recess" else if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i]<-"Mid_Term_Break" else if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i]<-"Easter_Recess" else if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i]<-"Easter_Recess" else (df$Break[i]<-"Not_Break") }

El primero funciona para siempre :) y obtengo 2 valores: Not_Break y Spring_Break .

Y este es el mensaje de advertencia:

Warning messages: 1: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 2: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 3: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 4: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 5: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 6: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 7: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 8: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") : the condition has length > 1 and only the first element will be used 9: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 10: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 11: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 12: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 13: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 14: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 15: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 16: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") : the condition has length > 1 and only the first element will be used 17: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 18: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 19: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 20: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 21: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 22: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 23: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 24: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") : the condition has length > 1 and only the first element will be used 25: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 26: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 27: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 28: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 29: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 30: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 31: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 32: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") : the condition has length > 1 and only the first element will be used 33: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 34: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 35: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 36: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 37: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 38: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 39: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 40: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") : the condition has length > 1 and only the first element will be used 41: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 42: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 43: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 44: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 45: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 46: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 47: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 48: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") : the condition has length > 1 and only the first element will be used 49: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used 50: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... : the condition has length > 1 and only the first element will be used

SEGUNDO - añadiendo al código en el enlace:

LU <- Map(`:`, Break_data$Start, Break_data$End) LU <- data.frame(value = unlist(LU), index = rep(seq_along(LU), lapply(LU, length))) for (i in c(1:nrow(df))){ if (df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]]) else (df$date[i] >= "2016-02-05" & df$date <= "2018-07-12") df$Break[i]<-"Not_Break" }

En el segundo también me sale un error. Cualquier modificación al código o implementación (en R o Python) será apreciada

¿Hay alguna forma más eficiente de hacer esto?

Nota: los conjuntos de datos están disponibles públicamente en: https://github.com/tomiscat/data

library(lubridate) # data Break_data <- data.table::fread( " Start End Break Year 2016-02-24 2016-02-29 Spring_Break 2016 2016-03-23 2016-03-28 Easter_Recess 2016 2016-10-05 2016-10-10 Mid_Term_Break 2016 2017-03-01 2017-03-06 Spring_Break 2017 2017-04-12 2017-04-17 Easter_Recess 2017 2017-10-04 2017-10-09 Mid_Term_Break 2017 2018-02-28 2018-03-05 Spring_Break 2018 2018-03-28 2018-04-02 Easter_Recess 2018" ) df <- data.frame( date = c("2016-02-05","2016-02-05", "2016-02-05" ,"2016-02-05", "2016-02-05", "2016-02-05", "2016-02-26", "2016-10-07", "2018-03-30", "2018-07-12","2018-07-12", "2018-07-12", "2018-07-12", "2018-07-12" ,"2018-07-12") ) # mapping Break_data$Start <- ymd(Break_data$Start) Break_data$End <- ymd(Break_data$End) df$date <- ymd(df$date) LU <- Map(`:`, Break_data$Start, Break_data$End) LU <- data.frame(value = unlist(LU), index = rep(seq_along(LU), lapply(LU, length))) df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]] # if not mapped(df$Break ==NA), then set it to "Non_break" df$Break <- ifelse(is.na(df$Break), "Non_Break", df$Break) df$Break <- factor(df$Break) df #> date Break #> 1 2016-02-05 Non_Break #> 2 2016-02-05 Non_Break #> 3 2016-02-05 Non_Break #> 4 2016-02-05 Non_Break #> 5 2016-02-05 Non_Break #> 6 2016-02-05 Non_Break #> 7 2016-02-26 Spring_Break #> 8 2016-10-07 Mid_Term_Break #> 9 2018-03-30 Easter_Recess #> 10 2018-07-12 Non_Break #> 11 2018-07-12 Non_Break #> 12 2018-07-12 Non_Break #> 13 2018-07-12 Non_Break #> 14 2018-07-12 Non_Break #> 15 2018-07-12 Non_Break

Creado en 2018-08-19 por el paquete reprex (v0.2.0).

Edición: solución completa