Cómo pegar valores de factores variables en R o Python según la fecha-Creación de interrupciones escolares
python-3.x for-loop (1)
Tengo el siguiente conjunto de datos (
Break_data
) recopilado del calendario escolar que comienza y finaliza los descansos:
print(Break_data)
Start End Break Year
1 2016-02-24 2016-02-29 Spring_Break 2016
2 2016-03-23 2016-03-28 Easter_Recess 2016
3 2016-10-05 2016-10-10 Mid_Term_Break 2016
4 2017-03-01 2017-03-06 Spring_Break 2017
5 2017-04-12 2017-04-17 Easter_Recess 2017
6 2017-10-04 2017-10-09 Mid_Term_Break 2017
7 2018-02-28 2018-03-05 Spring_Break 2018
8 2018-03-28 2018-04-02 Easter_Recess 2018
Y este es un conjunto de datos muy grande.
head(df$date)
[1] "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05"
tail(df$date)
[1] "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12"
Siguiendo los pasos proporcionados en: https://stackoverflow.com/a/51052626/9341589
Quiero crear una
ruptura de la
variable del factor similar comparándola con un rango de conjunto de datos
df
(es decir, incluye muchas variables además de la
fecha
de
2016-02-05
a
2018-07-12
) - el intervalo de muestreo es de 15 minutos (es decir, un día es de 96 filas).
En mi caso, además de estos valores que se muestran en la tabla, quiero que los valores que no pertenecen al
Start
y al
End
de estas fechas se consideren días sin
Non_Break
.
Siguiendo los pasos en el enlace mencionado anteriormente, esta es la versión modificada del código en R:
Break_data$Start <- ymd(Break_data$Start)
Break_data$End <- ymd(Break_data$End)
df$date <- ymd(df$date)
LU <- Map(`:`, Break_data$Start, Break_data$End)
LU <- data.frame(value = unlist(LU),
index = rep(seq_along(LU), lapply(LU, length)))
df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]]
Supongo que además de esto, tengo que proporcionar
Non_Break
en un
for loop
o simple
if
funciona durante el periodo de tiempo que no está dentro de los rangos de inicio y final.
Edición: he intentado de dos maneras diferentes:
PRIMERO- sin usar el mapeo
for (i in c(1:nrow(df))){
if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29")
df$Break[i]<-"Spring_Break"
else if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28")
df$Break[i]<-"Easter_Recess"
else if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10")
df$Break[i]<-"Mid_Term_Break"
else if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06")
df$Break[i]<-"Spring_Break"
else if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17")
df$Break[i]<-"Easter_Recess"
else if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09")
df$Break[i]<-"Mid_Term_Break"
else if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05")
df$Break[i]<-"Easter_Recess"
else if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02")
df$Break[i]<-"Easter_Recess"
else (df$Break[i]<-"Not_Break")
}
El primero funciona para siempre :) y obtengo 2 valores: Not_Break y Spring_Break .
Y este es el mensaje de advertencia:
Warning messages:
1: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
2: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
3: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
4: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
5: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
6: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
7: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
8: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
9: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
10: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
11: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
12: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
13: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
14: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
15: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
16: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
17: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
18: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
19: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
20: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
21: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
22: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
23: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
24: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
25: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
26: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
27: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
28: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
29: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
30: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
31: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
32: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
33: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
34: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
35: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
36: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
37: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
38: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
39: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
40: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
41: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
42: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
43: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
44: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
45: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
46: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
47: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
48: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
49: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
50: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
SEGUNDO - añadiendo al código en el enlace:
LU <- Map(`:`, Break_data$Start, Break_data$End)
LU <- data.frame(value = unlist(LU),
index = rep(seq_along(LU), lapply(LU, length)))
for (i in c(1:nrow(df))){
if (df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]])
else (df$date[i] >= "2016-02-05" & df$date <= "2018-07-12")
df$Break[i]<-"Not_Break"
}
En el segundo también me sale un error. Cualquier modificación al código o implementación (en R o Python) será apreciada
¿Hay alguna forma más eficiente de hacer esto?
Nota: los conjuntos de datos están disponibles públicamente en: https://github.com/tomiscat/data
library(lubridate)
# data
Break_data <- data.table::fread(
" Start End Break Year
2016-02-24 2016-02-29 Spring_Break 2016
2016-03-23 2016-03-28 Easter_Recess 2016
2016-10-05 2016-10-10 Mid_Term_Break 2016
2017-03-01 2017-03-06 Spring_Break 2017
2017-04-12 2017-04-17 Easter_Recess 2017
2017-10-04 2017-10-09 Mid_Term_Break 2017
2018-02-28 2018-03-05 Spring_Break 2018
2018-03-28 2018-04-02 Easter_Recess 2018"
)
df <- data.frame(
date = c("2016-02-05","2016-02-05", "2016-02-05" ,"2016-02-05", "2016-02-05", "2016-02-05",
"2016-02-26", "2016-10-07", "2018-03-30",
"2018-07-12","2018-07-12", "2018-07-12", "2018-07-12", "2018-07-12" ,"2018-07-12")
)
# mapping
Break_data$Start <- ymd(Break_data$Start)
Break_data$End <- ymd(Break_data$End)
df$date <- ymd(df$date)
LU <- Map(`:`, Break_data$Start, Break_data$End)
LU <- data.frame(value = unlist(LU),
index = rep(seq_along(LU), lapply(LU, length)))
df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]]
# if not mapped(df$Break ==NA), then set it to "Non_break"
df$Break <- ifelse(is.na(df$Break), "Non_Break", df$Break)
df$Break <- factor(df$Break)
df
#> date Break
#> 1 2016-02-05 Non_Break
#> 2 2016-02-05 Non_Break
#> 3 2016-02-05 Non_Break
#> 4 2016-02-05 Non_Break
#> 5 2016-02-05 Non_Break
#> 6 2016-02-05 Non_Break
#> 7 2016-02-26 Spring_Break
#> 8 2016-10-07 Mid_Term_Break
#> 9 2018-03-30 Easter_Recess
#> 10 2018-07-12 Non_Break
#> 11 2018-07-12 Non_Break
#> 12 2018-07-12 Non_Break
#> 13 2018-07-12 Non_Break
#> 14 2018-07-12 Non_Break
#> 15 2018-07-12 Non_Break
Creado en 2018-08-19 por el paquete reprex (v0.2.0).
Edición: solución completa