I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003.
Here is an example to recreate the output:
library(tidyverse)
library(hablar)
df <- read_csv("year, week, rat_house_females, rat_house_males, mouse_wild_females, mouse_wild_males
2018,10,1,1,1,1
2018,10,1,1,1,1
2018,11,2,2,2,2
2018,11,2,2,2,2
2019,10,3,3,3,3
2019,10,3,3,3,3
2019,11,4,4,4,4
2019,11,4,4,4,4") %>%
convert(chr(year,week)) %>%
mutate(total_rodents = rowSums(select_if(., is.numeric))) %>%
convert(num(year,week)) %>%
group_by(year,week) %>% summarise(average = mean(total_rodents))
The output tibble is correct, but this message appears:
summarise()
regrouping output by 'year' (override with.groups
argument)
How should this be interpreted? Why does it report regrouping only by 'year' when I grouped by both year and week? Also, what does it mean to override and why would I want to do that?
I don't think the message indicates a problem because it appears throughout the dplyr vignette:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
I believe it is a new message because it has only appeared on very recent SO questions such as How to melt pairwise.wilcox.test output using dplyr? and R Aggregate over multiple columns (neither of which addresses the regrouping/override message).
Thank you!
Best Answer
It is just a friendly warning message. By default, if there is any grouping before the
summarise
, it drops one group variable i.e. the last one specified in thegroup_by
. If there is only one grouping variable, there won't be any grouping attribute after thesummarise
and if there are more than one i.e. here it is two, so, the attribute for grouping is reduce to 1 i.e. the data would have the 'year' as grouping attribute. As a reproducible exampleThe message is that it is
ungroup
ing i.e when there is a singlegroup_by
, it drops that grouping after thesummarise
Here, it drops the last grouping and regroup with the 'am'
If we check the
?summarise
, there is.groups
argument which by default is"drop_last"
and the other options are"drop"
,"keep"
,"rowwise"
i.e. if we change the
.groups
insummarise
, we don't get the message because the group attributes are removedPreviously, this warning was not issued and it could lead to situations where the OP does a
mutate
or something else assuming there is no grouping and results in unexpected output. Now, the warning gives the user an indication that we should be careful that there is a grouping attributeNOTE: The
.groups
right now isexperimental
in its lifecycle. So, the behaviour could be modified in the future releasesDepending upon whether we need any transformation of the data based on the same grouping variable (or not needed), we could select the different options in
.groups
.