Cross fit generalized linear models
A data frame
A list of formulas to apply to each subset of the data.
If named, these names will be used in the model
column of the output.
Otherwise, the formulas will be converted to strings in the model
column.
Columns to subset the data.
Can be any expression supported by
<tidy-select
>.
If NULL
, the data is not subset into columns.
Defaults to NULL
.
A list of columns passed to weights
in fn
.
If one of the elements is NULL
or NA
, that model will not
be weighted.
Defaults to NULL
.
A list of glm model families.
Defaults to gaussian("identity")
, the equivalent of lm()
.
See family for examples.
A list of additional arguments to glm()
.
A logical or function to use to tidy model output into
data.frame columns.
If TRUE
, uses the default tidying function: tidy_glance()
.
If FALSE
, NA
, or NULL
, the untidied model output will be returned in
a list column named fit
.
An alternative function can be specified with an unquoted function name or
a purrr-style lambda function with one argument (see usage
with broom::tidy(conf.int = TRUE) in examples).
Defaults to tidy_glance.
A list of additional arguments to the tidy
function
If "stop"
, the default, the function will stop and return an
error if any subset produces an error.
If "warn"
, the function will produce a warning for subsets that produce
an error and return results for all subsets that do not.
A tibble with a column for the model formula,
columns for subsets,
columns for the model family and type,
columns for the weights (if applicable),
and columns of tidy model output or a list column of models
(if tidy = FALSE
)
cross_fit()
to use any modeling function.
cross_fit_glm(
data = mtcars,
formulas = list(am ~ gear, am ~ cyl),
cols = vs,
families = list(gaussian("identity"), binomial("logit"))
)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> # A tibble: 16 × 17
#> model family link vs term estimate std.e…¹ statis…² p.value null.…³
#> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 am ~ gear gaussi… iden… 0 (Int… -1.57 1.69e-1 -9.28e+0 7.73e-8 4
#> 2 am ~ gear gaussi… iden… 0 gear 0.536 4.64e-2 1.15e+1 3.59e-9 4
#> 3 am ~ gear gaussi… iden… 1 (Int… -1.58 9.07e-1 -1.74e+0 1.08e-1 3.5
#> 4 am ~ gear gaussi… iden… 1 gear 0.538 2.33e-1 2.31e+0 3.95e-2 3.5
#> 5 am ~ gear binomi… logit 0 (Int… -177. 4.09e+5 -4.34e-4 1.00e+0 22.9
#> 6 am ~ gear binomi… logit 0 gear 50.4 1.16e+5 4.36e-4 1.00e+0 22.9
#> 7 am ~ gear binomi… logit 1 (Int… -74.8 1.28e+4 -5.82e-3 9.95e-1 19.4
#> 8 am ~ gear binomi… logit 1 gear 18.8 3.21e+3 5.85e-3 9.95e-1 19.4
#> 9 am ~ cyl gaussi… iden… 0 (Int… 2.54 5.65e-1 4.51e+0 3.58e-4 4
#> 10 am ~ cyl gaussi… iden… 0 cyl -0.297 7.50e-2 -3.96e+0 1.12e-3 4
#> 11 am ~ cyl gaussi… iden… 1 (Int… 2.10 5.77e-1 3.64e+0 3.38e-3 3.5
#> 12 am ~ cyl gaussi… iden… 1 cyl -0.35 1.24e-1 -2.83e+0 1.52e-2 3.5
#> 13 am ~ cyl binomi… logit 0 (Int… 79.7 1.52e+4 5.26e-3 9.96e-1 22.9
#> 14 am ~ cyl binomi… logit 0 cyl -10.2 1.89e+3 -5.38e-3 9.96e-1 22.9
#> 15 am ~ cyl binomi… logit 1 (Int… 39.7 6.52e+3 6.08e-3 9.95e-1 19.4
#> 16 am ~ cyl binomi… logit 1 cyl -9.71 1.63e+3 -5.95e-3 9.95e-1 19.4
#> # … with 7 more variables: df.null <int>, logLik <dbl>, AIC <dbl>, BIC <dbl>,
#> # deviance <dbl>, df.residual <int>, nobs <int>, and abbreviated variable
#> # names ¹std.error, ²statistic, ³null.deviance