| Title: | A Comprehensive Set of Functions to Clean, Analyze, and Present Crime Data |
|---|---|
| Description: | A collection of functions that make it easier to understand crime (or other) data, and assist others in understanding it. The package helps you read data from various sources, clean it, fix column names, and graph the data. |
| Authors: | Jacob Kaplan [aut, cre] (ORCID: <https://orcid.org/0000-0002-0601-0387>) |
| Maintainer: | Jacob Kaplan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.5.1 |
| Built: | 2026-05-14 06:19:47 UTC |
| Source: | https://github.com/jacobkap/crimeutils |
Capitalizes the first letter of every word
capitalize_words(words, lowercase_of = TRUE)capitalize_words(words, lowercase_of = TRUE)
words |
A string or vector of strings with words you want capitalized |
lowercase_of |
If TRUE (default), keeps the string " of " to be lowercased as is custom in English writing (e.g. District of Columbia). |
The original string with the first letter of each word capitalized
capitalize_words("district of columbia")capitalize_words("district of columbia")
Creates new columns to indicate which values are outliers based on the average value.
indicate_outliers( data, select_columns = NULL, group_variable, std_dev_value = 1.96, zero_is_outlier = FALSE )indicate_outliers( data, select_columns = NULL, group_variable, std_dev_value = 1.96, zero_is_outlier = FALSE )
data |
A data.frame |
select_columns |
A string or vector of strings with the name(s) of the numeric columns to check for outliers. If NULL (default), will use all numeric columns in the data. |
group_variable |
A string with the name of the column with the grouping variable. |
std_dev_value |
A number indicating how many standard deviations away from the mean to determine if a value is an outlier. |
zero_is_outlier |
If TRUE (not default), reports any zero value as an outlier. |
The initial data.frame with new columns for each numeric variable included with a value of 0 if not an outlier and 1 if that row is an outlier.
indicate_outliers(mtcars, "drat", group_variable = "am") indicate_outliers(mtcars, "drat", group_variable = "am", zero_is_outlier = TRUE)indicate_outliers(mtcars, "drat", group_variable = "am") indicate_outliers(mtcars, "drat", group_variable = "am", zero_is_outlier = TRUE)
Create a line graph with 95% confidence interval bars
make_average_graph( data, x_col, y_col, confidence_interval_error_bars = TRUE, mean_line = TRUE, type = c("line", "bar") )make_average_graph( data, x_col, y_col, confidence_interval_error_bars = TRUE, mean_line = TRUE, type = c("line", "bar") )
data |
A data.frame with the data you want to graph |
x_col |
A string with the name of the x-axis column |
y_col |
A string with the name of the y-axis column |
confidence_interval_error_bars |
A boolean (default TRUE) for whether to include 95% confidence intervals or not. |
mean_line |
If TRUE (default) willadd a dashed line with the overall mean. |
type |
A string for whether it should make a linegraph ("line", default) or a bargraph ("bar") |
A ggplot object. Also prints the graph to the Plots panel.
data = data.frame(x = sample(15:25, size = 200, replace = TRUE), y = sample(1:100, size = 200, replace = TRUE)) make_average_graph(data, "x", "y") make_average_graph(data, "x", "y", confidence_interval_error_bars = FALSE) make_average_graph(data, "x", "y", type = "bar", mean_line = FALSE) make_average_graph(data, "x", "y", confidence_interval_error_bars = FALSE, type = "bar")data = data.frame(x = sample(15:25, size = 200, replace = TRUE), y = sample(1:100, size = 200, replace = TRUE)) make_average_graph(data, "x", "y") make_average_graph(data, "x", "y", confidence_interval_error_bars = FALSE) make_average_graph(data, "x", "y", type = "bar", mean_line = FALSE) make_average_graph(data, "x", "y", confidence_interval_error_bars = FALSE, type = "bar")
Make a nice-looking barplot.
make_barplots(data, column, count = TRUE, title = NULL, ylab = NULL)make_barplots(data, column, count = TRUE, title = NULL, ylab = NULL)
data |
A data.frame with the data you want to graph. |
column |
A string with the name of the column you want to make the plot from. |
count |
A boolean (default TRUE) indicating if you want the barplot to show a count of the column values or a percent. |
title |
A string with the text you want as the title. |
ylab |
A string with the text you want as the y-axis label. |
A barplot object.
make_barplots(mtcars, "cyl") make_barplots(mtcars, "cyl", count = FALSE, title = "hello", ylab = "YLAB Label")make_barplots(mtcars, "cyl") make_barplots(mtcars, "cyl", count = FALSE, title = "hello", ylab = "YLAB Label")
Create a descriptive statistics table from numeric variables
make_desc_stats_table( data, columns, output = c("min", "median", "mean", "sd", "max", "sum", "NAs"), decimals = 2, title = NULL, subtitle = NULL, footnote = NULL )make_desc_stats_table( data, columns, output = c("min", "median", "mean", "sd", "max", "sum", "NAs"), decimals = 2, title = NULL, subtitle = NULL, footnote = NULL )
data |
A data.frame with the data you want to make the table from. |
columns |
A string or vector of strings with the names of the columns you want to use. |
output |
A string or vector of strings indicating which math functions you want to perform on the columns and present in the table. Options are: 'min', 'median', 'mean', 'sd', 'max', and 'N'. Default is to use all of these math functions. The order you put in these values is the order the table will present the columns. |
decimals |
A positive integer for how many decimal places you want to round to. |
title |
A string with the text you want as the title |
subtitle |
A string with the text you want as the subtitle. |
footnote |
A string with the text you want as the footnote. |
A data.frame with the data that generates the table, which is outputted in the Viewer tab.
make_desc_stats_table(mtcars, columns = c("mpg", "disp", "wt", "cyl")) make_desc_stats_table(mtcars, c("mpg", "disp", "wt"), output = c("mean", "min"), decimals = 4, title = "hello", subtitle = "world")make_desc_stats_table(mtcars, columns = c("mpg", "disp", "wt", "cyl")) make_desc_stats_table(mtcars, c("mpg", "disp", "wt"), output = c("mean", "min"), decimals = 4, title = "hello", subtitle = "world")
Creates a .tex file with LaTeX code to create a table from an R data.frame.
make_latex_tables( data, file, caption = "", label = "", multi_column = NULL, footnote = "", sideways = FALSE, longtable = FALSE )make_latex_tables( data, file, caption = "", label = "", multi_column = NULL, footnote = "", sideways = FALSE, longtable = FALSE )
data |
A data.frame or a list of data.frames. If a data.frame, the table is created with the values in that data.frame. If a list of data.frames, the table gets one panel for each data.frame. If the list is named, will use the names to create panel labels. |
file |
A string with the name of the file to save the .tex as. |
caption |
(Optional) A string with the caption for the table (i.e. the table title). |
label |
(Optional) A string with the reference for the table - to be used when referencing the table in the text. If NULL, |
multi_column |
(Optional) A named vector with the names being the names of the multi-column and the values being the width of the multi-column. |
footnote |
(Optional) A string with text for the footnote of the table. |
sideways |
(Optional) If TRUE, will make a sideways table (useful for large tables), otherwise (default) will make a normal table. |
longtable |
(Optional) If TRUE, will make a longtable table (useful for long tables), otherwise (default) will make a normal table. |
Nothing. It will create a .tex file in the current working directory.
## Not run: make_latex_tables(mtcars, file = "text.tex", caption = "This is a description of the table", label = "internal_table_label", footnote = "Here is some info you should know to read this table", longtable = TRUE) ## End(Not run)## Not run: make_latex_tables(mtcars, file = "text.tex", caption = "This is a description of the table", label = "internal_table_label", footnote = "Here is some info you should know to read this table", longtable = TRUE) ## End(Not run)
Create a table showing the mean, median, and mode of a certain column
make_mean_median_mode_table_by_group( data, group_column, data_column, total_row = TRUE )make_mean_median_mode_table_by_group( data, group_column, data_column, total_row = TRUE )
data |
A data.frame with the data you want to make the table from. |
group_column |
A string with the name of the variable you are grouping by |
data_column |
A string for the variable you want to get the mean, median, and mode from, Variable should be numeric. |
total_row |
A boolean (default TRUE) for whether to include a row a the bottom for the overall mean and standard deviation (i.e. not by group). |
A data.frame with the first column showing the category grouped by. Then one column for the mean, one column for the median, and one column for the mode.
make_mean_median_mode_table_by_group(mtcars, "gear", "mpg")make_mean_median_mode_table_by_group(mtcars, "gear", "mpg")
Get mean and standard deviation of variables by group
make_mean_std_dev_by_group_table(data, group_column, columns, total_row = TRUE)make_mean_std_dev_by_group_table(data, group_column, columns, total_row = TRUE)
data |
A data.frame with the data you want to make the table from. |
group_column |
A string with the name of the variable you are grouping by |
columns |
A string or vector of strings for the variables you want to get the mean and standard deviation for. |
total_row |
A boolean (default TRUE) for whether to include a row a the bottom for the overall mean and standard deviation (i.e. not by group). |
A data.frame with the first column showing the category grouped by. Then one column for each variable you want the mean and standard deviation for. Will give the mean and standard deviation as a single string with the standard deviation in parentheses.
make_mean_std_dev_by_group_table(mtcars, "gear", c("mpg", "disp"))make_mean_std_dev_by_group_table(mtcars, "gear", c("mpg", "disp"))
Make a table showing the number (n) and percent of the population (e.g. % of nrow()) for each value in a variable(s).
make_n_and_percent_table(data, columns)make_n_and_percent_table(data, columns)
data |
A data.frame with the data you want to make the table from. |
columns |
A string or vector of strings with the column names to make the N and % from. |
A data.frame with one row for each value in the inputted variable(s) and columns showing the N and % for that value.
make_n_and_percent_table(mtcars, c("cyl", "gear"))make_n_and_percent_table(mtcars, c("cyl", "gear"))
Make a graph of coefficient values and 95 percent confidence interval for regression.
make_regression_graph(model, coefficients = NULL)make_regression_graph(model, coefficients = NULL)
model |
A 'lm' object made from making a model using 'lm()'. |
coefficients |
A string or vector of strings with the coefficient names. Will then make the graph only with those coefficients. |
Outputs a 'ggplot2' graph
make_regression_graph(model = lm(mpg ~ cyl + disp + hp + drat, data = mtcars)) make_regression_graph(model = lm(mpg ~ cyl + disp + hp + drat, data = mtcars), coefficients = c("cyl", "disp")) make_regression_graph(model = lm(mpg ~ cyl + disp, data = mtcars))make_regression_graph(model = lm(mpg ~ cyl + disp + hp + drat, data = mtcars)) make_regression_graph(model = lm(mpg ~ cyl + disp + hp + drat, data = mtcars), coefficients = c("cyl", "disp")) make_regression_graph(model = lm(mpg ~ cyl + disp, data = mtcars))
Turns regression results in a data.frame for easy conversion to a table
make_regression_table(model, coefficients_only = TRUE)make_regression_table(model, coefficients_only = TRUE)
model |
A 'lm' object made from making a model using 'lm()'. |
coefficients_only |
If TRUE (default), returns only the coefficients,standard error, t-value, p-value, and confidence intervals. Else also returns the r-squared, the adjusted r-squared,f-stat, p-value for the f-stat, and the degrees of freedom. |
A data.frame with the regression results
make_regression_table(lm(mpg ~ cyl, data = mtcars)) make_regression_table(lm(mpg ~ cyl, data = mtcars), coefficients_only = FALSE)make_regression_table(lm(mpg ~ cyl, data = mtcars)) make_regression_table(lm(mpg ~ cyl, data = mtcars), coefficients_only = FALSE)
Make a nice-looking stat_count (similar to barplot) plot.
make_stat_count_plots( data, column, count = TRUE, title = NULL, ylab = NULL, xlab = NULL )make_stat_count_plots( data, column, count = TRUE, title = NULL, ylab = NULL, xlab = NULL )
data |
A data.frame with the data you want to graph. |
column |
A string with the name of the column you want to make the plot from. |
count |
A boolean (default TRUE) indicating if you want the barplot to show a count of the column values or a percent. |
title |
A string with the text you want as the title. |
ylab |
A string with the text you want as the y-axis label. |
xlab |
A string with the text you want as the x-axis label. |
A stat_count object
make_stat_count_plots(mtcars, "mpg") make_stat_count_plots(mtcars, "mpg", count = FALSE, title = "hello", ylab = "YLAB Label")make_stat_count_plots(mtcars, "mpg") make_stat_count_plots(mtcars, "mpg", count = FALSE, title = "hello", ylab = "YLAB Label")
Returns abbreviations of state name input.
make_state_abb(state)make_state_abb(state)
state |
A vector of strings with the names of US states. |
A vector of strings with the abbreviations of the inputted state names.
make_state_abb("california")make_state_abb("california")
Pad decimal places with trailing zeros.
pad_decimals(numbers, digits = NULL)pad_decimals(numbers, digits = NULL)
numbers |
A number or vector of numbers. |
digits |
Number of decimal places to pad. If NULL (default), uses the maximum number of decimal places in the numbers input. If digits is less than the number of decimal places in the data, rounds the data to the decimal place specified. If rounding at a 5, follows R's rules to round to the nearest even number. |
The original numbers, now as strings with trailing zeros added to the decimal places.
pad_decimals(c(2, 3.4, 8.808))pad_decimals(c(2, 3.4, 8.808))
A set of colorblind friendly colors for graphs.
scale_color_crim(...)scale_color_crim(...)
... |
Arguments passed to discrete_scale() |
The ggplot graph with colors set.
ggplot2::ggplot(mtcars, ggplot2::aes(x = mpg, y = hp, color = as.character(cyl))) + ggplot2::geom_point(size = 2) + scale_color_crim()ggplot2::ggplot(mtcars, ggplot2::aes(x = mpg, y = hp, color = as.character(cyl))) + ggplot2::geom_point(size = 2) + scale_color_crim()
A set of colorblind friendly fill colors for graphs.
scale_fill_crim(...)scale_fill_crim(...)
... |
Arguments passed to discrete_scale() |
The ggplot graph with fills set.
ggplot2::ggplot(mtcars, ggplot2::aes(x = cyl, fill = as.character(cyl))) + ggplot2::geom_bar() + scale_fill_crim()ggplot2::ggplot(mtcars, ggplot2::aes(x = cyl, fill = as.character(cyl))) + ggplot2::geom_bar() + scale_fill_crim()
A set of linetypes
scale_linetype_crim(...)scale_linetype_crim(...)
... |
Arguments passed to discrete_scale() |
The ggplot graph with linetypes set.
ggplot2::ggplot(mtcars, ggplot2::aes(x = mpg, y = hp, linetype = as.character(cyl))) + ggplot2::geom_line(size = 1) + scale_linetype_crim() + theme_crim()ggplot2::ggplot(mtcars, ggplot2::aes(x = mpg, y = hp, linetype = as.character(cyl))) + ggplot2::geom_line(size = 1) + scale_linetype_crim() + theme_crim()
Create a PDF with one scatterplot for each group in the data.
scatterplot_data_graph( data, numeric_variable1, numeric_variable2, group_variable, file_name )scatterplot_data_graph( data, numeric_variable1, numeric_variable2, group_variable, file_name )
data |
A data.frame with the data you want to graph. |
numeric_variable1 |
A string with the name of the first column with numeric data to graph. |
numeric_variable2 |
A string with the name of the second column with numeric data to graph. |
group_variable |
A string with the name of the column with the grouping variable. |
file_name |
A string with the name of the PDF to be made with one page for each graph. |
A PDF with one page per graph
## Not run: scatterplot_data_graph(mtcars, numeric_variable1 = "mpg", numeric_variable2 = "disp", group_variable = "gear", file_name = "test.pdf") ## End(Not run)## Not run: scatterplot_data_graph(mtcars, numeric_variable1 = "mpg", numeric_variable2 = "disp", group_variable = "gear", file_name = "test.pdf") ## End(Not run)
A minimalist theme designed for graphics in academic research
theme_crim()theme_crim()
The graph with the theme changed.
ggplot2::ggplot(mtcars) + ggplot2::geom_point(ggplot2::aes(x = wt, y = mpg)) + theme_crim()ggplot2::ggplot(mtcars) + ggplot2::geom_point(ggplot2::aes(x = wt, y = mpg)) + theme_crim()
Create a PDF with one time-series graph for each group in the data.
time_series_data_graph( data, numeric_variable, time_variable, group_variable, outlier_std_dev_value = 1.96, file_name )time_series_data_graph( data, numeric_variable, time_variable, group_variable, outlier_std_dev_value = 1.96, file_name )
data |
A data.frame with the data you want to graph. |
numeric_variable |
A string with the name of the column with numeric data to graph. |
time_variable |
A string with the name of the column that contains the time variable. |
group_variable |
A string with the name of the column with the grouping variable. |
outlier_std_dev_value |
A number that indicates how many standard deviations from the group mean an outlier is. Outliers will be colored orange in the data. |
file_name |
A string with the name of the PDF to be made with one page for each graph. |
A PDF with one page per graph
Get ORIs that consistently report their data every year.
ucr_constant_reporter_oris(data, minimum_months_reported)ucr_constant_reporter_oris(data, minimum_months_reported)
data |
A data.frame with Uniform Crime Report (UCR) data. Requires at least the ORI, year, and number_of_months_reported columns. |
minimum_months_reported |
Integer indicating the minimum number of months requesting to keep in data. |
A vector with the ORIs that report the minimum number of months for every year in the data.