Title: | Dynamic Modeling and Machine Learning Environment |
---|---|
Description: | Estimates, predict and forecast dynamic models as well as Machine Learning metrics which assists in model selection for further analysis. The package also have capabilities to provide tools and metrics that are useful in machine learning and modeling. For example, there is quick summary, percent sign, Mallow's Cp tools and others. The ecosystem of this package is analysis of economic data for national development. The package is so far stable and has high reliability and efficiency as well as time-saving. |
Authors: | Job Nmadu [aut, cre] |
Maintainer: | Job Nmadu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 11.11.24 |
Built: | 2024-10-14 03:25:57 UTC |
Source: | https://github.com/JobNmadu/Dyn4cast |
This function estimates the lower and upper 80% and 95% forecasts of the Model. The final values are within the lower and upper limits of the base data. Used in conjunction with <scaled_logit> and <inv_scaled_logit> functions, they are adapted from Hyndman & Athanasopoulos (2021) and modified for independent use rather than be restricted to be used with a particular package.
constrainedforecast(Model, lower, upper)
constrainedforecast(Model, lower, upper)
Model |
This is the exponential values from the |
lower |
The lower limit of the forecast |
upper |
The upper limit of the forecast |
A list of forecast values within 80% and 95% confidence band. The values are:
Lower 80% |
Forecast at lower 80% confidence level. |
Upper 80% |
Forecast at upper 80% confidence level. |
Lower 95% |
Forecast at lower 95% confidence level. |
Upper 95% |
Forecast at upper 95% confidence level. |
library(Dyn4cast) library(splines) library(forecast) lower <- 1 upper <- 37 Model <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data) FitModel <- scaledlogit(x = fitted.values(Model), lower = lower, upper = upper) ForecastModel <- forecast(FitModel, h = length(200)) ForecastValues <- constrainedforecast(Model = ForecastModel, lower, upper)
library(Dyn4cast) library(splines) library(forecast) lower <- 1 upper <- 37 Model <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data) FitModel <- scaledlogit(x = fitted.values(Model), lower = lower, upper = upper) ForecastModel <- forecast(FitModel, h = length(200)) ForecastValues <- constrainedforecast(Model = ForecastModel, lower, upper)
This is a custom plot for correlation matrix in which the coefficients are displayed along with graphics showing the magnitude of each coefficient.
corplot(r)
corplot(r)
r |
Correlation matrix of the data for the plot |
The function returns a custom plot of the correlation matrix
corplot |
The custom plot of the correlation matrix |
data.frame
for comparable Machine Learning prediction and visualizationOften economic and other Machine Learning data are of different units or sizes making either estimation, interpretation or visualization difficult. The solution to these issues can be handled if the data can be transformed into unitless or data of similar magnitude. This is what data_transform
is set to do. It is simple and straight forward to use.
data_transform(data, method, MARGIN = 2)
data_transform(data, method, MARGIN = 2)
data |
A |
method |
The type of transformation. There three options. |
MARGIN |
Option to either transform the data |
This function returns the output of the data transformation process as
tata_transformed |
A new |
library(Dyn4cast) # View the data without transformation data0 <- Transform %>% pivot_longer(!X, names_to = "Factors", values_to = "Data") ggplot(data = data0, aes(x = X, y = Data, fill = Factors, color = Factors)) + geom_line() + scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + labs(y = "Data", x = "Series", color = "Factors") + theme_bw(base_size = 12) # Example 1: Transformation by min-max method. # You could also transform the `X column` but is is better not to. data1 <- data_transform(Transform[, -1], 1) data1 <- cbind(Transform[, 1], data1) data1 <- data1 %>% pivot_longer(!X, names_to = "Factors", values_to = "Data") ggplot(data = data1, aes(x = X, y = Data, fill = Factors, color = Factors)) + geom_line() + scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + labs(y = "Data", x = "Series", color = "Factors") + theme_bw(base_size = 12) # Example 2: `log` transformation data2 <- data_transform(Transform[, -1], 2) data2 <- cbind(Transform[, 1], data2) data2 <- data2 %>% pivot_longer(!X, names_to = "Factors", values_to = "Data") ggplot(data = data2, aes(x = X, y = Data, fill = Factors, color = Factors)) + geom_line() + scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + labs(y = "Data", x = "Series", color = "Factors") + theme_bw(base_size = 12) # Example 3: `Mean-SD` transformation data3 <- data_transform(Transform[, -1], 3) data3 <- cbind(Transform[, 1], data3) data3 <- data3 %>% pivot_longer(!X, names_to = "Factors", values_to = "Data") ggplot(data = data3, aes(x = X, y = Data, fill = Factors, color = Factors)) + geom_line() + scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + labs(y = "Data", x = "Series", color = "Factors") + theme_bw(base_size = 12)
library(Dyn4cast) # View the data without transformation data0 <- Transform %>% pivot_longer(!X, names_to = "Factors", values_to = "Data") ggplot(data = data0, aes(x = X, y = Data, fill = Factors, color = Factors)) + geom_line() + scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + labs(y = "Data", x = "Series", color = "Factors") + theme_bw(base_size = 12) # Example 1: Transformation by min-max method. # You could also transform the `X column` but is is better not to. data1 <- data_transform(Transform[, -1], 1) data1 <- cbind(Transform[, 1], data1) data1 <- data1 %>% pivot_longer(!X, names_to = "Factors", values_to = "Data") ggplot(data = data1, aes(x = X, y = Data, fill = Factors, color = Factors)) + geom_line() + scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + labs(y = "Data", x = "Series", color = "Factors") + theme_bw(base_size = 12) # Example 2: `log` transformation data2 <- data_transform(Transform[, -1], 2) data2 <- cbind(Transform[, 1], data2) data2 <- data2 %>% pivot_longer(!X, names_to = "Factors", values_to = "Data") ggplot(data = data2, aes(x = X, y = Data, fill = Factors, color = Factors)) + geom_line() + scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + labs(y = "Data", x = "Series", color = "Factors") + theme_bw(base_size = 12) # Example 3: `Mean-SD` transformation data3 <- data_transform(Transform[, -1], 3) data3 <- cbind(Transform[, 1], data3) data3 <- data3 %>% pivot_longer(!X, names_to = "Factors", values_to = "Data") ggplot(data = data3, aes(x = X, y = Data, fill = Factors, color = Factors)) + geom_line() + scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + labs(y = "Data", x = "Series", color = "Factors") + theme_bw(base_size = 12)
The function estimates and predict models using time series dataset and provide subset forecasts within the length of trend. The recognized models are lm, smooth spline, polynomial splines with or without knots, quadratic polynomial, and ARIMA. The robust output include the models' estimates, time-varying forecasts and plots based on themes from ggplot. The main attraction of this function is the use of the newly introduced equal number of trend (days, months, years) to estimate forecast from the model. The function takes daily, monthly and yearly data sets for now
.
DynamicForecast(date, series, Trend, Type, MaximumDate, x = 0, BREAKS = 0, ORIGIN = origin, Length = 0, ...)
DynamicForecast(date, series, Trend, Type, MaximumDate, x = 0, BREAKS = 0, ORIGIN = origin, Length = 0, ...)
date |
A vector containing the dates for which the data is collected. Must be the same length with |
series |
A vector containing data for estimation and forecasting. Must be the same length with |
x |
vector of optional dataset that is to be added to the model for forecasting. The modeling and forecasting is still done if not provided. Must be the same length with |
BREAKS |
A vector of numbers indicating points of breaks for estimation of the spline models. |
MaximumDate |
The date indicating the maximum date (last date) in the data frame, meaning that forecasting starts the next date following it. The date must be a recognized date format. Note that for forecasting, the date origin is set to 1970-01-01. |
Trend |
The type of trend. There are three options Day, Month and Year. |
Type |
The type of response variable. There are two options Continuous and Integer. For integer variable, the forecasts are constrained between the minimum and maximum value of the response variable. |
Length |
The length for which the forecast would be made. If not given, would default to the length of the dataset i.e. sample size. |
ORIGIN |
if different from 1970-01-01 must be in the format |
... |
Additional arguments that may be passed to the function. If the maximum date is NULL which is is the default, it is set to the last date of the |
A list with the following components:
Spline without knots |
The estimated spline model without the breaks (knots). |
Spline with knots |
The estimated spline model with the breaks (knots). |
Smooth Spline |
The smooth spline estimates. |
ARIMA |
Estimated Auto Regressive Integrated Moving Average model. |
Quadratic |
The estimated quadratic polynomial model. |
Ensembled with equal weight |
Estimated Ensemble model with equal weight given to each of the models. To get this, the fitted values of each of the models is divided by the number of models and summed together. |
Ensembled based on weight |
Estimated Ensemble model based on weight of each model. To do this, the fitted values of each model served as independent variable and regressed against the trend with interaction among the variables. |
Ensembled based on summed weight |
Estimated Ensemble model based on summed weight of each model. To do this, the fitted values of each model served as independent variable and is regressed against the trend. |
Ensembled based on weight of fit |
Estimated Ensemble model. The fit of each model is measured by the rmse. |
Unconstrained Forecast |
The forecast if the response variable is continuous. The number of forecasts is equivalent to the length of the dataset (equal days forecast). |
Constrained Forecast |
The forecast if the response variable is integer. The number of forecasts is equivalent to the length of the dataset (equal days forecast). |
RMSE |
Root Mean Square Error (rmse) for each forecast. |
Unconstrained forecast Plot |
The combined plots of the unconstrained forecasts using ggplot. |
Constrained forecast Plot |
The combined plots of the constrained forecasts using ggplot. |
Date |
This is the date range for the forecast. |
Fitted plot |
This is the plot of the fitted models. |
Estimated coefficients |
This is the estimated coefficients of the various models in the forecast. |
# COVID19$Date <- zoo::as.Date(COVID19$Date, format = '%m/%d/%Y') # #The date is formatted to R format # LEN <- length(COVID19$Case) # Dss <- seq(COVID19$Date[1], by = "day", length.out = LEN) # #data length for forecast # ORIGIN = "2020-02-29" # lastdayfo21 <- Dss[length(Dss)] # The maximum length # uncomment to run # Data <- COVID19[COVID19$Date <= lastdayfo21 - 28, ] # # desired length of forecast # BREAKS <- c(70, 131, 173, 228, 274) # The default breaks for the data # DynamicForecast(date = Data$Date, series = Data$Case, # BREAKS = BREAKS, MaximumDate = "2021-02-10", # Trend = "Day", Length = 0, Type = "Integer") # # lastdayfo21 <- Dss[length(Dss)] # Data <- COVID19[COVID19$Date <= lastdayfo21 - 14, ] # BREAKS = c(70, 131, 173, 228, 274) # DynamicForecast(date = Data$Date, series = Data$Case, # BREAKS = BREAKS , MaximumDate = "2021-02-10", # Trend = "Day", Length = 0, Type = "Integer")
# COVID19$Date <- zoo::as.Date(COVID19$Date, format = '%m/%d/%Y') # #The date is formatted to R format # LEN <- length(COVID19$Case) # Dss <- seq(COVID19$Date[1], by = "day", length.out = LEN) # #data length for forecast # ORIGIN = "2020-02-29" # lastdayfo21 <- Dss[length(Dss)] # The maximum length # uncomment to run # Data <- COVID19[COVID19$Date <= lastdayfo21 - 28, ] # # desired length of forecast # BREAKS <- c(70, 131, 173, 228, 274) # The default breaks for the data # DynamicForecast(date = Data$Date, series = Data$Case, # BREAKS = BREAKS, MaximumDate = "2021-02-10", # Trend = "Day", Length = 0, Type = "Integer") # # lastdayfo21 <- Dss[length(Dss)] # Data <- COVID19[COVID19$Date <= lastdayfo21 - 14, ] # BREAKS = c(70, 131, 173, 228, 274) # DynamicForecast(date = Data$Date, series = Data$Case, # BREAKS = BREAKS , MaximumDate = "2021-02-10", # Trend = "Day", Length = 0, Type = "Integer")
This function provides graphic displays of the order of significance estimated coefficients of models. This would assists in accessing models so as to decide which can be used for further analysis, prediction and policy consideration.
estimate_plot(Model, limit)
estimate_plot(Model, limit)
Model |
Estimated model for which the estimated coefficients would be plotted |
limit |
Number of variables to be included in the coefficients plots |
The function returns a plot of the order of importance of the estimated coefficients
estimate_plot |
The plot of the order of importance of estimated coefficients |
Often, when a continuous data is converted to factors using the base R
cut function, the resultant Class Interval
column provide data with scientific notation which normally appears confusing to interpret, especially to casual data scientist. This function provide a more user-friendly output and is provided in a formatted manner. It is a easy to implement function.
formattedcut(data, breaks, cut = FALSE)
formattedcut(data, breaks, cut = FALSE)
data |
A vector of the data to be converted to factors if not cut already or the vector of a cut data |
breaks |
Number of classes to break the data into |
cut |
|
The function returns a data frame
with three or four columns i.e Lower class
, Upper class
, Class interval
and Frequency
(if the cut is FALSE
).
Cut |
The |
DD <- rnorm(100000) formattedcut(DD, 12, FALSE) DD1 <- cut(DD, 12) DDK <- formattedcut(DD1, 12, TRUE) DDK # if data is not from a data frame, the frequency distribution is required. as.data.frame(DDK %>% group_by(`Lower class`, `Upper class`, `Class interval`) %>% tally())
DD <- rnorm(100000) formattedcut(DD, 12, FALSE) DD1 <- cut(DD, 12) DDK <- formattedcut(DD1, 12, TRUE) DDK # if data is not from a data frame, the frequency distribution is required. as.data.frame(DDK %>% group_by(`Lower class`, `Upper class`, `Class interval`) %>% tally())
There are three main types of ranking: Standard competition, Ordinal and Fractional. Garrett's Ranking Technique is the application of fractional ranking in which the data points are ordered and given an ordinal number/rank. The ordering and ranking provide additional information which may not be available from frequency distribution. Again, the ordering is based on the level of seriousness or severity of the data point from the view point of the respondent. Ranking enables ease of comparison and makes grouping more meaningful. It is used in social science, psychology and other survey types of research. This functions performs Garrett Ranking of up to 15 ranks.
garrett_ranking(data, num_rank, ranking = NULL, m_rank = c(2:15))
garrett_ranking(data, num_rank, ranking = NULL, m_rank = c(2:15))
data |
The data for the Garrett Ranking, must be a |
num_rank |
A vector representing the number of ranks applied to the data. If the data is a five-point Likert-type data, then number of ranks is 5. |
ranking |
A vector of list representing the ranks applied to the data. If not available, positional ranks are applied. |
m_rank |
The scope of the ranking methods which is between 2 and 15. |
A list with the following components:
Data mean table |
Table of data ranked using simple average. |
Garrett ranked data |
Table of data ranked using Garrett mean score. |
Garrett value |
Table of ranking Garrett values |
garrett_data <- data.frame(garrett_data) ranking <- c("Serious constraint", "Constraint", "Not certain it is a constraint", "Not a constraint", "Not a serious constraint") ## ranking is supplied garrett_ranking(garrett_data, 5, ranking) # ranking not supplied garrett_ranking(garrett_data, 5) # you can rank subset of the data garrett_ranking(garrett_data, 8) garrett_ranking(garrett_data, 4)
garrett_data <- data.frame(garrett_data) ranking <- c("Serious constraint", "Constraint", "Not certain it is a constraint", "Not a constraint", "Not a serious constraint") ## ranking is supplied garrett_ranking(garrett_data, 5, ranking) # ranking not supplied garrett_ranking(garrett_data, 5) # you can rank subset of the data garrett_ranking(garrett_data, 8) garrett_ranking(garrett_data, 4)
This function is used to estimate exponential lower (80% and 95%) and upper (80% and 95%) values from the outcome of the scaledlogit
function. The exponentiation ensures that the forecast does not go beyond the upper and lower limits of the base data.
invscaledlogit(x, lower, upper)
invscaledlogit(x, lower, upper)
x |
The forecast values from constrained forecast package. Please specify the appropriate column containing the forecast values. |
lower |
Lower limits of the forecast values |
upper |
Upper limits of the forecast values |
x <- 1:35 lower <- 1 upper <- 35 invscaledlogit(x = x, lower = lower, upper = upper)
x <- 1:35 lower <- 1 upper <- 35 invscaledlogit(x = x, lower = lower, upper = upper)
The linear model still remains a reference point towards advanced modeling of some datasets as foundation for Machine Learning, Data Science and Artificial Intelligence in spite of some of her weaknesses. The major task in modeling is to compare various models before a selection is made for one or for advanced modeling. Often, some trial and error methods are used to decide which model to select. This is where this function is unique. It helps to estimate 14 different linear models and provide their coefficients in a formatted Table for quick comparison so that time and energy are saved. The interesting thing about this function is the simplicity, and it is a one line code.
Linearsystems(y, x, mod, limit, Test = NA)
Linearsystems(y, x, mod, limit, Test = NA)
y |
Vector of the dependent variable. This must be numeric. |
x |
Data frame of the explanatory variables. |
mod |
The group of linear models to be estimated. It takes value from 0 to 6. 0 = EDA (correlation, summary tables, Visuals means); 1 = Linear systems, 2 = power models, 3 = polynomial models, 4 = root models, 5 = inverse models, 6 = all the 14 models |
limit |
Number of variables to be included in the coefficients plots |
Test |
test data to be used to predict y. If not supplied, the fitted y is used hence may be identical with the fitted value. It is important to be cautious if the data is to be divided between train and test subsets in order to train and test the model. If the sample size is not sufficient to have enough data for the test, errors are thrown up. |
A list with the following components:
Visual means of the numeric variable |
Plot of the means of the numeric variables. |
Correlation plot |
Plot of the Correlation Matrix of the numeric variables. To recover the plot, please use this canonical form object$ |
Linear |
The full estimates of the Linear Model. |
Linear with interaction |
The full estimates of the Linear Model with full interaction among the numeric variables. |
Semilog |
The full estimates of the Semilog Model. Here the independent variable(s) is/are log-transformed. |
Growth |
The full estimates of the Growth Model. Here the dependent variable is log-transformed. |
Double Log |
The full estimates of the double-log Model. Here the both the dependent and independent variables are log-transformed. |
Mixed-power model |
The full estimates of the Mixed-power Model. This is a combination of linear and double log models. It has significant gains over the two models separately. |
Translog model |
The full estimates of the double-log Model with full interaction of the numeric variables. |
Quadratic |
The full estimates of the Quadratic Model. Here the square of numeric independent variable(s) is/are included as independent variables. |
Cubic model |
The full estimates of the Cubic Model. Here the third-power (x^3) of numeric independent variable(s) is/are included as independent variables. |
Inverse y |
The full estimates of the Inverse Model. Here the dependent variable is inverse-transformed (1/y). |
Inverse x |
The full estimates of the Inverse Model. Here the independent variable is inverse-transformed (1/x). |
Inverse y & x |
The full estimates of the Inverse Model. Here the dependent and independent variables are inverse-transformed 1/y & 1/x). |
Square root |
The full estimates of the Square root Model. Here the independent variable is square root-transformed (x^0.5). |
Cubic root |
The full estimates of the cubic root Model. Here the independent variable is cubic root-transformed (x^1/3). |
Significant plot of Linear |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Linear with interaction |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Semilog |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Growth |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Double Log |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Mixed-power model |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Translog model |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Quadratic |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Cubic model |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Inverse y |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Inverse x |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Inverse y & x |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Square root |
Plots of order of importance and significance of estimates coefficients of the model. |
Significant plot of Cubic root |
Plots of order of importance and significance of estimates coefficients of the model. |
Model Table |
Formatted Tables of the coefficient estimates of all the models |
Machine Learning Metrics |
Metrics (47) for assessing model performance and metrics for diagnostic analysis of the error in estimation. |
Table of Marginal effects |
Tables of marginal effects of each model. Because of computational limitations, if you choose to estimate all the 14 models, the Tables are produced separately for the major transformations. They can easily be compiled into one. |
Fitted plots long format |
Plots of the fitted estimates from each of the model. |
Fitted plots wide format |
Plots of the fitted estimates from each of the model. |
Prediction plots long format |
Plots of the predicted estimates from each of the model. |
Prediction plots wide format |
Plots of the predicted estimates from each of the model. |
Naive effects plots long format |
Plots of the |
Naive effects plots wide format |
Plots of the |
Summary of numeric variables |
of the dataset. |
Summary of character variables |
of the dataset. |
## Without test data (not run) # y = linearsystems$MKTcost # to run all the exercises, uncomment. # x <- select(linearsystems, -MKTcost) # Linearsystems(y, x, 6, 15) # NaNs produced if run ## Without test data (not run) # x = sampling[, -1] # y = sampling$qOutput # limit = 20 # mod <-3 # Test <- NA # Linearsystems(y, x, 3, 15) # NaNs produced if run # # with test data # x = sampling[, -1] # y = sampling$qOutput # Data <- cbind(y, x) # sampling <- sample(1:nrow(Data), 0.8*nrow(Data)) # 80% of data is sampled for training the model # train <- Data[sampling, ] # Test <- Data[-sampling, ] # 20% of data is reserved for testing (predicting) the model # y <- train$y # x <- train[, -1] # mod <- 4 # Linearsystems(y, x, 4, 15, Test) # NaNs produced if run
## Without test data (not run) # y = linearsystems$MKTcost # to run all the exercises, uncomment. # x <- select(linearsystems, -MKTcost) # Linearsystems(y, x, 6, 15) # NaNs produced if run ## Without test data (not run) # x = sampling[, -1] # y = sampling$qOutput # limit = 20 # mod <-3 # Test <- NA # Linearsystems(y, x, 3, 15) # NaNs produced if run # # with test data # x = sampling[, -1] # y = sampling$qOutput # Data <- cbind(y, x) # sampling <- sample(1:nrow(Data), 0.8*nrow(Data)) # 80% of data is sampled for training the model # train <- Data[sampling, ] # Test <- Data[-sampling, ] # 20% of data is reserved for testing (predicting) the model # y <- train$y # x <- train[, -1] # mod <- 4 # Linearsystems(y, x, 4, 15, Test) # NaNs produced if run
Mallow's Cp is one of the very useful metrics and selection criteria for machine learning algorithms (models). It is used to estimate the closest number to the number of predictors and the intercept (approximate number of explanatory variables) of linear and non-linear based models. The function inherits residuals
from the estimated model. The uniqueness of this function compared to other procedures for computing Mallow's Cp is that it does not require nested models for computation and it is not limited to lm
based models only.
MallowsCp(Model, y, x, type, Nlevels = 0)
MallowsCp(Model, y, x, type, Nlevels = 0)
Model |
The estimated model from which the Mallows Cp would be computed |
y |
The vector of the LHS variable of the estimated model |
x |
The matrix of the RHS variable of the estimated model. Note that if the model adds additional factor variables into the output, then the number of additional factors |
type |
The type of model ( |
Nlevels |
Optional number of additional variables created if the model has categorical variables that generates additional dummy variables during estimation or the number of additional variables created if the model involves interaction terms. |
A list with the following components
MallowsCp |
of the Model. |
library(Dyn4cast) ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) x <- gl(2, 10, 20, labels = c("Ctl","Trt")) y <- c(ctl, trt) Model <- lm(y ~ x) Type <- "LM" MallowsCp(Model = Model, y = y, x = x, type = Type, Nlevels = 0)
library(Dyn4cast) ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) x <- gl(2, 10, 20, labels = c("Ctl","Trt")) y <- c(ctl, trt) Model <- lm(y ~ x) Type <- "LM" MallowsCp(Model = Model, y = y, x = x, type = Type, Nlevels = 0)
This function estimates over 40 Metrics for assessing the quality of Machine Learning Models. The purpose is to provide a wrapper which brings all the metrics on the table and makes it easier to use them to select a model.
MLMetrics(Observed, yvalue, Model, K, Name, Form, kutuf, TTy)
MLMetrics(Observed, yvalue, Model, K, Name, Form, kutuf, TTy)
Observed |
The Observed data in a data frame format |
yvalue |
The Response variable of the estimated Model |
Model |
The Estimated Model (Model = a + bx) |
K |
The number of variables in the estimated Model to consider |
Name |
The Name of the Models that need to be specified. They are ARIMA, Values if the model computes the fitted value without estimation like Essembles, SMOOTH (smooth.spline), Logit, Ensembles based on weight - EssemWet, QUADRATIC polynomial, SPLINE polynomial. |
Form |
Form of the Model Estimated (LM, ALM, GLM, N-LM, ARDL) |
kutuf |
Cutoff for the Estimated values (defaults to 0.5 if not specified) |
TTy |
Type of response variable (Numeric or Response - like binary) |
A list with the following components:
Absolute Error |
of the Model. |
Absolute Percent Error |
of the Model. |
Accuracy |
of the Model. |
Adjusted R Square |
of the Model. |
`Akaike's` Information Criterion AIC |
of the Model. |
Area under the ROC curve (AUC) |
of the Model. |
Average Precision at k |
of the Model. |
Bias |
of the Model. |
Brier score |
of the Model. |
Classification Error |
of the Model. |
F1 Score |
of the Model. |
fScore |
of the Model. |
GINI Coefficient |
of the Model. |
kappa statistic |
of the Model. |
Log Loss |
of the Model. |
`Mallow's` cp |
of the Model. |
Matthews Correlation Coefficient |
of the Model. |
Mean Log Loss |
of the Model. |
Mean Absolute Error |
of the Model. |
Mean Absolute Percent Error |
of the Model. |
Mean Average Precision at k |
of the Model. |
Mean Absolute Scaled Error |
of the Model. |
Median Absolute Error |
of the Model. |
Mean Squared Error |
of the Model. |
Mean Squared Log Error |
of the Model. |
Model turning point error |
of the Model. |
Negative Predictive Value |
of the Model. |
Percent Bias |
of the Model. |
Positive Predictive Value |
of the Model. |
Precision |
of the Model. |
R Square |
of the Model. |
Relative Absolute Error |
of the Model. |
Recall |
of the Model. |
Root Mean Squared Error |
of the Model. |
Root Mean Squared Log Error |
of the Model. |
Root Relative Squared Error |
of the Model. |
Relative Squared Error |
of the Model. |
`Schwarz's` Bayesian criterion BIC |
of the Model. |
Sensitivity |
of the Model. |
specificity |
of the Model. |
Squared Error |
of the Model. |
Squared Log Error |
of the Model. |
Symmetric Mean Absolute Percentage Error |
of the Model. |
Sum of Squared Errors |
of the Model. |
True negative rate |
of the Model. |
True positive rate |
of the Model. |
library(splines) Model <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data) MLMetrics(Observed = Data, yvalue = Data$states, Model = Model, K = 2, Name = "Linear", Form = "LM", kutuf = 0, TTy = "Number")
library(splines) Model <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data) MLMetrics(Observed = Data, yvalue = Data$states, Model = Model, K = 2, Name = "Linear", Form = "LM", kutuf = 0, TTy = "Number")
This function retrieves the latent factors and their variable loadings which can be used as R
objects to perform other analysis.
data |
An |
DATA |
A |
A list with the following components:
Latent_frame |
|
Latent_1 |
|
Latent_2 |
|
Latent_3 |
|
Latent_4 |
|
Latent_5 |
|
Latent_6 |
|
Latent_7 |
|
Latent_8 |
|
Latent_9 |
|
library(psych) Data <- Quicksummary GGn <- names(Data) GG <- ncol(Data) GGx <- c(paste0('x0', 1:9), paste("x", 10:ncol(Data), sep = "")) names(Data) <- GGx lll <- fa.parallel(Data, fm = 'minres', fa = 'fa') dat <- fa(Data, nfactors = lll[["nfact"]], rotate = "varimax",fm="minres") model_factors(data = dat, DATA = Data)
library(psych) Data <- Quicksummary GGn <- names(Data) GG <- ncol(Data) GGx <- c(paste0('x0', 1:9), paste("x", 10:ncol(Data), sep = "")) names(Data) <- GGx lll <- fa.parallel(Data, fm = 'minres', fa = 'fa') dat <- fa(Data, nfactors = lll[["nfact"]], rotate = "varimax",fm="minres") model_factors(data = dat, DATA = Data)
This function is a wrapper for easy affixing of the per cent sign (%) to a value or a vector or a data frame of values.
Percent(Data, Type, format = "f", ...)
Percent(Data, Type, format = "f", ...)
Data |
The Data which the percent sign is to be affixed. The data must be in the raw form because for frame argument, the per cent value of each cell is calculated before the sign is affixed. |
Type |
The type of data. The default arguments are Value for single numeric data of Frame for a numeric vector or data frame data. In the case of vector or data frame, the per cent value of each cell is calculated before the per cent sign is affixed. |
format |
The format of the output which is internal and the default is a character factor |
... |
Additional arguments that may be passed to the function |
This function returns the result as
percent |
values with the percentage sign (%) affixed. |
Data <- c(1.2, 0.5, 0.103, 7, 0.1501) Percent(Data = Data, Type = "Frame") # Value, Frame Data <- 1.2 Percent(Data = Data, Type = "Value") # Value, Frame Percent(Data = sample, Type = "Frame") # Value, Frame
Data <- c(1.2, 0.5, 0.103, 7, 0.1501) Percent(Data = Data, Type = "Frame") # Value, Frame Data <- 1.2 Percent(Data = Data, Type = "Value") # Value, Frame Percent(Data = sample, Type = "Frame") # Value, Frame
There is increasing need to make user-friendly and production ready Tables for machine learning data. This function is a simplified quick summary and the output is a formatted table. This is very handy for those who do not have the time to write codes for user-friendly summaries.
quicksummary(x, Type, Cut, Up, Down, ci = 0.95)
quicksummary(x, Type, Cut, Up, Down, ci = 0.95)
x |
The data to be summarised. Only numeric data is allowed. |
Type |
The type of data to be summarised. There are two options here 1 or 2, 1 = |
Cut |
The cut-off point for Likert-type data |
Up |
The top Likert-type scale, for example, |
Down |
The lower Likert-type scale, for example, |
ci |
Confidence interval which is defaults to 0.95. |
The function returns a formatted Table of the Quick summary
ANS |
The formatted Table of the summary |
# Likert-type data Up <- "Constraint" Down <- "Not a constraint" quicksummary(x = Quicksummary, Type = 2, Cut = 2.60, Up = Up, Down = Down) # Continuous data x <- select(linearsystems, 1:6) quicksummary(x = x, Type = 1)
# Likert-type data Up <- "Constraint" Down <- "Not a constraint" quicksummary(x = Quicksummary, Type = 2, Cut = 2.60, Up = Up, Down = Down) # Continuous data x <- select(linearsystems, 1:6) quicksummary(x = x, Type = 1)
This function is a wrapper for scaling the fitted (predicted) values of a one-sided (positive or negative only) integer response variable of supported models. The scaling involves some log transformation of the fitted (predicted) values.
scaledlogit(x, lower, upper)
scaledlogit(x, lower, upper)
x |
The parameter to be scaled, which is the fitted values from supported models. The scaled parameter is used mainly for constrained forecasting of a response variable positive (0 - inf) or negative (-inf - 0). The scaling involves log transformation of the parameter |
lower |
Integer or variable representing the lower limit for the scaling (-inf or 0) |
upper |
Integer or variable representing the upper limit for the scaling (0 or inf) |
library(Dyn4cast) library(splines) lower <- 1 upper <- 37 Model <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data) scaledlogit(x = fitted.values(Model), lower = lower, upper = upper)
library(Dyn4cast) library(splines) lower <- 1 upper <- 37 Model <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data) scaledlogit(x = fitted.values(Model), lower = lower, upper = upper)
Observational study involves the evaluation of outcomes of participants not randomly assigned treatments or exposures. To be able to assess the effects of the outcome, the participants are matched using propensity scores (PSM). This then enables the determination of the effects of the treatments on those treated against those who were not treated. Most of the earlier functions available for this analysis only enables the determination of the average treatments effects on the treated (ATT) while the other treatment effects are optional. This is where this functions is unique because five different average treatment effects are estimated simultaneously, in spite of the one line code arguments. The five treatment effects are:
Average treatment effect for the entire (ATE) population
Average treatment effect for the treated (ATT) population
Average treatment effect for the controlled (ATC) population
Average treatment effect for the evenly matched (ATM) population
Average treatment effect for the overlap (ATO) population.
There excellent materials dealing with each of the treatment effects, please see
treatment_model(Treatment, x_data)
treatment_model(Treatment, x_data)
Treatment |
Vector of binary data (0, 1) LHS for the treatment effects estimation |
x_data |
Data frame of explanatory variables for the RHS of the estimation |
A list with the following components:
Model |
Estimated treatment effects model. |
Effect |
Data frame of the estimated various treatment effects. |
P_score |
Vector of estimated propensity scores from the model |
Fitted_estimate |
Vector of fitted values from the model |
Residuals |
Residuals of the estimated model |
`Experiment plot` |
Plot of the propensity scores from the model faceted into Treated and control populations |
`ATE plot` |
Plot of the average treatment effect for the entire population |
`ATT plot` |
Plot of the average treatment effect for the treated population |
`ATC plot` |
Plot of the average treatment effect for the controlled population |
`ATM plot` |
Plot of the average Treatment effect for the evenly population |
`ATO plot` |
Plot of the average Treatment effect for the overlap population |
weights |
Estimated weights for each of the treatment effects |
Treatment = treatments$treatment data = treatments[, c(2:3)] treatment_model(Treatment, data)
Treatment = treatments$treatment data = treatments[, c(2:3)] treatment_model(Treatment, data)