Package 'Dyn4cast'

Title: Dynamic Modeling and Machine Learning Environment
Description: Estimates, predict and forecast dynamic models as well as Machine Learning metrics which assists in model selection for further analysis. The package also have capabilities to provide tools and metrics that are useful in machine learning and modeling. For example, there is quick summary, percent sign, Mallow's Cp tools and others. The ecosystem of this package is analysis of economic data for national development. The package is so far stable and has high reliability and efficiency as well as time-saving.
Authors: Job Nmadu [aut, cre]
Maintainer: Job Nmadu <[email protected]>
License: MIT + file LICENSE
Version: 11.11.24
Built: 2024-10-14 03:25:57 UTC
Source: https://github.com/JobNmadu/Dyn4cast

Help Index


Constrained Forecast of One-sided Integer Response Model

Description

This function estimates the lower and upper 80% and 95% forecasts of the Model. The final values are within the lower and upper limits of the base data. Used in conjunction with <scaled_logit> and <inv_scaled_logit> functions, they are adapted from Hyndman & Athanasopoulos (2021) and modified for independent use rather than be restricted to be used with a particular package.

Usage

constrainedforecast(Model, lower, upper)

Arguments

Model

This is the exponential values from the invscaledlogit function.

lower

The lower limit of the forecast

upper

The upper limit of the forecast

Value

A list of forecast values within 80% and 95% confidence band. The values are:

Lower 80%

Forecast at lower 80% confidence level.

Upper 80%

Forecast at upper 80% confidence level.

Lower 95%

Forecast at lower 95% confidence level.

Upper 95%

Forecast at upper 95% confidence level.

Examples

library(Dyn4cast)
library(splines)
library(forecast)
lower <- 1
upper <- 37
Model   <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data)
FitModel <- scaledlogit(x = fitted.values(Model), lower = lower,
 upper = upper)
ForecastModel <- forecast(FitModel, h = length(200))
ForecastValues <- constrainedforecast(Model = ForecastModel, lower, upper)

Custom plot of correlation matrix

Description

This is a custom plot for correlation matrix in which the coefficients are displayed along with graphics showing the magnitude of each coefficient.

Usage

corplot(r)

Arguments

r

Correlation matrix of the data for the plot

Value

The function returns a custom plot of the correlation matrix

corplot

The custom plot of the correlation matrix


Standardize data.frame for comparable Machine Learning prediction and visualization

Description

Often economic and other Machine Learning data are of different units or sizes making either estimation, interpretation or visualization difficult. The solution to these issues can be handled if the data can be transformed into unitless or data of similar magnitude. This is what data_transform is set to do. It is simple and straight forward to use.

Usage

data_transform(data, method, MARGIN = 2)

Arguments

data

A data.frame with numeric data for transformation. All columns in the data are transformed

method

The type of transformation. There three options. 1 is for log transformation, 2 is for min-max transformation and 3 is for mean-SD transformation.

MARGIN

Option to either transform the data 2 == column-wise or 1 == row-wise. Defaults to column-wise transformation if no option is indicated.

Value

This function returns the output of the data transformation process as

tata_transformed

A new data.frame containing the transformed values

Examples

library(Dyn4cast)
# View the data without transformation

data0 <- Transform %>%
pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data0, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

# Example 1: Transformation by min-max method.
# You could also transform the `X column` but is is better not to.

data1 <- data_transform(Transform[, -1], 1)
data1 <- cbind(Transform[, 1], data1)
data1 <- data1 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data1, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

# Example 2: `log` transformation

data2 <- data_transform(Transform[, -1], 2)
data2 <- cbind(Transform[, 1], data2)
data2 <- data2 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data2, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

# Example 3: `Mean-SD` transformation

data3 <- data_transform(Transform[, -1], 3)
data3 <- cbind(Transform[, 1], data3)
data3 <- data3 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data3, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

Dynamic Forecast of Five Models and their Ensembles

Description

The function estimates and predict models using time series dataset and provide subset forecasts within the length of trend. The recognized models are lm, smooth spline, polynomial splines with or without knots, quadratic polynomial, and ARIMA. The robust output include the models' estimates, time-varying forecasts and plots based on themes from ggplot. The main attraction of this function is the use of the newly introduced equal number of trend (days, months, years) to estimate forecast from the model. The function takes ⁠daily, monthly and yearly data sets for now⁠.

Usage

DynamicForecast(date, series, Trend, Type, MaximumDate, x = 0, BREAKS = 0,
 ORIGIN = origin, Length = 0, ...)

Arguments

date

A vector containing the dates for which the data is collected. Must be the same length with series. The date must be in 'YYYY-MM-DD'. If the data is monthly series, the recognized date format is the last day of the month of the dataset e.g. 2021-02-28. If the data is a yearly series, the recognized date format is the last day of the year of the data set e.g. 2020-12-31. There is no format for Quarterly data for now.

series

A vector containing data for estimation and forecasting. Must be the same length with date.

x

vector of optional dataset that is to be added to the model for forecasting. The modeling and forecasting is still done if not provided. Must be the same length with series.

BREAKS

A vector of numbers indicating points of breaks for estimation of the spline models.

MaximumDate

The date indicating the maximum date (last date) in the data frame, meaning that forecasting starts the next date following it. The date must be a recognized date format. Note that for forecasting, the date origin is set to 1970-01-01.

Trend

The type of trend. There are three options Day, Month and Year.

Type

The type of response variable. There are two options Continuous and Integer. For integer variable, the forecasts are constrained between the minimum and maximum value of the response variable.

Length

The length for which the forecast would be made. If not given, would default to the length of the dataset i.e. sample size.

ORIGIN

if different from 1970-01-01 must be in the format "YYYY-MM-DD". This is used to position the date of the data in order to properly date the forecasts.

...

Additional arguments that may be passed to the function. If the maximum date is NULL which is is the default, it is set to the last date of the series.

Value

A list with the following components:

Spline without knots

The estimated spline model without the breaks (knots).

Spline with knots

The estimated spline model with the breaks (knots).

Smooth Spline

The smooth spline estimates.

ARIMA

Estimated Auto Regressive Integrated Moving Average model.

Quadratic

The estimated quadratic polynomial model.

Ensembled with equal weight

Estimated Ensemble model with equal weight given to each of the models. To get this, the fitted values of each of the models is divided by the number of models and summed together.

Ensembled based on weight

Estimated Ensemble model based on weight of each model. To do this, the fitted values of each model served as independent variable and regressed against the trend with interaction among the variables.

Ensembled based on summed weight

Estimated Ensemble model based on summed weight of each model. To do this, the fitted values of each model served as independent variable and is regressed against the trend.

Ensembled based on weight of fit

Estimated Ensemble model. The fit of each model is measured by the rmse.

Unconstrained Forecast

The forecast if the response variable is continuous. The number of forecasts is equivalent to the length of the dataset (equal days forecast).

Constrained Forecast

The forecast if the response variable is integer. The number of forecasts is equivalent to the length of the dataset (equal days forecast).

RMSE

Root Mean Square Error (rmse) for each forecast.

Unconstrained forecast Plot

The combined plots of the unconstrained forecasts using ggplot.

Constrained forecast Plot

The combined plots of the constrained forecasts using ggplot.

Date

This is the date range for the forecast.

Fitted plot

This is the plot of the fitted models.

Estimated coefficients

This is the estimated coefficients of the various models in the forecast.

Examples

# COVID19$Date <- zoo::as.Date(COVID19$Date, format = '%m/%d/%Y')
#  #The date is formatted to R format
# LEN <- length(COVID19$Case)
# Dss <- seq(COVID19$Date[1], by = "day", length.out = LEN)
#  #data length for forecast
# ORIGIN = "2020-02-29"
# lastdayfo21 <- Dss[length(Dss)] # The maximum length # uncomment to run
# Data <- COVID19[COVID19$Date <= lastdayfo21 - 28, ]
# # desired length of forecast
# BREAKS <- c(70, 131, 173, 228, 274) # The default breaks for the data
# DynamicForecast(date = Data$Date, series = Data$Case,
# BREAKS = BREAKS, MaximumDate = "2021-02-10",
#  Trend = "Day", Length = 0, Type = "Integer")
#
# lastdayfo21 <- Dss[length(Dss)]
# Data <- COVID19[COVID19$Date <= lastdayfo21 - 14, ]
# BREAKS = c(70, 131, 173, 228, 274)
# DynamicForecast(date = Data$Date, series = Data$Case,
# BREAKS = BREAKS , MaximumDate = "2021-02-10",
#  Trend = "Day", Length = 0, Type = "Integer")

Plot of Order of Significance of Estimated Regression Coefficients

Description

This function provides graphic displays of the order of significance estimated coefficients of models. This would assists in accessing models so as to decide which can be used for further analysis, prediction and policy consideration.

Usage

estimate_plot(Model, limit)

Arguments

Model

Estimated model for which the estimated coefficients would be plotted

limit

Number of variables to be included in the coefficients plots

Value

The function returns a plot of the order of importance of the estimated coefficients

estimate_plot

The plot of the order of importance of estimated coefficients


Convert continuous vector variable to formatted factors

Description

Often, when a continuous data is converted to factors using the ⁠base R⁠ cut function, the resultant ⁠Class Interval⁠ column provide data with scientific notation which normally appears confusing to interpret, especially to casual data scientist. This function provide a more user-friendly output and is provided in a formatted manner. It is a easy to implement function.

Usage

formattedcut(data, breaks, cut = FALSE)

Arguments

data

A vector of the data to be converted to factors if not cut already or the vector of a cut data

breaks

Number of classes to break the data into

cut

Logical to indicate if the cut function has already being applied to the data, defaults to FALSE.

Value

The function returns a ⁠data frame⁠ with three or four columns i.e ⁠Lower class⁠, ⁠Upper class⁠, ⁠Class interval⁠ and Frequency (if the cut is FALSE).

Cut

The ⁠data frame⁠

Examples

DD <- rnorm(100000)
formattedcut(DD, 12, FALSE)

DD1 <- cut(DD, 12)
DDK <- formattedcut(DD1, 12, TRUE)
DDK
# if data is not from a data frame, the frequency distribution is required.
as.data.frame(DDK %>%
group_by(`Lower class`, `Upper class`, `Class interval`) %>%
tally())

Garrett Ranking of Categorical Data

Description

There are three main types of ranking: Standard competition, Ordinal and Fractional. Garrett's Ranking Technique is the application of fractional ranking in which the data points are ordered and given an ordinal number/rank. The ordering and ranking provide additional information which may not be available from frequency distribution. Again, the ordering is based on the level of seriousness or severity of the data point from the view point of the respondent. Ranking enables ease of comparison and makes grouping more meaningful. It is used in social science, psychology and other survey types of research. This functions performs Garrett Ranking of up to 15 ranks.

Usage

garrett_ranking(data, num_rank, ranking = NULL, m_rank = c(2:15))

Arguments

data

The data for the Garrett Ranking, must be a data.frame.

num_rank

A vector representing the number of ranks applied to the data. If the data is a five-point Likert-type data, then number of ranks is 5.

ranking

A vector of list representing the ranks applied to the data. If not available, positional ranks are applied.

m_rank

The scope of the ranking methods which is between 2 and 15.

Value

A list with the following components:

Data mean table

Table of data ranked using simple average.

Garrett ranked data

Table of data ranked using Garrett mean score.

Garrett value

Table of ranking Garrett values

Examples

garrett_data <- data.frame(garrett_data)
ranking <- c("Serious constraint", "Constraint",
"Not certain it is a constraint", "Not a constraint",
"Not a serious constraint")

## ranking is supplied
garrett_ranking(garrett_data, 5, ranking)

# ranking not supplied
garrett_ranking(garrett_data, 5)

# you can rank subset of the data
garrett_ranking(garrett_data, 8)

garrett_ranking(garrett_data, 4)

Exponential Values after One-Sided Response Integer Variable Forecasting

Description

This function is used to estimate exponential lower (80% and 95%) and upper (80% and 95%) values from the outcome of the scaledlogit function. The exponentiation ensures that the forecast does not go beyond the upper and lower limits of the base data.

Usage

invscaledlogit(x, lower, upper)

Arguments

x

The forecast values from constrained forecast package. Please specify the appropriate column containing the forecast values.

lower

Lower limits of the forecast values

upper

Upper limits of the forecast values

Examples

x <- 1:35
lower <- 1
upper <- 35
invscaledlogit(x = x, lower = lower, upper = upper)

Linear Model and various Transformations for Efficiency

Description

The linear model still remains a reference point towards advanced modeling of some datasets as foundation for Machine Learning, Data Science and Artificial Intelligence in spite of some of her weaknesses. The major task in modeling is to compare various models before a selection is made for one or for advanced modeling. Often, some trial and error methods are used to decide which model to select. This is where this function is unique. It helps to estimate 14 different linear models and provide their coefficients in a formatted Table for quick comparison so that time and energy are saved. The interesting thing about this function is the simplicity, and it is a one line code.

Usage

Linearsystems(y, x, mod, limit, Test = NA)

Arguments

y

Vector of the dependent variable. This must be numeric.

x

Data frame of the explanatory variables.

mod

The group of linear models to be estimated. It takes value from 0 to 6. 0 = EDA (correlation, summary tables, Visuals means); 1 = Linear systems, 2 = power models, 3 = polynomial models, 4 = root models, 5 = inverse models, 6 = all the 14 models

limit

Number of variables to be included in the coefficients plots

Test

test data to be used to predict y. If not supplied, the fitted y is used hence may be identical with the fitted value. It is important to be cautious if the data is to be divided between train and test subsets in order to train and test the model. If the sample size is not sufficient to have enough data for the test, errors are thrown up.

Value

A list with the following components:

Visual means of the numeric variable

Plot of the means of the numeric variables.

Correlation plot

Plot of the Correlation Matrix of the numeric variables. To recover the plot, please use this canonical form object$⁠Correlation plot⁠$plot().

Linear

The full estimates of the Linear Model.

Linear with interaction

The full estimates of the Linear Model with full interaction among the numeric variables.

Semilog

The full estimates of the Semilog Model. Here the independent variable(s) is/are log-transformed.

Growth

The full estimates of the Growth Model. Here the dependent variable is log-transformed.

Double Log

The full estimates of the double-log Model. Here the both the dependent and independent variables are log-transformed.

Mixed-power model

The full estimates of the Mixed-power Model. This is a combination of linear and double log models. It has significant gains over the two models separately.

Translog model

The full estimates of the double-log Model with full interaction of the numeric variables.

Quadratic

The full estimates of the Quadratic Model. Here the square of numeric independent variable(s) is/are included as independent variables.

Cubic model

The full estimates of the Cubic Model. Here the third-power (x^3) of numeric independent variable(s) is/are included as independent variables.

Inverse y

The full estimates of the Inverse Model. Here the dependent variable is inverse-transformed (1/y).

Inverse x

The full estimates of the Inverse Model. Here the independent variable is inverse-transformed (1/x).

Inverse y & x

The full estimates of the Inverse Model. Here the dependent and independent variables are inverse-transformed 1/y & 1/x).

Square root

The full estimates of the Square root Model. Here the independent variable is square root-transformed (x^0.5).

Cubic root

The full estimates of the cubic root Model. Here the independent variable is cubic root-transformed (x^1/3).

Significant plot of Linear

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Linear with interaction

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Semilog

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Growth

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Double Log

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Mixed-power model

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Translog model

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Quadratic

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Cubic model

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Inverse y

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Inverse x

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Inverse y & x

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Square root

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Cubic root

Plots of order of importance and significance of estimates coefficients of the model.

Model Table

Formatted Tables of the coefficient estimates of all the models

Machine Learning Metrics

Metrics (47) for assessing model performance and metrics for diagnostic analysis of the error in estimation.

Table of Marginal effects

Tables of marginal effects of each model. Because of computational limitations, if you choose to estimate all the 14 models, the Tables are produced separately for the major transformations. They can easily be compiled into one.

Fitted plots long format

Plots of the fitted estimates from each of the model.

Fitted plots wide format

Plots of the fitted estimates from each of the model.

Prediction plots long format

Plots of the predicted estimates from each of the model.

Prediction plots wide format

Plots of the predicted estimates from each of the model.

Naive effects plots long format

Plots of the lm effects. May be identical with plots of marginal effects if performed.

Naive effects plots wide format

Plots of the lm effects. May be identical with plots of marginal effects if performed.

Summary of numeric variables

of the dataset.

Summary of character variables

of the dataset.

Examples

## Without test data (not run)
# y = linearsystems$MKTcost # to run all the exercises, uncomment.
# x <- select(linearsystems, -MKTcost)
# Linearsystems(y, x, 6, 15) # NaNs produced if run
## Without test data (not run)
# x = sampling[, -1]
# y = sampling$qOutput
# limit = 20
# mod <-3
# Test <- NA
# Linearsystems(y, x, 3, 15) # NaNs produced if run
# # with test data
# x = sampling[, -1]
# y = sampling$qOutput
# Data <- cbind(y, x)
# sampling <- sample(1:nrow(Data), 0.8*nrow(Data)) # 80% of data is sampled for training the model
# train <- Data[sampling, ]
# Test  <- Data[-sampling, ] # 20% of data is reserved for testing (predicting) the model
# y <- train$y
# x <- train[, -1]
# mod <- 4
# Linearsystems(y, x, 4, 15, Test) # NaNs produced if run

Computation of MallowsCp

Description

Mallow's Cp is one of the very useful metrics and selection criteria for machine learning algorithms (models). It is used to estimate the closest number to the number of predictors and the intercept (approximate number of explanatory variables) of linear and non-linear based models. The function inherits residuals from the estimated model. The uniqueness of this function compared to other procedures for computing Mallow's Cp is that it does not require nested models for computation and it is not limited to lm based models only.

Usage

MallowsCp(Model, y, x, type, Nlevels = 0)

Arguments

Model

The estimated model from which the Mallows Cp would be computed

y

The vector of the LHS variable of the estimated model

x

The matrix of the RHS variable of the estimated model. Note that if the model adds additional factor variables into the output, then the number of additional factors Nlevels is required otherwise the computed Cp would be biased.

type

The type of model (LM, ALM, GLM,N-LM, nls, ARDL, SMOOTH, SPLINE, ARIMA, plm) for which Cp would be computed broadly divided in to linear (LM, ALM, GLM, ARDL, SMOOTH, SPLINE, ARIMA, plm) and non-linear (GLM,N-LM, nls). The type of model must be specified as indicated. Supported models are LM, ALM, GLM (for binary based models), N-LM (not linear for models not clearly defined as linear or non-linear especially some of the essemble models that are merely computed not estimated) or nls for other non linear models, ARDL, SMOOTH for smooth.spline, SPLINE for bs spline models, ARIMA and plm.

Nlevels

Optional number of additional variables created if the model has categorical variables that generates additional dummy variables during estimation or the number of additional variables created if the model involves interaction terms.

Value

A list with the following components

MallowsCp

of the Model.

Examples

library(Dyn4cast)
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
x <- gl(2, 10, 20, labels = c("Ctl","Trt"))
y <- c(ctl, trt)
Model <- lm(y ~ x)
Type <- "LM"
MallowsCp(Model = Model, y = y, x = x, type = Type, Nlevels = 0)

Collection of Machine Learning Model Metrics for Easy Reference

Description

This function estimates over 40 Metrics for assessing the quality of Machine Learning Models. The purpose is to provide a wrapper which brings all the metrics on the table and makes it easier to use them to select a model.

Usage

MLMetrics(Observed, yvalue, Model, K, Name, Form, kutuf, TTy)

Arguments

Observed

The Observed data in a data frame format

yvalue

The Response variable of the estimated Model

Model

The Estimated Model (Model = a + bx)

K

The number of variables in the estimated Model to consider

Name

The Name of the Models that need to be specified. They are ARIMA, Values if the model computes the fitted value without estimation like Essembles, SMOOTH (smooth.spline), Logit, Ensembles based on weight - EssemWet, QUADRATIC polynomial, SPLINE polynomial.

Form

Form of the Model Estimated (LM, ALM, GLM, N-LM, ARDL)

kutuf

Cutoff for the Estimated values (defaults to 0.5 if not specified)

TTy

Type of response variable (Numeric or Response - like binary)

Value

A list with the following components:

Absolute Error

of the Model.

Absolute Percent Error

of the Model.

Accuracy

of the Model.

Adjusted R Square

of the Model.

`Akaike's` Information Criterion AIC

of the Model.

Area under the ROC curve (AUC)

of the Model.

Average Precision at k

of the Model.

Bias

of the Model.

Brier score

of the Model.

Classification Error

of the Model.

F1 Score

of the Model.

fScore

of the Model.

GINI Coefficient

of the Model.

kappa statistic

of the Model.

Log Loss

of the Model.

`Mallow's` cp

of the Model.

Matthews Correlation Coefficient

of the Model.

Mean Log Loss

of the Model.

Mean Absolute Error

of the Model.

Mean Absolute Percent Error

of the Model.

Mean Average Precision at k

of the Model.

Mean Absolute Scaled Error

of the Model.

Median Absolute Error

of the Model.

Mean Squared Error

of the Model.

Mean Squared Log Error

of the Model.

Model turning point error

of the Model.

Negative Predictive Value

of the Model.

Percent Bias

of the Model.

Positive Predictive Value

of the Model.

Precision

of the Model.

R Square

of the Model.

Relative Absolute Error

of the Model.

Recall

of the Model.

Root Mean Squared Error

of the Model.

Root Mean Squared Log Error

of the Model.

Root Relative Squared Error

of the Model.

Relative Squared Error

of the Model.

`Schwarz's` Bayesian criterion BIC

of the Model.

Sensitivity

of the Model.

specificity

of the Model.

Squared Error

of the Model.

Squared Log Error

of the Model.

Symmetric Mean Absolute Percentage Error

of the Model.

Sum of Squared Errors

of the Model.

True negative rate

of the Model.

True positive rate

of the Model.

Examples

library(splines)
Model   <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data)
MLMetrics(Observed = Data, yvalue = Data$states, Model = Model, K = 2,
 Name = "Linear", Form = "LM", kutuf = 0, TTy = "Number")

Latent Factors Recovery from Variables Loadings

Description

This function retrieves the latent factors and their variable loadings which can be used as R objects to perform other analysis.

Arguments

data

An ⁠R object⁠ obtained from exploratory factor analysis (EFA) using the fa function in psych package.

DATA

A data.frame, the raw data used to carry out the parallel analysis to obtain data object.

Value

A list with the following components:

Latent_frame

data.frame of latent factors based on the variables loadings.

Latent_1

data.frame of variables in Latent factor 1 with their loadings.

Latent_2

data.frame of variables in Latent factor 2 with their loadings.

Latent_3

data.frame of variables in Latent factor 3 with their loadings.

Latent_4

data.frame of variables in Latent factor 3 with their loadings.

Latent_5

data.frame of variables in Latent factor 5 with their loadings.

Latent_6

data.frame of variables in Latent factor 6 with their loadings.

Latent_7

data.frame of variables in Latent factor 7 with their loadings.

Latent_8

data.frame of variables in Latent factor 8 with their loadings.

Latent_9

data.frame of variables in Latent factor 9 with their loadings.

Examples

library(psych)
Data <- Quicksummary
GGn <- names(Data)
GG <- ncol(Data)
GGx <- c(paste0('x0', 1:9), paste("x", 10:ncol(Data), sep = ""))
names(Data) <- GGx
lll <- fa.parallel(Data, fm = 'minres', fa = 'fa')
dat <- fa(Data, nfactors = lll[["nfact"]], rotate = "varimax",fm="minres")

model_factors(data = dat, DATA = Data)

Attach Per Cent Sign to Data

Description

This function is a wrapper for easy affixing of the per cent sign (%) to a value or a vector or a data frame of values.

Usage

Percent(Data, Type, format = "f", ...)

Arguments

Data

The Data which the percent sign is to be affixed. The data must be in the raw form because for frame argument, the per cent value of each cell is calculated before the sign is affixed.

Type

The type of data. The default arguments are Value for single numeric data of Frame for a numeric vector or data frame data. In the case of vector or data frame, the per cent value of each cell is calculated before the per cent sign is affixed.

format

The format of the output which is internal and the default is a character factor

...

Additional arguments that may be passed to the function

Value

This function returns the result as

percent

values with the percentage sign (%) affixed.

Examples

Data <- c(1.2, 0.5, 0.103, 7, 0.1501)
Percent(Data = Data, Type = "Frame")  # Value, Frame
Data <- 1.2
Percent(Data = Data, Type = "Value")  # Value, Frame
Percent(Data = sample, Type = "Frame")  # Value, Frame

Quick Formatted Summary of Machine Learning Data

Description

There is increasing need to make user-friendly and production ready Tables for machine learning data. This function is a simplified quick summary and the output is a formatted table. This is very handy for those who do not have the time to write codes for user-friendly summaries.

Usage

quicksummary(x, Type, Cut, Up, Down, ci = 0.95)

Arguments

x

The data to be summarised. Only numeric data is allowed.

Type

The type of data to be summarised. There are two options here 1 or 2, 1 = Continuous and 2 = Likert-type

Cut

The cut-off point for Likert-type data

Up

The top Likert-type scale, for example, Agree, Constraints etc which would appear in the remark column.

Down

The lower Likert-type scale, for example, Disagree, ⁠Not a Constraint⁠ etc which would appear in the remark column.

ci

Confidence interval which is defaults to 0.95.

Value

The function returns a formatted Table of the Quick summary

ANS

The formatted Table of the summary

Examples

# Likert-type data
Up <- "Constraint"
Down <- "Not a constraint"
quicksummary(x = Quicksummary, Type = 2, Cut = 2.60, Up = Up, Down = Down)

# Continuous data
x <- select(linearsystems, 1:6)
quicksummary(x = x, Type = 1)

Scale Parameter for Integer Modeling and Forecast

Description

This function is a wrapper for scaling the fitted (predicted) values of a one-sided (positive or negative only) integer response variable of supported models. The scaling involves some log transformation of the fitted (predicted) values.

Usage

scaledlogit(x, lower, upper)

Arguments

x

The parameter to be scaled, which is the fitted values from supported models. The scaled parameter is used mainly for constrained forecasting of a response variable positive (0 - inf) or negative (-inf - 0). The scaling involves log transformation of the parameter

lower

Integer or variable representing the lower limit for the scaling (-inf or 0)

upper

Integer or variable representing the upper limit for the scaling (0 or inf)

Examples

library(Dyn4cast)
library(splines)
lower <- 1
upper <- 37
Model   <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data)
scaledlogit(x = fitted.values(Model), lower = lower,
 upper = upper)

Enhanced Estimation of Treatment Effects of Binary Data from Randomized Experiments

Description

Observational study involves the evaluation of outcomes of participants not randomly assigned treatments or exposures. To be able to assess the effects of the outcome, the participants are matched using propensity scores (PSM). This then enables the determination of the effects of the treatments on those treated against those who were not treated. Most of the earlier functions available for this analysis only enables the determination of the average treatments effects on the treated (ATT) while the other treatment effects are optional. This is where this functions is unique because five different average treatment effects are estimated simultaneously, in spite of the one line code arguments. The five treatment effects are:

  1. Average treatment effect for the entire (ATE) population

  2. Average treatment effect for the treated (ATT) population

  3. Average treatment effect for the controlled (ATC) population

  4. Average treatment effect for the evenly matched (ATM) population

  5. Average treatment effect for the overlap (ATO) population.

There excellent materials dealing with each of the treatment effects, please see

Usage

treatment_model(Treatment, x_data)

Arguments

Treatment

Vector of binary data (0, 1) LHS for the treatment effects estimation

x_data

Data frame of explanatory variables for the RHS of the estimation

Value

A list with the following components:

Model

Estimated treatment effects model.

Effect

Data frame of the estimated various treatment effects.

P_score

Vector of estimated propensity scores from the model

Fitted_estimate

Vector of fitted values from the model

Residuals

Residuals of the estimated model

`Experiment plot`

Plot of the propensity scores from the model faceted into Treated and control populations

`ATE plot`

Plot of the average treatment effect for the entire population

`ATT plot`

Plot of the average treatment effect for the treated population

`ATC plot`

Plot of the average treatment effect for the controlled population

`ATM plot`

Plot of the average Treatment effect for the evenly population

`ATO plot`

Plot of the average Treatment effect for the overlap population

weights

Estimated weights for each of the treatment effects

Examples

Treatment = treatments$treatment
data = treatments[, c(2:3)]
treatment_model(Treatment, data)