Package 'Dyn4cast'

Title: Dynamic Modeling and Machine Learning Environment
Description: Estimates, predict and forecast dynamic models as well as Machine Learning metrics which assists in model selection for further analysis. The package also have capabilities to provide tools and metrics that are useful in machine learning and modeling. For example, there is quick summary, percent sign, Mallow's Cp tools and others. The ecosystem of this package is analysis of economic data for national development. The package is so far stable and has high reliability and efficiency as well as time-saving.
Authors: Job Nmadu [aut, cre]
Maintainer: Job Nmadu <[email protected]>
License: MIT + file LICENSE
Version: 11.11.24
Built: 2025-03-10 14:21:24 UTC

Help Index

Constrained Forecast of One-sided Integer Response Model


This function estimates the lower and upper 80% and 95% forecasts of the Model. The final values are within the lower and upper limits of the base data. Used in conjunction with <scaled_logit> and <inv_scaled_logit> functions, they are adapted from Hyndman & Athanasopoulos (2021) and modified for independent use rather than be restricted to be used with a particular package.


constrainedforecast(Model, lower, upper)



This is the exponential values from the invscaledlogit function.


The lower limit of the forecast


The upper limit of the forecast


A list of forecast values within 80% and 95% confidence band. The values are:

Lower 80%

Forecast at lower 80% confidence level.

Upper 80%

Forecast at upper 80% confidence level.

Lower 95%

Forecast at lower 95% confidence level.

Upper 95%

Forecast at upper 95% confidence level.


lower <- 1
upper <- 37
Model   <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data)
FitModel <- scaledlogit(x = fitted.values(Model), lower = lower,
 upper = upper)
ForecastModel <- forecast(FitModel, h = length(200))
ForecastValues <- constrainedforecast(Model = ForecastModel, lower, upper)

Custom plot of correlation matrix


This is a custom plot for correlation matrix in which the coefficients are displayed along with graphics showing the magnitude of each coefficient.





Correlation matrix of the data for the plot


The function returns a custom plot of the correlation matrix


The custom plot of the correlation matrix

Standardize data.frame for comparable Machine Learning prediction and visualization


Often economic and other Machine Learning data are of different units or sizes making either estimation, interpretation or visualization difficult. The solution to these issues can be handled if the data can be transformed into unitless or data of similar magnitude. This is what data_transform is set to do. It is simple and straight forward to use.


data_transform(data, method, MARGIN = 2)



A data.frame with numeric data for transformation. All columns in the data are transformed


The type of transformation. There three options. 1 is for log transformation, 2 is for min-max transformation and 3 is for mean-SD transformation.


Option to either transform the data 2 == column-wise or 1 == row-wise. Defaults to column-wise transformation if no option is indicated.


This function returns the output of the data transformation process as


A new data.frame containing the transformed values


# View the data without transformation

data0 <- Transform %>%
pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data0, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

# Example 1: Transformation by min-max method.
# You could also transform the `X column` but is is better not to.

data1 <- data_transform(Transform[, -1], 1)
data1 <- cbind(Transform[, 1], data1)
data1 <- data1 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data1, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

# Example 2: `log` transformation

data2 <- data_transform(Transform[, -1], 2)
data2 <- cbind(Transform[, 1], data2)
data2 <- data2 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data2, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

# Example 3: `Mean-SD` transformation

data3 <- data_transform(Transform[, -1], 3)
data3 <- cbind(Transform[, 1], data3)
data3 <- data3 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data3, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

Dynamic Forecast of Five Models and their Ensembles


The function estimates and predict models using time series dataset and provide subset forecasts within the length of trend. The recognized models are lm, smooth spline, polynomial splines with or without knots, quadratic polynomial, and ARIMA. The robust output include the models' estimates, time-varying forecasts and plots based on themes from ggplot. The main attraction of this function is the use of the newly introduced equal number of trend (days, months, years) to estimate forecast from the model. The function takes ⁠daily, monthly and yearly data sets for now⁠.


DynamicForecast(date, series, Trend, Type, MaximumDate, x = 0, BREAKS = 0,
 ORIGIN = origin, Length = 0, ...)



A vector containing the dates for which the data is collected. Must be the same length with series. The date must be in 'YYYY-MM-DD'. If the data is monthly series, the recognized date format is the last day of the month of the dataset e.g. 2021-02-28. If the data is a yearly series, the recognized date format is the last day of the year of the data set e.g. 2020-12-31. There is no format for Quarterly data for now.


A vector containing data for estimation and forecasting. Must be the same length with date.


vector of optional dataset that is to be added to the model for forecasting. The modeling and forecasting is still done if not provided. Must be the same length with series.


A vector of numbers indicating points of breaks for estimation of the spline models.


The date indicating the maximum date (last date) in the data frame, meaning that forecasting starts the next date following it. The date must be a recognized date format. Note that for forecasting, the date origin is set to 1970-01-01.


The type of trend. There are three options Day, Month and Year.


The type of response variable. There are two options Continuous and Integer. For integer variable, the forecasts are constrained between the minimum and maximum value of the response variable.


The length for which the forecast would be made. If not given, would default to the length of the dataset i.e. sample size.


if different from 1970-01-01 must be in the format "YYYY-MM-DD". This is used to position the date of the data in order to properly date the forecasts.


Additional arguments that may be passed to the function. If the maximum date is NULL which is is the default, it is set to the last date of the series.


A list with the following components:

Spline without knots

The estimated spline model without the breaks (knots).

Spline with knots

The estimated spline model with the breaks (knots).

Smooth Spline

The smooth spline estimates.


Estimated Auto Regressive Integrated Moving Average model.


The estimated quadratic polynomial model.

Ensembled with equal weight

Estimated Ensemble model with equal weight given to each of the models. To get this, the fitted values of each of the models is divided by the number of models and summed together.

Ensembled based on weight

Estimated Ensemble model based on weight of each model. To do this, the fitted values of each model served as independent variable and regressed against the trend with interaction among the variables.

Ensembled based on summed weight

Estimated Ensemble model based on summed weight of each model. To do this, the fitted values of each model served as independent variable and is regressed against the trend.

Ensembled based on weight of fit

Estimated Ensemble model. The fit of each model is measured by the rmse.

Unconstrained Forecast

The forecast if the response variable is continuous. The number of forecasts is equivalent to the length of the dataset (equal days forecast).

Constrained Forecast

The forecast if the response variable is integer. The number of forecasts is equivalent to the length of the dataset (equal days forecast).


Root Mean Square Error (rmse) for each forecast.

Unconstrained forecast Plot

The combined plots of the unconstrained forecasts using ggplot.

Constrained forecast Plot

The combined plots of the constrained forecasts using ggplot.


This is the date range for the forecast.

Fitted plot

This is the plot of the fitted models.

Estimated coefficients

This is the estimated coefficients of the various models in the forecast.


# COVID19$Date <- zoo::as.Date(COVID19$Date, format = '%m/%d/%Y')
#  #The date is formatted to R format
# LEN <- length(COVID19$Case)
# Dss <- seq(COVID19$Date[1], by = "day", length.out = LEN)
#  #data length for forecast
# ORIGIN = "2020-02-29"
# lastdayfo21 <- Dss[length(Dss)] # The maximum length # uncomment to run
# Data <- COVID19[COVID19$Date <= lastdayfo21 - 28, ]
# # desired length of forecast
# BREAKS <- c(70, 131, 173, 228, 274) # The default breaks for the data
# DynamicForecast(date = Data$Date, series = Data$Case,
# BREAKS = BREAKS, MaximumDate = "2021-02-10",
#  Trend = "Day", Length = 0, Type = "Integer")
# lastdayfo21 <- Dss[length(Dss)]
# Data <- COVID19[COVID19$Date <= lastdayfo21 - 14, ]
# BREAKS = c(70, 131, 173, 228, 274)
# DynamicForecast(date = Data$Date, series = Data$Case,
# BREAKS = BREAKS , MaximumDate = "2021-02-10",
#  Trend = "Day", Length = 0, Type = "Integer")

Plot of Order of Significance of Estimated Regression Coefficients


This function provides graphic displays of the order of significance estimated coefficients of models. This would assists in accessing models so as to decide which can be used for further analysis, prediction and policy consideration.


estimate_plot(Model, limit)



Estimated model for which the estimated coefficients would be plotted


Number of variables to be included in the coefficients plots


The function returns a plot of the order of importance of the estimated coefficients


The plot of the order of importance of estimated coefficients

Convert continuous vector variable to formatted factors


Often, when a continuous data is converted to factors using the ⁠base R⁠ cut function, the resultant ⁠Class Interval⁠ column provide data with scientific notation which normally appears confusing to interpret, especially to casual data scientist. This function provide a more user-friendly output and is provided in a formatted manner. It is a easy to implement function.


formattedcut(data, breaks, cut = FALSE)



A vector of the data to be converted to factors if not cut already or the vector of a cut data


Number of classes to break the data into


Logical to indicate if the cut function has already being applied to the data, defaults to FALSE.


The function returns a ⁠data frame⁠ with three or four columns i.e ⁠Lower class⁠, ⁠Upper class⁠, ⁠Class interval⁠ and Frequency (if the cut is FALSE).


The ⁠data frame⁠


DD <- rnorm(100000)
formattedcut(DD, 12, FALSE)

DD1 <- cut(DD, 12)
DDK <- formattedcut(DD1, 12, TRUE)
# if data is not from a data frame, the frequency distribution is required. %>%
group_by(`Lower class`, `Upper class`, `Class interval`) %>%

Garrett Ranking of Categorical Data


There are three main types of ranking: Standard competition, Ordinal and Fractional. Garrett's Ranking Technique is the application of fractional ranking in which the data points are ordered and given an ordinal number/rank. The ordering and ranking provide additional information which may not be available from frequency distribution. Again, the ordering is based on the level of seriousness or severity of the data point from the view point of the respondent. Ranking enables ease of comparison and makes grouping more meaningful. It is used in social science, psychology and other survey types of research. This functions performs Garrett Ranking of up to 15 ranks.


garrett_ranking(data, num_rank, ranking = NULL, m_rank = c(2:15))



The data for the Garrett Ranking, must be a data.frame.


A vector representing the number of ranks applied to the data. If the data is a five-point Likert-type data, then number of ranks is 5.


A vector of list representing the ranks applied to the data. If not available, positional ranks are applied.


The scope of the ranking methods which is between 2 and 15.


A list with the following components:

Data mean table

Table of data ranked using simple average.

Garrett ranked data

Table of data ranked using Garrett mean score.

Garrett value

Table of ranking Garrett values


garrett_data <- data.frame(garrett_data)
ranking <- c("Serious constraint", "Constraint",
"Not certain it is a constraint", "Not a constraint",
"Not a serious constraint")

## ranking is supplied
garrett_ranking(garrett_data, 5, ranking)

# ranking not supplied
garrett_ranking(garrett_data, 5)

# you can rank subset of the data
garrett_ranking(garrett_data, 8)

garrett_ranking(garrett_data, 4)

Create Gender Variable


Often, there is need to differentiate between sex and gender. Many wonder if there is any difference at all. This function will create clarity between them.





data frame containing Age and Sex variables


The data.frame with:


data frame with two additional variables.


# df <- data.frame(Age = c(49, 30, 44, 37, 29, 56, 28, 26, 33, 45, 45, 19,
#   32, 22, 19, 28, 28, 36, 56, 34),
#  Sex = c("male", "female", "female", "male", "male", "male", "female",
#  "female", "Prefer not to say", "male", "male", "female", "female", "male",
#  "Non-binary/third gender", "male", "female", "female", "male", "male"))
#  gender(df)

Exponential Values after One-Sided Response Integer Variable Forecasting


This function is used to estimate exponential lower (80% and 95%) and upper (80% and 95%) values from the outcome of the scaledlogit function. The exponentiation ensures that the forecast does not go beyond the upper and lower limits of the base data.


invscaledlogit(x, lower, upper)



The forecast values from constrained forecast package. Please specify the appropriate column containing the forecast values.


Lower limits of the forecast values


Upper limits of the forecast values


x <- 1:35
lower <- 1
upper <- 35
invscaledlogit(x = x, lower = lower, upper = upper)

Linear Model and various Transformations for Efficiency


The linear model still remains a reference point towards advanced modeling of some datasets as foundation for Machine Learning, Data Science and Artificial Intelligence in spite of some of her weaknesses. The major task in modeling is to compare various models before a selection is made for one or for advanced modeling. Often, some trial and error methods are used to decide which model to select. This is where this function is unique. It helps to estimate 14 different linear models and provide their coefficients in a formatted Table for quick comparison so that time and energy are saved. The interesting thing about this function is the simplicity, and it is a one line code.


Linearsystems(y, x, mod, limit, Test = NA)



Vector of the dependent variable. This must be numeric.


Data frame of the explanatory variables.


The group of linear models to be estimated. It takes value from 0 to 6. 0 = EDA (correlation, summary tables, Visuals means); 1 = Linear systems, 2 = power models, 3 = polynomial models, 4 = root models, 5 = inverse models, 6 = all the 14 models


Number of variables to be included in the coefficients plots


test data to be used to predict y. If not supplied, the fitted y is used hence may be identical with the fitted value. It is important to be cautious if the data is to be divided between train and test subsets in order to train and test the model. If the sample size is not sufficient to have enough data for the test, errors are thrown up.


A list with the following components:

Visual means of the numeric variable

Plot of the means of the numeric variables.

Correlation plot

Plot of the Correlation Matrix of the numeric variables. To recover the plot, please use this canonical form object$⁠Correlation plot⁠$plot().


The full estimates of the Linear Model.

Linear with interaction

The full estimates of the Linear Model with full interaction among the numeric variables.


The full estimates of the Semilog Model. Here the independent variable(s) is/are log-transformed.


The full estimates of the Growth Model. Here the dependent variable is log-transformed.

Double Log

The full estimates of the double-log Model. Here the both the dependent and independent variables are log-transformed.

Mixed-power model

The full estimates of the Mixed-power Model. This is a combination of linear and double log models. It has significant gains over the two models separately.

Translog model

The full estimates of the double-log Model with full interaction of the numeric variables.


The full estimates of the Quadratic Model. Here the square of numeric independent variable(s) is/are included as independent variables.

Cubic model

The full estimates of the Cubic Model. Here the third-power (x^3) of numeric independent variable(s) is/are included as independent variables.

Inverse y

The full estimates of the Inverse Model. Here the dependent variable is inverse-transformed (1/y).

Inverse x

The full estimates of the Inverse Model. Here the independent variable is inverse-transformed (1/x).

Inverse y & x

The full estimates of the Inverse Model. Here the dependent and independent variables are inverse-transformed 1/y & 1/x).

Square root

The full estimates of the Square root Model. Here the independent variable is square root-transformed (x^0.5).

Cubic root

The full estimates of the cubic root Model. Here the independent variable is cubic root-transformed (x^1/3).

Significant plot of Linear

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Linear with interaction

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Semilog

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Growth

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Double Log

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Mixed-power model

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Translog model

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Quadratic

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Cubic model

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Inverse y

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Inverse x

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Inverse y & x

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Square root

Plots of order of importance and significance of estimates coefficients of the model.

Significant plot of Cubic root

Plots of order of importance and significance of estimates coefficients of the model.

Model Table

Formatted Tables of the coefficient estimates of all the models

Machine Learning Metrics

Metrics (47) for assessing model performance and metrics for diagnostic analysis of the error in estimation.

Table of Marginal effects

Tables of marginal effects of each model. Because of computational limitations, if you choose to estimate all the 14 models, the Tables are produced separately for the major transformations. They can easily be compiled into one.

Fitted plots long format

Plots of the fitted estimates from each of the model.

Fitted plots wide format

Plots of the fitted estimates from each of the model.

Prediction plots long format

Plots of the predicted estimates from each of the model.

Prediction plots wide format

Plots of the predicted estimates from each of the model.

Naive effects plots long format

Plots of the lm effects. May be identical with plots of marginal effects if performed.

Naive effects plots wide format

Plots of the lm effects. May be identical with plots of marginal effects if performed.

Summary of numeric variables

of the dataset.

Summary of character variables

of the dataset.


## Without test data (not run)
# y = linearsystems$MKTcost # to run all the exercises, uncomment.
# x <- select(linearsystems, -MKTcost)
# Linearsystems(y, x, 6, 15) # NaNs produced if run
## Without test data (not run)
# x = sampling[, -1]
# y = sampling$qOutput
# limit = 20
# mod <-3
# Test <- NA
# Linearsystems(y, x, 3, 15) # NaNs produced if run
# # with test data
# x = sampling[, -1]
# y = sampling$qOutput
# Data <- cbind(y, x)
# sampling <- sample(1:nrow(Data), 0.8*nrow(Data)) # 80% of data is sampled for training the model
# train <- Data[sampling, ]
# Test  <- Data[-sampling, ] # 20% of data is reserved for testing (predicting) the model
# y <- train$y
# x <- train[, -1]
# mod <- 4
# Linearsystems(y, x, 4, 15, Test) # NaNs produced if run

Computation of MallowsCp


Mallow's Cp is one of the very useful metrics and selection criteria for machine learning algorithms (models). It is used to estimate the closest number to the number of predictors and the intercept (approximate number of explanatory variables) of linear and non-linear based models. The function inherits residuals from the estimated model. The uniqueness of this function compared to other procedures for computing Mallow's Cp is that it does not require nested models for computation and it is not limited to lm based models only.


MallowsCp(Model, y, x, type, Nlevels = 0)



The estimated model from which the Mallows Cp would be computed


The vector of the LHS variable of the estimated model


The matrix of the RHS variable of the estimated model. Note that if the model adds additional factor variables into the output, then the number of additional factors Nlevels is required otherwise the computed Cp would be biased.


The type of model (LM, ALM, GLM,N-LM, nls, ARDL, SMOOTH, SPLINE, ARIMA, plm) for which Cp would be computed broadly divided in to linear (LM, ALM, GLM, ARDL, SMOOTH, SPLINE, ARIMA, plm) and non-linear (GLM,N-LM, nls). The type of model must be specified as indicated. Supported models are LM, ALM, GLM (for binary based models), N-LM (not linear for models not clearly defined as linear or non-linear especially some of the essemble models that are merely computed not estimated) or nls for other non linear models, ARDL, SMOOTH for smooth.spline, SPLINE for bs spline models, ARIMA and plm.


Optional number of additional variables created if the model has categorical variables that generates additional dummy variables during estimation or the number of additional variables created if the model involves interaction terms.


A list with the following components


of the Model.


ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
x <- gl(2, 10, 20, labels = c("Ctl","Trt"))
y <- c(ctl, trt)
Model <- lm(y ~ x)
Type <- "LM"
MallowsCp(Model = Model, y = y, x = x, type = Type, Nlevels = 0)

Sequential Computation of Dynamic Multidimensional Poverty Indices (MDPI)


This function computes the indices and all associated measures of multidimensional poverty sequentially in a dynamic way. Sequentially the function computes Incidence of poverty (H), Adjusted incidence of poverty (H * (q/n)), Deprivation Score of each dimension in the computation, Intensity of poverty (A), Multidimensional poverty index (MDPI = H * A), the Contribution in % of each of the dimensions to MDPI, and Average deprivation among the deprived (A * D). Dynamically, it computes the various indices for between three and nine dimensions (D). The first five dimensions included in the computations are Health, Education, Living standard, Social security and, Employment and Income depending on the choice of the user. Four additional dimensions can be included in the computations. The computations are carried out either for the ⁠national sample data⁠ or can be dis-aggregated based on ⁠grouping factors⁠, like region, sex, gender, marital status or any suitable one. The cut-off mark demarcating poor (q) and non-poor (n-q) members in the sample (n) is defaulted to 0.4 but can be varied as may be dictated by the interests or the need for the computation. The computations are in line with various procedures already outlined in literature starting with the work of Alkire et. al, (2015) but has been expanded from three dimensions to nine. Each dimension is given ⁠equal weight⁠ in the computation but all indicators are weighted in line with existing guidelines in Alkire & Foster (2011) and Alkire & Santos (2010). See also Alkire & Santos (2014) and Chan & Wong (2024).


  Bar = 0.4,
  id_addn = NULL,
  Factor = NULL,
  plots = NULL,
  id = c("Health", "Education", "Living standard"),
  id_add = "Social security",
  id_add1 = "Employment and Income"



⁠data frame⁠ containing all the variables for the computation. Note that the variables to be used for the computation must be coded ⁠(0,1)⁠.


list of vectors of indicators making up each dimension to be computed


a vector of cut-of used to divide the population into those in the poverty category and those that are not. Defaults to 0.4 if not supplied.


a vector of additional dimensions to be used for the computation up to a maximum of four.


a grouping factor for the computation which must be a variable in the data.


plots of the various measures. For this to be possible, the number of options in the Factor argument must be less than 41. The default is NULL. To produce, any character string will overwrite the default.


a vector of the first three dimensions used in the computation given as Health, Education and Living standard.


a vector of the fourth dimension in the computation given as Social security. Can be re-defined but never NULL.


a vector of the fifth dimension in the computation given as Employment and Income. Can be re-defined but never NULL.


A list with the following components:


Publication-ready table of the factor and national MDPI prepared with ⁠summarymodels package⁠. Will not return if only national computation is carried out.


⁠Data frame⁠ of the factor and national MDPI. Will not return if only national computation is carried out.

MDPI mean

⁠Data frame⁠ of the mean MDPI. Will not return if only national computation is carried out.


⁠Data frame⁠ of the SD of MDPI. Will not return if only national computation is carried out.


⁠Data frame⁠ of national MDPI with mean and SD.


⁠Data frame⁠ of the scores for each dimension in the computation.


⁠Data frame⁠ of the scores for each indicator in the computation.


Alkire, S. & Foster, J. (2011). Counting and Multidimensional Poverty Measurement. Journal of Public Economics 95(7-8): 476–87.

Alkire, S., Foster, J. E., Seth, S., Santos, M. E., Roche, J., & Ballon, P. (2015). Multidimensional poverty measurement and analysis. Oxford University Press.

Alkire, S. & Santos, M. E. (2010). Acute Multidimensional Poverty: A New Index for Developing Countries. Oxford Poverty and Human Development Initiative (OPHI) Working Paper No. 38.

Alkire, S. & Santos, M. E. (2014). Measuring Acute Poverty in the Developing World: Robustness and Scope of the Multidimensional Poverty Index. World Development 59:251-274.

Siu Ming Chan & Hung Wong (2024): Measurement and determinants of multidimensional poverty: the case of Hong Kong, Journal of Asian Public Policy, DOI: 10.1080/17516234.2024.2325857


# Not run, uncomment to run
# library(MPI)
# data("examplePovertydf")
# data <- examplePovertydf
# dm <- list(d1 = c("Child.Mortality", ""),
#            d2 = c("", "School.attendance", "School.lag"),
#            d3 = c("Cooking.Fuel", "",
#                   "", "Electricity",
#                   "Housing.Materials", "Asset.ownership"))
# mdpi(data, dm, plots = "t", Factor = "Region")
# library(mpitbR)
# data <- subset(syn_cdta)
# data <- na.omit(data)
# dm <- list(d1 = c("d_nutr","d_cm"),
#            d2 = c("d_satt","d_educ"),
#            d3 = c("d_elct","d_sani","d_wtr","d_hsg","d_ckfl","d_asst"))
# mdpi(data, dm, plots = "t", Factor = "region")

Collection of Machine Learning Model Metrics for Easy Reference


This function estimates over 40 Metrics for assessing the quality of Machine Learning Models. The purpose is to provide a wrapper which brings all the metrics on the table and makes it easier to use them to select a model.


MLMetrics(Observed, yvalue, Model, K, Name, Form, kutuf, TTy)



The Observed data in a data frame format


The Response variable of the estimated Model


The Estimated Model (Model = a + bx)


The number of variables in the estimated Model to consider


The Name of the Models that need to be specified. They are ARIMA, Values if the model computes the fitted value without estimation like Essembles, SMOOTH (smooth.spline), Logit, Ensembles based on weight - EssemWet, QUADRATIC polynomial, SPLINE polynomial.


Form of the Model Estimated (LM, ALM, GLM, N-LM, ARDL)


Cutoff for the Estimated values (defaults to 0.5 if not specified)


Type of response variable (Numeric or Response - like binary)


A list with the following components:

Absolute Error

of the Model.

Absolute Percent Error

of the Model.


of the Model.

Adjusted R Square

of the Model.

`Akaike's` Information Criterion AIC

of the Model.

Area under the ROC curve (AUC)

of the Model.

Average Precision at k

of the Model.


of the Model.

Brier score

of the Model.

Classification Error

of the Model.

F1 Score

of the Model.


of the Model.

GINI Coefficient

of the Model.

kappa statistic

of the Model.

Log Loss

of the Model.

`Mallow's` cp

of the Model.

Matthews Correlation Coefficient

of the Model.

Mean Log Loss

of the Model.

Mean Absolute Error

of the Model.

Mean Absolute Percent Error

of the Model.

Mean Average Precision at k

of the Model.

Mean Absolute Scaled Error

of the Model.

Median Absolute Error

of the Model.

Mean Squared Error

of the Model.

Mean Squared Log Error

of the Model.

Model turning point error

of the Model.

Negative Predictive Value

of the Model.

Percent Bias

of the Model.

Positive Predictive Value

of the Model.


of the Model.

R Square

of the Model.

Relative Absolute Error

of the Model.


of the Model.

Root Mean Squared Error

of the Model.

Root Mean Squared Log Error

of the Model.

Root Relative Squared Error

of the Model.

Relative Squared Error

of the Model.

`Schwarz's` Bayesian criterion BIC

of the Model.


of the Model.


of the Model.

Squared Error

of the Model.

Squared Log Error

of the Model.

Symmetric Mean Absolute Percentage Error

of the Model.

Sum of Squared Errors

of the Model.

True negative rate

of the Model.

True positive rate

of the Model.


Model   <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data)
MLMetrics(Observed = Data, yvalue = Data$states, Model = Model, K = 2,
 Name = "Linear", Form = "LM", kutuf = 0, TTy = "Number")

Latent Factors Recovery from Variables Loadings


This function retrieves the latent factors and their variable loadings which can be used as R objects to perform other analysis.



An ⁠R object⁠ obtained from exploratory factor analysis (EFA) using the fa function in psych package.


A data.frame, the raw data used to carry out the parallel analysis to obtain data object.


A list with the following components:


data.frame of latent factors based on the variables loadings.


data.frame of variables in Latent factor 1 with their loadings.


data.frame of variables in Latent factor 2 with their loadings.


data.frame of variables in Latent factor 3 with their loadings.


data.frame of variables in Latent factor 3 with their loadings.


data.frame of variables in Latent factor 5 with their loadings.


data.frame of variables in Latent factor 6 with their loadings.


data.frame of variables in Latent factor 7 with their loadings.


data.frame of variables in Latent factor 8 with their loadings.


data.frame of variables in Latent factor 9 with their loadings.


Data <- Quicksummary
GGn <- names(Data)
GG <- ncol(Data)
GGx <- c(paste0('x0', 1:9), paste("x", 10:ncol(Data), sep = ""))
names(Data) <- GGx
lll <- fa.parallel(Data, fm = 'minres', fa = 'fa')
dat <- fa(Data, nfactors = lll[["nfact"]], rotate = "varimax",fm="minres")

model_factors(data = dat, DATA = Data)

Attach Per Cent Sign to Data


This function is a wrapper for easy affixing of the per cent sign (%) to a value or a vector or a data frame of values.


Percent(Data, Type, format = "f", ...)



The Data which the percent sign is to be affixed. The data must be in the raw form because for frame argument, the per cent value of each cell is calculated before the sign is affixed.


The type of data. The default arguments are Value for single numeric data of Frame for a numeric vector or data frame data. In the case of vector or data frame, the per cent value of each cell is calculated before the per cent sign is affixed.


The format of the output which is internal and the default is a character factor


Additional arguments that may be passed to the function


This function returns the result as


values with the percentage sign (%) affixed.


Data <- c(1.2, 0.5, 0.103, 7, 0.1501)
Percent(Data = Data, Type = "Frame")  # Value, Frame
Data <- 1.2
Percent(Data = Data, Type = "Value")  # Value, Frame
Percent(Data = sample, Type = "Frame")  # Value, Frame

Plots of Multidimensional Poverty Measures


Plots of Multidimensional Poverty Measures


plot_mdpi(data, kala, dma, factor = NULL)



⁠Data frame⁠ of Multidimensional Poverty measures which is an object from mdpi


color palette with at least 15 colors but must be equal or higher than the number of options in the factor argument


number of Dimensions involved in the computation of Multidimensional Poverty measures.


the optional grouping factor used in the computation measures. If not supplied only the national plots will be produced irrespective of whether the factor was used in the computation.


A list of the following plots:

Multidimensional poverty index


Deprivation Score


Adjusted incidence of poverty


Intensity of poverty


Average deprivation among the deprived


Contribution of each Dimension


combined dimensions




combined dimensions of national



# Not run, uncomment to run
# library(MPI)
# data("examplePovertydf")
# data <- examplePovertydf
# dm <- list(d1 = c("Child.Mortality", ""),
#            d2 = c("", "School.attendance", "School.lag"),
#            d3 = c("Cooking.Fuel", "",
#                   "", "Electricity",
#                   "Housing.Materials", "Asset.ownership"))
# dp <- mdpi(data, dm, Factor = "Region")
# library(MetBrewer)
# kala <- met.brewer("OKeeffe1", 15, type = "continuous")
# dma <- 3
# plot_mdpi(dp$MDPI, kala, dma, "Region")

Quick Formatted Summary of Machine Learning Data


There is increasing need to make user-friendly and production ready Tables for machine learning data. This function is a simplified quick summary and the output is a formatted table. This is very handy for those who do not have the time to write codes for user-friendly summaries.


quicksummary(x, Type, Cut, Up, Down, ci = 0.95)



The data to be summarised. Only numeric data is allowed.


The type of data to be summarised. There are two options here 1 or 2, 1 = Continuous and 2 = Likert-type


The cut-off point for Likert-type data


The top Likert-type scale, for example, Agree, Constraints etc which would appear in the remark column.


The lower Likert-type scale, for example, Disagree, ⁠Not a Constraint⁠ etc which would appear in the remark column.


Confidence interval which is defaults to 0.95.


The function returns a formatted Table of the Quick summary


The formatted Table of the summary


# Likert-type data
Up <- "Constraint"
Down <- "Not a constraint"
quicksummary(x = Quicksummary, Type = 2, Cut = 2.60, Up = Up, Down = Down)

# Continuous data
x <- select(linearsystems, 1:6)
quicksummary(x = x, Type = 1)

Scale Parameter for Integer Modeling and Forecast


This function is a wrapper for scaling the fitted (predicted) values of a one-sided (positive or negative only) integer response variable of supported models. The scaling involves some log transformation of the fitted (predicted) values.


scaledlogit(x, lower, upper)



The parameter to be scaled, which is the fitted values from supported models. The scaled parameter is used mainly for constrained forecasting of a response variable positive (0 - inf) or negative (-inf - 0). The scaling involves log transformation of the parameter


Integer or variable representing the lower limit for the scaling (-inf or 0)


Integer or variable representing the upper limit for the scaling (0 or inf)


lower <- 1
upper <- 37
Model   <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data)
scaledlogit(x = fitted.values(Model), lower = lower,
 upper = upper)

Enhanced Estimation of Treatment Effects of Binary Data from Randomized Experiments


Observational study involves the evaluation of outcomes of participants not randomly assigned treatments or exposures. To be able to assess the effects of the outcome, the participants are matched using propensity scores (PSM). This then enables the determination of the effects of the treatments on those treated against those who were not treated. Most of the earlier functions available for this analysis only enables the determination of the average treatments effects on the treated (ATT) while the other treatment effects are optional. This is where this functions is unique because five different average treatment effects are estimated simultaneously, in spite of the one line code arguments. The five treatment effects are:

  1. Average treatment effect for the entire (ATE) population

  2. Average treatment effect for the treated (ATT) population

  3. Average treatment effect for the controlled (ATC) population

  4. Average treatment effect for the evenly matched (ATM) population

  5. Average treatment effect for the overlap (ATO) population.

There are excellent materials dealing with each of the treatment effects, please see Understanding propensity score weighting


treatment_model(Treatment, x_data)



Vector of binary data (0 = control population, 1 = treated population) LHS for the treatment effects estimation


Data frame of explanatory variables for the RHS of the estimation


A list with the following components:


Estimated treatment effects model.


Data frame of the estimated various treatment effects.


Vector of estimated propensity scores from the model


Vector of fitted values from the model


Residuals of the estimated model

`Experiment plot`

Plot of the propensity scores from the model faceted into Treated and control populations

`ATE plot`

Plot of the average treatment effect for the entire population

`ATT plot`

Plot of the average treatment effect for the treated population

`ATC plot`

Plot of the average treatment effect for the controlled population

`ATM plot`

Plot of the average Treatment effect for the evenly population

`ATO plot`

Plot of the average Treatment effect for the overlap population


Estimated weights for each of the treatment effects


Treatment = treatments$treatment
data = treatments[, c(2:3)]
treatment_model(Treatment, data)