Commentary - (2021) Volume 0, Issue 0
Received: 10-Dec-2021
Published:
31-Dec-2021
Citation: Carin, Daniel Shu. “A Brief Note on Dimensional
Economic Statistical Analysis.” Bus Econ J S6 (2021):004.
Copyright: © 2021 Carin DS. This is an open-access article distributed under the
terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author
and source are credited.
In Recent years, high-dimensional models have grown more common in business and economics. Thousands to millions of records on persons are used in modern corporate data. With the advent of new technology in the last few decades, it has been regular practice in economics, finance, and marketing to collect data on a huge number of traits for a small number of people. The Statistical Analysis on high-dimensional models and their economic and financial applications. The number of parameters in vector autoregressive models expands with the size of the model, therefore estimating these models can quickly become computationally intractable. Panel data, which is extensively utilized in economics, can also be used for high-dimensional data analysis. Volatility matrix estimation is an example of high-dimensional statistics used in finance. Another example in marketing is scanner data on home transactions on a wide number of products.
The aforementioned situations in which the number of parameters is substantially more than the sample sizes are referred to as high-dimensional statistical analysis. Unfortunately, standard econometric procedures that work well with low-dimensional data where the number of records exceeds the number of covariates do not work well with high-dimensional data. So, what can go wrong if a low-dimensional data approach is used in a high-dimensional setting? The major issue is that low-dimensional data analysis approaches like least squares or logistic regression yield a perfect fit on training data but a poor fit on test data. This over-fitting for test data was attributed to the least squares and logistic regression's excessive flexibility in high-dimensional situations. Over-fitting refers to when a statistical process fits the data with largely noise rather than the signal. To minimize overfitting with high-dimensional data, use less flexible least squares or logistic regression models like LASSO, principle components regression, ridge regression, forward stepwise selection etc. Regularization or shrinking, which involves minimising the amount of non-zero coefficient estimations, is the basic notion behind these strategies. Another important concept is that high-dimensional statistical problems suffer from the dimensionality curse, which causes the quality of the fitted model and forecasts to deteriorate as new features are added to the model. Statistical analysis explained the quality of the fit increases when additional signal components that are actually connected with the response are added in the model. When additional noise features that aren't related to the response are added, the model's fit and predictions deteriorate, and the risk of over-fitting rises.
High-dimensional data can be a blessing if the more features are useful and related with the response, resulting in a better predictive model, but it can also be a curse if the additional features are noise, resulting in poor prediction quality. Furthermore, even when relevant variables are included, the additional variance that comes with their inclusion may outweigh the bias reduction. Finally, because of the severe multicollinearity among features, the results for high-dimensional approaches should be evaluated with caution. In highdimensional environments, common measures of model fit, such as p-value and R-squared, can be misleading because they will make the model fit appear almost perfect due to overfitting. High-dimensional statistical analysis is a rapidly expanding field of study that holds great promise for business and economics experts. In the coming years, stronger approaches for prediction and inference are anticipated to be introduced. The use of data mining techniques to augment penalized least squares and penalized likelihood approaches for variable selection in sparse regression models is explored in a potential approach for enhanced prediction and inference.
Business and Economics Journal received 5936 citations as per Google Scholar report