consequences of heteroscedasticity

Thus, regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect. Note that these did not change, which indicates that there is no bias in the estimates themselves in the presence of heteroscedasticity. Heteroscedasticity is prescribed change in magnitude of spread of residuals over the range. Following the error learning models, as people learn their error of behaviors becomes smaller over time. In the first stage we run the OLS regression disregarding the heteroscedasticity question. OLS estimators are still unbiased and consistent, but: OLS estimators are inefficient, i.e. Heteroscedasticity can also arise as a result of the presence of outliers. 1) OLS Coefficients are still unbiased for true value. Effects of Heteroscedasticity: As mentioned above that one of the assumption (assumption number 2) of linear regression is that there is no heteroscedasticity. The OLS estimators no longer have the lowest variance among all unbiased linear estimators. This effect occurs because heteroscedasticity increases the variance of the coefficient estimates but the OLS procedure does not detect this increase. Consequently, OLS calculates the t-values and F-values using an underestimated amount of variance. The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too. In the case of heteroscedasticity, the OLS estimators are unbiased but inefficient. Consequences of Heteroskedasticity First, note that we do not need the homoskedasticity assumption to show the unbiasedness of OLS. Pure (as opposed to impure) heteroscedasticity does not cause bias in the parameter estimates. 2.3 Consequences of Heteroscedasticity. This implies that if we still use OLS in the presence of heteroscedasticity, our standard errors could be inappropriate and hence any inferences we make could be misleading. When the residuals are serially correlated the parameters estimates of OLS are statistically unbiased. The variances of the OLS estimators are biased in this case. Heteroskedasticity has serious consequences for the OLS estimator. Heteroscedasticity is usually modeled using one the following specifications: -H1 : σt2 is a function of past εt2 and past σt2 (GARCH model).-H2 : σt2 increases monotonically with one (or several) exogenous variable(s) (x1,...). Heteroscedasticity is also caused due to omission of variables from the model. The variance of each disturbance term μi, conditional on the chosen values of explanatory variables is some constant number equal to σ2. Also note that heteroscedasticity tends to affect cross-sectional data more than time series. When the disturbance term exhibits serial correlation, the values as well as the standard errors of the parameters estimates are affected. Considering the same income saving model, if the variable income is deleted from the model, then the researcher would not be able to interpret anything from the model. However, the homoskedasticity assumption is needed to show the efficiency of OLS. If it turns out to be insignificant, we may accept the assumption of homoscedasticity. Recall, under heteroscedasticity the OLS estimator still delivers unbiased and consistent coefficient estimates, but the estimator will be biased for standard errors. Consequences of Heteroskedasticity for OLS • Assumption MLR. If β3 turns out to be statistically significant, it would suggest that heteroscedasticity is present in the data. There exists an alternative to the OLS Coefficient that has a smaller variance than the OLS one. But there is no reference. E(β̂) = β. Unbiased coefficients depend on E(εi)=0, Cov(xi,εi)=0. So the regression is safe from heteroscedasticity in terms of unbiasedness. As we have seen, both estimators are (linear) unbiased estimators: In repeated sampling, on the average, they will equal the true parameter; that is, they are both unbiased estimators. When heteroscedasticity is present in data, then estimates based on Ordinary Least Square (OLS) are subjected to following consequences: We cannot apply the formula of the variance of the coefficients to conduct tests of significance and construct confidence intervals. Heteroscedasticity is more likely to occur, for example, when the range in the sample vary substantially in different observations. This suggests that there was some other variable affecting the rate of decomposition that wasn't accounted for by the simple model. HETEROSCEDASTICITY AND SKEWNESS IN REGRESSION [Effects of Heteroscedasticity and Skewness on Prediction in Regression: Modeling Growth of the Human Heart] By ROBERT D. ABBOTT and HOWARD P. GUTGESELL. Introduction: Two of the most common characteristics of data include heteroscedasticity (heterogeneity of variance) and skewness. Violation of CLRM – Assumption 4.2: Consequences of Heteroscedasticity. Heteroscedasticity does not cause ordinary least squares coefficient estimates to be biased, although it can cause ordinary least squares estimates of the variance (and, thus, standard errors) of the coefficients to be biased, possibly above or below the true or population variance. Heteroscedasticity arises from violating the assumption of CLRM (classical linear regression model), that the regression model is not correctly specified. Applying expectation on both sides we get: E(β̂)=β+∑E(xiεi)/∑xi2=β since E(εixi)=0. α̂=ȳ-β̂X̄ The concentration of H2O2 against time follows a half-life rule. The hours put in typing practice and the number of typing errors are related - as practice increases, errors are expected to decrease. Var(εi)=σ2 where i=1,2,...,n. Consequences of Heteroscedasticity: The OLS estimators and regression predictions based on them remains unbiased and consistent. One of the assumptions of the classical linear regression model is that there is no heteroscedasticity. Incorrect data transformation, incorrect functional form (linear or log-linear model) can cause outliers. The measure utilizes the dispersion of the variance of the regression residuals. Because of this, confidence intervals and hypotheses tests cannot be relied on. The estimated standard errors are biased and as a result the t-tests and the F-test are invalid. For example the number of typing errors made in a given time period on a test is expected to decrease with the hours put in typing practice. Heteroscedasticity is more common in cross sectional types of data than in time series types of data. The fit will still be reasonable, unless the heteroskedasticity is so severe as to cause outliers. Heteroskedasticity is often a result of the nature of distribution of one or more regressors included in the model. The estimated SE is wrong when heteroscedasticity is present. Other violations of assumptions have their own consequences which we will deal with elsewhere. My goal in this blog post is to bring the effects of multicollinearity to life with real data. There are online data banks where from you can search for heteroscedastic data. The problem of heteroscedasticity can be solved by using weighted least squares procedure or by using a robust command in the regression. For example, consider large countries such as the USA and small countries such as Cuba - comparing drug store and general store sales. Under heteroscedasticity the OLS estimators are still unbiased and consistent. The symbols and formulas used are from very common econometrics books. A student conducted an experiment in his school chemistry class into the rate of decomposition of hydrogen peroxide in the presence of a catalyst. Although the OLS parameter estimates are still unbiased, the estimated standard errors do not reflect the true variability. The unbiasedness property of OLS estimation is not violated by the presence of heteroscedasticity. The OLS estimator remains unbiased and consistent but is no longer BLUE. The classical example of heteroscedasticity is the range in family income between the poorest and richest family in town. Heteroscedasticity is caused by different sources including: incorrect data transformation, incorrect functional form (linear or log-linear model), omission of variables from the model, and the nature of distribution of one or more regressors included in the model. Standard errors will be unreliable, which will further cause bias in test results and confidence intervals. The weighted least squares procedure can solve the problem of heteroscedasticity in some cases. Bartlett's test is one of the most popular statistical tests for homoscedasticity. The effect of heteroscedasticity on regression trees has not yet been studied extensively. Reference: Verbeek, M. (2008). A Guide to Modern Econometrics, 2nd ed., Chichester: John Wiley & Sons.

