Econometrics: A Whistle Stop Tour – Part 2

Recap of Part 1

Last time, we took our first steps into econometrics, looking at data types, variables, regressions, and the classic warning: correlation does not equal causation.

Now, we’re going to explore deeper into the econometric jungle. Regression analysis may look simple when a line neatly summarises the relationship between two variables, but in reality, it comes with a list of hidden conditions…

Assumptions of OLS (and Why They Matter)

Ordinary Least Squares, or OLS, is the backbone of basic regression analysis. It looks simple enough: fit a line through the data that minimises the squared errors, and voilà, you have a model. But behind the scenes, this method relies on several assumptions that need to hold if the results are to be trustworthy. Without them, the whole framework can break down.

The first assumption is that the relationship between the dependent variable and the independent variable (which we looked at in the first part) is linear. In practice, this means we expect changes in the independent variable to have a straight-line effect on the dependent one. Think of studying for an exam: we might assume that each extra hour of revision steadily improves your mark. If the relationship is curved, the benefits of studying tail off after a certain point, then forcing a straight line through the data risks missing the true story.

Another crucial assumption is that the independent variables aren’t perfect copies of each other, a problem known as multicollinearity. If they are, the regression can’t separate their individual effects. Imagine trying to measure both “age in years” and “age in months” as predictors in the same model. They contain the same information, and the regression gets stuck trying to disentangle them.

Then there is the issue of homoskedasticity, the expectation that the variance of the error terms remains constant across all observations. If this holds, it means the model’s errors are evenly spread out, no matter the values of the variables. But in many real-world cases, this neatness disappears. Take income data: errors in predicting the wages of people at the lower end of the scale might be quite small, while errors at the higher end balloon because high earners’ salaries vary much more. This imbalance is called heteroskedasticity, and it makes our estimates less reliable.

Closely related is the assumption that the error terms are independent of one another. In time-series data, this often fails because today’s shocks are connected to yesterday’s. Stock prices, for instance, rarely move in isolation. If there’s a surge in volatility one day, it usually lingers for a while, creating what economists call autocorrelation. This violates the independence assumption and can lead us to underestimate or overestimate the precision of our model.

Together, these assumptions form the backbone of OLS. If they are broadly met, regression works as advertised, providing us with reliable insights into the relationships we care about. If they fail, however, our results can become misleading, and the elegant straight line through the data becomes less meaningful than it first appears.

Beyond Simple Regression (Multiple Regression)

Up to now, we’ve mostly looked at regression as a straight line between one independent variable and one dependent variable. But real life is rarely that simple. Economists often want to control for multiple factors at once, and that’s where multiple regression comes in.

For example, if we want to understand what drives wages, it’s not enough to look only at education. Experience, gender, location, and industry all matter too. Multiple regression lets us include these variables simultaneously, giving us a clearer picture of each factor’s individual effect.

The beauty of multiple regression is that it helps us get closer to answering causal questions. It doesn’t solve all problems, but it allows us to say things like: holding other factors constant, an additional year of education increases wages by X%.

Conclusion

In this part, we’ve gone a little deeper into the mechanics of econometrics. We saw how regression rests on certain assumptions, met some of the problems that can trip us up, and introduced multiple regression to better capture the real complexity of economic relationships.

Of course, this is only part of the story. Econometrics is a much bigger toolkit than we’ve covered here, and there’s still plenty more ground to explore. We’ll pick that up in the next part of the blog — so stay tuned!

Author