Real world data often comes in non-linear relationship. Before we can apply linear regression analysis to such data sets, it is necessary to perform linearity transformation first. Population and economic growths are two examples of exponential functions.
The following table shows a familiar example of compound interest.
Year
Principal
1
10000
2
11000
3
12100
4
13310
5
14641
6
16105.1
7
17715.61
8
19487.171
9
21435.8881
10
23579.47691
11
25937.4246
12
28531.16706
13
31384.28377
14
34522.71214
15
37974.98336
16
41772.48169
17
45949.72986
18
50544.70285
19
55599.17313
20
61159.09045
21
67274.99949
22
74002.49944
23
81402.74939
24
89543.02433
25
98497.32676
26
108347.0594
27
119181.7654
28
131099.9419
29
144209.9361
30
158630.9297
It’s easy to see principal growth is exponential.
If we attempt to force feed a linear regression model, we’ll get poor results.
The first step to fix this problem is to understand the nature of the growth of Y with respect to X. Is it a concave function (eg., logarithmic), or is it convex (eg., quadratic, exponential)? In our case, principal growth is exponential, ie., y = ex, so all we need to do is flatten the curve by taking a log.
We can verify that ln(Principal) is indeed linear.
We then run linear regression on ln(Principal). To find out the log values of the principals, we use the predict function.
So far, what we’ve got is the log values of principals. To get the actual dollar values, we only have to raise e to the log values.
As we can see, the predicted values fit the observed ones remarkably well.