1 Introduction
The climate is a result of complex interactions between several elements. Greenhouse gases and the sun are both equally responsible for maintaining a temperature at which we can live, in the troposphere. The sun continually emits UV and IR radiation, some of which is reflected back from the Ozone layer and also from the ice in the Arctic and the Antarctic. The clouds and the land also reflect sun rays. This reflection is called albedo which is the reason for the temperate climate that our planet has. Greenhouse gases like CO2, CH4 and water vapor absorb some of the sun heat and cause a warming of the atmosphere which then can support various species of plants and animals. However, mostly because of fossil fuel burning which emits CO2 and CH4, the amount of greenhouse gases in the atmosphere is increasing which is causing the temperature to gradually increase. The emitted CO2 and CH4 are also absorbed by land and the ocean, which is called uptake. But the absorbed greenhouse cases may also be then released back into the atmosphere, and so there is a gradual stabilization in the amount of greenhouse gases at a higher level than the present because of anthropogenic emissions [3] [2] [1].
In this article, I attempt to use Linear Regression (LM), Quadratic Regression, and Gaussian Process Regression (GPR) [5] [6] [7] to predict how the levels of GHG affect the average temperature of the atmosphere through temperature anomalies. Note that the effect of carbon emissions in one area of the world affects the entire world if sufficient time is given to the atmospheric and oceanic forces to stabilize.
2 Greenhouse Gas Models and Emission Models
Table 1 shows the correlation matrix and table 2 shows various models with the RSquared on a test set. The correlation matrix shows a strong correlation between CO2 and the temperature anomalies as well as between CH4 and the temperature anomalies. CO2 is more abundant in the atmosphere and is a stronger indicator of temperature anomalies as compared to CH4. Humidity has a small correlation, but the other greenhouse gases have a stronger correlation.
I tried several models to see the effect of increasing greenhouse gas concentration in the atmosphere. The results section follows this section. I tried linear regression, nonlinear quadratic regression and Gaussian process regression (GPR). All three models are able to extrapolate beyond what is in the training data in the form of counterfactuals.
The greenhouse gas models try to predict what would be the anomaly in temperature when the greenhouse gas concentration in the atmosphere is changed to a lower and higher multiple of the concentration on 10/2017. On 10/2017, the CO2 concentration (as measured by Mauna Loa Observatory) was 404 PPM, the CH4 concentration was 1858 PPB and Humidity was 65.4. The temperature anomaly on 10/2017 was 0.90. This means that compared to the expected temperature, this month was warmer by C. This threshold for counterfactuals is used in the greenhouse gas models.
The CO2 level on 7/1991 was 356 PPM and the CH4 level was 1716 PPB, and the relative humidity level was 53.4. The temperature anomaly on 7/1991 was C. This threshold for counterfactuals is used in the Emission models.
See figures 1 through 6.
3 Results
3.1 The Data and Packages
The temperature anomaly data is taken from:
https://climate.nasa.gov/vitalsigns/globaltemperature/.
The GHG data is taken from:
https://www.esrl.noaa.gov/gmd/dv/. (Mauna Loa Observatory)
The CO2 emissions data is taken from:
https://datahub.io/core/co2fossilglobal
Other data is available here:
https://esgfnode.llnl.gov/search/cmip5/.
We use the GauPro package for Gaussian Process Regression:
https://CRAN.Rproject.org/package=GauPro
We use lm and nls in R [4] for linear and nonlinear regression:
https://cran.rproject.org/
3.2 Scatter Plot of Greenhouse Gases
Figure 1 shows a scatter plot of standardized greenhouse gases with the temperature anomaly. The trend shows that as the concentration of greenhouse gases increase, the temperature anomaly also increases. The effect is strongest with methane (CH4) but because CH4 is present in only low quantities in the atmosphere, the effect it has on temperature is not very strong. CO2 and humidity also have a positive slope.
3.3 Counterfactuals of Greenhouse Gases
Figure 2 and Figure 3 are plots with counterfactuals on the model. Starting from a multiplier of 0.02 and in increments of 0.001 I query the model with values of the greenhouse gases multiplied by the multiplier. Figure 2 shows the results of a linear model while Figure 3 shows the results in a nonlinear regression model (quadratic).
Figure 2 shows that as the greenhouse gas levels increase, the temperature anomaly also increases. For instance, if the CO2 level is increased 1.5 times as compared to the 10/2017 level (400 PPM to 600 PPM), the temperature anomaly will be about C. The same increase in CH4 causes a temperature anomaly of C, but this is relatively less important than the CO2 levels (unless there are gas hydrate eruptions in the ocean). The plot also shows that if the CO2 level was decreased to half (200 PPM), the temperature anomaly will be C. If the CO2 level was doubled to 800 PPM, the temperature anomaly will be C. This is easily reconciled with the IPCC reports which have almost the same results.
I also tried to fit a nonlinear quadratic regression model as shown in figure 3. The results are similar. But as figure 3 shows, the quadratic has a much larger curvature in the initial stages and increases to almost linear after a multiplier of 1.5. This is clearly what the scientists expect in that there is a tipping point after which the temperature anomaly increases more rapidly.
Figures 7 and 8 show the counterfactual charts for Gaussian Process Regression (GPR). It shows that if the CO2 level is increased to 600 PPM, the temperature will increase by C. For CH4, GPR shows a slightly higher increase of C if the level is increased by times the level on 10/2017 i.e. increase the CH4 to 2787 PPB.
3.4 Analysis of Emissions
As we increase carbon emissions, the CO2 and CH4 levels in the atmosphere increase. The figures 4,5 and 6 show some analysis of emissions and the CO2 level as compared to the level on 7/1991 (356 PPM).
Figure 4 shows a linear fit and a scatter plot of CO2 levels and the emissions. Figure 5 shows the results of fitting a linear model with increasing emissions. It shows that if we increase the emissions 1.5 times that on 7/1991, we can expect the CO2 level in the atmosphere to increase to about 390 PPM (which we have already crossed). It also shows that if we decrease the emissions by half, the CO2 level will drop to about 330 PPM (not a very large decrease). If we reduce the emissions to , the CO2 level will decrease to about 310 PPM (as compared to the 7/1991 levels).
Figure 5 shows the results of fitting a quadratic regression model, with counterfactuals on the emissions. The results are quite similar. But as noted on the greenhouse gas models, the curvature is larger initially. But this model shows that if we reduce emissions to , the CO2 level will stabilize to about 330 PPM, a slightly different result than the linear model.
3.5 Discussion
As noted in the previous section, the levels of greenhouse gases in the atmosphere is increasing at a rapid rate, thus causing a proportionate increase in the temperature. Global warming can cause many undesirable things like the following [1]:

Increase in global temperatures by to .

Increase in the sea level, causing undesirable effects on coastal cities.

Increase in the frequency of severe weather events like storms, droughts and heat waves and the severity of winters.

Decrease in the world forests, which will cause a feedback effect.

Release of CH4 from deep ocean gas hydrate deposits.

Increase in epidemics of vector borne diseases like malaria because of a proliferation of disease spreading insects like mosquitoes.

Negative effect on farming, thus causing food shortage

Further shortage in the availability of clean drinking water
Greenhouse Gases  CO2  CH4  Humidity  Temperature Anomaly 

CO2  1  0.94  0.42  0.65 
CH4  0.94  1  0.38  0.73 
Humidity  0.42  0.38  1  0.15 
Temperature Anomaly  0.65  0.73  0.15  1 
Date  Name  Value 

7/1991  CO2  356 PPM 
7/1991  CH4  1716 PPB 
7/1991  Humidity  53.4 
7/1991  Temperature Anomaly  C 
10/2017  CO2  404 PPM 
10/2017  CH4  1858 PPB 
10/2017  Humidity  65.4 
10/2017  Temperature Anomaly  C 
GHG  Model  Multiplier  GHG Level Change  Temperature Anomaly 

CO2  Linear  2  404 to 808 PPM  
CO2  Quadratic  2  404 to 808 PPM  
CO2  GPR  2  404 to 808 PPM  
CH4  Linear  2  1858 to 3716 PPB  
CH4  Quadratic  2  1858 to 3716 PPB  
CH4  GPR  2  1858 to 3716 PPB  
CO2  Linear  1.5  404 to 606 PPM  
CO2  Quadratic  1.5  404 to 606 PPM  
CO2  GPR  1.5  404 to 606 PPM  
CH4  Linear  1.5  1858 to 2787 PPB  
CH4  Quadratic  1.5  1858 to 2787 PPB  
CH4  GPR  1.5  1858 to 2787 PPB  
CO2  Linear  0.5  404 to 202 PPM  
CO2  Quadratic  0.5  404 to 202 PPM  
CO2  GPR  0.5  404 to 202 PPM  
CH4  Linear  0.5  1858 to 929 PPB  
CH4  Quadratic  0.5  1858 to 929 PPB  
CH4  GPR  0.5  1858 to 929 PPB 
Model  RSquared 

CO2 Linear Model  0.72 
CH4 Linear Model  0.83 
Humidity Linear Model  0.23 
Combined Linear Model  0.84 
CO2 Quadratic Model  0.73 
CH4 Quadratic Model  0.83 
Humidity Quadratic Model  0.20 
Combined Quadratic Model  0.84 
GPR CO2 Model  0.82 
GPR CH4 Model  0.81 
GPR Humidity Model  0.31 
GPR Combined Model  0.73 
4 Conclusion
In this article, I analyzed some data of the greenhouse gases CO2, CH4 and Humidity and I find that all three correlate well with NASA’s temperature anomaly data set. Correlation is not causation, but it has been noted through experiments in laboratories that greenhouse gases absorb the heat of the sun and so increase the atmospheric temperature. It is also conjectured that one of the reasons for the ice ages in the past is the reduction in the greenhouse gases [1].
Through counterfactuals on the models, I was able to predict what the temperature anomaly will be across levels of CO2 and CH4 multiplied by a multiplier. I note the results in the results section of this article.
I found it difficult to collect and process data from disparate sources because of availability and unknown formats, which is why I was not able to use other variables like cloud, land and ice albedo, ocean indicators etc. which are crucial to understanding global warming. Future work will include these variables and create more accurate predictions of climate change.
References
 [1] (2008) Global warming: a very short introduction. OUP Oxford. Cited by: §1, §3.5, §4.
 [2] (2013) Climate: a very short introduction. OUP Oxford. Cited by: §1.
 [3] (2014) Climate change: a very short introduction. OUP Oxford. Cited by: §1.
 [4] (year) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. External Links: Link Cited by: §3.1.
 [5] (2019) Tackling climate change with machine learning. arXiv preprint arXiv:1906.05433. Cited by: §1.
 [6] (2006) Gaussian processes for machine learning. Vol. 2, MIT press Cambridge, MA. Cited by: §1, §5.2.
 [7] (2018) Analysis of global warming using machine learning. Computational Water, Energy, and Environmental Engineering 7 (03), pp. 127. Cited by: §1.
5 Appendix
5.1 Appendix A: Linear Regression
A linear model is linear in the parameters and variables. It models the response variable in the following form:
Where is mean Gaussian noise.
In a likelihood formulation, maximizing the posterior is equivalent to maximizing the likelihood. In a linear regression model, the likelihood is Gaussian.
Where, and
is a parameter which can also be parametrized. This is called a link function and in the case of linear regression, it is the identity link function. In the case of logistic regression, the link function is a sigmoid.
A prior can be placed in a Bayesian model (which I don’t use in this article).
5.2 Appendix B: Gaussian Process Regression
Gaussian Process Regression (GPR) and kernel methods are similar. A kernel maps the input feature space into a possibly infinite dimensional feature space using basis functions. For instance, a quadratic regression model maps each dimension of the input into a polynomial equation of degree 2 (in our implementation, we omit the term with degree 1). Support vector machines are also an example of using the kernel trick.
GPR uses kernel methods with a Gaussian prior on the weights of the model. The kernel, or covariance function could be a linear kernel or maybe a squared exponential kernel as shown below:
The covariance function represents a distribution over basis functions.
,
Where is a draw in the function space.
This is equivalent to Bayesian regression with an infinite dimensional feature space composed of basis functions of the original feature space. A thing to note that as in the squared exponential kernel above, kernel functions are are actually good similarity functions. Please see [6] for a highly detailed exposition of Gaussian processes for machine learning.