Analysis of Greenhouse Gases

by   Shalin Shah, et al.
Johns Hopkins University

Climate change is a result of a complex system of interactions of greenhouse gases (GHG), the ocean, land, ice, and clouds. Large climate change models use several computers and solve several equations to predict the future climate. The equations may include simple polynomials to partial differential equations. Because of the uptake mechanism of the land and ocean, greenhouse gas emissions can take a while to affect the climate. The IPCC has published reports on how greenhouse gas emissions may affect the average temperature of the troposphere and the predictions show that by the end of the century, we can expect a temperature increase from 0:8 C to 5 C. In this article, I use Linear Regression (LM), Quadratic Regression and Gaussian Process Regression (GPR) on monthly GHG data going back several years and try to predict the temperature anomalies based on counterfactuals. The results are quite similar to the IPCC reports.


page 9

page 10

page 11

page 12

page 13

page 14

page 15

page 16


How Ominous is the Future Global Warming Premonition?

Global warming, the phenomenon of increasing global average temperature ...

Analysis of various climate change parameters in India using machine learning

Climate change in India is one of the most alarming problems faced by ou...

Gaussian Process Regression for Arctic Coastal Erosion Forecasting

Arctic coastal morphology is governed by multiple factors, many of which...

On the Generalization of Agricultural Drought Classification from Climate Data

Climate change is expected to increase the likelihood of drought events,...

Toe-Heal-Air-Injection Thermal Recovery Production Prediction and Modelling Using Quadratic Poisson Polynomial Regression

This research paper explores application of multivariable regression mod...

Bayesian Appraisal of Random Series Convergence with Application to Climate Change

Roy and Bhattacharya (2020) provided Bayesian characterization of infini...

Interpretable Climate Change Modeling With Progressive Cascade Networks

Typical deep learning approaches to modeling high-dimensional data often...

1 Introduction

The climate is a result of complex interactions between several elements. Greenhouse gases and the sun are both equally responsible for maintaining a temperature at which we can live, in the troposphere. The sun continually emits UV and IR radiation, some of which is reflected back from the Ozone layer and also from the ice in the Arctic and the Antarctic. The clouds and the land also reflect sun rays. This reflection is called albedo which is the reason for the temperate climate that our planet has. Greenhouse gases like CO2, CH4 and water vapor absorb some of the sun heat and cause a warming of the atmosphere which then can support various species of plants and animals. However, mostly because of fossil fuel burning which emits CO2 and CH4, the amount of greenhouse gases in the atmosphere is increasing which is causing the temperature to gradually increase. The emitted CO2 and CH4 are also absorbed by land and the ocean, which is called uptake. But the absorbed greenhouse cases may also be then released back into the atmosphere, and so there is a gradual stabilization in the amount of greenhouse gases at a higher level than the present because of anthropogenic emissions [3] [2] [1].

In this article, I attempt to use Linear Regression (LM), Quadratic Regression, and Gaussian Process Regression (GPR) [5] [6] [7] to predict how the levels of GHG affect the average temperature of the atmosphere through temperature anomalies. Note that the effect of carbon emissions in one area of the world affects the entire world if sufficient time is given to the atmospheric and oceanic forces to stabilize.

2 Greenhouse Gas Models and Emission Models

Table 1 shows the correlation matrix and table 2 shows various models with the R-Squared on a test set. The correlation matrix shows a strong correlation between CO2 and the temperature anomalies as well as between CH4 and the temperature anomalies. CO2 is more abundant in the atmosphere and is a stronger indicator of temperature anomalies as compared to CH4. Humidity has a small correlation, but the other greenhouse gases have a stronger correlation.

I tried several models to see the effect of increasing greenhouse gas concentration in the atmosphere. The results section follows this section. I tried linear regression, non-linear quadratic regression and Gaussian process regression (GPR). All three models are able to extrapolate beyond what is in the training data in the form of counterfactuals.

The greenhouse gas models try to predict what would be the anomaly in temperature when the greenhouse gas concentration in the atmosphere is changed to a lower and higher multiple of the concentration on 10/2017. On 10/2017, the CO2 concentration (as measured by Mauna Loa Observatory) was 404 PPM, the CH4 concentration was 1858 PPB and Humidity was 65.4. The temperature anomaly on 10/2017 was 0.90. This means that compared to the expected temperature, this month was warmer by C. This threshold for counterfactuals is used in the greenhouse gas models.

The CO2 level on 7/1991 was 356 PPM and the CH4 level was 1716 PPB, and the relative humidity level was 53.4. The temperature anomaly on 7/1991 was C. This threshold for counterfactuals is used in the Emission models.

See figures 1 through 6.

3 Results

3.1 The Data and Packages

The temperature anomaly data is taken from:

The GHG data is taken from: (Mauna Loa Observatory)

The CO2 emissions data is taken from:

Other data is available here:

We use the GauPro package for Gaussian Process Regression:

We use lm and nls in R [4] for linear and non-linear regression:

3.2 Scatter Plot of Greenhouse Gases

Figure 1 shows a scatter plot of standardized greenhouse gases with the temperature anomaly. The trend shows that as the concentration of greenhouse gases increase, the temperature anomaly also increases. The effect is strongest with methane (CH4) but because CH4 is present in only low quantities in the atmosphere, the effect it has on temperature is not very strong. CO2 and humidity also have a positive slope.

3.3 Counterfactuals of Greenhouse Gases

Figure 2 and Figure 3 are plots with counterfactuals on the model. Starting from a multiplier of 0.02 and in increments of 0.001 I query the model with values of the greenhouse gases multiplied by the multiplier. Figure 2 shows the results of a linear model while Figure 3 shows the results in a non-linear regression model (quadratic).

Figure 2 shows that as the greenhouse gas levels increase, the temperature anomaly also increases. For instance, if the CO2 level is increased 1.5 times as compared to the 10/2017 level (400 PPM to 600 PPM), the temperature anomaly will be about C. The same increase in CH4 causes a temperature anomaly of C, but this is relatively less important than the CO2 levels (unless there are gas hydrate eruptions in the ocean). The plot also shows that if the CO2 level was decreased to half (200 PPM), the temperature anomaly will be C. If the CO2 level was doubled to 800 PPM, the temperature anomaly will be C. This is easily reconciled with the IPCC reports which have almost the same results.

I also tried to fit a non-linear quadratic regression model as shown in figure 3. The results are similar. But as figure 3 shows, the quadratic has a much larger curvature in the initial stages and increases to almost linear after a multiplier of 1.5. This is clearly what the scientists expect in that there is a tipping point after which the temperature anomaly increases more rapidly.

Figures 7 and 8 show the counterfactual charts for Gaussian Process Regression (GPR). It shows that if the CO2 level is increased to 600 PPM, the temperature will increase by C. For CH4, GPR shows a slightly higher increase of C if the level is increased by times the level on 10/2017 i.e. increase the CH4 to 2787 PPB.

3.4 Analysis of Emissions

As we increase carbon emissions, the CO2 and CH4 levels in the atmosphere increase. The figures 4,5 and 6 show some analysis of emissions and the CO2 level as compared to the level on 7/1991 (356 PPM).

Figure 4 shows a linear fit and a scatter plot of CO2 levels and the emissions. Figure 5 shows the results of fitting a linear model with increasing emissions. It shows that if we increase the emissions 1.5 times that on 7/1991, we can expect the CO2 level in the atmosphere to increase to about 390 PPM (which we have already crossed). It also shows that if we decrease the emissions by half, the CO2 level will drop to about 330 PPM (not a very large decrease). If we reduce the emissions to , the CO2 level will decrease to about 310 PPM (as compared to the 7/1991 levels).

Figure 5 shows the results of fitting a quadratic regression model, with counterfactuals on the emissions. The results are quite similar. But as noted on the greenhouse gas models, the curvature is larger initially. But this model shows that if we reduce emissions to , the CO2 level will stabilize to about 330 PPM, a slightly different result than the linear model.

3.5 Discussion

As noted in the previous section, the levels of greenhouse gases in the atmosphere is increasing at a rapid rate, thus causing a proportionate increase in the temperature. Global warming can cause many undesirable things like the following [1]:

  1. Increase in global temperatures by to .

  2. Increase in the sea level, causing undesirable effects on coastal cities.

  3. Increase in the frequency of severe weather events like storms, droughts and heat waves and the severity of winters.

  4. Decrease in the world forests, which will cause a feedback effect.

  5. Release of CH4 from deep ocean gas hydrate deposits.

  6. Increase in epidemics of vector borne diseases like malaria because of a proliferation of disease spreading insects like mosquitoes.

  7. Negative effect on farming, thus causing food shortage

  8. Further shortage in the availability of clean drinking water

Figure 1: Scatter Plot of Greenhouse Gases and Temperature Anomalies
Figure 2: Counterfactual Inference by Scaling GHG Levels using LM
Figure 3: Counterfactual Inference by Scaling GHG Levels using a Quadratic Model
Figure 4: Scatter Plot of Emissions and CO2 Levels
Figure 5: Counterfactual Inference by Scaling Emissions using LM
Figure 6: Counterfactual Inference by Scaling Emissions using a Quadratic Model
Figure 7: Counterfactual Inference by Scaling the CH4 Level using Gaussian Process Regression
Figure 8: Counterfactual Inference by Scaling the CO2 Level using Gaussian Process Regression
Greenhouse Gases CO2 CH4 Humidity Temperature Anomaly
CO2 1 0.94 0.42 0.65
CH4 0.94 1 0.38 0.73
Humidity 0.42 0.38 1 0.15
Temperature Anomaly 0.65 0.73 0.15 1
Table 1: Correlation Matrix
Date Name Value
7/1991 CO2 356 PPM
7/1991 CH4 1716 PPB
7/1991 Humidity 53.4
7/1991 Temperature Anomaly C
10/2017 CO2 404 PPM
10/2017 CH4 1858 PPB
10/2017 Humidity 65.4
10/2017 Temperature Anomaly C
Table 2: Temperature Anomalies and GHG Levels
GHG Model Multiplier GHG Level Change Temperature Anomaly
CO2 Linear 2 404 to 808 PPM
CO2 Quadratic 2 404 to 808 PPM
CO2 GPR 2 404 to 808 PPM
CH4 Linear 2 1858 to 3716 PPB
CH4 Quadratic 2 1858 to 3716 PPB
CH4 GPR 2 1858 to 3716 PPB
CO2 Linear 1.5 404 to 606 PPM
CO2 Quadratic 1.5 404 to 606 PPM
CO2 GPR 1.5 404 to 606 PPM
CH4 Linear 1.5 1858 to 2787 PPB
CH4 Quadratic 1.5 1858 to 2787 PPB
CH4 GPR 1.5 1858 to 2787 PPB
CO2 Linear 0.5 404 to 202 PPM
CO2 Quadratic 0.5 404 to 202 PPM
CO2 GPR 0.5 404 to 202 PPM
CH4 Linear 0.5 1858 to 929 PPB
CH4 Quadratic 0.5 1858 to 929 PPB
CH4 GPR 0.5 1858 to 929 PPB
Table 3: Model Predictions
Model R-Squared
CO2 Linear Model 0.72
CH4 Linear Model 0.83
Humidity Linear Model 0.23
Combined Linear Model 0.84
CO2 Quadratic Model 0.73
CH4 Quadratic Model 0.83
Humidity Quadratic Model 0.20
Combined Quadratic Model 0.84
GPR CO2 Model 0.82
GPR CH4 Model 0.81
GPR Humidity Model -0.31
GPR Combined Model 0.73
Table 4: R-Squared of Various Models

4 Conclusion

In this article, I analyzed some data of the greenhouse gases CO2, CH4 and Humidity and I find that all three correlate well with NASA’s temperature anomaly data set. Correlation is not causation, but it has been noted through experiments in laboratories that greenhouse gases absorb the heat of the sun and so increase the atmospheric temperature. It is also conjectured that one of the reasons for the ice ages in the past is the reduction in the greenhouse gases [1].

Through counterfactuals on the models, I was able to predict what the temperature anomaly will be across levels of CO2 and CH4 multiplied by a multiplier. I note the results in the results section of this article.

I found it difficult to collect and process data from disparate sources because of availability and unknown formats, which is why I was not able to use other variables like cloud, land and ice albedo, ocean indicators etc. which are crucial to understanding global warming. Future work will include these variables and create more accurate predictions of climate change.


  • [1] M. Maslin (2008) Global warming: a very short introduction. OUP Oxford. Cited by: §1, §3.5, §4.
  • [2] M. Maslin (2013) Climate: a very short introduction. OUP Oxford. Cited by: §1.
  • [3] M. Maslin (2014) Climate change: a very short introduction. OUP Oxford. Cited by: §1.
  • [4] R Core Team (year) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. External Links: Link Cited by: §3.1.
  • [5] D. Rolnick, P. L. Donti, L. H. Kaack, K. Kochanski, A. Lacoste, K. Sankaran, A. S. Ross, N. Milojevic-Dupont, N. Jaques, A. Waldman-Brown, et al. (2019) Tackling climate change with machine learning. arXiv preprint arXiv:1906.05433. Cited by: §1.
  • [6] C. K. Williams and C. E. Rasmussen (2006) Gaussian processes for machine learning. Vol. 2, MIT press Cambridge, MA. Cited by: §1, §5.2.
  • [7] H. Zheng et al. (2018) Analysis of global warming using machine learning. Computational Water, Energy, and Environmental Engineering 7 (03), pp. 127. Cited by: §1.

5 Appendix

5.1 Appendix A: Linear Regression

A linear model is linear in the parameters and variables. It models the response variable in the following form:

Where is mean Gaussian noise.

In a likelihood formulation, maximizing the posterior is equivalent to maximizing the likelihood. In a linear regression model, the likelihood is Gaussian.

Where, and

is a parameter which can also be parametrized. This is called a link function and in the case of linear regression, it is the identity link function. In the case of logistic regression, the link function is a sigmoid.

A prior can be placed in a Bayesian model (which I don’t use in this article).

5.2 Appendix B: Gaussian Process Regression

Gaussian Process Regression (GPR) and kernel methods are similar. A kernel maps the input feature space into a possibly infinite dimensional feature space using basis functions. For instance, a quadratic regression model maps each dimension of the input into a polynomial equation of degree 2 (in our implementation, we omit the term with degree 1). Support vector machines are also an example of using the kernel trick.

GPR uses kernel methods with a Gaussian prior on the weights of the model. The kernel, or covariance function could be a linear kernel or maybe a squared exponential kernel as shown below:

The covariance function represents a distribution over basis functions.


Where is a draw in the function space.

This is equivalent to Bayesian regression with an infinite dimensional feature space composed of basis functions of the original feature space. A thing to note that as in the squared exponential kernel above, kernel functions are are actually good similarity functions. Please see [6] for a highly detailed exposition of Gaussian processes for machine learning.