On the Energy Consumption of Different Dataframe Processing Libraries – An Exploratory Study

09/12/2022
by   Shriram Shanbhag, et al.
0

Background: The energy consumption of machine learning and its impact on the environment has made energy efficient ML an emerging area of research. However, most of the attention stays focused on the model creation and the training and inferencing phase. Data oriented stages like preprocessing, cleaning and exploratory analysis form a critical part of the machine learning workflow. However, the energy efficiency of these stages have gained little attention from the researchers. Aim: Our study aims to explore the energy consumption of different dataframe processing libraries as a first step towards studying the energy efficiency of the data oriented stages of the machine learning pipeline. Method: We measure the energy consumption of 3 popular libraries used to work with dataframes, namely Pandas, Vaex and Dask for 21 different operations grouped under 4 categories on 2 datasets. Results: The results of our analysis show that for a given dataframe processing operation, the choice of library can indeed influence the energy consumption with some libraries consuming 202 times lesser energy over others. Conclusion: The results of our study indicates that there is a potential for optimizing the energy consumption of the data oriented stages of the machine learning pipeline and further research is needed in the direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

Energy Efficiency of Web Browsers in the Android Ecosystem

This paper presents an empirical study regarding the energy consumption ...
research
10/25/2022

CarbonTag: A browser-based method for approximating energy consumption of online ads

Energy is today the most critical environmental challenge. The amount of...
research
07/04/2020

A Novel Multi-Step Finite-State Automaton for Arbitrarily Deterministic Tsetlin Machine Learning

Due to the high energy consumption and scalability challenges of deep le...
research
11/08/2020

Principles of Stochastic Computing: Fundamental Concepts and Applications

The semiconductor and IC industry is facing the issue of high energy con...
research
09/28/2009

A Conceivable Origin of Machine Consciousness in the IDLE process

In this short paper, we would like to call professional community's atte...
research
05/11/2023

Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques

To address increasing societal concerns regarding privacy and climate, t...
research
01/31/2020

Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning

Accurate reporting of energy and carbon usage is essential for understan...

Please sign up or login with your details

Forgot password? Click here to reset