A Variability-Aware Design Approach to the Data Analysis Modeling Process
The massive amount of current data has led to many different forms of data analysis processes that aim to explore this data to uncover valuable insights. Methodologies to guide the development of big data science projects, including CRISP-DM and SEMMA, have been widely used in industry and academia. The data analysis modeling phase, which involves decisions on the most appropriate models to adopt, is at the core of these projects. However, from a software engineering perspective, the design and automation of activities performed in this phase are challenging. In this paper, we propose an approach to the data analysis modeling process which involves (i) the assessment of the variability inherent in the CRISP-DM data analysis modeling phase and the provision of feature models that represent this variability; (ii) the definition of a framework structural design that captures the identified variability; and (iii) evaluation of the developed framework design in terms of the possibilities for process automation. The proposed approach advances the state of the art by offering a variability-aware design solution that can enhance system flexibility, potentially leading to novel software frameworks which can significantly improve the level of automation in data analysis modeling process.
READ FULL TEXT