Occams Razor for Big Data? On Detecting Quality in Large Unstructured Datasets

11/12/2020
by   Birgitta Dresp-Langley, et al.
0

Detecting quality in large unstructured datasets requires capacities far beyond the limits of human perception and communicability and, as a result, there is an emerging trend towards increasingly complex analytic solutions in data science to cope with this problem. This new trend towards analytic complexity represents a severe challenge for the principle of parsimony or Occams Razor in science. This review article combines insight from various domains such as physics, computational science, data engineering, and cognitive science to review the specific properties of big data. Problems for detecting data quality without losing the principle of parsimony are then highlighted on the basis of specific examples. Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time, and meaning can be extracted rapidly from large sets of unstructured image or video data parsimoniously through relatively simple unsupervised machine learning algorithms. Why we still massively lack in expertise for exploiting big data wisely to extract relevant information for specific tasks, recognize patterns, generate new information, or store and further process large amounts of sensor data is then reviewed; examples illustrating why we need subjective views and pragmatic methods to analyze big data contents are brought forward. The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics, and the development of increasingly autonomous artificial intelligence aimed at coping with the big data deluge in the near future.

READ FULL TEXT

page 12

page 15

page 20

research
05/03/2019

Big Data Model "Entity and Features"

The article deals with the problem which led to Big Data. Big Data infor...
research
09/21/2017

Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service

Recently, we have been witnessing huge advancements in the scale of data...
research
03/17/2020

A new paradigm for accelerating clinical data science at Stanford Medicine

Stanford Medicine is building a new data platform for our academic resea...
research
08/03/2021

Big Data Science Over the Past Web

Web archives preserve unique and historically valuable information. They...
research
03/01/2015

23-bit Metaknowledge Template Towards Big Data Knowledge Discovery and Management

The global influence of Big Data is not only growing but seemingly endle...
research
09/23/2022

KeypartX: Graph-based Perception (Text) Representation

The availability of big data has opened up big opportunities for individ...
research
03/30/2021

Text Classification Using Hybrid Machine Learning Algorithms on Big Data

Recently, there are unprecedented data growth originating from different...

Please sign up or login with your details

Forgot password? Click here to reset