Occams Razor for Big Data? On Detecting Quality in Large Unstructured Datasets

by   Birgitta Dresp-Langley, et al.

Detecting quality in large unstructured datasets requires capacities far beyond the limits of human perception and communicability and, as a result, there is an emerging trend towards increasingly complex analytic solutions in data science to cope with this problem. This new trend towards analytic complexity represents a severe challenge for the principle of parsimony or Occams Razor in science. This review article combines insight from various domains such as physics, computational science, data engineering, and cognitive science to review the specific properties of big data. Problems for detecting data quality without losing the principle of parsimony are then highlighted on the basis of specific examples. Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time, and meaning can be extracted rapidly from large sets of unstructured image or video data parsimoniously through relatively simple unsupervised machine learning algorithms. Why we still massively lack in expertise for exploiting big data wisely to extract relevant information for specific tasks, recognize patterns, generate new information, or store and further process large amounts of sensor data is then reviewed; examples illustrating why we need subjective views and pragmatic methods to analyze big data contents are brought forward. The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics, and the development of increasingly autonomous artificial intelligence aimed at coping with the big data deluge in the near future.



There are no comments yet.


page 12

page 15

page 20


Big Data Model "Entity and Features"

The article deals with the problem which led to Big Data. Big Data infor...

Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service

Recently, we have been witnessing huge advancements in the scale of data...

A new paradigm for accelerating clinical data science at Stanford Medicine

Stanford Medicine is building a new data platform for our academic resea...

23-bit Metaknowledge Template Towards Big Data Knowledge Discovery and Management

The global influence of Big Data is not only growing but seemingly endle...

Big Data Science Over the Past Web

Web archives preserve unique and historically valuable information. They...

Text Classification Using Hybrid Machine Learning Algorithms on Big Data

Recently, there are unprecedented data growth originating from different...

Digitising Cultural Complexity: Representing Rich Cultural Data in a Big Data environment

One of the major terminological forces driving ICT integration in resear...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.