The Trip to The Enterprise Gourmet Data Product Marketplace through a Self-service Data Platform

07/28/2021
by   Michał Zasadziński, et al.
0

Data Analytics provides core business reporting needs in many software companies, acts as a source of truth for key information, and enables building advanced solutions, e.g., predictive models, machine learning, real-time recommendations, to grow the business. A self-service, multi-tenant, API-first, and scalable data platform is the foundational requirement in creating an enterprise data marketplace, which enables the creation, publishing, and exchange of data products. Such a marketplace enables the exploration and discovery of data products, further providing high-level data governance and oversight on marketplace contents. In this paper, we describe our way to the gourmet data product marketplace. We cover the design principles, the implementation details, technology choices, and the journey to build an enterprise data platform that meets the above characteristics. The platform consists of ingestion, streaming, storage, transformation, schema generation, fail-safe, data sharing, access management, PII data automatic identification, self-service storage optimization recommendations, and CI/CD integration. We then show how the platform enables and operates the data marketplace, facilitating the exchange of stable data products across users and tenants. We motivate and show how we run scalable decentralized data governance. All of this is built and run for Cimpress Technology (CT), which operates the Mass Customization Platform for Cimpress and its businesses. The CT data platform serves 1000s of users from different platform participants, with data sourced from heterogeneous sources. Data is ingested at a rate of well over 1000 individual messages per second and serves more than 100k analytical queries daily.

READ FULL TEXT

page 1

page 4

research
07/20/2018

Dev-for-Operations and Multi-sided Platform for Next Generation Platform as a Service

This paper presents two new challenges for the Telco ecosystem transform...
research
07/08/2021

A Multi-Protocol, Secure, and Dynamic Data Storage Integration Frameworkfor Multi-tenanted Science Gateway Middleware

Science gateways are user-centric, end-to-end cyberinfrastructure for ma...
research
12/16/2020

An Integrated Platform for Collaborative Data Analytics

While collaboration among data scientists is a key to organizational pro...
research
06/28/2019

One Embedding To Do Them All

Online shopping caters to the needs of millions of users daily. Search, ...
research
02/04/2021

Challenges in biomarker discovery and biorepository for Gulf-war-disease studies: a novel data platform solution

Aims: Our Gulf War Illness (GWI) study conducts combinatorial screening ...
research
03/15/2023

Dataset Management Platform for Machine Learning

The quality of the data in a dataset can have a substantial impact on th...
research
08/30/2023

Demo: Integration of Marketplace for the 5G Open RAN Ecosystem

The Open RAN API and interface standards facilitate the new ecosystems w...

Please sign up or login with your details

Forgot password? Click here to reset