Integrazione di Apache Hive con Spark

01/15/2019
by   Michele Gentile, et al.
0

English. This document describes the solutions adopted, which arose from the need to transfer a large amount of information between the most famous distributed SQL and NoSQL storage systems to perform analysis and/or modification operations exploiting the peculiarities of the same. The goal was achieved using the Spark engine and studying and using the open source library "Hive Warehouse Connector" made by Hortonworks. It provides new interoperability features between Hive and Spark. The choice fell on these APIs in order to take advantage from Spark's distributed computing through Spark-Sql libraries, to allow a quick reading and writing on the databases chosen by the Network Contacts Systems Engineering Team and to make the stored information available for consultation outside the Ambari cluster. Italiano. Il presente documento descrive le soluzioni adottate, nate dalla necessità di trasferire un elevato numero di informazioni tra i più famosi sistemi distribuiti di archiviazione SQL e NoSQL per effettuare operazioni di analisi e/o modifica sfruttando le peculiarità degli stessi. L'obiettivo è stato raggiunto utilizzando l'engine Spark e studiando e utilizzando la libreria open source "Hive Warehouse Connector" messa a disposizione da Hortonworks che fornisce nuove funzionalità di interoperabilità tra Hive e Spark. La scelta è ricaduta su queste API per poter avvalersi del calcolo distribuito di Spark mediante le librerie di Spark-Sql, per consentire una rapida lettura e scrittura sui database scelti dal team di Ingegneria dei Sistemi di Network Contacts al fine di rendere consultabili le informazioni archiviate all'esterno del cluster Ambari.

READ FULL TEXT

page 16

page 17

page 18

page 19

page 20

page 21

page 22

page 23

research
08/21/2012

The MADlib Analytics Library or MAD Skills, the SQL

MADlib is a free, open source library of in-database analytic methods. I...
research
08/29/2023

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

Large language models (LLMs) have emerged as a new paradigm for Text-to-...
research
06/15/2022

Selectivity Estimation of Inequality Joins In Databases

Selectivity estimation refers to the ability of the SQL query optimizer ...
research
08/31/2020

SparkGOR: A unified framework for genomic data analysis

Motivation: Our goal was to combine the capabilities of Spark and GOR in...
research
09/06/2019

Automating Cluster Management with Weave

Modern cluster management systems like Kubernetes and Openstack grapple ...
research
05/03/2019

Adaptive filter ordering in Spark

This report describes a technical methodology to render the Apache Spark...
research
10/15/2019

SCALPEL3: a scalable open-source library for healthcare claims databases

This article introduces SCALPEL3, a scalable open-source framework for s...

Please sign up or login with your details

Forgot password? Click here to reset