Full-privacy secured search engine empowered by efficient genome-mapping algorithms

12/29/2021
by   Yuan-Yu Chang, et al.
0

Since the 90s, keyword-based search engines have been helping people locate relevant web content via a simple query, so have the recent full-text-based search engines mainly used for plagiarism detection following an article upload. However, these "free" or paid services operate by storing users' search queries and preferences for personal profiling and targeted ads delivery, while user-uploaded articles can further profit the service providers as part of their expanding databases. In short, search engine privacy has not been an option for web exploration in the past decades. Here we demonstrate that a database or internet search, provided with the entire article as a query, can be correctly carried out without revealing users' sensitive queries by an irreversible encoding scheme and an efficient FM-index search routine that is generally used in the NGS of genomes. In our solution, Sapiens Aperio Veritas Engine (S.A.V.E.), every word in the query is encoded into one of 12 "amino acids" (a.a.) comprising a pseudo-biological sequence (PBS) at users' local machines. The PBS-mediated plagiarism detection is done by users' submission of locally encoded PBS through our cloud service to locate identical duplicates in the collected web contents which had been encoded in the same way as the query. It is found that PBSs with a length longer than 12 a.a., can return correct results with a false positive rate <0.8 Bowtie and is 4 orders faster than BLAST. S.A.V.E., functioning in both regular and in-private search modes, provides a new option for efficient internet search and plagiarism detection in a compressed search space without a chance of storing and revealing users' confidential contents. We expect that future privacy-aware search engines can reference the ideas proposed herein. S.A.V.E. is made available at https://dyn.life.nthu.edu.tw/SAVE/

READ FULL TEXT

page 5

page 9

page 15

page 17

research
10/25/2021

Developing a Meta-suggestion Engine for Search Query

With the development of the Internet and the accumulation of information...
research
05/03/2018

CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions

By regularly querying Web search engines, users (unconsciously) disclose...
research
04/02/2023

The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

The Archive Query Log (AQL) is a previously unused, comprehensive query ...
research
06/17/2022

CLEAR: A Fully User-side Image Search System

We use many search engines on the Internet in our daily lives. However, ...
research
05/04/2018

X-Search: Revisiting Private Web Search using Intel SGX

The exploitation of user search queries by search engines is at the hear...
research
08/18/2018

Decentralized Search on Decentralized Web

Decentralized Web, or DWeb, is envisioned as a promising future of the W...
research
08/26/2020

MAR: A structure-based search engine for models

The availability of shared software models provides opportunities for re...

Please sign up or login with your details

Forgot password? Click here to reset