A Frequency-Based Learning-To-Rank Approach for Personal Digital Traces

12/24/2020
by   Daniela Vianna, et al.
0

Personal digital traces are constantly produced by connected devices, internet services and interactions. These digital traces are typically small, heterogeneous and stored in various locations in the cloud or on local devices, making it a challenge for users to interact with and search their own data. By adopting a multidimensional data model based on the six natural questions – what, when, where, who, why and how – to represent and unify heterogeneous personal digital traces, we can propose a learning-to-rank approach using the state of the art LambdaMART algorithm and frequency-based features that leverage the correlation between content (what), users (who), time (when), location (where) and data source (how) to improve the accuracy of search results. Due to the lack of publicly available personal training data, a combination of known-item query generation techniques and an unsupervised ranking model (field-based BM25) is used to build our own training sets. Experiments performed over a publicly available email collection and a personal digital data trace collection from a real user show that the frequency-based learning approach improves search accuracy when compared with traditional search tools.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

Searching Heterogeneous Personal Digital Traces

Digital traces of our lives are now constantly produced by various conne...
research
12/29/2020

Supporting Human Memory by Reconstructing Personal Episodic Narratives from Digital Traces

Numerous applications capture in digital form aspects of people's lives....
research
05/11/2021

Federated Unbiased Learning to Rank

Unbiased Learning to Rank (ULTR) studies the problem of learning a ranki...
research
04/04/2022

Revealing Cumulative Risks in Online Personal Information: A Data Narrative Study

When pieces from an individual's personal information available online a...
research
02/22/2021

Entities of Interest

In the era of big data, we continuously - and at times unknowingly - lea...
research
01/29/2023

G-Rank: Unsupervised Continuous Learn-to-Rank for Edge Devices in a P2P Network

Ranking algorithms in traditional search engines are powered by enormous...
research
04/15/2019

Modeling Hierarchical Usage Context for Software Exceptions based on Interaction Data

Traces of user interactions with a software system, captured in producti...

Please sign up or login with your details

Forgot password? Click here to reset