Lorenzo Baraldi

research

∙ 08/23/2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Image captioning, like many tasks involving vision and language, current...

0 Manuele Barraco, et al. ∙

research

∙ 07/18/2023

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

Research in Image Generation has recently made significant progress, par...

0 Federico Betti, et al. ∙

research

∙ 06/12/2023

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

The use of self-supervised pre-training has emerged as a promising appro...

0 Lorenzo Baraldi, et al. ∙

research

∙ 04/04/2023

Multi-Class Explainable Unlearning for Image Classification via Weight Filtering

Machine Unlearning has recently been emerging as a paradigm for selectiv...

0 Samuele Poppi, et al. ∙

research

∙ 04/04/2023

Evaluating Synthetic Pre-Training for Handwriting Processing Tasks

In this work, we explore massive pre-training on synthetic word images f...

0 Vittorio Pippi, et al. ∙

research

∙ 04/02/2023

Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images

Recent advancements in diffusion models have enabled the generation of r...

0 Roberto Amoroso, et al. ∙

research

∙ 03/21/2023

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

The CLIP model has been recently proven to be very effective for a varie...

0 Sara Sarto, et al. ∙

research

∙ 01/17/2023

Embodied Agents for Efficient Exploration and Smart Scene Description

The development of embodied agents that can communicate with humans in n...

0 Roberto Bigazzi, et al. ∙

research

∙ 08/17/2022

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Handwritten Text Recognition (HTR) in free-layout pages is a challenging...

0 Silvia Cascianelli, et al. ∙

research

∙ 08/16/2022

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Handwritten Text Recognition (HTR) is an open problem at the intersectio...

22 Silvia Cascianelli, et al. ∙

research

∙ 07/29/2022

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

Image-text matching is gaining a leading role among tasks involving the ...

7 Nicola Messina, et al. ∙

research

∙ 07/26/2022

Retrieval-Augmented Transformer for Image Captioning

Image captioning models aim at connecting Vision and Language by providi...

7 Sara Sarto, et al. ∙

research

∙ 04/19/2022

Embodied Navigation at the Art Gallery

Embodied agents, trained to explore and navigate indoor photorealistic e...

0 Roberto Bigazzi, et al. ∙

research

∙ 04/18/2022

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

Embodied AI is a recent research area that aims at creating intelligent ...

0 Federico Landi, et al. ∙

research

∙ 02/21/2022

CaMEL: Mean Teacher Learning for Image Captioning

Describing images in natural language is a fundamental step towards the ...

9 Manuele Barraco, et al. ∙

research

∙ 11/24/2021

Universal Captioner: Inducing Content-Style Separation in Vision-and-Language Model Training

While captioning models have obtained compelling results in describing n...

1 Marcella Cornia, et al. ∙

research

∙ 09/14/2021

Focus on Impact: Indoor Exploration with Intrinsic Motivation

Exploration of indoor environments has recently experienced a significan...

0 Roberto Bigazzi, et al. ∙

research

∙ 08/31/2021

Working Memory Connections for LSTM

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of...

0 Federico Landi, et al. ∙

research

∙ 07/14/2021

From Show to Tell: A Survey on Image Captioning

Connecting Vision and Language plays an essential role in Generative Int...

18 Matteo Stefanini, et al. ∙

research

∙ 06/02/2021

Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Image captioning models have lately shown impressive results when applie...

42 Marco Cagrandi, et al. ∙

research

∙ 05/12/2021

Out of the Box: Embodied Navigation in the Real World

The research field of Embodied AI has witnessed substantial progress in ...

0 Roberto Bigazzi, et al. ∙

research

∙ 04/20/2021

Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis

As the request for deep learning solutions increases, the need for expla...

0 Samuele Poppi, et al. ∙

research

∙ 02/15/2021

RMS-Net: Regression and Masking for Soccer Event Spotting

The recently proposed action spotting task consists in finding the exact...

0 Matteo Tomei, et al. ∙

research

∙ 07/20/2020

Inter-Homines: Distance-Based Risk Estimation for Human Safety

In this document, we report our proposal for modeling the risk of possib...

6 Matteo Fabbri, et al. ∙

research

∙ 07/14/2020

Explore and Explain: Self-supervised Navigation and Recounting

Embodied AI has been recently gaining attention as it aims to foster the...

5 Roberto Bigazzi, et al. ∙

research

∙ 04/27/2020

A Novel Attention-based Aggregation Function to Combine Vision and Language

The joint understanding of vision and language has been recently gaining...

9 Matteo Stefanini, et al. ∙

research

∙ 12/17/2019

M^2: Meshed-Memory Transformer for Image Captioning

Transformer-based architectures represent the state of the art in sequen...

10 Marcella Cornia, et al. ∙

research

∙ 12/09/2019

STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Spatio-temporal action localization is a challenging yet fascinating tas...

33 Matteo Tomei, et al. ∙

research

∙ 11/27/2019

Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a challenging task in which an a...

28 Federico Landi, et al. ∙

research

∙ 10/07/2019

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

The ability to generate natural language explanations conditioned on the...

15 Marcella Cornia, et al. ∙

research

∙ 07/05/2019

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

In Vision-and-Language Navigation (VLN), an embodied agent needs to reac...

3 Federico Landi, et al. ∙

research

∙ 03/05/2019

A Deep Learning based approach to VM behavior identification in cloud systems

Cloud computing data centers are growing in size and complexity to the p...

0 Matteo Stefanini, et al. ∙

research

∙ 03/04/2019

M-VAD Names: a Dataset for Video Captioning with Naming

Current movie captioning architectures are not capable of mentioning cha...

12 Stefano Pini, et al. ∙

research

∙ 11/26/2018

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation

The applicability of computer vision to real paintings and artworks has ...

0 Matteo Tomei, et al. ∙

research

∙ 11/26/2018

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Current captioning approaches can describe images using black-box archit...

8 Marcella Cornia, et al. ∙

research

∙ 06/26/2017

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Image captioning has been recently gaining a lot of attention thanks to ...

0 Marcella Cornia, et al. ∙

research

∙ 11/29/2016

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

Data-driven saliency has recently gained a lot of attention thanks to th...

0 Marcella Cornia, et al. ∙

research

∙ 11/28/2016

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

The use of Recurrent Neural Networks for video captioning has recently g...

0 Lorenzo Baraldi, et al. ∙

research

∙ 10/05/2016

Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks

This paper presents a novel approach for temporal and semantic segmentat...

0 Lorenzo Baraldi, et al. ∙

research

∙ 09/05/2016

A Deep Multi-Level Network for Saliency Prediction

This paper presents a novel deep architecture for saliency prediction. C...

0 Marcella Cornia, et al. ∙

research

∙ 04/09/2016

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features

This paper presents a novel retrieval pipeline for video collections, wh...

0 Lorenzo Baraldi, et al. ∙

Lorenzo Baraldi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro