Tanzila Rahman

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Yang Wang
233 publications
Leonid Sigal
73 publications
Sergey Tulyakov
46 publications
Jian Ren
42 publications
Giuseppe Carenini
34 publications
Hsin-Ying Lee
31 publications
Mrigank Rochan
16 publications
Shweta Mahajan
8 publications
Shih-Han Chou
6 publications
Bicheng Xu
5 publications
Mengyu Yang
4 publications

research

∙ 11/23/2022

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

There has been a recent explosion of impressive generative models that c...

0 Tanzila Rahman, et al. ∙

research

∙ 10/26/2021

TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation

The recent success of transformer models in language, such as BERT, has ...

0 Tanzila Rahman, et al. ∙

research

∙ 03/25/2021

Weakly-supervised Audio-visual Sound Source Detection and Separation

Learning how to localize and separate individual object sounds in the au...

0 Tanzila Rahman, et al. ∙

research

∙ 11/04/2020

An Improved Attention for Visual Question Answering

We consider the problem of Visual Question Answering (VQA). Given an ima...

2 Tanzila Rahman, et al. ∙

research

∙ 09/22/2019

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

Multi-modal learning, particularly among imaging and linguistic modaliti...

0 Tanzila Rahman, et al. ∙

research

∙ 04/09/2019

Convolutional Temporal Attention Model for Video-based Person Re-identification

The goal of video-based person re-identification is to match two input v...

0 Tanzila Rahman, et al. ∙

research

∙ 10/26/2018

Video-based Person Re-identification Using Spatial-Temporal Attention Networks

We consider the problem of video-based person re-identification. The goa...

8 Shivansh Rao, et al. ∙

Success!

An error occurred

Tanzila Rahman

Featured Co-authors

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation

Weakly-supervised Audio-visual Sound Source Detection and Separation

An Improved Attention for Visual Question Answering

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

Convolutional Temporal Attention Model for Video-based Person Re-identification

Video-based Person Re-identification Using Spatial-Temporal Attention Networks

Sign in with Google

Consider DeepAI Pro