Zhiyuan Fang

research

∙ 06/01/2023

End-to-end Knowledge Retrieval with Multi-modal Queries

We investigate knowledge retrieval with multi-modal queries, i.e. querie...

0 Man Luo, et al. ∙

research

∙ 11/13/2022

Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation

Incremental or continual learning has been extensively studied for image...

0 Zekang Zhang, et al. ∙

research

∙ 04/28/2022

Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos

In videos that contain actions performed unintentionally, agents do not ...

15 Arnav Chakravarthy, et al. ∙

research

∙ 12/09/2021

Injecting Semantic Concepts into End-to-End Image Captioning

Tremendous progress has been made in recent years in developing better i...

0 Zhiyuan Fang, et al. ∙

research

∙ 04/05/2021

Compressing Visual-linguistic Model via Knowledge Distillation

Despite exciting progress in pre-training for visual-linguistic (VL) rep...

0 Zhiyuan Fang, et al. ∙

research

∙ 01/12/2021

SEED: Self-supervised Distillation For Visual Representation

This paper is concerned with self-supervised learning for small models. ...

2 Zhiyuan Fang, et al. ∙

research

∙ 06/21/2020

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

A system capturing the association between video frames and textual quer...

6 Zhiyuan Fang, et al. ∙

research

∙ 06/13/2020

HRDNet: High-resolution Detection Network for Small Objects

Small object detection is challenging because small objects do not conta...

0 Ziming Liu, et al. ∙

research

∙ 05/15/2020

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Person search by natural language aims at retrieving a specific person i...

1 Zhe Wang, et al. ∙

research

∙ 03/11/2020

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Captioning is a crucial and challenging task for video understanding. In...

37 Zhiyuan Fang, et al. ∙

research

∙ 05/28/2019

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs

The process of identifying changes or transformations in a scene along w...

0 Tejas Gokhale, et al. ∙

research

∙ 04/07/2019

Modularized Textual Grounding for Counterfactual Resilience

Computer Vision applications often require a textual grounding module wi...

0 Zhiyuan Fang, et al. ∙

research

∙ 05/01/2018

Weakly Supervised Attention Learning for Textual Phrases Grounding

Grounding textual phrases in visual content is a meaningful yet challeng...

0 Zhiyuan Fang, et al. ∙

research

∙ 11/28/2016

Range Loss for Deep Face Recognition with Long-tail

Convolutional neural networks have achieved great improvement on face re...

0 Xiao Zhang, et al. ∙

Zhiyuan Fang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro