VisualProg Distiller: Learning to Fine-tune Non-differentiable Visual Programming Frameworks

09/18/2023
by   Wentao Wan, et al.
0

As an interpretable and universal neuro-symbolic paradigm based on Large Language Models, visual programming (VisualProg) can execute compositional visual tasks without training, but its performance is markedly inferior compared to task-specific supervised learning models. To increase its practicality, the performance of VisualProg on specific tasks needs to be improved. However, the non-differentiability of VisualProg limits the possibility of employing the fine-tuning strategy on specific tasks to achieve further improvements. In our analysis, we discovered that significant performance issues in VisualProg's execution originated from errors made by the sub-modules at corresponding visual sub-task steps. To address this, we propose “VisualProg Distiller", a method of supplementing and distilling process knowledge to optimize the performance of each VisualProg sub-module on decoupled visual sub-tasks, thus enhancing the overall task performance. Specifically, we choose an end-to-end model that is well-performed on the given task as the teacher and further distill the knowledge of the teacher into the invoked visual sub-modules step-by-step based on the execution flow of the VisualProg-generated programs. In this way, our method is capable of facilitating the fine-tuning of the non-differentiable VisualProg frameworks effectively. Extensive and comprehensive experimental evaluations demonstrate that our method can achieve a substantial performance improvement of VisualProg, and outperforms all the compared state-of-the-art methods by large margins. Furthermore, to provide valuable process supervision for the GQA task, we construct a large-scale dataset by utilizing the distillation process of our method.

READ FULL TEXT

page 4

page 10

page 11

research
06/07/2022

DynaMaR: Dynamic Prompt with Mask Token Representation

Recent research has shown that large language models pretrained using un...
research
12/20/2022

Large Language Models Are Reasoning Teachers

Language models (LMs) have demonstrated remarkable performance on downst...
research
11/30/2022

Explicit Knowledge Transfer for Weakly-Supervised Code Generation

Large language models (LLMs) can acquire strong code-generation capabili...
research
10/23/2022

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

How to usefully encode compositional task structure has long been a core...
research
07/15/2023

CPET: Effective Parameter-Efficient Tuning for Compressed Large Language Models

Parameter-efficient tuning (PET) has been widely explored in recent year...
research
05/04/2023

Neuralizer: General Neuroimage Analysis without Re-Training

Neuroimage processing tasks like segmentation, reconstruction, and regis...
research
06/01/2023

Differentiable Tree Operations Promote Compositional Generalization

In the context of structure-to-structure transformation tasks, learning ...

Please sign up or login with your details

Forgot password? Click here to reset