PiClick: Picking the desired mask in click-based interactive segmentation
Click-based interactive segmentation enables productive pixel-level annotation and image editing with simple user clicks, whereas target ambiguity remains a problem hindering precise segmentation. That is, in scenes with rich context, one click may refer to multiple potential targets residing in corresponding masks, while most interactive segmentors can only generate one single mask and fail to capture the rich context. To resolve target ambiguity, we here propose PiClick to produce semantically diversified masks. PiClick leverages a transformer network design wherein mutually interactive mask queries are integrated to infuse target priors. Moreover, a Target Reasoning Module is designed in PiClick to automatically imply the best-matched mask from all proposals, significantly relieving target ambiguity as well as extra human intervention. Extensive experiments conducted on all 9 interactive segmentation datasets not only demonstrate the state-of-the-art segmentation performance of PiClick, but also reduces human interventions with multiple proposal generation and target reasoning. To promote direct usage and future endeavors, we release the source code of PiClick together with a plug-and-play annotation tool at https://github.com/cilinyan/PiClick.
READ FULL TEXT