Inbetweening generation from keyframes plays an important role in 2D animation production. The inbetweening frames are interpolated between hand-drawn images. For example, approximately 800 keyframes and 5000 to 7000 inbetweenings are required for a 30-minutes animation. In conventional animation, keyframes have to be manually vectorized to unified strokes from the original drawings, where strokes are generally rough and uneven. This process is commonly tedious and time-consuming even for skilled animators.
To solve this issue, the inbetweening generation approaches were examined using two consecutive keyframes [2, 6, 8, 13, 12, 10, 14, 9, 20, 19, 26]. However, these approaches were mainly based on vectorized drawings and manual stroke correspondence between the keyframes. It remains a challeng to automatically obtain the stroke correspondence from original drawings without vectorization.
In raster keyframes, each stroke plays an important role in representing object contours and depth information. Given this, we consider the drawn strokes of keyframes as boundary lines of the separated closed areas. Therefore, we propose a method to interactively estimate stroke correspondences between two raster keyframes by using closed area correspondences, as shown in Figure 1.
The proposed system can help users avoid the tedious work of redrawing the vector lines. This system can also automatically generate inbetweening frames. To verify the feasibility of the proposed system, we calculate the accuracy of the estimated closed area correspondence using several examples of animation cuts. In addition, we measure the amount of user corrections until the closed area correspondence is completely determined. We confirm the visual qualities of stroke correspondences by importing the estimated stroke correspondence to commercial inbetweening products, such as Cacani .
2 Related Work
2.1 Stroke Correspondence
Many techniques have been proposed to automatically generate inbetweening frames from keyframes, which are composed of strokes (vector lines) [2, 6, 8, 13, 14, 9, 20, 19]. In general, stroke correspondences between keyframes are determined by imposing restrictions, such as drawing orders [4, 17]. However, in many cases, a keyframe is composed of hundreds of strokes, and users invest a lot of time and effort to determine the correspondence. In addition, several studies on automatic methods have focused on a narrower range of target strokes, such as boundary lines [7, 3], but they are difficult to determine inner strokes in closed area. Hence, recent studies discuss how to construct stroke correspondence.
, two keyframes are projected onto a three-dimensional virtual canvas. Several strokes set by the users in each keyframe will overlay each other when transforming the shape of the canvas, and the correspondence is estimated by combinatorial optimization based on distance. Whited et al. construct a connection graph based on a relationship between the keyframes, estimating the correspondence by performing a depth-first search on the graph. However, this process fails when storkes are occluded in more complex scenes, or change in shapes and numbers of strokes between keyframes take place.
Yang et al.  utilize characteristics of neighboring strokes (e.g., stroke shapes), and propose a method to create a correspondence between keyframes by using a greedy algorithm. This system enables users to manually specify several constraints and handle the estimated results (correspondence). In addition, Yang et al.  extend Yang’s method  to establish characteristics of stroke connections, and improve the estimation accuracy. However, with these methods, the users must manually redraw keyframes with vector lines by tracing over the original character image in advance.
Therefore, we focus on closed areas of character images (i.e., the character parts) with a method to directly estimate stroke correspondence from character images. Note that the manual vector redrawing process can be skipped, hence users can concentrate on designing movements of each stroke.
2.2 Closed area Correspondence
Constructing closed area correspondences between two raster images has been thoroughly investigated 
. In the field of reference-based colorization of line drawings, Sato et al. utilize the positional and connection relationships of closed areas in the keyframes to construct their correspondence. Maejima et al.  propose a method to (i) estimate the corresponding closed areas in the reference keyframes and (ii) color target inbetweenings based on (i) estmation. While these approaches are suitable for colorization tasks, they are unsuitable for inbetweening generation because they cannot make vertex correspondences to each pair of corresponding strokes. Consequently, we combine closed area and stroke-based methods that presume to be the optimum method. With this proposed frameworks, we can directly design inbetweenings from two keyframes without vectorization steps.
3 Proposed method
In this section, we introduce a framework to construct stroke correspondences. First, we separated two input images (keyframes) into closed areas while removing noises. Then, we estimate closed area correspondences between keyframes by using a greedy algorithm. Note that the user can manually correct the correspondence using the correction interface. Finally, the system automatically estimates stroke correspondences by referring to the estimated closed area correspondence.
Figure 2 shows the overview of the proposed system to estimate the stroke correspondence between the two continuous keyframes. Closed areas in the two input keyframes are labeled, and the depth context relationship of the closed areas is estimated, followed by a greedy algorithm to estimate all closed area correspondences between the two keyframes. Finally, users correct the closed area correspondence if they consider it is necessary, and the area correspondence is updated from the corrected results. Stroke correspondence relationship between the two input keyframes is determined from the estimated closed area correspondence.
3.2 Labeling Process
We first scan images and (keyframes) and import them into the PC. The color images are converted to grayscale to perform the labeling process with OpenCV, and then a median filter in kernel size
removes fine noises and repairs broken strokes. Subsequently, closed areas in the keyframes are labeled with threshold binarizingpixels, which achieved the most stable results in multiple experiments. The labeled closed area of keyframe A is , and the closed area of keyframe B is , as shown in Figure 3.
Information such as assigning color, area, coordinates of the centroid, and the label and angle of the centroid in the adjacent closed area must be obtained from closed area and . At this time, the closed area that touches the greatest proportion of the screen’s outer circumferences is estimated as the background.
3.3 Depth Context of Closed Area
We extract the intersections in the input keyframe using OpenCV, and draw a circle with radius placing the intersections as a center point. The proposed system estimates the depth context of each closed area from the ratio of the closed areas in those circles. Radius is regarded as the square root of ((screen size-background label size) ) so that the area of the circle is the area of the character on the screen.
As a result, the information of the adjacent closed area in the same keyframe and its own depth context relationship are obtained as for the information of the closed areas and .
3.4 Greedy Algorithm
Our proposed system focuses on estimating all closed area correspondence as the target problem. This problem can be solved with the estimation of the closed area correspondence near the seed pair, which is considered to be the pair among all combinations as sub-problem. Therefore, we use a greedy algorithm to solve this problem and seek local optimum solutions.
In our algorithm, the score of the closed area near the seed pair is calculated and updated. The first seed pair is the starting point of the greedy algorithm. Then, the pair with the highest score is considered as a new seed pair, and the score is updated iteratively. We repeat the calculation until the scores of all closed areas are updated at least once. Finally, we swap the first and the second keyframes to improve accuracy and update the scores accordingly.
To determine the seed pair, it is necessary to calculate the seed score by quantifying the similarity of the closed areas. The closed area of the labeled keyframe A is . The closed area of the keyframel B is . The formula for calculating the seed score of the closed area is ,as follows.
where denotes the calculated value by matching feature points between closed areas and using AKAZE . denotes the area ratio of closed area to . Note that is when , and when .
The seed score is calculated for all combinations of and , and the combination with the highest seed score is considered to be the first seed pair . For determining the seed, the seed score of the other keyframe’s closed area is obtained as a tentative score for the information of closed areas and .
To estimate correspondence near seed pair, the seed pair updated their scores using the positional relationship and depth context relative to the seed pair.
3.5 Closed Area Association
Let be the set of closed area near , and let be the set of closed area near . We calculate the score as the relationship between and . The formula for calculating this relationship is given below.
where and are constant values. denotes that the absolute value of the difference between angle and is ; the angle between and is . denotes a step function that compares the context of and with the context of and , outputting 1 if they are similar and 0 if they are different. The formula for calculating the score of the closed area is as follows (The highest score becomes the complement of ).
where and are similar to the seed score in Equation (1), and in Equation (3.3) is the score (Equation 2) which is the correspondence between closed areas and ). The tentative score obtained when calculating the seed score is .
At this step, the paired closed areas and are determined as a new seed pair, the correspondence in the neighboring seed pair is estimated, and all the closed area scores are updated. This loop performs as many times as the number of the closed areas in the keyframes, and the highest score becomes the first seed pair. Then the score is updated again by swapping keyframes A and B. In order to prevent an infinite loop, a pair that has already been treated as a seed pair will not be retreated if an alternative seed pair can be selected.
Figure 5 shows the work process of the greedy algorithm, that estimates closed area correspondence.
3.6 User Correction
When errors are found in the closed area correspondence estimated by the greedy algorithm, users can manually correct them with our interface. Based on the correction, the system re-calculates the scores and the correspondence. Theses corrections can also improve the accuracy of stroke correspondences.
3.7 Stroke Correspondence
Boundary lines of the closed areas in keyframes are regarded as strokes. Therefore, we construct stroke correspondences by adopting the estimated closed area correspondences. Figure 6 shows the process of the stroke correspondence estimation. For example, when (1) two closed areas in keyframe A ( and ) are separated by one stroke (green) and (2) two closed areas in keyframe B ( and ) are separated by one stroke (green), we can easily determine that stroke corresponds to stroke . Therefore, we find correspondences between strokes that separated two or more closed areas.
This method for determining stroke correspondence between two consecutive keyframes suggests the possibility of avoiding the trouble of redrawing the keyframes in vector lines, which is a requirement in previous work. In addition, matching accuracy can be improved by considering not only the corresponding relationships of the position and connection, but also the depth relationships when determining closed area correspondence.
4 User Study
We implemented the proposed approach on a desktop computer with Intel i7-10700 CPU 2.90GHz and GeForce RTX 2070 SUPER GPU. The purpose of the user study is to verify the feasibility of estimating stroke correspondence by using closed area labeling and to determine whether the burden on the user is reduced, compared to the existing method. To verify user’s burden reduction compared to the existing method, we used Cacani  to construct stroke correspondences. We investigated the effectiveness of the proposed system through questionnaires. To determine the feasibility of the system, we measured the time required to carry out the work.
4.1 Comparison Study
Our main objective is not to automatically construct the correspondence but rather provide a good starting point for manual editing. Hence, we incorporate a manual editing step to modify closed area correspondences in the keyframes. This process is similar to Cacani’s procedure: (1) initializing the correspondence using an inbetweening generator  and (2) modifying it manually. Therefore, this experiment adopted as eveluation criteria both the accuracy of closed area correspondences estimation (without user correction), and the number of user corrections until the closed area completely matched.
4.1.1 Experimental Method
To verify the estimation qualities, we estimated closed area correspondences with the following three methods:
: shape, connection and depth relationship.
: shape and connection relationship.
: shape only.
We measured the matching accuracy of closed area correspondences and stroke correspondences between the two consecutive keyframes (without user correction). The accuracy of the closed area correspondence between the input keyframes A and B is calculated by the following equation.
where is the number of closed areas in keyframe A, is the number of closed areas in keyframe B, and is the difference of closed areas with changed pairs, between the correspondence before user correction and the complete correspondence after user correction. We also measured the number of manual corrections until the closed areas completely matched.
Similarly, the accuracy of stroke correspondences is calculated by the following equation.
where is the number of strokes of keyframe A split by the author, is the number of strokes of the keyframe B split by the author, and is the number of different pair strokes due to the different results from Cacani versus the proposed method. Note that no pair result from the proposed method is counted as .
In addition, we imported the estimated stroke correspondence of the two consecutive keyframes (with user correction step) into Cacani to identify (1) the visual qualities of inbetweening frames and (2) the feasibility of stroke correspondence estimation using our method.
4.1.2 Case Study
We used four types of animation cuts from online video resources:
cut with little movement.
cut with large movement.
cut with little movement and change in the front-back relationship.
cut with large movement and change in the depth context relationship.
4.2 System Evaluation
We conducted system evaluation with 9 graduate students aged 20–40 years working in information systems in Japan (6 males and 3 females) including experienced animators. We verified the burden felt by the users when determining the corresponding closed areas and estimating the corresponding strokes using the proposed method, versus determining the corresponding strokes using Cacani.
4.2.1 Experimental method
The participants were asked to determine the corresponding closed areas and the corresponding strokes from the two keyframes using the proposed method and Cacani. Figure 7 shows keyframes used in the experiment.
First, we took 10 minutes to explain how to operate Cacani, we then asked the participants to practice using simple figures (see Figure 7(7a and 7b)). We limited the practice to 20 minutes to determine whether the participants fully understood Cacani. Next, we asked the participants to construct stroke correspondences of complex characters (see Figure 7(7c)). Note that all participants used the same work environment.
At the end of the construction process, we examined the feasibility of the proposed method through questionnaires. In addition to the survey on their burden, we also investigated whether the results were satisfied and whether the operation was easy to understand.
5.1 Objective Evaluation
Figure 8 shows the estimated results of , , and methods. This figure also shows the results of user-corrected correspondences, the estimated stroke correspondence based on the closed area correspondences, and the results of automatically generated inbetweenings of sequences based on the proposed estimated stroke correspondence. Note that the colors of the closed areas and strokes are randomly assigned, but the pairs of closed area/strokes are given the same colors. Table 1 and Table 2 show the matching accuracy of closed area correspondences and the average number of user corrections.
The shape-only approach is lower in the estimation accuracy than other methods ( and ), so the characteristics of connection relationship is needed to produce better estimations. On the other hand, from the results of (with the depth relationship estimation) and (without the depth relationship estimation), we confirm that the quality of the proposed system depends on input scenes. If input keyframes have a similar depth relationship like (see Figure 9d), we can obtain plausible correspondences and reduce the number of user corrections. On the other hand, if it is difficult to determine the depth relationships (see Figure 9a,b,c), has lower accuracy than .
The average accuracy of stroke correspondence estimation using our methods is 59.513% (: 69.595%, : 58.670%, : 41.304%, and : 68.481%). A possible reason why the estimation accuracy is insufficient ( 80%) is that the actual relationships between closed areas are changed in the input keyframes and strokes might disappear due to occlusion, so several strokes are identified as “no correspondence,” as shown in Figure 10.
In addition, the proposed system has difficulties automatically generating plausible inbetweening frames (see Figure 8) since the accuracy of the stroke correspondence estimation is insufficient. However, the current qualities are sufficient to automatically generate tight inbetweenings like , the upper parts of & , and the lower parts of .
5.2 User Experience
Figure 11 shows the post-experiment questionnaire results. We confirmed that participants who were more accustomed to draw illustrations felt more comfortable designing cartoon images with Cacani, but all participants answered that the proposed method was easier than working with Cacani. Regarding the ease of understanding the operation, they also answered that the proposed method was equivalent or easier to understand compared to Cacani. Note that the average operating time was 7 min 17 sec for Cacani and 1 min 6 sec for the proposed method. Therefore, we can conclude that the proposed method imposed fewer burdens on the participants than Cacani.
In this paper, we proposed a novel method to estimate stroke correspondences between two raster images with labeled closed areas. The proposed method allows users to manually edit the estimated results, thus improving the accuracy of the correspondences. We verified the feasibility of the proposed method using four types of sequences. In addition, the comparison study and the questionnaire suggest that the burden on the users can be reduced when using the proposed method compared to the conventional tools when determining stroke correspondence. Therefore, the proposed method of labeling closed areas is effective in estimating the stroke correspondence, especially in cases where the corresponding closed areas do not change significantly. The participants reported that estimating stroke correspondences was easier and quicker with the proposed approach.
In our prototype, we found that the accuracy drops in estimating stroke correspondence when the relationship of closed areas changes significantly between the keyframes, for example, a new closed area may be generated due to large movements of characters. If a stroke exists independently without a closed area such as wrinkles on clothes, the proposed approach will fail to calculate the stroke correspondence. In order to solve the above issues and increase estimation accuracy, we plan to split the strokes in the correct position and improve the estimation accuracy of depth relationships using deep learning based 3D human shape and pose
. Note that AKAZE may be inappropriate for extracting feature points for keyframes due to less information such as color differences and shadow information compared to photographs. Therefore, the feature extraction method can be improved in future.
To improve the use of the proposed approach, we would like to develop the user interfaces for not only user correction, but also user guidance in sketch input . The depth estimation has low accuracy due to inaccurate extraction of the required junctions. The possible solution might be to detect closed area occluded by other closed area to improve the estimation of hidden strokes. Finally, we consider the inbetweening frames for freehand drawing of the keyframes .
This work was supported by Grant from Tateishi Science and Technology Foundation, JSPS KAKENHI grant JP20K19845 and JP19K20316, Japan.
-  (2013) Fast explicit diffusion for accelerated features in nonlinear scale spaces. In Proceedings of the British Machine Vision Conference, pp. 13:1–13:11. External Links: Cited by: §3.4.
-  (2006) Latent doodle space. Computer Graphics Forum 25 (3), pp. 477–485. External Links: Cited by: §1, §2.1.
-  (2009) Compatible embedding for 2d shape animation. IEEE Transactions on Visualization and Computer Graphics (TVCG) 15 (5), pp. 867–879. External Links: Cited by: §2.1.
-  (1975) Computer animation of free form images. In Proceedings of the 2nd Annual Conference on Computer Graphics and Interactive Techniques, New York, NY, USA, pp. 78–80. External Links: Cited by: §2.1, §4.1.
-  (2021) CACANi: 2d animation & inbetween software. Note: https://cacani.sg/ Cited by: §1, §4.
-  (2015) Vector graphics animation with time-varying topology. ACM Transactions on Graphics (TOG) 34 (4), pp. 145:1–145:12. External Links: Cited by: §1, §2.1.
-  (2006) Re-using traditional animation: methods for semi-automatic segmentation and inbetweening. In Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA ’06), Goslar, DEU, pp. 223–232. External Links: Cited by: §2.1.
-  (2001) Automatic in-betweening in computer assisted animation by exploiting 2.5 d modelling techniques. In Proceedings Fourteenth Conference on Computer Animation (Cat. No. 01TH8596), Seoul, South Korea, pp. 192–200. External Links: Cited by: §1, §2.1.
-  (2018) Toonsynth: example-based synthesis of hand-colored cartoon animations. ACM Transactions on Graphics (TOG) 37 (4), pp. 167:1–167:11. External Links: Cited by: §1, §2.1.
-  (2017) Hand-drawn animation with self-shaped canvas. In ACM SIGGRAPH 2017 Posters, New York, NY, USA, pp. 5:1–5:2. External Links: Cited by: §1, §2.1.
-  (2019) Matching of interkeyframe strokes considering pairwise-constraints using self-shaped canvas. In Proceedings of the 81-th National Convention of IPSJ, Vol. 2019, pp. 165–166. Cited by: §2.1.
-  (2021) View-dependent formulation of 2.5d cartoon models. Arxiv. External Links: Cited by: §1.
-  (2016) Active comicing for freehand drawing animation. In Mathematical Progress in Expressive Image Synthesis III, Singapore, pp. 45–56. External Links: Cited by: §1, §2.1.
-  (2014) Quasi 3d rotation for hand-drawn characters. In ACM SIGGRAPH 2014 Posters, New York, NY, USA, pp. 12:1–12:1. External Links: Cited by: §1, §2.1.
-  (2021) DualFace: two-stage drawing guidance for freehand portrait sketching. CoRR abs/2104.12297. External Links: Cited by: §6.
-  (2019) Graph matching based anime colorization with multiple references. In ACM SIGGRAPH 2019 Posters, New York, NY, USA, pp. 13:1–13:2. External Links: Cited by: §2.2.
-  (1981) Inbetweening for computer animation utilizing moving point constraints. In Proceedings of the 8th Annual Conference on Computer Graphics and Interactive Techniques, Vol. 15, New York, NY, USA, pp. 263–269. External Links: Cited by: §2.1.
-  (2014) Reference-based manga colorization by graph correspondence using quadratic programming. In SIGGRAPH Asia 2014 Technical Briefs, New York, NY, USA, pp. 15:1–15:4. External Links: Cited by: §2.2.
-  (1993) 2-d shape blending: an intrinsic solution to the vertex path problem. In Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, Vol. 93, New York, NY, USA, pp. 15–18. External Links: Cited by: §1, §2.1.
-  (1992) A physically based approach to 2–d shape blending. ACM SIGGRAPH Computer Graphics 26 (2), pp. 25–34. External Links: Cited by: §1, §2.1.
-  (2010) Betweenit: an interactive tool for tight inbetweening. Computer Graphics Forum 29 (2), pp. 605–614. External Links: Cited by: §2.1.
-  (2019) Visual feedback for core training with 3d human shape and pose. In 2019 Nicograph International (NicoInt), pp. 49–56. External Links: Cited by: §6.
-  (2015-10) Autocomplete hand-drawn animations. ACM Trans. Graph. 34 (6). External Links: Cited by: §6.
-  (2015) Region-based painting style transfer. In SIGGRAPH Asia 2015 Technical Briefs, New York, NY, USA, pp. 8:1–8:4. External Links: Cited by: §2.2.
-  (2018) FTP-sc: fuzzy topology preserving stroke correspondence. Computer Graphics Forum 37 (8), pp. 125–135. External Links: Cited by: §2.1.
-  (2017) Context-aware computer aided inbetweening. IEEE Transactions on Visualization and Computer Graphics (TVCG) 24 (2), pp. 1049–1062. External Links: Cited by: §1, §2.1.