. To achieve these, a large volume of data is required to obtain satisfactory performance. However, the deep learning models trained with traditional supervised learning methods often perform poorly or even fail when only a small amount of data is available or when they need to adapt to unseen tasks or time-varying ones. In practical signal recognition tasks, the collection and annotation of abundant data are notoriously expensive, especially for some rare but important signals. Another critical challenge is the presence of noise, because the signal data varies for different signal-to-noise ratios (SNRs), and in real-world scenarios, the deep neural networks (DNNs) have to adapt to real-time variations in SNRs.
Meta-learning technique Finn et al. (2017, 2018); Yoon et al. (2018); Zhang et al. (2018); Balaji et al. (2018) seeks to resolve above challenges by learning how to learn like humans do. We know that humans can effectively utilize prior knowledge and experience to learn new skills rapidly with very few examples. Similarly, the meta-learner is trained on the distribution of homogeneous tasks, with the goal of learning internal features that are broadly applicable to all tasks, rather than a single individual task. Equipped with these sensitive internal features, the meta-learner is able to produce significant improvements of adaptation ability via fine-tuning. Recently, meta-learning has demonstrated promising performance in many fields Liu et al. (2019); Khodadadeh et al. (2019); Xie et al. (2019); Alet et al. (2019); Jerfel et al. (2019); Khodak et al. (2019); Zhou et al. (2019); Rajeswaran et al. (2019); Zhuang et al. (2020); Kong et al. (2020); Denevi et al. (2020); Chen et al. (2020); Yao et al. (2020); Baik et al. (2020); Sitzmann et al. (2020); Harrison et al. (2020); Ji et al. (2020); Boutilier et al. (2020); Confavreux et al. (2020); Goldblum et al. (2020). Please see the supplementary material for detailed related work in Section G. However, for some particular fields, especially signal recognition, existing meta-learning methods generally neglect the prior knowledge of the signals, i.e., temporal information and complex domain information. For models with insufficient training data, it is crucial to incorporate this prior knowledge.
for signal recognition, respectively. Attention mechanisms have been widely adopted in many time series learning tasks, such as natural language processing. It became an integral component of Recurrent neural networks (RNNs), long short-term memoryHochreiter and Schmidhuber (1997) and gated recurrent Chung et al. (2014) neural networks, until Transformer Vaswani et al. (2017) was proposed. Since then, self-attention is able to replace RNN with better performance and parallel computation. Therefore, we adapt the attention mechanism to the signal recognition task. Since the signals contain both magnitude and phase, complex numbers are used for the representation of signals. Consequently, complex arithmetic operations are the essential part of signal processing. Intuitively, complex-valued neural networks should be built to address the signal recognition problem. However, to the best of our knowledge, the meta-learning method equipped with attention mechanisms in the complex-valued neural networks has not been investigated.
In this paper, we propose a Complex-valued Attentional MEta Learner (CAMEL), for few-shot signal recognition, which generalizes meta-learning and attention to the complex domain. With the help of these novel designs, CAMEL has succeeded in capturing more information from the signal data. The prior knowledge assists CAMEL in preventing overfitting and improving its performance. For better understanding the proposed architecture, the overview of CAMEL is illustrated in Figure 1. Notice that CAMEL can be applied to any kind of complex-valued data. By leveraging existing meta-learning and few learning methods in extensive experiments, the proposed method shows consistently better performance compared with the state-of-the-art methods. The effectiveness of each novel component in CAMEL is verified via ablation studies. From the convergence analysis of complex-valued MAML, it is shown that CAMEL is able to find an first-order stationary point for any positive after at most iterations with second-order information.
The code of this paper will be released upon acceptance. Please see the supplementary material for notations, detailed derivation of Lemma, and more experiment results.
Meta-learning is one of the most suited techniques to solve signal recognition problems, because, in the real world, signal annotation is expensive and models need to adapt to changing SNRs, whereas meta-learning has an explicit goal of fast adaptation. To further improve the effectiveness of meta-learning in applications to signal processing, we consider incorporating prior knowledge of signal data to the model by CAMEL that can generalize meta-learning with attention to the complex domain so that we are able to extract complex domain and temporal information from signal data.
However, lots of so called complex-valued neural networks treat a complex number as two real numbers, i.e., real and imaginary parts of the complex number, and design special network structures to recover complex operations using these real numbers. We refer to these special complex-valued neural networks as in-phase/quadrature complex-valued neural networks (IQCVNNs). Although IQCVNNs can deal with complex-valued problems, essentially the neural nets are still working with real-valued ones, since IQCVNNs work without defining complex derivatives and the complex chain rules in back-propagation. We refer to the complex-valued neural networks that define complex derivatives and the complex chain rules as complex derivatives complex-valued neural networks (CDCVNNs). It turns out that compared with IQCVNNs, CDCVNNs can perform complex operations with fewer parameters. To be more specific, we give the following lemmas to show the significance of CDVNNs compared with IQCVNNs with respect to time complexity.
If a function is complex analytic, the time complexity of the derivative of in IQCVNNs are twice that of the complex derivative of in CDCVNNs.
The complex-valued convolutional layer and complex-valued fully connected layer is complex analytic.
As we know, the convolutional and fully connected layers are the most computationally intensive parts of a neural network. Therefore, although it has a similar effect to the complex-valued neural network, IQCVNNs far exceed the CDCVNNs in terms of the time complexity of back-propagation. Especially, meta-learning requires second-order information of the objective function to guarantee convergence Fallah et al. (2020), which forces us to implement CDCVNNs. The complex chain rule is a key to implementing CDCVNNs. According to the complex chain rule, we are able to derive the outer-loop update process of CAMEL, which is different from that of MAML.
Complex-valued attention is also necessary for CAMEL to obtain the temporal information from signal data. However, in complex-valued attention, it is required to compute the derivative of the mapping from complex to real domain since calculating the similarity coefficient between two pairs leads to the real numbers in the activation function of the complex-valued neural nets. Given the following lemma, we know that the derivative of the function will be non-analytic, since constant function is useless in identifying the features of data.
, is analytic if and only if is a constant function.
To the best of our knowledge, attention in the complex domain has rarely been studied. 111A closely related work is Yang et al. (2020), which proposed a complex transformer and developed attention and encoder-decoder network operating for complex input. However, they utilized eight attentions to represent complex-valued attention without considering the nonlinear components of attention such as softmax and activation functions, etc.. Therefore, we here study complex-valued attention and propose CAMEL as presented in the next section.
Please see the supplementary material for the definitions of complex derivative, analytic function, and the Cauchy-Riemann equations in Section D.
3.1 Algorithm Design
CAMEL utilizes complex-valued neural networks and attention to provide prior knowledge, i.e., complex domain and temporal information, to prevent overfitting during training. It resembles its namesake animal, camel, which stores water and nutrients with its hump to ensure its survival in extreme conditions.
CAMEL updates parameters through back-propagation by the chain rule. However, traditional chain rule does not work, because CAMEL is non-analytic.
The chain rule for complex variables The chain rule is different when the function is non-analytic. For a non-analytic composite function , where , we can apply the following chain rule:
where is a continuous function and
denotes the conjugate vector of. Note that if the function is analytic, the second term equals zero and (1) turns into the normal chain rule. In the case of matrix derivatives, the chain rule can be written as:
where is non-analytic, and are two complex matrices, and denotes the transpose of a matrix.
Under (1) and (2), CAMEL is able to update the parameters as expected. Formally, we define the base model of CAMEL to be a complex-valued attentional neural network with meta-parameters . The goal is to learn a sensitive initial , for which the network performs well on the th query set after few gradient update steps on the th support set to obtain . Here,
is a task randomly sampled from the task probability distribution. The update steps above are termed as the inner-loop update process, which can be represented as:
where is a learning rate and denotes the gradient on the support set of task . The meta-parameters are trained by optimizing the performance of . Consequently, the meta-objective is defined as follows:
where denotes the loss on the query set of task after the inner-loop update process. As the underlying is unknown, evaluation of the expectation in the right hand side of (4) is often computationally prohibitive. Therefore, we can minimize the function with a batch of tasks that are independently drawn from , which can be expressed as:
The optimization of the meta-objective is referred to as the outer-loop update process, which can be expressed as:
where denotes the meta learning rate. Define
In response to complex meta-parameters , we have
3.2 Complex-valued Attention
The attention mechanisms are widely used in various areas of deep learning, but attention for the complex domain have rarely been addressed. A significant reason is that the attention has to utilize the softmax function to calculate the similarity coefficient, which must be real numbers rather than complex numbers. According to Lemma 3, it is a constant function or a non-analytic function. However, the constant functions are useless and discardable in neural networks, while non-analytic functions cannot be derived at arbitrary points in complex domain. As a result, we had to utilize the complex gradient vector.
Complex gradient vector If is the real function of a complex vector , then the complex gradient vector is given by Hjørungnes (2011):
Complex-valued softmax function Under (10), we are able to define the generalized complex-valued softmax function as:
where denotes the softmax function in real case and denotes any function that maps complex numbers to real numbers, such as (i.e., the magnitude of the complex numbers), , and , etc.
Given a complex matrix , we can compute the complex matrix , and
using linear transformations, which are similar to complex-valued fully connected layers. Then the complex-valued attention can be written as:
where acts on each row of the matrix and denotes the row dimension of i.e. scaling factor.
Complex-valued multi-head attention Complex-valued multi-headed attention allows models to jointly focus on information from different representations.
where , and are the projection matrices and denotes the concatenation of inputs matrices.
Normalization, such as batch normalizationIoffe and Szegedy (2015) and layer normalization Ba et al. (2016), is an important component of neural networks. Especially, the batch normalization is commonly employed. However, for a complex vector
, its variance, which has to be computed in normalization, is real. According to Lemma3, the variance is non-analytic. Therefore, in the back-propagation of complex-valued normalization, we have to utilize the complex gradient vector (10). Define as the complex scaling parameters and as the complex shift parameters, the complex-valued normalization can be expressed as:
where and denote the expectation and variance, respectively, and denotes the conjugate transpose of .
Complex-valued activation function The activation function is nonlinear, so that it is scarcely to be analytic. Most of the well-known activation functions are not analytic in the complex domain, such as Sigmoid, Tanh, and ReLU Goodfellow et al. (2016), etc. Especially, the complex Sigmoid and Tanh is not bounded while in complex ReLU the complex numbers cannot be compared with zero. To this end, the complex-valued activation function can be defined as:
where denotes the activation function in real case. In this way, the and are bounded because the real and imaginary parts of them are bounded. Meanwhile, the complex can be compared with zero because the real and imaginary parts of inputs can be compared with zero. However, since the complex-valued activation functions defined above are non-analytic in most cases, the complex chain rule is required for derivatives.
Please see the supplementary material for detailed complex-valued convolutional layer and complex-valued fully connected layer in Section F.
4 Convergence of CAMEL
In this section, we will show the convergence behavior of complex-valued MAML by following the previous work Fallah et al. (2020) in proving the convergence MAML in the real domain. To prove the complex-valued MAML, we need to utilize twice continuously differentiable, -smooth, -Lipschitz continuous, and Hessian, etc. in complex domain. Please see the supplementary material for detailed Assumptions, Lemma, and proof of Theorem 1 in Section H.
after running for
iterations, where is defined in Assumption 1 and and denotes the size of the support set and query set, respectively.
The result in Theorem 1 demonstrates that after running CAMEL for iterations, we are able to find a point at which the expected gradient norm satisfies (16).
We train the model on 3 datasets: RadioML 2016.10A O’Shea et al. (2016), a dataset with 220,000 total samples, 20,000 samples for each class and 11,000 samples for each SNR, consists of dimension input X in 11 classes. The 11 classes correspond to 11 modulation types: 8PSK, AM-DSB, AM-SSB, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, WBFM. And RadioML 2016.04C O’Shea et al. (2016)
, a synthetic dataset, is generated with GNU Radio, consisting of about 110 thousand signals. These samples are uniformly distributed in SNR from -20dB to +20dB and tagged so that we can evaluate performance on specific subsets. Actually 2016.10A represents a cleaner and more normalized version of the 2016.04C dataset. The third one is SIGNAL2020.02Dong et al. (2021b), whose data is modulated at a rate of 8 samples per symbol, while 128 samples per frame, with 20 different SNRs, even values between [2dB, 40dB].
5.1 Experimental setup
The CAMEL is implemented in PytorchPaszke et al. (2019) with python on a RTX3090 Graphics Processing Units, and trained using the Adam optimizer Kingma and Ba (2014)
. In the classification experiments of three datasets, RadioML 2016.04C, RadioML 2016.10A and SIGNAL 2020.02, the default hyper-parameters are as follows: the training epochs are 400,000; the meta batch size is 2; the meta-level outer learning rate is 0.001 and the task-level inner update learning rate is 0.1; the task-level inner update step is 5 and the update step for fine-tuning is 10. All of our experiments use the same hyper-parameter as the default setting. We change the support set shot number in 1 and 5 to have different results of 5-way 1-shot case and 5-way 5-shot case.
5.2 Our Model
First, we study the influence of adding a multi-head self attention mechanism in this network, which can focus attention on important information. We perform a multi-head attention with 8 heads. Instead of performing a single attention function with input -dimensional keys, values and queries, it is found beneficial to linearly project the queries, keys and values times with different, learned linear projections to , , dimensions, respectively. Then perform the attention function in parallel, concatenate the outputs and do the projection again to get the final result Dauphin and Schoenholz (2019). In our experiments, as illustrated in Table 1, the performance is much better with the addition of the multi-head attention mechanism. As the batch size increases, the performance improves while increasing computation and time-consuming. To make a trade-off, we set the batch size to be 64 when using multi-head attention. We observe that the model with attention mechanism demonstrates a greater ability to increase the accuracy owing to various improvements.
|MAML Finn et al. (2017)||86.57%||94.50%||43.26%||67.77%|
|SNAIL Mishra et al. (2018)||71.18%||78.48%||35.01%||36.34%|
|Reptilec Nichol et al. (2018)||69.16%||92.32%||55.01%||69.39%|
|MAML+complex+CT Yang et al. (2020)||96.40%||97.50%||58.40%||69.80%|
shows 95% confidence intervals over tasks.
|MAML Finn et al. (2017)||88.93%0.13%||93.59%0.62%|
|SNAIL Mishra et al. (2018)||89.21%0.75%||96.90%0.19%|
|Reptile Nichol et al. (2018)||87.08%2.88%||92.07%5.65%|
|MAML+complex+CT Yang et al. (2020)||93.58%1.15%||96.52%0.08%|
Further study concerns the influence of adding a complex-valued neural network, because we notice that complex numbers could have a richer representational capacity. For these signals inputs, using complex number can probably obtain more useful details than real numbers and could also facilitate noise-robust memory retrieval mechanismsTrabelsi et al. (2018b). We need to deal with the complex building blocks to construct a complex number neural network: representing of complex numbers, Complex gradient vectors, complex weight initialization, complex convolutions, complex-valued activation, complex-valued normalization and complex-valued multi-head attention mechanism. These blocks are determined by their own algorithm and the algorithm of complex numbers. We figure out from the results in Table 1 and Table 2 that this complex features improve the classification accuracy in both 5-way 1-shot and 5-way 5-shot cases with different datasets.
In the training process, we adjust the number of convolution kernels to 128. For the multi-head attention part, we set the source sequence length and output sequence length to 64, number of heads to 8. We observe that such complex-valued models are more competitive than their real valued counterparts. These build our final model: CAMEL, Model-Agnostic Meta-Learning with features of multi-head attention and complex-valued neural network. Compared with the other meta-learning models, CAMEL achieves the best classification accuracy.
The Complex Transformer Yang et al. (2020) implements complex attention in another way: It rewrites all complex functions into two separate real functions and computes the multiplication of queries, keys and values to get the complex attention with 8 attention functions having different inputs. We also conduct SNAIL Mishra et al. (2018), which combines a casual attention operation over the context produced by temporal convolutions, and Reptile Nichol et al. (2018), which uses only first-order derivatives for meta-learning updates. To have a comparison, Table 1 and Table 2 list the accuracies of several models based on MAML applied on different datasets. Results in thses two tables demonstrate that our model CAMEL have the state-of-the-art performance among all. In particular, some models are not well performed on the task in the dataset SIGNAL2020.02, but our model CAMEL still has a stable and great performance on this challenging task. Figure 3 indicates that CAMEL could get the highest accuracy at a relatively fast convergence speed. The results also show that, on these challenging signal classification tasks, the CAMEL model apparently outperforms other meta-learning models in accuracy and stability, which could be figured out from the smooth accuracy curves and narrow confidence intervals for CAMEL model in both 1 shot and 5 shot cases.
5.3 Ablation study
In this section, we have conducted the ablation studies on CAMEL in three scenarios, as shown in Table 3. The first scenario uses samples whose SNR 0, of which 75% is selected as the training set and 25% is selected as the test set. For the second scenario, showed in the column "SNR = 0" in Table 3, we pick samples with SNR=0 and randomly select 75% of them to form the training set and 25% of them as the test set. The third scenario forms the (Prediction-Other) P-O set as follow: pick 5 classes of signal samples (SNR 0) as set P and the rest 5 classes of samples (SNR 0) form set O. Pick all samples in set O and 5% of samples in set P as the training set. The remaining 95% of samples in set P constitute the test set.
On 3 training and testing sets mentioned above, we construct the MAML model first, and then add some features on it step by step. We add attention components and complex numbers separately and together. From the results we observe that in CAMEL, all the features added on the original MAML model help improve the classification accuracy.
|Accurancy||SNR 0||SNR = 0||P-O set|
In this paper, we have proposed a complex domain attentional meta-learning framework for signal recognition named CAMEL. CAMEL utilizes complex-valued neural networks and attention to provide prior knowledge, i.e., complex domain and temporal information, which helps CAMEL improve performance and prevent overfitting. As two byproducts of CAMEL, we have designed the complex-valued meta-learning and complex-valued attention, which can be of independent interest. With second-order information, CAMEL is able to find first-order stationary points of general nonconvex problems. Furthermore, CAMEL has achieved the state-of-the-art results on extensive datasets. Finally, the ablation studies in three scenarios have demonstrated the effectiveness of the components of CAMEL.
- Neural relational inference with fast modular meta-learning. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Cited by: §1.
- Layer normalization. arXiv preprint arXiv:1607.06450. Cited by: §3.2.
Meta-learning with adaptive hyperparameters. In Advances in Neural Information Processing Systems, Vol. 33, pp. 20755–20765. External Links: Cited by: Appendix G, §1.
- Metareg: towards domain generalization using meta-regularization. Advances in Neural Information Processing Systems 31, pp. 998–1008. Cited by: §1.
- Differentiable meta-learning of bandit policies. In Advances in Neural Information Processing Systems, Vol. 33, pp. 2122–2134. External Links: Cited by: §1.
- Modular meta-learning with shrinkage. In Advances in Neural Information Processing Systems, Vol. 33, pp. 2858–2869. External Links: Cited by: Appendix G, §1.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: §1.
- A meta-learning approach to (re)discover plasticity rules that carve a desired function into a neural network. In Advances in Neural Information Processing Systems, Vol. 33, pp. 16398–16408. External Links: Cited by: §1.
- MetaInit: initializing learning by learning to initialize. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. . External Links: Cited by: §5.2.
- The advantage of conditional meta-learning for biased regularization and fine tuning. In Advances in Neural Information Processing Systems, Vol. 33, pp. 964–974. External Links: Cited by: §1.
SSRCNN: a semi-supervised learning framework for signal recognition. IEEE Transactions on Cognitive Communications and Networking (), pp. 1–1. External Links: Cited by: §1.
- SR2CNN: zero-shot learning for signal recognition. IEEE Transactions on Signal Processing 69 (), pp. 2316–2329. External Links: Cited by: §5.
On the convergence theory of gradient-based model-agnostic meta-learning algorithms.
International Conference on Artificial Intelligence and Statistics, pp. 1082–1092. Cited by: §2, §4, Lemma 5, Proof 5.
Model-agnostic meta-learning for fast adaptation of deep networks.
International Conference on Machine Learning, pp. 1126–1135. Cited by: §1, Table 1, Table 2.
- Probabilistic model-agnostic meta-learning. In Advances in Neural Information Processing Systems, Vol. 31. External Links: Cited by: §1.
- Adversarially robust few-shot learning: a meta-learning approach. In Advances in Neural Information Processing Systems, Vol. 33, pp. 17886–17895. External Links: Cited by: Appendix G, §1.
- Deep learning. Vol. 1, MIT press Cambridge. Cited by: §3.2.
- Continuous meta-learning without tasks. In Advances in Neural Information Processing Systems, Vol. 33, pp. 17571–17581. External Links: Cited by: Appendix G, §1.
- Complex-valued neural networks. Vol. 400, Springer Science & Business Media. Cited by: §1, Assumption 2.
- Complex-valued matrix derivatives: with applications in signal processing and communications. Cambridge University Press. Cited by: §3.2, Assumption 2, Proof 5.
- Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §1.
- Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pp. 448–456. Cited by: §3.2.
- Learning the morphology of brain signals using alpha-stable convolutional sparse coding. In Advances in Neural Information Processing Systems, Vol. 30. External Links: Cited by: §1.
- Reconciling meta-learning and continual learning with online mixtures of tasks. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Cited by: Appendix G, §1.
- Convergence of meta-learning with task-specific adaptation over partial parameters. In Advances in Neural Information Processing Systems, Vol. 33, pp. 11490–11500. External Links: Cited by: §1.
- Unsupervised meta-learning for few-shot image classification. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Cited by: Appendix G, §1.
- Adaptive gradient-based meta-learning methods. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Cited by: Appendix G, §1.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.1.
Robust meta-learning for mixed linear regression with small batches. In Advances in Neural Information Processing Systems, Vol. 33, pp. 4683–4696. External Links: Cited by: §1.
- Self-supervised generalisation with meta auxiliary learning. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Cited by: Appendix G, §1.
- A simple neural attentive meta-learner. In International Conference on Learning Representations, External Links: Cited by: §5.2, Table 1, Table 2.
- On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999. Cited by: §5.2, Table 1, Table 2.
- Convolutional radio modulation recognition networks. In International Conference on Engineering Applications of Neural Networks, pp. 213–226. Cited by: §1, §5.
- Pytorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703. Cited by: §5.1.
- Meta-learning with implicit gradients. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Cited by: Appendix G, §1.
- MetaSDF: meta-learning signed distance functions. In Advances in Neural Information Processing Systems, Vol. 33, pp. 10136–10147. External Links: Cited by: §1.
- Graph signal processing approach to qsar/qspr model learning of compounds. IEEE Transactions on Pattern Analysis and Machine Intelligence (), pp. 1–1. External Links: Cited by: §1.
- Deep complex networks. In 6th International Conference on Learning Representations (ICLR), External Links: Cited by: §1.
- Deep complex networks. In International Conference on Learning Representations, External Links: Cited by: §5.2.
- Complex-valued networks for automatic modulation classification. IEEE Transactions on Vehicular Technology 69 (9), pp. 10085–10089. Cited by: §1.
- Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30, pp. . External Links: Cited by: §1.
- Meta learning with relational information for short sequences. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Cited by: Appendix G, §1.
- Complex transformer: a framework for modeling complex-valued sequence. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4232–4236. Cited by: §5.2, Table 1, Table 2, footnote 1.
- Online structured meta-learning. In Advances in Neural Information Processing Systems, Vol. 33, pp. 6779–6790. External Links: Cited by: Appendix G, §1.
- Bayesian model-agnostic meta-learning. In Advances in Neural Information Processing Systems, pp. 7343–7353. Cited by: §1.
Convergence analysis of fully complex backpropagation algorithm based on wirtinger calculus. Cognitive Neurodynamics 8 (3), pp. 261–266. Cited by: Assumption 3.
- MetaGAN: an adversarial approach to few-shot learning.. Advances in Neural Information Processing Systems 2, pp. 8. Cited by: §1.
- A complex-valued projection neural network for constrained optimization of real functions in complex variables. IEEE Transactions on Neural Networks and Learning Systems 26 (12), pp. 3227–3238. Cited by: Assumption 2, Assumption 3.
- Efficient meta learning via minibatch proximal update. In Advances in Neural Information Processing Systems, Vol. 32, pp. . External Links: Cited by: Appendix G, §1.
- No-regret non-convex online meta-learning. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3942–3946. Cited by: §1.
Appendix A Proof of Lemma 1
If a function is complex analytic, the time complexity of the derivative of in IQCVNNs are twice that of the complex derivative of in CDCVNNs.
We consider two scenarios, the derivative of simple analytic function and composite analytic function.
1. For a simple analytic function , in CDCVNNs, the complex derivative of with respect to a complex vector is equal to . While IQCVNNs considers and , therefore
Thus, in this scenario, the time complexity of the derivative of in IQCVNNs are twice that of the complex derivative of in CDCVNNs.
2. For a composite analytic function where is also complex analytic, in CDCVNNs, the complex derivative of with respect to a complex vector can be computed according to the complex chain rule.
Owing to the fact that is complex analytic, is equal to zero. So, (19) can be simplified to
where and . Hence, the time complexity of the complex derivative of in CDCVNNs is . However, in IQCVNNs, we have
where the size of each above tensor in (
To sum up, the Lemma 1 is established in both two scenarios. This completes the proof.
where the size of each above tensor in (21) is . Hence, the time complexity of the derivative of in IQCVNNs is . Note that the composite function of layers, which can be seen as composite functions of two layers calculated serially. As a result, in CDCVNNs, the time complexity of the complex derivative of is , while in IQCVNNs, the time complexity of the derivative of is . Hence, the Lemma holds in the scenario of the derivative of composite analytic function.
Appendix B Proof of Lemma 2
The complex-valued convolutional layer and complex-valued fully connected layer is complex analytic.
It is obviously that the complex-valued convolution layer and complex-valued fully connected layer are linear and continuous. Assume that a linear function is continuous with respect to a complex vector , then we can obtain
In a similar way,
According to the function is continuous and satisfies the Cauchy-Riemann equations, the linear function is complex analytic. Hence, the complex-valued convolution layer and complex-valued fully connected layer are complex analytic. This completes the proof.
Appendix C Proof of Lemma 3
, is analytic if and only if is a constant function.
Assume a function is analytic. Then has to satisfy the Cauchy-Riemann equations:
where is the complex input vector. Since the partial derivatives of are all equal to 0, is a constant function. This completes the proof.
Appendix D Definition Recall
In this section, we recall the definitions of complex derivative, analytic function, and the Cauchy-Riemann equations.
Complex derivative Let , where . If is continuous at a point , we can define its complex derivative as:
This is similar to the definition of the derivative for the function of a real variable. In the real case, the existence of the derivative implies that the limits of both exist and are equal when the point converges to from both the left and right directions. However, in the complex case, it means that the limits of exist and are equal when the point converges to from any directions in the complex plane. If a function satisfies this property at a point , we say that the function is complex-differentiable at .
Analytic function If a function is complex-differentiable for all points in some domain , then is said to be analytic, i.e., is a complex analytic function also known as holomorphic function, in .
The Cauchy-Riemann equations
The Cauchy-Riemann equations are a pair of real partial differential equations, and their complex analytic function needs to satisfy:
where and denote the real and imaginary parts of the complex number, respectively. The necessary and sufficient condition for to be complex analytic function in is that the function is continuous and satisfies the Cauchy-Riemann equations in .
Appendix E Proof of Lemma 4
In response to complex meta-parameters , we have
According to (5), it is obviously that
Note that, since , following the definition of complex gradient vector (10), we have
where the second equality is because the output of is real, the third equality follows the complex chain rule, and the last equality is given by the definition of complex gradient vector. Next, according to inner-loop update process (3), we have
This completes the proof.
Appendix F Complex-valued Neural Networks
Neural networks require back-propagation to update their parameters via first-order derivatives, as do complex-valued neural networks. We would prefer the functions in complex-valued neural networks to be analytic. Define and as the complex input vector and complex bias vector for each function, respectively.
Complex-valued convolutional layer The complex-valued convolutional layer implements the convolution operation on complex input signals. Define as the complex convolution kernel. Given , , and , since the complex-valued convolutional layer is linear, we are able to compute the real and imaginary parts of its outputs separately as the following
where denotes the convolution operation in real case.
Complex-valued fully connected layer The complex-valued fully connected layer achieves the linear transformation of complex inputs. Define as the complex weight matrix. Given , , and , the real and imaginary parts of the outputs of complex-valued fully connected layer can be computed as:
Similarly, the complex-valued fully connected layer can be expressed as:
Appendix G Related work
Recently, meta-learning has demonstrated promising performance in many fields. Khodadadeh et al. Khodadadeh et al. (2019) proposed an unsupervised algorithm for model-independent meta-learning for classification tasks. The work Liu et al. (2019) proposed a new method that automatically learns appropriate labels for auxiliary tasks. The work Xie et al. (2019) proposed a new meta-learning method to learn heterogeneous point process models from short event sequence data and relational networks. In addition, the work Jerfel et al. (2019)
proposed a Dirichlet process mixture for hierarchical Bayesian models with the parameters of arbitrary parametric models. Khodak et al.Khodak et al. (2019) built a theoretical framework for the design and understanding of practical meta-learning methods. The authors in Zhou et al. (2019) proposed a meta-learning method based on minibatch proximal update for learning effective hypothesis transfer.
Moreover, the work Rajeswaran et al. (2019) proposed an implicit MAML algorithm which relies only on the solution to the inner level optimization. The work Chen et al. (2020) a meta-learning approach that avoids the need for this often sub-optimal hand-selection. The work Yao et al. (2020) proposed an online structured meta-learning framework. Additionally, the authors in Baik et al. (2020) proposed a new weight update rule that greatly enhances the fast adaptation process. The work Harrison et al. (2020) proposed a meta-learning approach via online changepoint analysis to augment with a differentiable Bayesian changepoint detection scheme. The work Goldblum et al. (2020) proposed an adversarial querying algorithm for generating adversarially robust meta-learners and thoroughly investigated the causes for adversarial vulnerability.
Appendix H Convergence Analysis
For ease of writing and derivation, in our notation, represents the loss function on the task , represents the loss on the task after the inner-loop update process, and represents the meta-objective. By drawing task from task probability distribution , our optimization problem can be rewritten as
A random vector is called an -approximate first order stationary point for problem 33 if it satisfies .
Then, we formally state our assumptions as below.
is bounded below, and