1 Introduction
Many realworld scenarios exist where the time series can effectively be complemented with external knowledge. One such scenario represents information about the trade between countries in the form of a temporally evolving knowledge graph. The information about trade between the countries affects the corresponding currency exchange rate. Using information about trade, we want to better predict the currency exchange rate with high accuracy.
Time Series forecasting deals with the prediction of the data points of the sequence at a future timestamp based on the data available up till the current timestamp. Methods such as AutoRegressive Integrated Moving Average (ARIMA) model and Kalman filtering
[16, 15] are popular for predicting time series. Representational learning on graphstructured data [9, 25, 12, 27] is a widely researched field with considerable focus on temporallyevolving graphs [24, 18, 13, 7]. The increasing amount of complex data that can be effectively represented using dynamic multirelational graphs [17] has led to this increased focus on dynamic graph modeling. Several methods such as ConvE [5], RGCN [21], and DistMult [26] have shown admirable results on modeling static, multirelational graph data for link prediction. There are other approaches that attempt to model dynamic knowledge graphs by incorporating temporal information, and these include KnowEvolve [23], HyTE [4], and TADistMult [6], among others.The two aforementioned fields of time series prediction and representation learning on graphs have mainly been separated in the machine learning community. Recently some work has been done in integrating the two fields [14], which describes a method to incorporate a static, unirelational graph for traffic flow prediction. However, this method is only limited to static graphs with a single relation. To date, no method has been proposed for integrating temporally evolving graphs and time series prediction. In this paper, we propose a new method for exploiting the information from the dynamic graphs for time series prediction. We propose the use of static learnable embedding to capture the spatial information from knowledge graphs and a dynamic embedding to capture the dynamics of the time series and the evolving graph.
We present the firstever solution to the problem of timeseries prediction with temporal knowledge graphs (TKG). Since, to the best of our knowledge, currently no datasets exist which align with the problem statement, we prepared five suitable datasets through web scraping^{3}^{3}3We will release the dataset upon acceptance for future work. and evaluate our model. We show that our approach beats the current stateoftheart methods for time series forecasting on all the five datasets. Our approach also predicts the time series by any number of time steps and does not require a test time graph structure for evaluation. We release the code of model DArtNet for future research^{4}^{4}4https://github.com/INKUSC/DArtNet.
2 Related Work
We review work using static graphs for time series prediction and work on temporal knowledge graphs.
Timeseries Prediction.
In addition to the general timeseries prediction task, there have been some recent studies on the spatialtemporal forecasting problem. Diffusion Convolutional Recurrent Neural Network (DCRNN)
[14] is a method which incorporates a static, unirelational graph for time series (traffic flow) forecasting. Traffic flow is modeled as a diffusion process on a directed graph. The method makes use of bidirectional random walks on the graph to capture the spatial dependency and uses an encoderdecoder framework with scheduled sampling for incorporating the temporal dependence. However, this method cannot be extended to temporally evolving graphs as well as multirelational graphs. Another paper on Relational Time Series Forecasting [19], also formulates the problem of using dynamic graphs for time series prediction though it is not formulated for multi relational data. Neural relational inference [11] also looks at the inverse problem of predicting dynamics of graph with attribute information.Temporal Knowledge Graph Reasoning and Link Prediction. There have been several attempts on reasoning on dynamically evolving graphs. HyTE [4]
is a method for embedding knowledge graphs which views each timestamp in the graph data as a hyperplane. Each head, relation, tail triple at a particular timestamp is projected into the corresponding hyperplane. The translational distance, as defined by the TransE model
[1], of the projected embedding vectors, is minimized. TADistmult is a temporalaware version of Distmult. For a quadruple
, a predicate is constructed using which is passed into an GRU. The last hidden state of the GRU is taken as the representation of the predicate sequence (). KnowEvolve [23] models a relationship between two nodes as a multivariate point process. Learned entity embeddings are used to calculate the score for that relation, which is used to modulate the intensity function of the point process. ReNet [10] uses the neighborhood aggregation and RNN to capture the spatial and temporal information in the graph.3 Problem Formulation
A Knowledge Graph is a multirelational graph that can be represented in the form of triples where denotes the head, denotes the tail, and is the relation between the nodes and . A TKG has a time dimension as well, and the graph can be represented as quadruples in the form where denotes the timestamp at which the relation exists between the nodes and . We now introduce Dynamic Attributed Graphs, formalize our problem statement, and in later sections, present our model for making predictions on Dynamic Attributed Graphs.
Problem Definition. A Dynamic Attributed Graph (DAG) is a directed graph, where the edges are multirelational with time stamp associated with each edge known as an event, and attributes associated with the nodes for that particular time. An event in dynamic graph is represented as a quadruple where is the head entity, is the relation, is the tail entity and is the time stamp of the event. In a dynamic attributed graph, an event is represented as a hextuple where and are the attribute associated with head and tail at time . The collection of all the events at a time constitutes a dynamic graph where . The goal of the DAG Prediction problem is to learn a representation of the dynamic graph events and predict the attributes at each node at future timestamps by learning a set of functions to predict the events for the next time step. Link is predicted jointly, to aid the attribute prediction task. Formally, we want to learn a set of functions such that:
We divide the dynamic graph at any time in two sets, one consisting of only and other consisting of only the attribute values. Formally, , where and . We propose to predict and using these set of functions as follows:
We jointly predict the graph structure and the attribute values and show that the attribute values are being predicted with greater accuracy than can be done using any existing method.
4 Proposed Framework : DArtNet
We now present our framework for learning the set of functions for predicting the events for the next timestamp, given the history. We name our framework DArtNet which stands for Dynamic Attributed Network.
We model the changing graph structure and attribute values by learning an entityspecific representation. We define the representation of an entity in the graph as a combination of a static learnable embedding which does not change with time and represent static characteristics of each node, and a dynamic embedding which depends on the attribute value at that time and represent dynamically evolving property of each node. We then aggregate the information using the mean of neighborhood entities. For every entity in the graph, we model history encoding using a Recurrent Neural Network. Finally, we use a fully connected network for attribute prediction and link prediction tasks.
4.1 Representation Learning on Events
The main component of our framework is to learn a representation over the events which will be used for predicting future events. We learn a headspecific representation and then model its history using Recurrent Neural Network (RNN). Let represent the events associated with head at time , i.e., . For each entity in the graph, we decompose the information in two parts, static information and dynamic information. Static information does not change over time and represents the inherent information for the entity. Dynamic information changes over time. It represents the information that is affected by all the external variables for the entity. For every entity in the graph at time , we construct an embedding for the entity which consists of two components:

Static (Constant) learnable embedding which does not change over time.

Dynamic embedding which changes over time.
where , is the attribute of entity at time and is learnable parameter. The attribute value can be a multidimensional vector with representing multiple timeseries associated with same head. Hence the embedding of entity becomes where is the concatenation operator. For every relation (link) , we construct a learnable static embedding .
To capture the neighbourhood information of entity at time from the dynamic graph, we propose two spatial embeddings: attribute embedding and interaction embedding . captures the spatioattribute information from the neighbourhood of the entity and captures the spatiointeraction information from the neighbourhood of the entity . Mathematically, we can define the spatial embeddings as:
(1) 
(2) 
where is the cardinality of set and and are learnable parameters.
4.2 Learning Temporal Dependency
The embeddings , capture the spatial information for entity at time
. For predicting the information at future time, we need to capture the temporal dependence of the information. To keep track of the interactions and the attribute evolution over time, we model the history using Gated Recurrent Unit
[3], an RNN. For the head s, we define the encoded attribute history at time as the sequence and the encoded interaction history at time as the sequence . These sequence provide the full information about the evolution of the head till time . To represent in sequences, we model the encoded attribute and encoded interaction history for head as follows:(3)  
(4) 
where the vector captures the spatiotemporal information for the attribute evolution i.e. captures how the attribute value of the entity evolves over time with respect to the evolving graph structure, while the vector captures the spatiotemporal information of how the relation is associated with the entity over time. We show the DArtNet, its input and output in Figure 2.
Dataset  # Train  # Valid  # Test  # Nodes  # Rel  # Granularity 
AGT  463,188  57,898  57,900  58  178  Monthly 
CAC(small)  2070  388  508  90  1  Yearly 
CAC(large)  116,933  167,047  334,096  20,000  1  Yearly 
MTG  270,362  39,654  74,730  44  90  Monthly 
AGG  3,879,878  554,268  1,108,538  6,635  246  Monthly 
4.3 Prediction Functions
The main aim of the model is to be able to predict the future attribute values as well as the interaction events. To get the complete information of the event at next time step, we perform the prediction in two steps: (1) prediction of the attribute values for the whole graph and (2) prediction of the interaction events for the graph. We know . To predict , we divide it into two sets and . The attribute values of are predicted directly and modelled as follows:
(5) 
The attribute value for the entity is a function of the spatioattribute history for the entity and the static information about the entity. Attribute prediction requires graph structures, so we also predict graph structures. The probability of is modeled as:
At , we can write this probability as
In this work, we consider the case that probability of is independent of the past graphs
, and model it using uniform distribution, leading to
For predicting the interaction at future timestamp, we model the probability of the tail as follows:
(6) 
The functions and can be any function. In our experiments we use the functions as a singlelayered feedforward network.
4.4 Parameter Learning
We use multitask learning [20, 2] loss for optimizing the parameters. We minimize the attribute prediction loss and graph prediction loss jointly. The total loss , where is interaction loss, is attribute loss and
is a hyperparameter deciding the weight of both the tasks. For the attribute loss, we use the meansquared error
, where is the predicted attribute and is the ground truth attribute. For the interaction loss, we use the standard multiclass cross entropy loss, , where the is the number classes i.e. number of entities in our case.4.5 Forecasting over Time
At each inference time, DArtNet predicts future interactions and attributes based on the previous observations, i.e., . To predict interactions and attributes at time , DArtNet adopts multistep inference and predicts in a sequential manner. At each time step, we compute the probability to predict . We rank the tails predicted and choose the top tails as the predicted values. We use the predicted tails as for further inference. Also, we predict attributes, which yields . Now we have graph structure and attributes at time . We repeat this process until we get . Then we can predict interactions and attributes at time based on .
Method  ATG  CAC(small)  CAC(large)  MTG  AGG  

()  ()  ()  ()  ()  
w/o graph 
Historic Average  1.636  4.540  9.810  14.930  600.000 
VAR^{†}^{†}footnotemark: [8]  3.961  6.423  10.330  9.490  300.000  
ARIMA  1.463  4.245  9.102  2.860  51.240  
Seq2Seq model [22]  1.323  4.554  8.080  2.975  28.000  
with graph 
ConvE+LSTM (1 layer) ^{†}^{†}footnotemark: [5]  0.763  3.899  8.220  7.240  202.580 
ConvE+LSTM (2 layers)  0.728  4.321  8.440  9.460  206.640  
HyTE+LSTM (1 layer) ^{†}^{†}footnotemark: [4]  4.041  40.234  8.089  37.170  7.430  
HyTE+LSTM (2 layers)  1.531  40.885  8.230  17.410  2.070  
TADistmult+LSTM (1 layer) [6]  0.847  3.584  9.456  16.880  3.250  
TADistmult+LSTM (2 layer)  0.796  3.432  9.034  9.770  7.030  
RENet (mean)+LSTM (1 layer) [10]  0.793  4.073  9.022  5.020  203.320  
RENet (mean)+LSTM (2 layers)  0.857  3.865  8.856  4.348  200.220  
RENet (RGCN)+LSTM (1 layer)  0.620  3.718  8.998  5.170  203.120  
RENet (RGCN)+LSTM (2 layers)  0.550  3.984  8.201  12.700  201.560  
DArtNet  0.115  3.423  7.054  0.496  0.848 
5 Experiments
In this section, we evaluate our proposed method DArtNet on a temporal attribute prediction task on five datasets.The attribute prediction task is to predict future attributes for each node.
We evaluate our proposed method on two tasks: (1) predicting future attributes associated with each node on five datasets; (2) studying variations and parameter sensitivity of our proposed method. We will summarize the datasets, evaluation metrics, and baseline methods in the following sections.
5.1 Datasets
Due to the unavailability of datasets satisfying our problem statement, we curated appropriate datasets by scraping the web. We created and tested our approach on the datasets described below. Statistics of datasets are described in Table 1.
Attributed Trade graph (ATG). This dynamic graph represents the net export from one country (node) to another, where each edge belongs to an order of trade segment (in a million dollars). The monthaveraged currency exchange rate of the corresponding country in SDRs per currency unit is the time series attribute value.
CoauthorshipCitation dataset (CAC). Each edge in the graph represents the collaboration between the authors (node) of the research paper. The number of citations per year for an author is the corresponding time series attribute for the node.
Multiattributed Trade graph (MTG). This is a subset of ATG, with a multi attributed time series representing monthly Net Export Price Index and the value of International Reserves assets in millions of US dollars.
Attributed GDELT graph (AGG). Global Database of Events, Language, and Tone (GDELT) represents a different type of event in a month between entities like political leaders, organizations, and countries, etc. Here only country nodes are associated with a timeseries attribute, which is taken as the Currency Exchange Rate.
5.2 Evaluation Metrics
The aim is to predict attribute values at each node at future timestamps. For this purpose, Mean Squared Error (MSE) loss is used. The lower MSE indicates better performance.
5.3 Baseline methods
We show that the results produced by our model outperform those of the existing time series forecasting models. We compare our approach against two kinds of methods for attribute prediction.
Time series prediction without TKG
These methods do not take into account the graph data and make predictions using just the timeseries history available. We compare our model to Historic Average (HA), Vector AutoRegressive (VAR) model [8], Autoregressive Integrated Moving Average (ARIMA) model, and GRU based Seq2Seq [22]. HA makes predictions based on the weighted average of the previous time series values, ARIMA uses lagged observations for prediction and VAR predicts multiple time series simultaneously by capturing linear interdependencies among multiple time series.
Time series prediction with TKG
Node embeddings are learned using graph representational learning methods (both static and temporal) like RENet, HyTE, TADistMult, and ConvE. For each node, the attribute value at a particular timestamp is concatenated with the corresponding node embedding, and this data is passed into an GRU network for making predictions.
Hyperparameter settings
All models are implemented in PyTorch and have used Adam Optimizer for calculating gradients and training. The best hyperparameters are chosen using the validation dataset. Typically increasing value of
gives better results, and the best results on each dataset are reported.5.4 Main Results
The results for the attribute prediction on different datasets are reported in Table 2. We see that our method DArtNet outperforms every other baseline for the attribute prediction by a large margin. From the results it is clear, that the neural network based models outperform the other baselines on these complicated datasets proving their long term modeling capacity. We observe that the relational methods using graph information, generally outperform the nonrelational methods on attribute prediction. Large increase in performance is observed for more complicated datasets like MTG and AGG. This suggests that it is the right direction for research to use relational methods for attribute prediction. DArtNet outperforms other relational methods, which does not jointly train embeddings for attribute prediction and link prediction. This suggests that joint training of embeddings for attribute prediction and link prediction improves the performance on attribute prediction rather than training embeddings separately and then using it for attribute prediction.
5.5 Performance Analysis
To study the effects of changes in model parameter sharing and hyperparameter sensitivity on prediction, we perform several ablation studies for DArtNet on four datasets as AGG does not have attribute values over all nodes.

Decoupling of Attribute prediction and Interaction prediction tasks. We decouple the shared parameters and for both tasks and observe the performance. More formally, we use a different embedding for both tasks, i.e and as the parameters for link prediction and attribute prediction task respectively.

Sharing history. We study the effect of using the same history embedding for both link prediction and attribute prediction. This will help us study if similar history information is required for both the tasks. Here the does not explicitly get the related information so that we can share the weights. Hence the new equations become:
where the parameters of both the RNNs are shared.

Study of TimeDependent Information. We evaluate the performance of our model in the absence of any temporal information. Hence we do not encode any history for any task and directly predict the tails and the attribute values at a future timestamp. Hence the equations become:
Analysis on Variants of DArtNet. Figure 3 shows the variation of Attribute Loss with different variants of DArtNet proposed in Section 5.5. From Figure 3, we observe that our model outperforms the decoupled variant by a large margin for attribute prediction. This confirms the hypothesis that joint training of attribute prediction and link prediction performs better than training separately. We also see that sharing history for attribute prediction and link prediction deteriorates the results, which indicates that the history encoding information required for link prediction and attribute prediction is quite different from each other. Lastly, the timeindependent variant of our framework performs poorly. This clearly indicates that the temporal evolution information is essential for proper inference.
Sensitivity analysis of hyperparameter . We perform the sensitivity analysis of parameter , which specifies the weight given to both the tasks. We show the variation of MSE loss for attribute prediction task with . Figures 4 shows the variation of Attribute Loss with increasing . In Figure 4, we observe that Attribute value decreases with increasing . As expected, as increasing lambda favors the optimization of attribute loss while decreasing lambda favors the link prediction.
6 Conclusion and Future Work
In this paper, we propose to jointly model the attribute prediction and link prediction on a temporally evolving graph. We propose a novel framework DArtNet, which uses two recurrent neural networks to encode the history for the graph. The framework shares the parameter for the two tasks and jointly trains the two tasks using multitask learning. Through various experiments, we show that our framework is able to achieve better performance on attribute prediction than the previous methods indicating that external knowledge is useful for time series prediction. Interesting future work includes the link prediction on graph level rather than on subject and relation level in a memoryoptimized way.
References
 [1] (2013) Translating embeddings for modeling multirelational data. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), pp. 2787–2795. External Links: Link Cited by: §2.
 [2] (1997) Multitask learning. Machine Learning 28 (1), pp. 41–75. External Links: Link, Document Cited by: §4.4.

[3]
(2014)
On the properties of neural machine translation: encoderdecoder approaches
. CoRR abs/1409.1259. External Links: Link, 1409.1259 Cited by: §4.2. 
[4]
(2018OctoberNovember)
HyTE: hyperplanebased temporally aware knowledge graph embedding.
In
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
, Brussels, Belgium, pp. 2001–2011. External Links: Document Cited by: §1, §2, Table 2.  [5] (2017) Convolutional 2d knowledge graph embeddings. CoRR abs/1707.01476. External Links: Link, 1707.01476 Cited by: §1, Table 2.
 [6] (2018) Learning sequence encoders for temporal knowledge graph completion. CoRR abs/1809.03202. External Links: Link, 1809.03202 Cited by: §1, Table 2.
 [7] (2018) DynGEM: deep embedding method for dynamic graphs. CoRR abs/1805.11273. External Links: Link, 1805.11273 Cited by: §1.
 [8] (1994) Time series analysis. Princeton Univ. Press, Princeton, NJ. External Links: ISBN 0691042896, Link Cited by: Table 2, §5.3.
 [9] (2017) Representation learning on graphs: methods and applications. CoRR abs/1709.05584. External Links: Link, 1709.05584 Cited by: §1.
 [10] (2019) Recurrent event network for reasoning over temporal knowledge graphs. arXiv preprint arXiv:1904.05530. Cited by: §2, Table 2.
 [11] (2018) Neural relational inference for interacting systems. In ICML, Cited by: §2.
 [12] (2016) Semisupervised classification with graph convolutional networks. CoRR abs/1609.02907. External Links: Link, 1609.02907 Cited by: §1.
 [13] (2018) Learning dynamic embeddings from temporal interactions. CoRR abs/1812.02289. External Links: Link, 1812.02289 Cited by: §1.
 [14] (2017) Graph convolutional recurrent neural network: datadriven traffic forecasting. CoRR abs/1707.01926. External Links: Link, 1707.01926 Cited by: §1, §2.

[15]
(2013)
Shortterm traffic flow forecasting: an experimental comparison of timeseries analysis and supervised learning
. IEEE Transactions on Intelligent Transportation Systems 14, pp. 871–882. Cited by: §1.  [16] (2011) Discovering spatiotemporal causal interactions in traffic data streams. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, New York, NY, USA, pp. 1010–1018. External Links: ISBN 9781450308137, Link, Document Cited by: §1.

[17]
(2018)
Interpretable graph convolutional neural networks for inference on noisy knowledge graphs
. CoRR abs/1812.00279. External Links: Link, 1812.00279 Cited by: §1.  [18] (2019) EvolveGCN: evolving graph convolutional networks for dynamic graphs. CoRR abs/1902.10191. External Links: Link, 1902.10191 Cited by: §1.

[19]
(2018)
Relational time series forecasting.
The Knowledge Engineering Review
33, pp. e1. External Links: Document Cited by: §2.  [20] (2017) An overview of multitask learning in deep neural networks. CoRR abs/1706.05098. External Links: Link, 1706.05098 Cited by: §4.4.
 [21] (2018) Modeling relational data with graph convolutional networks. Lecture Notes in Computer Science, pp. 593–607. External Links: ISBN 9783319934174, ISSN 16113349, Link, Document Cited by: §1.
 [22] (2014) Sequence to sequence learning with neural networks. CoRR abs/1409.3215. External Links: Link, 1409.3215 Cited by: Table 2, §5.3.
 [23] (2017) Knowevolve: deep reasoning in temporal knowledge graphs. CoRR abs/1705.05742. External Links: Link, 1705.05742 Cited by: §1, §2.
 [24] (2018) Representation learning over dynamic graphs. CoRR abs/1803.04051. External Links: Link, 1803.04051 Cited by: §1.
 [25] (2019) A comprehensive survey on graph neural networks. CoRR abs/1901.00596. External Links: Link, 1901.00596 Cited by: §1.
 [26] (2014) Embedding entities and relations for learning and inference in knowledge bases. CoRR abs/1412.6575. Cited by: §1.
 [27] (2018) GraphRNN: A deep generative model for graphs. CoRR abs/1802.08773. External Links: Link, 1802.08773 Cited by: §1.
Appendix
Appendix A Datasets
Due to the unavailability of datasets satisfying our problem statement we curated appropriate datasets by scraping the web. We created and tested our approach on the datasets described below.
Attributed Trade graph (ATG). This dataset consists of a directed, multirelational, unweighted, dynamic knowledge graph with nodes representing different countries. A timestamped edge between two nodes represents the net exports between the respective countries in million dollars. To discretize the edges, the range of values of net exports is split into 200 equalsized segments resulting in 178 different types of edges. The attribute value associated with each node is the monthaveraged currency exchange rate of the corresponding country in SDRs per currency unit. The data is present in the form of a tuple where denote the head, tail and timestamp respectively. Relation exists between and at timestamp and are the attribute values of the head and tail respectively at . The graph evolves at a monthly rate.
The graph is obtained by using a script to scrape data from www.trademap.org. The exchangerate data is scraped from www.imf.org.
CoauthorshipCitation dataset (CAC). Here the knowledge graph is dynamic, unirelational, unweighted and undirected. The nodes denote authors and an edge between two nodes at a particular timestamp denotes that the corresponding authors contributed to a research paper at that time. The time granularity is a year. The attribute value associated with each node is the number of citations received by the associated author on any paper written by him/her per year. Again the data is present in the form of the tuple where the meanings of the symbols are as explained above. We used two versions of this dataset: small having 44 nodes and large having 20k nodes.
The citation dataset is curated from www.aminer.cn.
Multiattributed Trade graph (MTG). The graph in this dataset is a subset of the trade graph described above. Each node has multiple attribute values associated with it. One of the two attributes is the Net Export Price Index with individual commodities weighted by the ratio of net exports to the total commodity trade. The other is the value of International Reserves and other foreign currency assets in millions of US dollars. All two form monthly time series.
Both the time series attributes are scraped from www.imf.org.
Attributed GDELT graph (AGG). The knowledge graph, in this case, is derived from the Global Database of Events, Language, and Tone (GDELT). It is dynamic, directed, multirelational, unweighted and also has multiple types of nodes. The nodes represent entities like political leaders, organisations and several others. Each of these entities can be associated with a country. We modified this graph by adding nodes representing countries and connecting them to their respective entities through a selfdefined edge type. 245 other edge types also exist recording events. We have used this graph at the granularity level of a month. Only the country nodes are associated with a timeseries attribute which is taken as the Currency Exchange Rate (as described above) in this case.
Appendix B Experimental Settings
All are models are written in PyTorch^{8}^{8}8https://pytorch.org/. We use the Adam Optimizer for training our models with learning rate of . The Gated recurrent units are used as the RNN for all the experiments. We use only one GRU unit with hidden dimension for the experiments involving knowledge graphs while we use both one unit and two units for the baselines. The default sequence length for input to the graph is used. We experiment with various values of and we report the results of study in Section 5.5. We first train DArtNet on the training dataset. We then use the saved checkpoints from various stages of the training to obtain the results of attribute prediction on the validation data. From the validation results, we choose the best checkpoint and evaluate the test set on that checkpoint. We report the results of attribute prediction on the test data. All are models are trained on Nvidia GeForce GTX 1080 Ti.
Comments
There are no comments yet.