1 Introduction
Finding paths is a basic problem in graph theory [DBLP:books/daglib/0030488] and several variants have been studied, including finding a shortest path between two vertices and finding a longest path in a graph. Recently, these problems have been considered for realworld data that need a description of the vertex properties and dynamics of the relations [DBLP:journals/bigdata/ThejaswiGL20]. For these data, a richer representation with respect to the classical graph model has to be introduced, for example by associating labels or colors with vertices and by representing the evolution of relations with a temporal graph. In this latter model, edges are associated with timestamps to represent when an interaction occurred [holme2015modern].
In this paper we consider a problem that looks for a path in a temporal graph that has vertices associated with colors. Given a set of colors, the problem asks for a temporal path having vertices with distinct colors and including the maximum number of colors. A temporal path in a temporal graph is a path in which the timestamps of consecutive edges are strictly increasing, thus representing a path that does not violate the time constraint specified by the timestamps of the edges. The problem we consider is a variant of the one considered in [DBLP:journals/bigdata/ThejaswiGL20], that asks for a temporal path that exactly matches a multiset of colors (called motif in [DBLP:journals/bigdata/ThejaswiGL20]). As outlined in [DBLP:journals/bigdata/ThejaswiGL20], this problem has several applications, for example in tour recommendations [DBLP:conf/ht/ChoudhuryFAGLY10, DBLP:conf/wsdm/GionisLPT14], where vertices correspond to interesting locations, colors represent activities available in locations, edges correspond to transportation links between different locations (a timestamp is associated for example to departure time). A set (or a multiset) of colors represents activities a tourist may be interested into and a path associated with different colors is then a suggestion of the activities that can be carried out respecting the time constraints. A temporal graph, due to its structure, may not contain a temporal path that includes all the colors. Thus a natural direction that we consider in this paper is to look for a temporal path that includes the maximum number of colors.
Related Works.
Given a (static) vertexcolored graph, the problem of finding a colored path whose vertices have distinct colors and that includes the maximum number of colors (called tropical path) has been recently investigated in [TropicalPath]. In [TropicalPath], it is shown that the problem is not approximable, unless P = NP, within constant factor as the Longest Path problem, while hardness results or polynomialtime algorithms are given for several graph classes (bipartite chain graphs, threshold graphs, trees, block graphs, and proper interval graphs). A related problem on static graphs is that of finding a path whose vertices contains all the colors in a set and the vertices in the path are all colored distinctly [DBLP:journals/jacm/AlonYZ95, DBLP:journals/tcs/KowalikL16].
Several variants of the problem of finding a temporal path in a temporal vertexcolored graph that matches a given multiset of colors (called motif) have been introduced in [DBLP:journals/bigdata/ThejaswiGL20]. It is shown in [DBLP:journals/bigdata/ThejaswiGL20] that these variants of the problem are NPcomplete, but fixedparameter tractable when parameterized by the size of the motif.
Several problems related to finding paths in a temporal graph have been considered [DBLP:journals/pvldb/WuCHKLX14, DBLP:journals/tkde/WuCKHHW16]. A notable example is that of checking whether there exists a temporal path with waiting time constraint, a problem that has been shown to be NPcomplete [DBLP:conf/isaac/CasteigtsHMZ20]. A similar problem is the temporal graph exploration [DBLP:journals/jcss/Erlebach0K21] that asks for a temporal walk that, starting at a given vertex, visits all vertices of a graph with the smallest arrival time. Other related problems ask for the deletion of vertices so that temporal paths connecting pairs of vertices are removed [DBLP:journals/jcss/ZschocheFMN20]. Some recent contributions have investigated the computational complexity of exploring a temporal graph when the underlying graph is a star and finding an eulerian walk in a temporal graph [DBLP:journals/jcss/AkridaMSR21, DBLP:conf/iwoca/BumpusM21, DBLP:conf/iwoca/MarinoS21].
Our Contribution.
In this paper, given a temporal vertexcolored graph, we consider the problem of finding a temporal path whose vertices have distinct colors and that includes the maximum number of colors (a problem called Max CPTG). First, we study the approximation complexity of the Max CPTG problem and we show in Section 3 that it is not approximable within factor , unless . Notice that the corresponding problem on static graphs (finding a tropical path) is only known to be not approximable with constant factor, unless [TropicalPath].
In Section 4 we present a heuristic for Max CPTG, as our aim is to design a method that is applicable even for a large number of colors. Notice that the methods proposed in [DBLP:journals/bigdata/ThejaswiGL20] are for different variants of the problem, where all the colors of the motif have to be included in a solution. Moreover, the methods proposed in [DBLP:journals/bigdata/ThejaswiGL20] are fixedparameter algorithms, where the parameter is the size of the motif, hence the running time of these latter algorithms is exponential in the size of the motif, leading to methods that are able to process motifs of moderate size (up to colors are considered in [DBLP:journals/bigdata/ThejaswiGL20]). On the other hand, we have to point out that the methods in [DBLP:journals/bigdata/ThejaswiGL20] compute exact solutions, while our method is only a heuristic. In Section 5, we present an experimental evaluation of our heuristic, both on synthetic and realworld graphs. We start in Section 2 by introducing some definitions and by defining the problem we are interested into. Some proofs are omitted due to space constraints (marked by ).
2 Preliminaries
We start this section by introducing the definition of discrete time domain over which is defined a temporal graph.
A discrete time domain is a sequence of timestamp , , where each is an integer and . An interval over , where and , is the sequence of timestamps such that .
Two intervals and are disjoint if they do not share any timestamp, that is or . The concatenation of and is an interval obtained by merging the two time intervals and , that is, assuming without loss of generality that , Given a set of pairwise disjoint intervals , , …, , where , we can define the concatenation of these intervals:
We present now the definition of temporal graph. We assume that the vertex set is not changing on the time domain, that is the vertex set is identical in each timestamp.
A temporal graph consists of

A set of vertices

A time domain

A set of temporal edges, where a temporal edge of is a triple , with and .
denotes the set of active edges at timestamp , that is:
Now, we introduce the definition of temporal path.
Given a temporal graph , a temporal path in is an alternating sequence of vertices and temporal edges such that:

, , , are distinct vertices

For each , with , , with

For each , with , it holds .
Vertices and in are the start and end vertex of . The length of , denoted by , is the number of vertices in . We refer to Point 3 of Definition 2 as the time constraint of a temporal path.
A vertexcolored temporal graph is defined by adding a coloring to the vertices of a temporal graph.
is a vertexcolored temporal graph, where is a temporal graph and is a function that assigns a color from set to each vertex in .
We can now define the concept of colorful set of vertices.
Given a vertexcolored temporal graph , a set is colorful if all the vertices in have distinct colors.
A temporal path in a vertexcolored temporal graph is colorful if all its vertices have distinct colors. Now, we are able to define the problem we are interested into.
Maximum Colorful Path in Temporal Graph (Max CPTG)
Input: A vertexcolored temporal colored graph .
Output: A colorful temporal path in
that includes the maximum number of colors
(that is it has maximum length).
3 Inapproximability of Max CPTG
In this section we prove that the Max CPTG problem cannot be approximated within factor , unless . We prove this result by giving an approximation preserving reduction from the Maximum Independent Set problem (denoted by Max IS). For details on approximation preserving reductions see [DBLP:books/daglib/0030297]. The Max IS problem, given a graph , where and , asks for an independent set of maximum size (we recall that is an independent set if for , it holds that ).
Next, we describe our approximation preserving reduction from Max IS to Max CPTG. Given an instance of Max IS, we define a corresponding vertexcolored temporal graph , which is an instance of Max CPTG (an overview of is given in Fig. 1).
For each , , contains a set of vertices: Furthermore, contains an additional set of vertices The vertex set of is defined as follows: .
The time domain consists of the concatenation of time intervals , where each , , is associated with vertex set . The idea is that only edges connecting vertices of are active in interval , except for the last timestamp. The interval , , is defined as follows:
Notice, for example, that and and so on. By construction, the intervals , , are disjoint. The time domain is then the concatenation of intervals , that is
The color function , is defined over the following set of colors:
Essentially, each color encodes an edge , with , each color , , encodes the fact that is not adjacent to vertex . Notice that .
Now, we define the function . For the vertices in , with , is defined as follows:


, if and

, if and

, , if
Notice that , for each with , as we assume that does not contain self loops.
For the vertices of , the function is defined as follows: , .
Next, we define the set of temporal edges of . For each time interval , , contains a colorful temporal path induced by the vertices with . The temporal edges active in interval are defined as follows. At timestamp , with , ; notice that is the only active temporal edge of at timestamp .
The temporal path resulting from these edge is then:
Notice that, since by construction two intervals , , , are disjoint, the colorful temporal paths , are active in disjoint intervals.
The set contains also temporal edges defined to connect temporal colorful paths , . At timestamp , , the following temporal edges belong to :

, with , such that edge

This completes the definition of the vertexcolored temporal graph . We prove now a property of .
Let be an instance of Max IS and let be the corresponding instance of Max CPTG. Then:

Each temporal path , with , is colorful

The vertices in temporal paths , , with and , have different colors.
Now, we show how to construct in polynomial time a solution of Max CPTG from a solution of Max IS.
Let be an instance of Max IS and let be the corresponding instance of Max CPTG. Given a solution of Max IS, we can construct in polynomial time a solution of Max CPTG of length at least . Consider an independent set of , where . Then define a solution of Max CPTG as follows. The temporal path includes the colored temporal paths in interval , , and the temporal colored path in interval . These colored paths are connected in by the following temporal edges: and , are connected by temporal edge with ; and are connected by temporal edge with .
Since , with , are not adjacent in , it follows from Lemma 3 that the vertices in and do not share any color. Since each vertex in has a color distinct from the other vertices in , it follows that is colorful. Furthermore, notice that by construction, since the paths , , are defined over disjoint intervals, is a temporal path. Finally, notice that consists of paths each of length , thus concluding the proof.
A solution of Max IS can be computed in polynomial time starting from a solution of Max CPTG.
Let be an instance of Max IS and let be the corresponding instance of Max CPTG. Given a solution of Max CPTG of length , we can construct in polynomial time an independent set of of size at least .
The inapproximability of Max CPTG follow from Lemma 3, Lemma 3 and from the inapproximability of Max IS [DBLP:conf/stoc/Zuckerman06].
Max CPTG is not approximable within factor unless P = NP.
4 A Heuristic for Max CPTG
In this section, we present our efficient heuristic, called Colorful Temporal Path Local Search (CTPLS), for Max CPTG problem. CTPLS consists of two phases: (1) A greedy preliminary step that computes an initial solution, (2) A local search step that looks for a possible improvement of the solution.
We start by describing the preliminary greedy step. Given a vertexcolored temporal graph , first the step computes a segmentation of the time domain in disjoint intervals of equal length. Then it greedily looks for a temporal edge to be added to the path computed so far in each interval. The path is initialized as a temporal edge in the first interval. In the next intervals, the greedy step looks for a temporal edge that connects the last vertex of to a vertex that is not included in .
Then CTPLS applies a localsearch strategy, consisting of two different possible modifications of (unless contains all the colors).

LS1 (Edge replacement): starting from the first edge of , a temporal edge is possibly replaced with two temporal edges , , with ; notice that vertex is not already in and it must be colored differently from the vertices already in ; furthermore, all the temporal edges of the new path must satisfy the time constraint.

LS2 (Vertex replacement): starting from the first vertex in the solution, it possibly replaces a vertex in and the temporal edges incident in , with two vertices and and three temporal edges so that the new path satisfies the time constraint. Notice that and must not be in and must have different colors from the vertices of (except for the replaced vertex ).
5 Experimental Results
In this section, we present an experimental evaluation of CTPLS on synthetic and real networks. The CTPLS heuristic described in Section 4 is implemented in Python 3.7 using the NetworkX package for managing networks [hagberg2008exploring]. We perform the experiments on MacBookPro (OS version 11.4) with processor 2.9 GHzIntel Core i5 and 8GB 2133 MHz LPDDR3 of RAM.
Synthetic Networks.
In the first part of our experimental evaluation, we analyse the performance of CTPLS on synthetic datasets. We start by describing the synthetic datasets, then we discuss the results of CTPLS.
Datasets. Each synthetic graph is built as follows. First, we generate a temporal graph consisting of vertices over timestamps, such that the topology of the graph is based on one of the following models: ErdösRenyi (ER) with parameter , ErdösRenyi with parameter and BarabasiAlbert (BA) with parameter equal to . vertices of the graph are then chosen randomly, assigned a distinct colors and it is defined a temporal path that connects them. This ensures that each synthetic graph contains an optimal solution including all the colors in , thus allowing to compare the solutions returned by CTPLS with an optimal one. Then each of the remaining vertices of the graph is assigned uniformly random colors from . We consider the following sizes of : 10, 20, 30 and 50 colors. For each graph model and for each size of considered, we generated 20 independent synthetic graphs.
Outcome. We present in Table 1
the results of our experimental evaluation on the synthetic datasets. In particular, we report the minimum, maximum, average and standard deviation of the returned solutions of CTPLS over
instances for each color set and each graph model. Furthermore, we report the average running time (in seconds).As reported in Table 1, the performances of CTPLS degrade with the increasing of the number of colors. For the BAbased graphs, for example, for a set of colors the returned solutions contains on average at least of the colors in , for colors the average number of colors contained in the returned solutions is out of . The experimental results show also that the performances of CTPLS depend on the specific graph models. For the ER model with , the solutions returned are within of the optimal solutions (for colors). The performances are worse on ER with , within of the optimal solution (for colors). For the BA model, the solutions returned by CTPLS are close to the optimum only for the case of colors (within of the optimal solution) and are on average , and for , and colors, respectively. It has to be pointed out that the Max CPTG problem is hard to approximate, as shown in Section 3, so it is not surprising that for some datasets the lengths of the solutions returned by CTPLS are not close to the optimum.
The method is always fast on synthetic datasets, requiring at most seconds average running time (ER model with and colors).
color 10  color 20  color 30  color 50  
BA  path  time  path  time  path  time  path  time 
Min  8    10    12    14   
Max  10    16    17    21   
Average  9.1  0.06  13.1  0.08  14.25  0.11  17.2  0.13 
SD  0.79    1.41    1.65    2.28   
color 10  color 20  color 30  color 50  
ER  path  time  path  time  path  time  path  time 
Min  9    11    9    5   
Max  10    19    25    30   
Average  9.85  0.09  16.75  0.15  18.45  0.14  14.3  0.11 
SD  0.37    1.86    4.67    8.35   
color 10  color 20  color 30  color 50  
ER  path  time  path  time  path  time  path  time 
Min  10    19    25    38   
Max  10    20    30    46   
Average  10  0.24  19.8  0.35  28.3  0.66  42.4  0.68 
SD  0    0.41    1.17    1.82   
Real Networks.
In the second part of our experimental evaluation, we analyse the performance of CTPLS on four realworld datasets.
Datasets. We consider four different realworld temporal graphs taken from SNAP [snapnets] for testing CTPLS: College messages^{1}^{1}1http://snap.stanford.edu/data/CollegeMsg.html (CollegeMsg), Email EU core^{2}^{2}2http://snap.stanford.edu/data/emailEucoretemporal.html (emailEucoretemporal), Bitcoin alpha^{3}^{3}3http://snap.stanford.edu/data/socsignbitcoinalpha.html (socsignbitcoinalpha) and Bitcoin otc^{4}^{4}4http://snap.stanford.edu/data/socsignbitcoinotc.html (socsignbitcoinotc). These temporal graphs are not colored, hence, following the same approach of [DBLP:journals/bigdata/ThejaswiGL20], we assigned uniformly random colors from a set of colors and from a set of colors. We consider two variants for each of this network, since the length of an optimal solution of Max CPTG on these graph is unknown. Hence, in order to evaluate the results of CTPLS, for each realworld temporal graph we consider the original graph (denoted by NOOP) and a modified temporal graph, called YESOP, obtained by adding a temporal colorful path that contains each colors in . This latter temporal graph contains an optimal solution of length .
The first dataset, CollegeMsg, is taken from private messages sent on an online social network at the University of California, Irvine, where temporal edges represent private messages sent between users at a given time. The dataset contains 59835 temporal interactions, 1899 vertices and time domain of length . The emailEucoretemporal dataset is generated based on incoming and outgoing emails between members of a large European research institution, where temporal edges represent emails sent between users at a given time. This dataset contains 332334 temporal interactions, 986 vertices and time domain of length . socsignbitcoinalpha and socsignbitcoinotc are datasets of members who trade using Bitcoin on platforms called Bitcoin Alpha and Bitcoin OTC, respectively, to prevent transactions with risky users. A temporal edge represents a rate of member given by member at time . socsignbitcoinalpha contains 24186 temporal interactions, 3783 vertices and time domain of length , socsignbitcoinotc contains 35592 temporal interactions, 5881 vertices and time domain of length .
Outcome. In Table 2 we report the number of colors included in the solutions returned by CTPLS and the running time (in minutes) for the two groups of real datasets we considered (NOOP and YESOP). As shown in Table 2, for the NOOP networks with colors, CTPLS found in the worst case a path containing out of colors (socsignbitcoinalpha) and in the best case an optimal solition (emailEucoretemporal). For the other two networks, CollegeMsg and socsignbitcoinotc networks, CTPLS found suboptimal solutions that contains a significative number of colors, and colors out of , respectively.
For the YESOP networks with colors, we don’t report the result for emailEucoretemporal, as CTPLS was able to find an optimal solution for this dataset in NOOP network. The results are not significantly different from the corresponding NOOP datasets. CTPLS found in one case, the CollegeMsg, a path with the same number of colors as for the corresponding NOOP network. In one case, (socsignbitcoinotc) CTPLS found a larger number of colors ( instead of out of ), in another case (socsignbitcoinalpha) CTPLS found a slightly smaller number of colors ( instead of out of colors). This decreasing is due to the fact that CTPLS considers a temporal edge that belongs to the YESOP instance and not to the NOOP instance and this prevents CTPLS to include all the vertices of the solution of the NOOP instance.
For the NOOP networks with colors, CTPLS found in the worst case a path containing colors (socsignbitcoinalpha) and in the best case (emailEucoretemporal) out of colors. For the other two networks, CollegeMsg and socsignbitcoinotc networks, CTPLS found and colors out of , respectively. For networks with colors, CTPLS found the same number of colors in both YESOP and NOOP networks.
The experiments on realworld datasets confirm that CTPLS is able to produce suboptimal results even for networks with colors. For the networks with colors, CTPLS found solutions with at least colors compared to the optimum (socsignbitcoinalpha) and in one case an optimal solution. For the networks with larger number of colors ( colors) CTPLS found solutions with at least and at most colors compared to the optimum. Except for (socsignbitcoinalpha), the quality of solution returned by CTPLS starts slowly to degrade going from colors to colors. However, this deterioration is less pronounced than in synthetic datasets.
As for the running time, CTPLS is able to find a solution of Max CPTG in reasonable time, even for a set of colors (notice that this value is larger than what has been considered in [DBLP:journals/bigdata/ThejaswiGL20]). The running time varies considerably depending on the size of the temporal network and, in particular, on the length of the time domain. CTPLS indeed has highest running time on CollegeMsg and emailEucoretemporal whose time domain consists respectively of and timestamps. On the other hand, CTPLS requires at most minutes on socsignbitcoinalpha (NPOP, colors), which has the smallest time domain (1647 timestamps).
color 30  color 50  

NOOP  path  time  path  time 
CollegeMsg  27  144.71  38  27.91 
emailEucoretemporal  30  52.60  49  129.05 
socsignbitcoinalpha  20  0.34  36  0.51 
socsignbitcoinotc  25  10.98  40  10.35 
color 30  color 50  

YESOP  path  time  path  time 
CollegeMsg  27  149.68  38  29.09 
emailEucoretemporal      49  148.33 
socsignbitcoinalpha  19  0.16  36  0.22 
socsignbitcoinotc  27  6.34  40  9.82 
6 Conclusion
In this paper, we have introduced a problem called Max CPTG for finding a colorful temporal path of maximum length in a vertexcolored temporal graph. We have studied the approximation complexity of the problem and we have provided an inapproximability lower bound. Then we have presented a heuristic (CTPLS) based on a greedy preliminary step and local search. We have provided an experimental evaluation, both on synthetic and realworld graphs. The experimental results on synthetic datasets have shown that CTPLS returns near optimal solutions for a set of 10 colors, while the performance degrades when the number of colors increases. On the realworld datasets, the algorithm in many cases is able to find suboptimal results in reasonable time, even for networks with 50 colors, despite the fact that Max CPTG is hard to approximate.
Future works include the application of CTPLS to larger temporal networks. It would also be interesting to consider whether it is possible to apply the algebraic approach proposed in [DBLP:journals/bigdata/ThejaswiGL20] to the Max CPTG problem and compare its performance with CTPLS.
References
Appendix
Proof of Lemma 3
Let be an instance of Max IS and let be the corresponding instance of Max CPTG. Then:

Each temporal path , with , is colorful

The vertices in temporal paths , , with and , have different colors.
1. The property follows from the fact that each vertex of , , is associated with a distinct color.
2. By definition of coloring , since , it follows that and and, by construction, . Since by construction all the other vertices of and have different colors, it follows that the lemma holds.
Proof of Lemma 3
Let be an instance of Max IS and let be the corresponding instance of Max CPTG. Given a solution of Max CPTG of length , we can construct in polynomial time an independent set of of size at least . Given a colorful temporal path in of length , consider the temporal paths , with , in . We claim that the last vertex of is .
If the last vertex of is , for some with , we can compute a path such that by adding the temporal path that starts from vertex and ends in . By construction, is a temporal path, as this modification does not violate the time constraint. Furthermore, is colorful, since is colorful and each vertex , is assigned a color distinct from the other vertices of . Thus is colorful.
If the last vertex of is , similarly as the previous case, we can compute a temporal colorful path , with , by adding the path to .
If the last vertex of is , for some with and some with , we can compute a path with , as follows: (1) we remove from the temporal path connecting vertices , with , and (2) we add the colorful path . Notice that , since contains vertices. Notice that it is always possible to add path , since there is a temporal edge , where the last temporal edge in path is , hence the modification does not violate the time constraint. Since each vertex , , has color distinct from the other vertices of , it follows that is colorful.
We claim now that contains at least temporal paths , . Assume that this is not the case, it follows that by construction the temporal colorful path contains less than paths , with and path . Since by construction , with , it follows that .
Now, consider two paths , in , with . Since is colorful, it follows that the vertices of and are associated with different colors Then , otherwise the two vertices and in , , respectively, are both assigned the same color . It follows that we can define an independent set of as follows:
Notice that since , it follows that is an independent set of size at least , thus concluding the proof.
Proof of Theorem 3
Max CPTG is not approximable within factor unless P = NP. We show now that the reduction we have described is indeed an approximation preserving reduction. Denote the value of an optimal solution of Max CPTG (Max IS, respectively) by (, respectively); denote the value of an approximate solution of Max CPTG (Max IS, respectively) by (, respectively). Next, consider the approximation factor of Max CPTG, that is
By Lemma 3, it follows that . Thus
By Lemma 3, given an approximated solution of of size , we can compute in polynomial time a solution of Max IS of size at least . It follows that . Thus
Since we can assume that it follows that
Since is not approximable within factor , for any unless P = NP [DBLP:conf/stoc/Zuckerman06], it follows that
By construction, , hence we have that
thus concluding the proof.