1 Introduction
The popularization of social media and online social networking has empowered political parties, small and large corporations, celebrities, as well as ordinary people, with a platform to build, reach and broadcast information to their own audience. For example, political leaders use social media to present their character and personalize their message in hopes of tapping younger voters^{1}^{1}1http://www.nytimes.com/2012/10/08/technology/campaignsusesocialmediatolureyoungervoters.html; corporations increasingly rely on social media for a variety of tasks, from brand awareness to marketing and customer service [7]; celebrities leverage social media to bring awareness to themselves and strengthen their fans’ loyalty^{2}^{2}2http://www.wsj.com/articles/whatcelebritiescanteachcompaniesaboutsocialmedia1444788220; and, ordinary people post about their lives and express their opinions to gain recognition from a mix of close friends and acquaintances^{3}^{3}3http://www.pewinternet.org/topics/socialnetworking/. However, social media users often follow hundreds of broadcasters, and they often receive information at a rate far higher than their cognitive abilities to process it [13]. This also means that many broadcasters actually share quite a portion of their followers, and they are constantly competing for attention from these followers.
In this context, these followers’ attention becomes a scarce commodity of great value [8], and broadcasters would like to consume a good share of it so that their posted contents are noticed and possibly liked or shared. As a consequence, there are myriads of articles and blog entries about the best times to broadcast information in social media and social networking, as well as data analytics tools to find these times^{4}^{4}4http://www.huffingtonpost.com/catrionapollard/thebesttimestoposton_b_6990376.html^{5}^{5}5http://blog.klout.com/2015/07/whensthebesttimetopostonsocial/. However, the best time to post on social media depends on a variety of factors, often specific to the broadcaster in question, such as their followers’ daily and weekly behavior patterns, their location or timezone, and the number of broadcasters and volume of information competing for their attention in these followers’ feeds (be it in the form of a Twitter user’s timeline, a Facebook user’s wall or an Instagram user’s feed). Therefore, the problem of finding the best times to broadcast messages and elicit attention (be it views, likes or shares), in short, the whentopost problem, requires careful reasoning and smart algorithms, which have been largely inexistent until very recently [19].
In this paper, we develop a novel framework for the whentopost problem, where we measure the gained attention or visibility of a broadcaster as the time that at least one post from her is among the most recent
received stories in her followers’ feed. A desirable property of this time based visibility measure is that it is easy to estimate from real data. In order to measure the achieved visibility for a particular deployed broadcasting strategy, one only need to use a separate heldout set of the followers’ feeds, independently of the broadcasted content. This is in contrast to other measures based on, , the number of likes or shares caused by a broadcasting strategy. These latter measures are difficult to estimate from real data and often require actual interventions, since they depend on other confounding factors such as the follower’s reaction to the post content
[6], whose effect is difficult to model accurately [5].More specifically, we will model users’ feeds and posts as discrete events occurring in continuous time using the framework of temporal point processes. Our model explicitly characterize the continuous time interval between posts by means of conditional intensity functions [1]. Based on such continuoustime model, then choosing a strategy for a broadcaster becomes a problem of designing the conditional intensity of her posting events. We derive a novel formula which can link the conditional intensity of an arbitrary broadcaster with her visibility in her followers’ feeds. Interestingly, we can show that the average visibility is concave in the space of (piecewise) smooth intensity functions. Based on this result, we propose a convex optimization framework to address a diverse range of visibility shaping tasks given budget constraints. Our framework allows us to conduct finegrained control of a broadcaster’s visibility across her followers. For instance, our framework can steer the visibility in such a way that some time intervals are favored over others, , times when the broadcasters’ followers are online. In addition to the novel framework, we develop an efficient gradient based optimization algorithm, which allows us to find optimal broadcast intensities for a variety of visibility shaping tasks in a matter of milliseconds. Finally, we experimented on a large realworld dataset gathered from Twitter dataset, and show that our framework can consistently make broadcasters’ posts more visible than alternatives.
Related work. The work most closely related to ours is by Spasojevic et al. [19]
, who introduced the whentopost problem. In their work, they first perform an empirical study on the best times to post in Twitter and Facebook by analyzing more than a billion messages and responses. Then, they design several heuristics to (independently) pinpoint at the times that elicited the greatest number of responses in a training set and then show that these times also lead to more responses in a heldout set. In our work, we measure attention by means of visibility, a measure that is not confounded with the message content and can be accurately evaluated on a heldout set, and then develop a convex optimization framework to design complete broadcasting strategies that are
provably optimal.There have been an increasing number of empirical studies on understanding attention and information overload on social and information networks [2, 14, 16, 13]. The common theme is to investigate whether there is a limit on the amount of ties (, friends, followees or phone contacts) people can maintain, how people distribute attention across them, and how attention influences the propagation of information. In contrast, in this work, we focus on optimizing a social media user’s broadcasting strategy to capture the greatest attention from their followers.
Our work also relates to the influence maximization problem, extensively studied in recent years [18, 15, 4, 10], which aims to find a set of nodes in a social network whose initial adoptions of certain idea or product can trigger the largest expected number of followups. In this line of work, the goal is finding these influential users but not to find the best times for these users to broadcast their messages, which is our goal here. Only very recently, Farajtabar et al. [11] have developed a convex optimization framework to find broadcasting strategies, however, their focus is on steering the overall activity in the network to a certain state by incentivizing a few influential users, in contrast, we focus on maximizing visibility as measured on a broadcaster’s audience’s feeds.
2 Background on Point Processes
A temporal point process is a stochastic process whose realization consists of a list of discrete events localized in time, with and . Many different types of data produced in online social networks can be represented as temporal point processes, such as the times of tweets, retweets or likes in Twitter. A temporal point process can be equivalently represented as a counting process, , which records the number of events before time . Then, in a infinitesimally small time window around time , the number of observed event is
(1) 
and hence , where is a Dirac delta function. It is often assumed that only one event can happen in a small window of size , and hence .
An important way to characterize temporal point processes is via the intensity function — the stochastic model for the time of the next event given all the times of previous events. The intensity function
(intensity, for short) is the probability of observing an event in a small window
, ,(2) 
Based on the intensity, one can obtain the expectation of the number of events in the windows and respectively as
(3) 
There is a wide variety of functional forms for the intensity in the growing literature on social activity modeling using point processes, which are often designed to capture the phenomena of interests. For example, retweets have been modeled using multidimensional Hawkes processes [11, 22], new network links have been predicted using survival processes [21, 12], and daily and weekly variations on message broadcasting intensities have been captured using inhomogeneous Poisson processes [17].
In this work, since we are interested on optimizing message broadcasting intensities, we use inhomogeneous Poisson processes, whose intensity is a timevarying function .
3 From Intensities to Visibility
In this section, we will present our model for the posting times of broadcasters and the feed story arrival times of followers using point processes parameterized by intensity functions. Based on these models, we will then define our visibility measure, and derive a novel link between the visibility measure and the intensity functions of a broadcaster and her followers.
Representation of broadcast and feed.Given a directed social network with users, we assume that each user can be both broadcaster and follower. Then, we will use two sets of counting processes to modeling each user’s activity, the first set for the user’s broadcasting activity, and the second set for the user’s feed activity.
More specifically, we represent the broadcasting times of the users as a set of counting processes denoted by a vector
, in which the th dimension, , counts the number of messages user broadcasted up to but not including time . Then, we can characterize the message rate of these users using their corresponding intensities(4) 
Furthermore, given the adjacency matrix corresponding to the social network , where indicates that follows , and otherwise, we can represent the feed story arrival times of the users as a sum of the set of broadcasting counting processes. That is
(5) 
which essentially aggregates for each user the counting processes of the broadcasters followed by this user. Then, we can characterize the feed rates using intensity functions
(6) 
where .
Finally, from the perspective of a pair of broadcaster (or user) and her follower , it is useful to define the feed rate of due to other broadcasters (or users) followed by as
(7) 
where we assume if does not follow , .
Definition of Visibility. Consider a broadcaster and her follower , and we note that may follow many other broadcasters other than . Thus, at any time , user may see stories originated from multiple broadcasters. We can model the times and origins of all these stories present in ’s current feed as a firstinfirstout (FIFO) queue^{6}^{6}6In this work, we assume the social network sorts stories in each user’s feed in inverse chronological order. of pairs
where denotes the th element in the queue, is the time when receives a story from broadcaster , denotes the set of broadcasters followed by , and is the length of the queue. The length accounts for the fact that online social platforms typically set a maximum number of stories that can be displayed in the feed, , currently Twitter has . The FIFO queue is to model the fact that when a new story arrives, the oldest story, , at the bottom of the feed will be removed, and the ordering of the remaining stories will be shifted down by one slot, ,
and the newly arrived story will be appended to the beginning of the queue as and appear at the top of the feed. For simplicity, we assume that the queue is always full at the time of modeling.
In the list , we keep track of the rank of the most recent story posted by the broadcaster among all the stories received by user by time , ,
(8) 
Then, given an observation time window , and a deterministic sequence of broadcasting events, we can define the deterministic visibility of broadcaster at with respect to follower as
(9) 
which is the amount of times that at least one story from broadcaster is among the most recent stories in user ’s feed.
Since the sequence of broadcasting events are generated from stochastic processes, we will consider the expected value of instead. If we first denote the probability that at least one story from broadcaster is among the most recent stories in follower ’s feed as
(10) 
then the expected (or average) visibility can be defined as
(11) 
given the integral is welldefined. In some scenarios, one may like to favor some periods of times (, times in which the follower is online), encode such preference by means of a time significance function and consider instead of just .
Note that the visibility is defined for a pair of broadcaster and her follower given . We will focus our later exposition on a particular of and , and omit the subscript and simply use notation such as , . However, we note that the computation of the visibility for a pair of users and may depend on the broadcast and feed intensities of all users in the network.
Computation of Visibility. In this section, we derive an expression for the average visibility, given by Eq. 11, using the broadcaster posting and follower feed representation, given by Eqs. 47. This link is crucial for the convex visibility shaping framework in Section 5.
Given a broadcaster with and her follower with and , we first compute the probability that at least one message from the broadcaster is among the most recent ones received by at time . By definition, one can easily realize that satisfies the following equation:
(12) 
where each term models one of the two possible situations:

The most recent message received by follower by time was posted by broadcaster ( ) and none of the other broadcasters that follows posts a message in ( ).

The most recent message received by follower by time was posted by a different broadcaster ( ) and broadcaster posts a message in ( ) which becomes the most recent one.
Then, by rearranging terms and letting , one finds that the probability satisfies the following differential equation:
(13) 
We can proceed with the induction step for with . In particular, by definition, satisfies the following equation:
(14) 
where each term models one of the three possible situations:

The last message posted by broadcaster by time is among the most recent ones received by follower ( ) and, independent of whether a message is posted by any other broadcaster or not, this message will remain among the most recent at .

The last message posted by broadcaster by time is the th one ( () and none of the other broadcasters followed by posts a message in ( )

The last messages received by follower by time were posted by other broadcasters ( ) and broadcaster posts a message in ( ), becoming the most recent one.
By rearranging terms and letting , we uncover a recursive relationship between and , by means of the following differential equation:
(15) 
Perhaps surprisingly, we can find a closed form expression for , given by the following Lemma (proven in the Appendix A): Given a broadcaster with message intensity and one of her followers with feed message intensity due to other broadcasters . The probability that at least one message from the broadcaster is among the most recent ones received by the follower at time can be uniquely computed as
(16) 
given the boundary conditions and the incomplete Gamma function defined as .
4 On the Concavity of Visibility
Once we have a formula that allows us to compute the average visibility given any arbitrary intensities for the broadcasters, we will now show that, remarkably, the average visibility is concave in the space of smooth intensity functions. Moreover, we will also show that the average visibility is concave with respect to the parameters of piecewise constant functions, which we will use in our experiments.
Smooth intensity functions. In this section, we assume that the message intensity of the broadcaster belongs to the space of all smooth functions. Before we proceed, we need the following definition:
Given the space of all smooth functions, a functional is concave if for every and :
(17) 
A functional is convex if is concave.
It readily follows that the probability , given by Eq. 16, is a functional with as input. Moreover, the following two theorems, proven in Appendices B and C, establish the concavity of and with respect to .
Given a broadcaster with message intensity and one of her followers with feed message intensity due to other broadcasters . The probability that at least one message from the broadcaster is among the most recent ones received by the follower at time , given by Eq. 16, is concave with respect to . Given a broadcaster with message intensity and one of her followers with feed message intensity due to other broadcasters . The visibility , given by Eq. 11, is concave with respect to .
Given the above results, one could think of finding the optimal (general) message intensity that maximize (a function of) the average visibilities across a broadcaster’s followers. However, in practical applications, this may be inefficient and undesirable, instead, one may focus on a simpler parametrized family of intensities, such as piecewise constant intensity functions, which will be easier to optimize and fit using real data. To this aim, next, we prove that the average visibility is also concave on the parameters defining piecewise constant intensity functions.
Piecewise constant intensity functions. In this section, we assume that the message intensity of the broadcaster belongs to the space of piecewise constant functions , denoted by , which we parametrized as follows:
(18) 
where , is the number of pieces, and .
As the reader may have noticed, the results from the previous section are not readily usable since Lemma 3 requires the intensity functions to be smooth. However, we will now show that, for every function , there is a sequence of smooth functions such that and, this will sufficient to prove concavity. Before we proceed, we need the following definition: A functional is said to be continuous at if for every , there is a such that
(19) 
provided that , where is a norm in .
It readily follows that the probability is a continuous functional on . Moreover, we need the following lemma (proven in Appendix D) to prove the concavity: For every , there is a sequence of smooth functions where .
Using Lemma 4, for any , it follows that
(20) 
where is a sequence of smooth functions such that . As a consequence, we can establish the concavity of and with respect to with the following Theorem (proven in Appendix E): and are concave functionals in the space of piecewise constant functions .
If we represent using Eq. 18, and are concave with respect to .
5 Convex Visibility Shaping Framework
Given the concavity of the average visibility, we now propose a convex optimization framework for a variety of visibility shaping tasks. In all these tasks, our goal is to find the optimal message intensity for broadcaster that maximizes a particular nondecreasing concave utility function of the average visibility of broadcaster in all her followers within a time window , ,
(21) 
where , denotes the broadcaster ’s followers, denotes the average visibility in follower , the first constraint asserts the intensity function remains positive, and the second limits the average number of messages broadcasted within to be no more than .
We next discuss two instances of the general framework, which achieve different goals (their constraints remain the same and hence omitted). More generally, the flexibility of our framework allows to use any nondecreasing concave utility function.
Average Visibility Maximization (AVM). The goal here is to maximize the sum of the visibility for all the broadcaster’s followers, ,
(22) 
Minimax Visibility Maximization (MVM). Suppose our goal is instead to keep the visibility in the followers with the smallest visibility value above a certain minimum level, or, alternatively make the average visibility across the followers with the smallest visibility as high as possible. Then, we can perform the following minimax visibility maximization task
(23) 
where denotes the average visibility in the follower with the th smallest visibility among all the broadcaster’s followers.
6 Scalable Algorithm
To solve the visibility shaping problems defined above, we need to be able to (efficiently) evaluate the probability function and visibility . However, a direct evaluation by means of Eqs. 16 and 11 seem difficult. Here, we present an alternative representation of the probability function and the visibility for piecewise constant intensity functions, which allow us to compute these quantities very efficiently. Based on this result, we present an efficient gradient based algorithm to find the optimum intensity.
Assume the broadcaster’s message intensity and the follower’s feed message intensity due to other broadcasters adopt the following form:
Then, each piece in the above intensities satisfies the recurrence relation given by Eq. 15, which we rewrite as
and one can easily prove by induction that, in general, the solution of the above differential equation for each time interval is given by
(24) 
where , , and is the probability at the beginning of time interval. Such representation allows for an efficient evaluation of .
Next, we also need to compute the integral of to efficiently compute the visibility . Without loss of generality, we represent the time for each piece in a normalized time window . Then, the integral of can be written as follows:
(25) 
where note that the last term is efficiently computable since, for integer values of , the incomplete Gamma function .
Given Eq. 25, we can now easily compute the gradient of the visibility , which we can then use to design an efficient gradient based algorithm. For brevity, we just show the gradient for . Let , and be the values of at the beginning of each time interval, then,
where we can easily compute recursively as
if , and , if .
Once we have an efficient way to compute the visibility and its gradient, we can readily design a projected gradient descent algorithm to find the optimal message intensity in the visibility shaping problems described in Section 5. Note that, since our optimization problems are convex, there is a unique optimum and convergence is guaranteed. Moreover, for the projection step, we solve a quadratic program, minimizing the distance to the feasible polytope. Algorithm 1 summarizes the overall algorithm.
7 Experiments
Dataset description and experimental setup. We use data gathered from Twitter as reported in previous work [3], which comprises the following three types of information: profiles of million users, billion directed follow links among these users, and billion public tweets posted by the collected users. The follow link information is based on a snapshot taken at the time of data collection, in September 2009. Here, we focus on the tweets published during a six and a half month period, from February 2, 2009 to August 13, 2009. In particular, we sample , users uniformly at random as broadcasters and record all the tweets they posted. Moreover, for each of these broadcasters, we track down all their followers and record all the tweets they posted as well as reconstruct their true timelines by collecting all the tweets published by the people they follow.
In our experiments, we use the first three and a half month period, from February 2 to May 13 to fit the piecewise constant intensities of the followers’ timelines and the followers’ significance, which we use in our convex visibility shaping framework. Here, the follower’s significance is the probability that she is online, estimated as a piecewise (hourly) constant probability from the tweetsretweets the follower posted – if a follower tweeted or retweeted in an hour, we assume it was online during that hour. Then, we use the last three month period, from May 14 to August 13, to evaluate our framework. We refer to the former period as the training set and the latter as the test set. We experiment both with hours (, hour) and days (, hour), and set the budget to be equal to the average number of tweets per the broadcaster posted in the training period.
Evaluation schemes. Throughout this section, we use three different evaluation schemes, with an increasing resemblance to a real world scenario:
Theoretical objective: We compute the theoretical value of the utility using the broadcaster intensity under study, be it the (optimal) intensity given by our convex visibility shaping framework, the intensity given by an alternative baseline, or the the broadcaster’s (true) fitted intensity.
Simulated objective: We simulate events both from the broadcaster intensity under study and each of the followers’ timeline fitted intensities. Then, we estimate empirically the overall utility based on the simulated events. We perform
independent simulation runs and report the average and standard error (or standard deviation) of the utility.
Heldout data: We simulate events from the broadcaster intensity under study, interleave these generated events on the true followers’ timelines recorded as test set, and compute the corresponding utility. We perform independent simulation runs and report the average and standard error (or standard deviation) of the utility.
Intensities, top probabilities and visibilities. We pay attention to four broadcasters, picked at random, and solve the average visibility maximization task for one of their followers, also picked at random. Our goal here is to shed light on the influence that the follower’s timeline intensity and significance have on the optimized broadcaster’s intensity as well as its corresponding visibility and top probability for different values of . Figure 2 summarizes the results, which show that (i) including the significance in the visibility definition shifts the optimized intensities away from the times in which the followers are not online (first row); (ii) the optimized intensities typically achieve a higher average visibility than the one achieved by the broadcaster’s true posting activity on a heldout set (third row); and (iii) the optimized intensities are more concentrated in time for (first row) and achieve a higher average visibility and top probability for (second and third row).
Solution quality. In this section, we perform a large scale evaluation of our framework across all , broadcasters in terms of the three evaluation schemes described above and compare its performance against several baselines. Here, we consider the definition of visibility that incorporates significance since, as argued previously, may lead to more effective broadcasting strategies^{7}^{7}7We obtain qualitatively similar results if we omit the significance in the definition of visibility. Actually, in such case, our framework beats the baselines by a greater margin..
In the average visibility maximization task, we compare our framework with three heuristics, in which the broadcaster distributes the available budget uniformly at random (RAVM), proportionally to (IAVM) and proportionally to (PAVM), respectively. In the minimax visibility maximization task, we also compare with three heuristics. The first two heuristics are similar to two of the ones just mentioned for AVM, , the broadcaster distributes the available budget uniformly at random (RMVM) and proportionally to (IMVM). In the third heuristic, the broadcaster distributes its budget following a greedy procedure: at each iteration , it first finds the user with the least visibility given and then solves the average visibility maximization for that user given a budget of . Finally, it outputs the intensity . The greedy procedure starts with . Additionally, for the heldout comparison, we also compute the actual average intensity that the broadcaster achieved in reality.
Theoretical 


Simulated 

Real Heldout 

Average Visibility  Minimax Visibility 
Figure 3 summarizes the results by means of a box plot, which shows the utilities achieved by our framework and the heuristics normalized with respect to the utility achieved by the broadcasters’ fitted true intensity (by the posts during the test set for the third evaluation scheme). That means, if , the optimized intensity achieves the same utility as the broadcaster’s recorded posts. For the average visibility maximization task, the intensities provided by our method achieve higher theoretical objective and higher utility on a heldout set, in average (black dashed line), than the broadcaster’s fitted intensities. In contrast, alternatives fail at providing any gain, , for a half of the broadcasters. Finally, for the minimax visibility maximization task, which is significantly harder, the intensities provided by our method achieve higher theoretical objective and higher average utility on a heldout set, in average (black dashed line), than the broadcaster’s fitted intensities. In this case, although our method outperforms the baselines by large margins in terms of theoretical and simulated objectives, the baselines achieved almost the same average utility on the heldout set. The theoretical and simulated objective are almost equal in all cases, as one may have expected.
Solution quality vs. # of followers. Figure 4(a) shows the average visibilities achieved by our optimized intensities for the AVM task, normalized by the average visibility that the corresponding broadcasters’ fitted intensities achieve, against number of followers for the same , broadcasters as above. Independently of the number of followers, we find that the intensities provided by our method consistently outperform the broadcaster’s fitted intensities.
Visibility vs. . Figure 4(b) shows the average visibility achieved by our optimized intensities for the AVM task against for the four broadcasters from Figure 2.
Scalability. Figure 4(c) shows that our convex optimization framework easily scale to broadcasters with thousands of followers. For example, given a broadcaster with followers, our algorithm takes milliseconds to find the optimal intensity for the average visibility maximization using a single machine with cores and TB RAM.
a) Followers  b)  c) Running time 
8 Conclusions
In this paper, we developed a novel framework to solve the whentopost problem, in which we model users’ feeds and posts as discrete events occurring in continuous time. Under such continuoustime model, then choosing a strategy for a broadcaster becomes a problem of designing the conditional intensity of her posting events. The key technical idea that enables our framework is a novel formula which can link the conditional intensity of an arbitrary broadcaster with her visibility in her followers’ feeds, defined as the time that at least one post from her is among the most recent received stories in her followers’ feed. In addition to the framework, we develop an efficient gradient based optimization algorithm, which allows us to find optimal broadcast intensities for a variety of visibility shaping tasks in a matter of seconds. Experiments on large realworld data gathered from Twitter revealed that our framework can consistently make broadcasters’ posts more visible than alternatives.
Our work also opens many interesting venus for future work. For example, we assume that the social network sorts stories in each user’s feed in inverse chronological order. While this is a realistic assumption for some social networks (, Twitter), there are other social networks (, Facebook) where the feed is curated algorithmically. It would be very interesting to augment our framework to such cases. In this work, we model users’ intensities using inhomogeneous Poisson processes, whose intensities are history independent and deterministic. Extending our framework to point processes with stochastic and history dependent intensity functions, such as Hawkes processes, would most likely provide more effective broadcasting strategies. In this work, we validate our framework on two visibility shaping tasks, average visibility maximization and minimax visibility maximization, however, there are many other useful tasks one may think of, such as visibility homogenization. Finally, it would be very interesting to investigate the scenario in which there are several smart broadcasters using our algorithm.
References
 [1] O. Aalen, O. Borgan, and H. K. Gjessing. Survival and event history analysis: a process point of view. Springer, 2008.
 [2] L. Backstrom, E. Bakshy, J. M. Kleinberg, T. M. Lento, and I. Rosenn. Center of attention: How facebook users allocate attention across friends. ICWSM, 2011.
 [3] M. Cha, H. Haddadi, F. Benevenuto, and P. K. Gummadi. Measuring User Influence in Twitter: The Million Follower Fallacy. ICWSM, 2010.
 [4] W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing in largescale social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010.
 [5] J. Cheng, L. Adamic, P. Dow, J. Kleinberg, and J. Leskovec. Can cascades be predicted? In Proceedings of the 23rd international conference on World wide web, 2014.
 [6] T. Chenhao, L. Lee, and B. Pang. The effect of wording on message propagation: Topic and authorcontrolled natural experiments on twitter. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014.
 [7] E. Constantinides. Foundations of social media marketing. ProcediaSocial and behavioral sciences, 148:40–57, 2014.
 [8] M. B. Crawford. The world beyond your head: On becoming an individual in an age of distraction. Macmillan, 2015.
 [9] A. De, I. Valera, N. Ganguly, S. Bhattacharya, and M. GomezRodriguez. Modeling opinion dynamics in diffusion networks. arXiv preprint arXiv:1506.05474, 2015.
 [10] N. Du, L. Song, M. GomezRodriguez, and H. Zha. Scalable influence estimation in continuoustime diffusion networks. In Advances in Neural Information Processing Systems, 2013.
 [11] M. Farajtabar, N. Du, M. GomezRodriguez, I. Valera, H. Zha, and L. Song. Shaping social activity by incentivizing users. In NIPS, 2014.
 [12] M. Farajtabar, Y. Wang, M. GomezRodriguez, S. Li, H. Zha, and L. Song. Coevolve: A joint point process model for information diffusion and network coevolution. In Advances in Neural Information Processing Systems, pages 1945–1953, 2015.
 [13] M. GomezRodriguez, K. Gummadi, and B. Schölkopf. Quantifying information overload in social media and its impact on social contagions. In 8th International AAAI Conference on Weblogs and Social Media, 2014.
 [14] N. Hodas and K. Lerman. How visibility and divided attention constrain social contagion. SocialCom, 2012.
 [15] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003.
 [16] G. Miritello, R. Lara, M. Cebrian, and E. Moro. Limited communication capacity unveils strategies for human interaction. Scientific reports, 3, 2013.
 [17] N. Navaroli and P. Smyth. Modeling response time in digital human communication. In Ninth International AAAI Conference on Web and Social Media, 2015.
 [18] M. Richardson and P. Domingos. Mining knowledgesharing sites for viral marketing. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002.
 [19] N. Spasojevic, Z. Li, A. Rao, and P. Bhattacharyya. Whentopost on social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2127–2136. ACM, 2015.
 [20] I. Valera and M. GomezRodriguez. Modeling adoption and usage of competing products. In IEEE International Conference on Data Mining, 2015.
 [21] D. Vu, D. Hunter, P. Smyth, and A. Asuncion. Continuoustime regression models for longitudinal networks. In Advances in Neural Information Processing Systems, 2011.
 [22] Q. Zhao, M. Erdogdu, H. He, A. Rajaraman, and J. Leskovec. Seismic: A selfexciting point process model for predicting tweet popularity. In 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.
Appendix A Proof of Lemma 1
We will prove this lemma by induction on . For the case , satisfies a firstorder linear differential equation,
(26) 
whose unique solution is
(27) 
as long as assuming . Then, using that , we can rewrite the solution as
which proves the theorem for . Now, in the inductive step we assume the hypothesis is true for and we prove it for . We start by rewriting the differential given by Equation 15 as
(28) 
where, by assumption, is unique and known. Then, as long as , the above differential equation has a unique solution and thus we only need to find that satisfies it. To do so, we rewrite the right hand side of the differential equation using the inductive hypothesis as
which, using , can be expressed as
(29) 
Next, we hypothesize that
(30) 
and rewrite Eq. 29 as
(31) 
Then, by the fundamental theorem of calculus,
and thus
(32) 
Finally, using that for differentiable functions and , , we have that
and then we can rewrite Eq. 32 as
which simplifies to
This asserts that hypothesized solution for in Eq. 30 satisfies Eq. 28, hence, it is the unique solution for .
Appendix B Proof of Theorem 3
From Lemma 1, we know that
Using integration by parts, we can rewrite the above expression as
Lemma B tells us that and are convex with respect to . Moreover, using Lemma B and the fact that , it follows that the function is convex. Finally, given that , we can conclude that is concave with respect to .
Functional is convex with respect to for any constant .
Proof.
We simply verify that satisfies the definition of convexity, as given by Eq. 17:
where the inequality follows from the arithmeticgeometric mean inequality, ,
for all positive , , and . ∎If the functional is convex with respect to . Then, given any arbitrary function , the functional is also convex with respect to .
Proof.
We verify that the functional verifies the definition of convexity, as given by Eq. 17:
where the inequality holds using that, given any two arbitrary functions and such that for all , then given for all . ∎
Appendix C Proof of Theorem 4
Appendix D Proof of Lemma 6
Each piecewise continues function can be represented as summation of a number of heaviside step functions. The count is equal to the number of discontinuity points. However, each heaviside function itself is the limit of smooth tanh functions. Therefore, the piecewise continues function will be the limit of a finite summation of smooth tanh functions.
Appendix E Proof of Theorem 7
Consider two piecewise constant functions . According to Lemma 4 there exist sequence of smooth functions such that and . Because of the concavity of in we know for :
Taking the limit and using the continuity of we get:
(33) 
Accompanied with convexity of space the theorem is proved.
Comments
There are no comments yet.