With the rise of the Internet, connections among people has become easier than ever; so has been for the availability of information and its accessibility. As such, the Internet is also the source of unprecedented collective phenomena, some of which, however, cast shadows on our contemporary society . Indeed, the dissemination of heavily biased, or worse, downright false information, once relatively moderate in size, and limited possibly to the class of hoaxes and scams, exploded in the last decade, creating the broader category of fake news. The urgent need for models that can describe the increasing spread of fake news has been highlighted by the current COVID-19 pandemic. Governments in many countries have found themselves in enormous difficulty because of the slowdown in vaccination campaigns due to the spread of false information and the inability of individuals to discern the authenticity of such information [12, 24].
Let us first briefly recall some of the main challenges in modeling fake news. First of all, one of the priorities is to introduce a definition of fake news with a consensus that is wide enough to make research works relatable. In this direction, one of the most accepted (though not universally so) traits for fake news to be labeled so is purpose: fake news is intentionally false news [40, 37, 2, 17]. The concept of purpose seems to be useful when differentiating between theories that are focused on the content rather than on the conveyor . In this sense, a piece of information that is accidentally false (e.g., by inaccuracy) is substantially different, both semantically and stylistically, from a maliciously fabricated one.
Next, the challenge is to detect fake news. The majority of recent lines of research in the direction of automatic detection moves toward the aid of big data
and artificial intelligence tools[34, 11, 33, 39]. An alternative strategy is instead component-based and focuses on the analysis of the multiple parts involved in the diffusion of the fake news, that is, both on the side of the creator and on the side of the user, but also on the linguistics and semantics of the actual content and on its style, and finally on the social context of the information (see  and references therein).
Different lines are possible, though. Information theory, for instance, has been used to model fake news: in , fake news are defined as time series with an inherent bias, that is, its expectation is nonzero, involved in a stochastic process of which the user tries to judge the likelihood of the truth, together with noise.
Finally, epidemiological theory has been proving for long to be fertile ground for modeling of fake news [13, 14], especially in the somewhat broader category of rumor-spreading dynamics. The analogy between rumors and epidemics has often proved fruitful: Daley and Kendall  took inspiration by the classical works of Kermack and Mckendrick [20, 18] to propose a SIR-like model involving ignorant, spreader and stifler agents who played the role of the susceptible, infectious and recovered ones in . Since the seminal paper , rumor-spreading dynamics has taken ideas from epidemiological models to improve their prediction accuracy.
Recently, networks theory delved in this direction, too [10, 7, 41, 38]. Substantial research has merged networks and epidemiology through for instance classical compartmental models like the SIR (both epidemiological in  and rumor-oriented ), the SIS  and the SIRS [22, 35]. Moreover, more sophisticated epidemiological models have been developed, like the SEIZ model  describing the evolution in time of the compartments of susceptible, exposed, infectious and skeptic agents, which has been adapted to the analysis of fake news dissemination (see, e.g., [25, 19]). In particular, the tendency seems to be to define the skeptic agents like the ones who are aware of the information but do not actively spread it . In a symmetric fashion, spreaders need not to believe a piece of information to be able to spread it (this is especially useful when thinking that bots are often encountered in social networks , both for legitimate and malicious purposes). This descriptions are also sensible in terms of matching the model with data available.
In this paper we follow this pathway: borrowing ideas from kinetic theory [15, 27], we combine a classical compartmental approach inspired by epidemiology [18, 20] with a kinetic description of the effects of competence [29, 28]. We refer also to the recent work  concerning evolutionary models for knowledge. In fact, the first wave of initiatives addressing fake news focused on news production by trying to limit citizen exposure to fake news. This can be done by fact-checking, labeling stories as fake, and eliminating them before they spread. Unfortunately, this strategy has already been proven not to work, it is indeed unrealistic to expect that only high quality, reliable information will survive. As a result, governments, international organizations, and social media companies have turned their attention to digital news consumers, and particularly children and young adults. From national campaigns in several countries to the OECD, there is a wave of action to develop new curricula, online learning tools, and resources that foster the ability to “spot fake news” .
It is therefore of paramount importance to build models capable of describing the interplay between the dissemination of fake news and the creation of competence among the population. To this end, the approach we have followed in this paper falls within the recent socio-economic modeling described by kinetic equations (see  for a recent monograph on the subject). More precisely, we adapted the competence model introduced in [29, 28] to a compartmental model describing fake news dissemination. Such a model allows not only to introduce competence as a static feature of the dynamics but as an evolutionary component both taking into account learning by interactions between agents and possible interventions aimed at educating individuals in the ability to identify fake news. Furthermore, in our modeling approach agents may have memory of fake news and as such be permanently immune to it once it has been detected, or fake news may not have any inherent peculiarities that would make it memorable enough for the population to immunize themselves against it in the future. The approach can be easily adapted to other compartmental models present in the literature, like the ones previously discussed [5, 25, 32].
The rest of the manuscript is organized as follows. In Section 2 we introduce the structured kinetic model describing the spread of fake news in presence of different competence levels among individuals. The main properties of the resulting kinetic models are also analyzed. Next, Section 3 is devoted to study the Fokker-Planck approximation of the kinetic model and to derive the corresponding stationary states in terms of competence. Several numerical results are then presented in Section 4 that illustrate the theoretical findings and the capability of the model to describe transition effects in the spread of fake news due to the interaction between epidemiological and competence parameters. Some concluding remarks are reported in the last Section together with details on the theoretical results and the numerical methods in two separate appendices.
2 Fake news spreading in a socially structured population
In this section, we introduce a structured model for the dissemination of fake news in presence of different levels of skills among individuals in detecting the actual veracity of information, by combining a compartmental model in epidemiology and rumor-spreading analysis [18, 14] with the kinetic model of competence evolution proposed in .
We consider a population of individuals divided into four classes. The oblivious ones, still not aware of the news; the reflecting ones, who are aware of the news and are evaluating how to act; the spreader ones, who actively disseminate the news and the silent ones, who have recognized the fake news and do not contribute to its spread. Terminology, when describing this compartmental models, is not fully established; however, the dominant one, inspired by epidemiology, refers to the definitions provided by Daley  of a population composed of ignorant, spreader and stifler individuals. The class of reflecting agents can be referred to as a group that has a time-delay before taking a decision and enter an active compartment [5, 25].
Notation, i.e., the choice of letters to represent the compartments, is even more scattered and somewhat confusing. In Table 1 for readers’ convenience we have summarized some of the different possible choices of letters and terminology found in literature. Given the widespread use of epidemiological models compared to fake news models, in order to make the analogies easier to understand, we chose to align with notations conventionally used in epidemiology. Therefore, in the rest of the paper we will describe the population in terms of susceptible agents (S), who are the oblivious ones; exposed agents (E), who are in the time-delay compartment after exposure and before shifting into an active class; infectious agents (I), who are the spreader ones and finally removed agents (R) who are aware of the news but not actively engaging in its spread.
Note that this subdivision of the population does not take into account actual beliefs of agents about the truth of the news, so that removed agents, for instance, need not be actually skeptic, nor the spreaders need to actually believe the news. To simplify the mathematical treatment, as in the original works by Daley and Kendall [13, 14], we ignored the possible ‘active’ effects of the population of removed individuals by interacting with other compartments and producing immunization among susceptible (the role of skeptic individuals in [5, 25]) and remission among spreaders (the role of stiflers in ). Of course, the model easily generalizes to include these additional dynamics.
The main novelty in our approach is to consider an additional structure on the population based on the concept of competence of the agents, here understood as the ability to assess and evaluate information.
|SEIR (this paper)||DK [13, 14]||ISR ||SEIZR ||SEIZ |
Let us suppose that agents in the system are completely characterized by their competence , measured in a suitable unit. We denote by , , , , the competence distribution at time of susceptible, exposed, infectious and removed individuals, respectively. Aside from natality or mortality concerns (i.e., the social network is a closed system—nobody enters or leaves it during the diffusion of the fake news, which is a common assumption, based on the average lifespan of fake news) we therefore have:
which implies that we will refer to
as the fractions of the population that are susceptible, exposed, infected, or recovered respectively. We also denote the relative mean competences as
2.1 A SEIR model describing fake news dynamics
The fake news dynamics proceeds as follows: a susceptible agent gets to know it by a spreader. At this point, the now-exposed agent evaluates the piece of information—the reflecting, or delay, stage—and decide whether to share it with other individuals (and turning into a spreader themselves) or to keep silent, removing themselves by the dissemination process.
When the dynamic is independent from the knowledge of individuals, the model can be expressed by the following system of ODEs
with and where is the contact rate between the class of the susceptible and the class of infectious, is the rate at which agents make their decision about spreading the news or not, is the portion of agents who become infectious and is the rate at which spreaders remove themselves from the compartment, due, e.g., to loss of interest in sharing the news or forgetfulness. Finally,
is related to the specificity of the fake news and the probability of individulas to remember it. A probability ofmeans that the fake news has not any inherent peculiarity (e.g., in terms of content, structure, style, …) that can make it memorable enough for the population to ‘immunize’ against it in the future, while a probability of allows for the agents to have the full ability to not fall for that fake news a second time. The various parameters have been summarized in Table 2. The diagram of the SEIR model (1) is shown in Figure 1.
|contact rate between susceptible and infected individuals|
|average decision time on whether or not to spread fake news|
|probability of deciding not to spread fake news|
|average duration of a fake news|
|probability of remembering fake news|
It is straightforward to notice that when and are zero, system (1) specializes in a classic SEIS epidemiological model. This is consistent with treating the dissemination of non-specific fake news in a population as the spread of a disease with multiple strains, for which a durable immunization is never attained. In this case system (1) has two equilibrium states: a disease-free equilibrium state and an endemic equilibrium state where
If instead or , there also is the possibility to permanently immunize against fake news with those traits; moreover, both infectious and exposed agents eventually vanish, leaving only the susceptible and removed compartments populated. In the case of maximum specificity of the fake news, i.e., , the stationary equilibrium state has the form
where is solution of the nonlinear equation
in which is the initial datum .
2.2 The interplay with competence and learning
In the following, we combine the evolution of the densities according to the SEIR model (1) with the competence dynamics proposed in . We refer to the degree of competence that an individual can gain or loose in a single interaction from the background as ; in what follows we denote by the bounded-mean distribution of , satisfying
Assuming a susceptible agent has a competence level and interacts with another one belonging to the various compartments in the population and having a competence level , their levels after the interaction will be given by
where and quantify the amount of competence lost by susceptible individuals by the natural process of forgetfulness and the amount gained by susceptible individuals from the background, respectively. , instead, models the competence gained through the interaction with members of the class , with ; a possible choice for is , where
is the characteristic function anda minimum level of competence required to the agents for increasing their own skills by interactions. Finally, and to consider the non-deterministic nature of the competence acquisition process.
The binary interactions involving the exposed agents can be similarly defined
the same holds for the interactions concerning the infectious fraction of the population
and finally we have the interactions regarding the removed agents
It is reasonable to assume that both the processes of gain and loss of competence from the interaction with other agents or with the background in (5)–(8) are bounded by zero. Therefore we suppose that if , and if , with and , and then
may, for example, be uniformly distributed in.
In order to combine the compartmental model SEIR with the evolution of the competence levels given by equations (5)–(8) we introduce the interaction operator following the standard Boltzmann-type theory . As earlier, we will denote with a suitable compartment of the population, i.e., , and we will use the brackets to indicate the expectation with respect to the random variable . Thus, if is an observable function, then the action of on is given by
with defined by (5)
with defined by (6)
with defined by (7),
with defined by (8). All the above operators preserve the total number of agents as the unique interaction invariant, corresponding to .
The system then reads:
where the function
is responsible for the contagion, being the contact rate between agents with competence levels and . In the above formulation we also assumed , , , and functions of . Note that, clearly, the most important parameters influenced by individuals’ competence are , since individuals have the highest rates of contact with people belonging to the same social class, and thus with a similar level of competence, as individuals with greater competence invest more time in checking the authenticity of information, and , which characterizes individuals’ decision to spread fake news. On the other hand, the values of and we may assume to be less influenced by the level of expertise of individuals.
2.3 Properties of the kinetic SEIR model with competence
In this section we analyze some of the properties of the Boltzmann system (13). First let us consider the reproducing ratio in presence of knowledge.
By integrating system (13) against , and considering only the compartments of individuals which may disseminate the fake news we have
In the above derivation we used the fact that the Boltzmann interaction terms describing knowledge evolution among agents preserve the total number of individuals and therefore vanish. Following the analysis in , and omitting the details for brevity, we obtain a reproduction number generalizing the classical one
has a natural connection with the Fourier transform by choosing its kernelas test function, we can analyze the system (13) with the Fourier transforms of the densities as unknowns.
Indeed, given a function , its Fourier transform is defined as
The system (13) becomes
where the operators are defined in terms of the Fourier transforms of their arguments for , so that
where , with is defined as
We suppose that the parameters satisfy the condition
which will prove useful in the proof.
which is finite whenever and
have equal moments up to the integer part ofor to if is an integer.
We have the following result.
For the details of the proof we refer to Appendix A.
3 Mean-field approximation
A highly useful tool to obtain information analytically on the large-time behavior of Boltzmann-type models are scaling techniques; in particular the so-called quasi-invariant limit , which allows to derive the corresponding mean-field description of the kinetic model (13).
where and the functions involved in the dissemination of the fake news, as well
We denote by the scaled interaction terms. Omitting the dependence on time on mean values and re-scaling time as , we obtain up to
where we used a Taylor expansion for small values of of
3.1 Stationary solutions of Fokker-Planck SEIR models
Let us impose that , following  from the computations of the previous section we formally obtain the Fokker-Planck system
In the case or , we know that , , and due to mass conservation, so that and as well. Thus, adding all the equations together leads us to
which has as solution an inverse Gamma density
It is straightforward to see that the scaled Gamma densities
In Figure 2 we report two examples of the stationary solutions where we chose the competence variable to be uniformly distributed in : in the first case (left) we considered , while in the second case (right) we set and .
4 Numerical examples
In this section we present some numerical tests to show the characteristics of the model in describing the dynamics of fake news dissemination in a population with a competence-based structure.
To begin with, we validate the Fokker-Planck model obtained as the quasi-invariant limit of the Boltzmann equation: we will do so through a Monte Carlo method for the competence distribution (see , Chapter 5 for more details). Next, we approximate the Fokker-Planck systems (22)–(25) by generalizing the structure-preserving numerical scheme  to explore the interplay between competence and disseminating dynamics in the more realistic case of epidemiological parameters dependent on the competence level (see Appendix B). Lastly, we investigate how the fake news’ diffusion would impact differently on different classes of the population defined in terms of their capabilities of interacting with information.
4.1 Test 1: Numerical quasi-invariant limit
In this test we show that the mean-field Fokker-Planck system (22)–(25) obtained under the quasi-invariant scaling (20) and (21) is a good approximation of the Boltzmann models (13) when . We do so by using a Monte Carlo method with particles, starting with a uniform distribution of competence , where is the indicator function, and performing various iterations until the stationary state was reached; next, the distributions were averaged over the next 500 iterations. We considered constant competence-related parameters and as well as a constant variance for the random variables .
In Figure 3, we plotted the results for (circle-solid, teal) and for (square-solid, ochre): those choices correspond to a scaling regime of and , respectively, with . Finally, we assumed that (left) and (right).
Directly comparing the Boltzmann dynamics equilibrium with the explicit analytic solution of the Fokker-Planck regime shows that if is small enough, Fokker-Planck asymptotics provide a consistent approximation of the steady states of the kinetic distributions.
4.2 Test 2: Learning dynamics and fake news dissemination
For this test, we applied the structure-preserving scheme to system (22)–(25) in a more realistic scenario featuring an interaction term dependent on the competence level of the agents, as well as a competence-dependent delay during which agents evaluate the information and decide how to act. In this setting, we refer to the recent Survey of Adult Skills (SAS) made by the OECD : in particular, we focus on competence understood as a set of information-processing skills, especially through the lens of literacy, defined  as “the ability to understand, evaluate, use and engage with written texts in order to participate in society”. One of the peculiarities that makes the SAS, which is an international, multiple-year spanning effort in the framework of the PIAAC (Programme for the International Assessment of Adult Competencies) by the OECD, interesting in our case is that it was administered digitally to more than 70% of the respondents. Digital devices are arguably the most important vehicle for information diffusion in OECD countries, so that helps to keep consistency.
Literacy proficiency was defined through increasing levels; we therefore consider a population partitioned in classes based on the competence level of their occupants, equated to the score of the literacy proficiency test of the SAS, normalized. Thus, we chose a log-normal-like distribution
Initial distributions for the epidemiological compartments were set as
with , and .
The contact rate was set as