It may not be an exaggeration to say that the successful development of various communication systems, including fifth generation (5G) cellular systems, has been based on Shannon’s theory, which is called information theory [Shannon]. Information theory has influenced the development of communication systems as well as various other fields (e.g., statistics, biology, and so on). In his theory, information is characterized as randomness in variables. This allows one to calculate the fundamental limits and performance of communication, and to design efficient compression and transmission schemes through noisy channels. Despite the success in this domain of technical communication (TC), since its introduction in 1948, Shannon theory’s ignorance about the meanings of information [Weaver53] has long been tackled particularly in the field of the philosophy of information. Meanwhile, overcoming this limitation of Shannon theory has recently been regarded as one of the key enablers for the upcoming sixth generation (6G) communication systems [Strinati21, dde, seo2021semantics].
To fill this void, it requires to develop a theory on meaningful information, i.e., semantic information, as well as a novel communication technology based on semantic information, i.e., semantic communication (SC)
. For SC, existing works can be categorized into model-free methods leveraging machine learning[dde], and model-based approaches that quantify semantic information [Bao11] or specify the emergence of meanings through communication [seo2021semantics]. Our work falls into the latter category in the hope of unifying our analysis on SC with the existing model-based analysis on TC.
In regard to semantic information, there are two different views in the philosophy of information. One angle focuses on measuring semantic similarity [Floridi05, Floridi08], which often encourages an entirely new way to define meaningful information. For instance, each meaning can be identified as a group that is invariant to various nuisances or a category [belfiore2021topos], across which semantic similarity can be compared. The other end of the spectrum focuses on quantifying semantic uncertainty [Adriaans10]
, in a similar way to Shannon theory where message occurrences are counted to measure semantic-agnostic uncertainty. As an example, Shannon information can be extended to semantic information by leveraging the theory of inductive probability[Carnap50] (see also [Hailperin84, Williamson02]). This allows to measure the likelihood of a sentence/clause’s truth using logical probability, upon which an SC system can be constructed [Bao11]. Our view is aligned with the latter angle (i.e., like [Bao11], a probabilistic logic approach is taken), while we focus on making SC interact with TC under Shannon theory.
In particular, in this paper, we consider an approach to SC based on the theory of probabilistic logic assigning probabilities to logical clauses [Carnap50, Nilsson86]. This allows to make inferences over clauses and to quantify their truthfulness or provability in a probabilistic way. We showcase that the process of inference and its provability analysis can be performed using the probabilistic logic programming language (ProbLog)111ProbLog tools are available in: https://dtai.cs.kuleuven.be/problog.
, a practical logic-based probabilistic programming language that has been widely used in the field of symbolic artificial intelligence (AI).
Furthermore, based on [Bar-Hillel53, Adriaans10], we consider a two-layer SC system comprising: (i) the conventional TC layer where data symbols can be transmitted without taking into account their meanings; and (ii) an SC layer where one exploits semantic information that can be obtained from a background knowledge or by updating a knowledge base. We demonstrate the interaction between TC and SC layers with selected examples showing how SC improves the efficiency of TC, i.e., SC for TC, as well as how to design TC to achieve maximal gains in SC under limited communication resources, i.e., TC for SC. For simplicity and consistency throughout the paper, we confine ourselves to a simple scenario where a human user or an intelligent device stores logical clauses in a knowledge base and intends to improve the knowledge by seeking answers to a number of queries.
The main contributions of the paper are as follows.
Based on probabilistic logic, we characterize knowledge bases for semantic information and define various entropy-based measures, which allow us to model semantic compression and security.
For a SC system consisting of SC and TC layers, we address various issues through interactions between SC and TC subject to constraints of physical channels including a message selection problem. Few numerical examples are studied to illustrate the proposed approaches.
Open issues and challenges are identified for further research in the future.
Note that this paper is an extended version of [CLP_22].
The paper is organized as follows. Once we provide a background in Section II, in Section III, we present various aspects of semantic information and knowledeg bases based on probabilistic logic and introduce key measures. With the developed measures, in Section IV, we address key issues to build a SC system consisting of TC and SC layers by explaining how TC and SC layers interact subject to various constraints of physical channels. Numerical results on two exemplary SC use cases are presented in Section V, and open issues and challenges are discussed in Section VI. We conclude the paper with a few remarks in Section VII.
In this section, we present a background on information theory [CoverBook] and probabilistic logic [Nilsson86] [Williamson02].
Ii-a Classical and Semantic Information Theory
Although information theory originally started as a mathematical theory for communications, it has been applied in diverse fields ranging from biology to neuroscience. In information theory, random variables are used to represent symbols to be transmitted. The entropy of a random variable, denoted by, is the number of bits required to represent it, which is given by (taking to base 2 in the rest of the paper) when
is a discrete random variables, wherestands for the probability that and represents the statistical expectation. The entropy of can also be interpreted as the amount of information of .
The joint entropy of and is defined as and the conditional entropy is given by
The mutual information between and is defined as . It can also be shown that . If and are assumed to be the transmitted and received signals over a noisy channel, respectively, can be seen as the number of bits that can be reliably transmitted over this channel. Thus, is called the channel capacity that is the maximum achievable transmission rate for a given channel that is characterized by the transition probability .
As pointed out in [Bar-Hillel53], information theory is not interested in the content or meaning of the symbols, but quantifying the amount of information based on the frequency of their occurrence (i.e., the distribution of symbols as random variables). For example, is to measure the uncertainty of information or number of bits to represent a symbol regardless of what means. However, this does not mean that information theory is useless in dealing with the meaning or content of information as will be discussed in the paper.
To fill this void, it requires to develop a theory on meaningful information, i.e., semantic information. In regard to semantic information, there are two different views in the philosophy of information. One angle focuses on measuring semantic similarity [Floridi05, Floridi08], which often encourages an entirely new way to define meaningful information. For instance, each meaning can be identified as a group that is invariant to various nuisances (e.g., a so-called topos in category theory [belfiore2021topos]), across which semantic similarity can be compared. The other end of the spectrum focuses on quantifying semantic uncertainty [Adriaans10], in a similar way to Shannon theory where message occurrences are counted to measure semantic-agnostic uncertainty. As an example, Shannon information can be extended to semantic information by leveraging the theory of inductive probability [Carnap50] (see also [Hailperin84, Williamson02]). This allows to measure the likelihood of a sentence/clause’s truth using logical probability, upon which an SC system can be constructed [Bao11]. Our view is aligned with the latter angle (i.e., like [Bao11], a probabilistic logic approach is taken), while we focus on making SC interact with TC under Shannon theory.
Ii-B Deterministic and Probabilistic Logic
Reasoning about the truth of a sentence is the simplest type of logic, called propositional logic. Treating this as the zero-th order logic, the first-order logic can describe ordinary logic by parsing out and dividing each sentence into meaningful clauses [Aho94]. In the first-order logic, each clause consists of constant symbols (e.g., alphabets), logical operators (e.g., Boolean algebra such as AND , OR , and NOT ), and non-logical predicates (e.g., “is the father of” ). Programming in Logics (Prolog) aims to describe the first-order logic using a programming language, which has been widely used for computational linguistics and symbolic artificial intelligence (AI) such as IBM Watson [Clocksin03]. In Prolog, each clause is in a form of Head :- Body which is read as “Head is True if Body is True.” However, Prolog can only describe deterministic logic although the world is full of uncertainty. To overcome this limitation, thanks to the notion of probabilistic logic [Nilsson86] [Williamson02], ProbLog introduces the notion of a probability to each clause that is now in a form of p::Head :- Body. This probability can be, for instance, annotated by a programmer, which indicates the programmer’s degree of belief in the clause.
In this paper, we focus on exchanging logical clauses and making probabilistic inferences based on the clauses written in ProbLog. In particular, for facts and , where is assigned probability and is assigned probability , we have probability of computed as the product of the probabilities, i.e. , and computed as since . Similar calculations can be applied with deductive reasoning, e.g., suppose we have the rule of the form (where “” is “implies”) annotated with probability and with probability , then we can infer with probability . In ProbLog, a clause with probability is written as p::b :- a, where “:-” is read as “if”.
In general, a knowledge base is regarded as a set of clauses (where a clause is a rule or a fact). Given a rule of the form , the head of the rule is and the body is . Note that a fact is basically a rule of the form , which can just be written as . One can make inferences about the truth value of a query , provided that matches the head of a clause in with the outcome being the probability of . If does not match any head of a clause in , cannot say anything about . We denote the probability of computed as the answer when posed as a query to the knowledge base by . We assume that inferences made will be as defined by the semantics of ProbLog.
In addition, for the purposes of the discussion in this paper, we consider mostly the propositional logic fragment of ProbLog for simplicity (and if variables are involved in some examples, we assume that their values range over a finite set, i.e., they are just abbreviations for a finite set of propositional clauses, so that the set of queries that can be answered via a knowledge base is finite).
Iii Entropy and Knowledge Bases: Communicating Informative Messages
In this section, we discuss various aspects of semantic information (e.g., semantic compression and security) after quantifying the uncertainty of knowledge bases using the entropy of a clause.
Iii-a Entropy of a Clause
We consider the entropy of a given clause whose truth value can be considered as a random variable with outcomes “true” with probability , and “false” with probability , as follows:
Here, the subscript is used to differentiate the entropy of a random variable from that of a clause.
When a given query is posed to the knowledge base , and suppose a probability is computed with respect to , i.e., when matches a head of a clause in , as in the semantics of ProbLog, then , and we denote the entropy of with respect to as , i.e.:
Note that if does not match the head of any clause in , then the result of the query is undefined; alternatively, for an application, this can be set to (i.e., a random guess).
Iii-B Uncertainty of a Knowledge Base
Let denote the set of the terms which are the heads of all clauses in . We consider the heads of the clauses as these would correspond to the set of different queries that the knowledge base can compute a meaningful probability for.
Given a knowledge base , we can then define an uncertainty measure of as follows (which takes into account the entropy of answers it computes, i.e., the average entropy of queries computable from ):
Ideally, if a knowledge base can answer all its queries with certainty (probability 1, i.e., true with probability 1 or false with probability 1), then (assuming that ), while it is in the worse case. [enhanced,sharp corners,width=,breakable,colback=white, boxsep=1pt,left=2pt,right=2pt,top=3pt,bottom=3pt]
Suppose we have a knowledge base as follows, in ProbLog:
0.2::a. 0.3::b. 0.5::a :- b.
The set of the heads of all clauses in is ; the possible queries can answer are and , i.e. , and . Thus,
Iii-C Sender’s Message Choice Problem
Consider a network or a multiuser system consisting of multiple users. Each user may wish to improve their knowledge bases and communication222In this section, we assume that TC is ideal. In Section IV, we will consider how SC and TC interact. plays a crucial role in reducing the uncertainty of a knowledge base. To illustrate this, suppose that Alice has a set of clauses and Bob has a knowledge base . In order to minimize the average entropy of , Alice can choose and send a message to Bob, i.e.,
However, this requires Alice to have complete knowledge of . Alternatively, Alice might have a statistical approximation of in which with probability . In this case, Alice’s choice of is recast as:
To realize this idea, one way is to allow Bob to keep feeding the entropy of back to Alice. Then, throughout iterative communication, Alice can gradually improve the accuracy of .
Iii-D Receiver’s Message Assimilation Problem
In parallel with Alice’s choice of communication message as discussed in Section III-C, Bob is also able to reduce the uncertainty of the knowledge base by adjusting the updating rule of upon receiving , i.e., assimilation of . In (2), the assimilation is given by simply adding the received message to , i.e., . Generalizing this, Bob’s message assimilation problem is cast as: , where identifies an operator among a set of assimilation operators.
The aforementioned simple addition can be an assimilation operator, i.e., . Additionally, we introduce an assimilation operator maximizing the freshness of each clause in a way that: on receiving a new message (or clause) of the form ::, if includes clauses (of the form ::) differing from in only the associated probability , it replaces all such clauses of with the newly received , resulting in the updated knowledge base with replacement; otherwise, it follows the simple addition rule. To describe this, we define an assimilation operator that satisfies:
Furthermore, we introduce another assimilation rule that aims to minimize the entropy of each query to be asked to . To this end, remains unchanged if the received doesn’t help decrease the entropy for the query corresponding to the head of , where the clause is in the form of :::-, i.e. . This rule is described using an assimilation operator that is defined as:
Given the assimilation operator , , or , the resultant changes in the average entropy of will be elaborated on in Section III-E. Furthermore, for simplicity, will be used to represent the assimilation operators discussed above (i.e., , or ).
Iii-E Semantic Content of a Message
We can define the notion of the semantic content of a message (where a message in this case is a clause labelled with a probability) with respect to the receiver’s background knowledge base as follows (as the change in average entropy of a knowledge base with respect to its queries):
Each message changes and the receiver wants to decrease the entropy, i.e., , or wants to be as low as possible, as the message should decrease the average entropy in computed queries (of course, it could also increase the average entropy!). In the following example, we show why this definition helps. [enhanced,sharp corners,width=,breakable,colback=white, boxsep=1pt,left=2pt,right=2pt,top=3pt,bottom=3pt]
Suppose Alice has a knowledge base as follows, in ProbLog:
0.3::b. 0.5::a :- b.
Suppose Alice receive the labelled clause 0.2::m, i.e., m labelled with probability forming as follows:
0.3::b. 0.5::a :- b. 0.2::m.
Then, , , and and so . We have:
The uncertainty in the knowledge base with respect to the queries it can answer has decreased - which is what we expect when Alice receives a clause with a lower entropy relative to the existing clauses in . Also, if instead Alice received 0.9::b, then Alice’s knowledge base becomes:
0.9::b. 0.5::a :- b.
And , , that is, we have:
The uncertainty in the knowledge base with respect to the queries it can answer has decreased - which is what we expect when Alice receives a clause with a lower entropy replacing an existing clause in . By assimilating 0.9::b, we can have:
0.9::b. 0.3::b. 0.5::a :- b.
where , , and
which also shows a decrease in average entropy.
Iii-F Inference Can Reduce the Need for Communication
Suppose there is no background knowledge, i.e., . Then, the uncertainty of a query becomes , i.e. the truth or falsity of is merely a random guess. But with a knowledge base , we expect to have: . Furthermore, for two different knowledge bases, and , if
then we say is less uncertain than with respect to query . For the case that , we can easily show that .
This can lead to a reduction in the need to obtain information about given that we can make inferences about with . For example, suppose , where , is good enough, then there is no need to receive further information about . In fact, with respect to , we want only to receive information to reduce the entropy for , that is, we want only to receive message such that:
This can also be generalized if there is a set of available messages, say , as follows:
Here, is the best message among those in to reduce the entropy for . This implies that one might want to consider the consequences of receiving and assimilating a message (or from the sender side, the implications of sending a message) on the uncertainty of a knowledge base (whether it would increase or decrease the entropy with respect to or with respect to the overall uncertainty of a knowledge base as defined above). We illustrate this idea further later in the paper.
Iii-G Communicating a Knowledge Base Efficiently: a Notion of Semantic Compression
If the sender has an entire knowledge base to send, then the sender can achieve possible compression by sending the minimum number of clauses (assuming a standard fixed number of bits to send a clause) equivalent to the query-answering capability of the knowledge base.
We used , the heads of all clauses in , crudely to represent the set of queries answerable by a knowledge base but more general measures can be defined based on what can be inferred from a knowledge base (e.g., the immediate consequence operator [10.1145/183432.183528]).
Let denote the set of queries answerable using knowledge base , then two knowledge bases and are equivalent provided they can answer exactly the same queries: , and for each , both the knowledge bases compute the same results . Denoting by the set of all knowledge bases equivalent to , clearly, the sender should send to the receiver, which is given by
where denotes the cardinality of , i.e., the number of clauses in . In practice, if this is hard to compute, the sender, wanting to send , can try to perform semantic compression by finding a such that and .
In fact, the conditions above can be weakened, if the sender has and the receiver has some tolerance, then, given some threshold of tolerances and , suppose we have a knowledge base such that for a finite , or , and that deviates from by computing potentially different though similar probabilities as for each query, that is, for each :
which also implies that, for some , restricted to commonly answerable queries, . Potentially, can be a subset of by removing clauses, a “compressed” form for .
We note that one can also define the semantic content of a complex message comprising, not just a single clause, but a set of clauses (i.e., where a set if clauses is a knowledge base), generalizing from (4):
We have seen that the sender who knows the receiver has knowledge can exploit this fact to reduce the amount of data that needs to be sent to the receiver, while communicating the same semantic content. In effect, one can compute the following, with respect to receiver knowledge and target semantic content that the sender wants to communicate to the receiver:
where denotes the set of complex messages having content , i.e., . This can be viewed as a form of semantic compression that is relative to the semantic content (as defined in (6)) to be communicated.
Iii-H Improved Security via Semantic Messages
As we have seen, the semantic content of a message helps reduce the receiver’s uncertainty about one or more queries. We can then define a notion semantically secure messages, in that, without the receiver’s knowledge base, someone who has gotten hold of the message might not be able to use it to answer a query (or a set of queries).
For example, suppose Eve has knowledge base and Alice sends a message to Bob, who has knowledge base . With respect to a query , we can represent the fact that Eve has little use for the message provided as follows:
In other words, suppose , and Eve managed to intercept the communication and gain the message (and forwards it to Bob pretending that nothing has happened as a man-in-the-middle attack), but combined with her knowledge base , Eve is still just as uncertain about as before.
However, Bob who receives , who has , finds the message meaningful, that is, with respect to :
Hence, as long as Bob and Alice have an a priori shared context, as represented by knowledge base that Bob has and Alice knows that Bob has , then, it might be possible for Alice to transmit so that Eve (who does not know ), an eavesdropper, will not be able to make much use of it, with respect to some “sought after” answer for .
Note that one can see this as analogous to the typical security encryption scenario: is the plaintext encoded as the ciphertext using some key , then Bob who has knowledge of the key can decrypt to know , but Eve, after getting hold of , does not have and cannot use it obtain . But there are key differences. There could be multiple ways to infer with different sets of clauses. and may have different clauses but both could allow some inferences about . Alice needs to ensure that is such that (7) and is such that (8) before sending .
We can consider semantic information security based on the previous discussion. Conventional information security [Bloch11] [Csiszar11] is based on different channel reliability (e.g., the eavesdropper channel is a degraded channel in wiretap channel models). On the other hand, semantic information security is based on the different reliability of knowledge bases.
Define the semantic mutual information between query and message with respect to knowledge base as
Like the mutual information, this semantic mutual information is non-negative, , and upper-bounded by , i.e.,
We can also see that becomes 0 if message does not help answer query as happened in (7). Note that if , we have (since regardless of ). Consequently, we need to have an additional assumption that does not belong to or . In addition, we say that message is independent of query (with respect to knowledge base ) if . Clearly, if message helps answer query (to some extent), we expect to see that . Thus, the semantic mutual information can be used to quantify the increase of semantic information that message together with knowledge base can provide for query . Then, assuming that Bob is the legitimate receiver and Eve is the eavesdropper, the semantic secrecy rate for given query and message can be defined as
where . Let
which is the entropy difference between knowledge bases, and for given query . If , we can see that has less knowledge than for given query , and vice versa. Then, we can see that the semantic secrecy rate becomes greater than 0 if
The inequality in (12) implies that message can improve Bob’s knowledge base more than Eve’s knowledge base when answering query .
The notion of semantic secrecy rate can be extended to the case that message may not be reliably received due to TC errors over noisy physical channels. This generalization can allow the integration of conventional information security with semantic information security. To this end, we can consider that the original message is modified due to TC errors and received at Bob and Eve as and , respectively. The semantic mutual information at Bob becomes and at Eve . From them, we could generalize the semantic secrecy rate to encompass different physical TC channel conditions of Bob and Eve.
Iv Issues in Designing SC Systems
In this section, we discuss key issues in designing SC systems with questions and their answers. In particular, we focus on interactions between TC and SC.
Iv-a Question: How Can SC Be Structured?
In [Bao11], a model of SC was presented, which is illustrated in Fig. 1. The message generator, which is also called a semantic encoder is to produce a message syntax that will be transmitted by a conventional/technical transmitter. As a result, it is possible to design an SC system with two different layers: TC and SC layers.
In particular, the output of the sender at the SC layer is a message to be transmitted over a conventional physical channel as shown in Fig. 2. The output of the decoder at the TC layer is a decoded message that becomes the input of the SC decoder. From this view, a conventional TC system can be used without any significant changes for SC. However, without any meaningful interactions between TC and SC, there is no way for TC to exploit the background knowledge in SC and use the information obtained from semantic inference.
For interactions between TC and SC, the notion of the conditional entropy [CoverBook] can be employed. In SC, we can assume that is the information that can be obtained from the background knowledge at the receiver. In particular, is a clause or an element of clauses in the knowledge base at the receiver. For a clause , the entropy of becomes . In this case, the sender only needs to send the information of at a rate of . In Fig. 3, we illustrate a model for exploiting the external and internal knowledge bases to reduce the number of bits to transmit. For a given query, Bob can extract partial information, , from his knowledge base, which can be seen as data transmitted through internal communication, and seek additional information, , from others’ knowledge bases, e.g., Alice’s knowledge base. In this case, the number of bits to be transmitted is , which will be available through external TC.
In general, the notion of the Slepian-Wolf coding [SW73] can be employed in order to efficiently exploit the background knowledge in SC. Suppose that there are two sources at two separate senders, which are denoted by and , for distributed source coding. In the Slepian-Wolf coding, sender 1 that has can transmit at a rate of , while sender 2 that has can transmit at a rate of , not . As a result, the total rate becomes . In the context of SC, can be seen as the information that is available from the background knowledge and through semantic inference. [enhanced,sharp corners,width=,breakable,colback=white, boxsep=1pt,left=2pt,right=2pt,top=3pt,bottom=3pt]
Suppose that Alice and Bob are the sender
and receiver, respectively. In previous conversations, Alice
told Bob that “Tom has passed an exam and his score is 75
out of 100,” which becomes part of background knowledge. Then, Bob asked Alice the pass score, which is denoted by . Clearly, based on the knowledge base from the previous conversation, the pass score has to be less than or equal to 75, i.e., , which can be regarded as . Thus, to encode , the number of bits becomes . If is a positive integer and uniformly distributed over
is a positive integer and uniformly distributed over, , not .
[enhanced,sharp corners,width=,breakable,colback=white, boxsep=1pt,left=2pt,right=2pt,top=3pt,bottom=3pt]
Suppose that Eve told Bob that “Tom’s score is 75”, which is denoted by fact . In addition, Alice sends additional information that “The pass score is 70,” which is denoted by fact . Bob still does not know if Tom has passed, even after knowing the mark. Bob can ask Alice but does not need to ask Alice or Eve whether or not Tom has passed, because Bob can tell Tom passes from facts and via inference. If and , the probability that Tom has passed is . Thus, in order to encode the fact that Tom has passed, which is a binary random variable (e.g., (resp. ) represents Tom passes (resp. fails)), the number of bits becomes . This demonstrates that the background knowledge in SC can help compress the information in TC. A logic programming perspective on this example can also be considered. Suppose we model the knowledge Bob has with this rule that says that a person passes if the mark is above a threshold, and also that Bob has been told by Eve Tom’s score:
0.8::mark(tom,75). 1.0::pass(X) :- mark(X,M), pass_score(S), M >=S.
But Bob still does not know if Tom has passed. Bob could ask Alice but does not need to if he also knows the passing mark:
0.9::pass_score(70). 0.8::mark(tom,75). 1.0::pass(X) :- mark(X,M), pass_score(S), M >=S.
Bob can then answer the query pass(tom) himself with computed probability . Now Bob knows not only Tom’s mark but also whether Tom has passed, if this probability of is good enough for Bob. With representing Bob’s knowledge base, note that . Note that if Charlie later tells Bob that Tom has passed with probability , then Bob perhaps should discard Charlie’s message (which under assimilation resulting in ) would increase Bob’s uncertainty about pass(tom) since . Inferring can go far - e.g., by inferring about Tom, Bob has reduced the need for communication, but this can be extended to not just Tom but many others, saving a lot of communication. While this example appears to be contrived, one can consider a wide range of examples where a similar advantage can be realized. For instance, another way to put it is that suppose Bob knows the review scores of 1000 restaurants in his city but without knowing the pass score to be qualified as a good restaurant. Bob does not know if any of them passed, but on receiving the one message on the pass score, Bob now can infer which of the 1000 restaurants passed and which did not. Also, rather than sending facts stating which passed and which didn’t, sending just the pass score is more efficient. Lastly, if Bob is uncertainty tolerant and guesses the pass score with probability , then it doesn’t even need to ask for the pass score, and concludes a restaurant passes with probability , which might be good enough for tolerant Bob to dine in.
Iv-B Question: What Messages to Send?
As discussed in Subsection III-F, an optimal message can be chosen to minimize the entropy for a given query (see (5)). If a message is to be sent over a TC channel, the length of message can be regarded as the cost of TC. Let denote the length of message , where at a sender (in bits), while represents the knowledge base at the receiver that has query . Provided that the maximum length of message is limited by over a given TC channel, the optimal message for query can be given by
While the optimization in (14
) would be tractable, it requires for the sender to know or estimate the receiver’s knowledge base,, so that it can compute . Thus, in general, it is expected that the sender has a larger knowledge base than the receiver and knows (or is able to estimate) the receiver’s knowledge base. For example, the sender can be a server in cloud and the receiver can be a mobile user in a cellular system. The server needs to update all the registered users’ knowledge bases. In addition, the server is connected to base stations and needs to estimate the length of message to be transmitted through TC, which may vary depending on the time-varying physical channel condition between the user and associated base station. In this case, is also a function of the channel condition and parameters of the physical layer (e.g., modulation order, code rate, and so on).
The message selection problem in (14) can also be generalized for the case of multiple receivers. For example, suppose that there is a common query from all the receivers, . For TC, we can consider broadcast channels where there one sender and receivers. Let denote the knowledge base of receiver , . Then, (14) becomes
where the maximum of the all receivers’ entropy for the common query , i.e., , is to be minimized. If multiple messages are to be sent, the message selection in (16) can be repeated or a subset of can be chosen.
Iv-C Question: What Questions to Ask?
As shown in Example 4, it is important to formulate a question/query carefully in SC for efficient TC. For example, if Bob asks Alice, ”Does Tom pass?”, Alice can answer yes or no. Thus, a single binary random variable can be considered in TC for Alice’s answer. In this regard, inefficient questions might be “What is the pass score?” and “What is Tom’s score?”. Then, Alice must answer ”70” and ”75”, respectively, which requires more than one bit and can be seen inefficient compared to the answer of pass or fail with one binary random variable in TC.
In addition, the knowledge base has to be exploited in SC to formulate a TC-efficient question/query through semantic inference as mentioned earlier. By a TC-efficient question/query in SC, we mean a question/query that can be answered with a minimum number of bits in TC. To this end, we can consider the minimum description length (MDL) criterion [Rissanen78].
Recall that denotes the set of queries answerable using knowledge base . Consider a subset of , denoted by (i.e., ), which has all the queries whose answers can provide specific information that Bob wants. Then, the query that minimizes the total length of query-and-answer is given by
where and represent the length functions of question and message (as an answer) for given question , respectively. Furthermore, the cost function can be replaced with , where is the weight for the length of answer. In the conventional MDL criterion, . while can be larger than 1 if the length of answer is more important than the length of question, and vice versa. For a length function, , suppose that a set of queries is finite and known, i.e., with a finite is known. In addition, if the probability that query is given by , using the entropy, then the length of query becomes . [enhanced,sharp corners,width=,breakable,colback=white, boxsep=1pt,left=2pt,right=2pt,top=3pt,bottom=3pt]
Suppose that a group of students took an exam and their scores (between 1 and 100) were given. In addition, there are 4 grades, . Denoting by a student’s score, the grade is given as follows: for , for , for , and for . Alice has a knowledge base of the exam results, and Bob wants to know if Tom has passed and can consider the following set of questions to ask Alice:
Bob knows the grading table and means fail. Provided that Tom’s score is 80, the answer that “Tom’s score is 80” (for ) or “Tom’s grade is B” (for ) implies that Tom passes. Thus, the answer of any query in can directly or indirectly provide the information what Bob wants (i.e., whether or not Tom passes). The number of bits to encode the answer for query is bits (if the score is given as an integer number between 1 and 100 uniformly at random), for query 2 bits (as there are 4 grades that are equally likely), and for query 1 bit. If , , and , with , then according to the MLD criterion in (17). Of course, can be shorter than the above values of any prior knowledge of Tom’s performance is known (e.g., Tom has been an excellent achiever and hardly fails). This indicates that the receiver’s knowledge base can help compress the information to be sent in TC.
Iv-D Question: How Existing Knowledge Is Related?
Let be a random variable of a certain information. In addition, denote by the information that user has. Each user may have a different uncertainty on the information that can be measured by the following conditional entropy:
We can decompose the information at user with respect to as follows: , where is independent of , i.e., . Here, represents the distribution of . Then, we have
Thus, in multiuser SC with single sender and multiple receivers, it becomes important to realize the difference between receivers’ knowledge bases. [enhanced,sharp corners,width=,breakable,colback=white, boxsep=1pt,left=2pt,right=2pt,top=3pt,bottom=3pt]
Suppose that is a message. The meaning of this message can be different and depends on a receiver’s knowledge, which is denoted by for user . As a result, the meaning of the message is a function of and , i.e., . Consider 3 parties, Alice, Bob, and Eve. All the three parties know a person named Tom who took an examination. Alice has a message, , that is “Tom’s score is 75” to deliver Bob and Eve. Bob knows that another candidate whose score is 70 passes, which makes Bob deem Tom passed. Therefore, the meaning of the message is that Tom has passed the examination. On the other hand, Eve knows that a different candidate whose score is 80 passes. Thus, Eve still does not know if Tom has passed.
In this and previous subsections, we discussed two separate questions, while the two questions can be considered together. For example, if there are multiple related queries, we may consider an optimal order of queries to minimize the number of bits to be transmitted through interactions between TC and SC. To this end, it is necessary to consider the fact that the receiver can update its knowledge base once the answers of the earlier queries are obtained. Using a certain example, we will discuss this issue in Section V with numerical results.
Iv-E Question: Where To Seek Answer Among Distributed Sources?
In this subsection, we first discuss an approach to efficiently select distributed sources in terms of the entropy difference minimization [Choi_WCNC20]. Then, we extend this approach with respect to semantic context.
Suppose that there are multiple senders and one receiver. Let denote the information that sender has. The receiver has a query and the answer is a function of the variables at the senders, which is given by , where stands for the number of senders. For a large , with a limited bandwidth, collecting all information from distributed senders may take a long time. Furthermore, if the ’s are correlated, it may not be necessary to collect all variables. For efficient data collection from distributed senders/sources (or sensors), the notion of data-aided sensing (DAS) has been considered in [Choi_DAS19] [Choi_DAS20]. If only one sender can be chosen in each round, the following selection criterion is proposed in [Choi_WCNC20]:
where represents the index set of the senders that send their information up to iteration and is the set of the variables of the senders corresponding to . Here, stands for the complement of a set . In (20), represents the total amount of remained uncertainty of for given , which is available at the receiver up to iteration . Thus, in the next iteration , the sender that minimizes the remained uncertainty is to be chosen.
While no semantic information is taken into account in (20), it is possible to extend to consider semantic information. Let represent the message at node (for a given set of queries). At iteration , represents the updated knowledge base . Then, from (4), the node (or source) selection criterion can be given as follows:
That is, the receiver can actively seek the most effective message among multiple sources and iterate this process to rapidly improve the knowledge base. In addition, as in (14), constraints on TC can be imposed if TC channels are limited (e.g., in terms of capacity and channel resource sharing).
V Numerical Results
In this section, we present the numerical results of two examples, illustrating how SC and TC can interact to reduce the communication overhead in terms of the number of bits to transmit or the number of communication rounds. For simplicity, we consider a peer-to-peer communication between Alice and Bob who are the sender and the receiver, respectively, and focus only on the transmission from Alice to Bob. The first example assumes that all the queries from Bob to Alice are assumed to be reliable, whereas the second example considers unreliable queries, as we shall elaborate next.
V-a Crossword Puzzle Example
Consider a task for Bob to solve the crossword puzzle in Fig. 4 with the three questions. Bob has a knowledge base, denoted by , to solve the puzzle and is able to ask Alice to obtain answers through TC. It is assumed that Alice knows all the answers, which are (1) APPLE, (2) PORK, and (3) ICE. The physical channel of TC is modeled as a discrete memory less channel (DMC) and TC unit is an alphabet letter (upper cases only). Thus, we assume that each symbol has a unit length of bits. For the -ary DMC of TC, the following transition probability is assumed:
where represents the crossover probability or symbol error rate of TC.
To solve the crossword puzzle in Fig. 4
, for problem (1), Bob has a list of possible answers (the names of fruits consisting of 5 letters) as follows, written as an annotated disjunction representing a probability distribution:
0.25::word(one,"APPLE"); 0.25::word(one,"PEACH"); 0.25::word(one,"MANGO"); 0.25::word(one,"MELON").
where “;” can be taken as XOR, that is, there are 4 fruits and each one is equally likely in . For problem (2), Bob has the possible answers as follows:
Bob does not have any idea on problem (3). Note that any answer to the query has probability 0.25 and any answer to the query has probability 0.5, that is, , for each of the possible and for each of the possible . However, we can also capture how knowing certain possibilities in one word can help Bob know the other word, e.g., with certainty 1.0, not labelled below, we have the rules:
% (1) helps (2) word(two,"PORK") :- word(one,"APPLE"). % (3) helps (1) word(one,"APPLE") :- word(three,X), endswith(X,"E").
Firstly, we assume that the TC channel is error-free (i.e., ). Without any interactions between TC and SC, Alice needs to send 9 letters or bits. To reduce the number of bits from Alice to Bob, Bob can exploit his knowledge base. To this end, orders for queries333In this example, the terms, problem and query, are interchangeable. can be considered (e.g., (1) (2) (3)). When (1) is asked to Alice, Alice sends “APPLE”. Then, Bob can find the answer of (2) using his knowledge base. Furthermore, Bob can update his knowledge base as “If the meat is from pig, it is PORK” with probability 1 or
word(two,"PORK") :- clue(two,"meat from pig").
Bob can ask (3) to Alice and Alice sends the first two letters, “IC” as the last letter was sent. Note that once (1) is answered, Bob can find the answer of (2). Thus, the two orders of queries, [(1) (2) (3)] and [(1) (3) (2)], are reduced to [(1) (3)]. As a result, according to the order of queries, (1) (3), a total of 7 letters should be sent from Alice. For different orders, we have different number of letters to be transmitted as shown in Table I.
|the order of queries||number of letters|
|Order 1||[(1) (3)]||7|
|Order 2||[(2) (3)]||6|
Note that when problem (3) was asked as the first query, Bob receives “ICE”. Then, he can use his knowledge base to find all the answer. That is, for (1), “APPLE” is only the answer whose last letter is “E” so that Bob can find the answer. Likewise, Bob can also find the answer of query (2). This result can provide an insight into the best order for multiple related queries, which is that the first query might be the most uncertain one for Bob. However, this may not be true if TC is no longer error-free (as we will show later). In addition, note that the rule that (1) helps (2) could have been more precisely stated that the answer to (1) with the third letter (index 2) is “P”, then Bob would know word (2):
word(two,"PORK") :- word(one,X), charAt(X,2,"P").
and one could also state that knowing just the third letter of (1) would identify (1) completely:
word(one,"APPLE") :- word(one,X), charAt(X,2,"P").
While this could help further reduce the amount of data Alice needs to send to Bob - Alice just sends the third letter of (1) to identify both (1) and (2), given Alice knows about Bob’s knowledge base, we will not consider these more precise rules further for simplicity.
We now consider the case that the crossover probability of TC channel is non-zero (i.e., ). Suppose that the majority-logic decoding [LinBook] is employed. For query (1), Bob can successfully decode if any 3 letters out of 5 letters, APPLE, are correctly received. For query (2), we assume that 2 letters are to be sufficient for successful decoding (here, we ignore the case that “POEF” or ”BERK” etc). On the other hand, for query (3), all the 3 letters should be correctly received, which can happen with a probability of .
In Fig. 5, the decoding error probability when each query is answered from Alice to Bob over -ary DMC is shown. Thanks to different levels of knowledge at Bob, the decoding error probability varies. Since Bob does not have any knowledge about query (3), the decoding error probability becomes the highest. When
, there can be decoding errors in TC. Thus, Bob may ask Alice to re-transmit. For this, we consider a simple re-transmission scheme. Then, the average number of re-transmissions can be found using the geometric distribution. For example, lettingdenote the probability of successful decoding for the transmitted answers, the average number of (re-)transmissions becomes , and the total number of letters to be (re-)transmitted becomes for each query. Fig. 6 shows the average number of letters to be (re-)transmitted for each order. It is shown that order 3 is optimal (in terms of the number of letters to be (re-)transmitted) when the crossover probability is sufficiently low (which was clearly shown above when ). However, as the TC channel becomes less reliable (i.e., increases), order 3 is no longer optimal. That is, when , we see that order 2 becomes optimal.
V-B Clinical Test Example
Consider a new medicine clinical test in which Alice and Bob participate as a medical doctor and a medical scientist, respectively. As Fig. 7 shows, Alice in a hospital has a knowledge base storing the causal relationships among a symptom , its treatment , and a patient’s recovery ; i.e., , , and , where and is read as ‘ causes .’ Unknown relationships are associated with random guesses, i.e., . Meanwhile, Bob in a lab has a knowledge base storing the causal relationships among the age and loss of the patient; i.e., , , and .
Such a knowledge base coincides with a causal graph, i.e., a structured causal model (SCM) [scholkopf2021toward]
or a Bayesian network, which is a directed acyclic graph (DAG) having the nodes’s and the edges associated with ’s that identify causal relationships. ProbLog is capable of representing this causal knowledge base in a way that “ with probability ” is described by the following clause:
Given this knowledge base, our focus is Bob’s self-asking a query about the truth probability of , which is cast as:
where is assumed to be . The calculation of follows the same way of (22) in a recursive manner. Consequently, reflects all its preceding causal relationships.
Suppose that answering to each query is followed by improving Bob’s knowledge base by receiving a single clause on from Alice. For every communication round, Bob compares the received clause on and the clause stored in its knowledge base. Bob chooses either one of these two clauses and updates its knowledge base. Assuming that the received clause is always chosen by Bob, the communicating clause selection at Alice and the received clause assimilation at Bob are jointly recast as the problem of Alice’s selection of a clause to transmit. Each clause transmission is determined by one of the the following rules:
A1. Replacement - A randomly selected clause;
A2. Maximum Edge Probability - The clause associated with the maximal edge probability;
A3. Minimum Edge Entropy - The clause maximally reducing the entropy of an edge in Bob’s knowledge base;
A4. Minimum Knowledge Base Entropy - The clause maximally reducing Bob’s knowledge base entropy;
A5. Maximum Average Answer Probability - The clause maximizing Bob’s average answer probability.
With A1, the communication Rounds continue until sending Alice’s entire clauses. With A2-A5, the communication stops when it cannot further improve Bob’s knowledge base for its given criterion. This reduces communication costs, which comes at the cost of Alice’s additional computing overhead and having the information on Bob’s knowledge base.
To measure the accuracy of Bob’s reasoning about its query, we define the average error of a query on where the average is taken over the query selection. Each error is measured using the absolute difference between under Bob’s knowledge base and that under a ground-truth SCM that can be reconstructed by integrating the knowledge bases of Alice and Bob based on A3.
When Bob’s self-asking queries are randomly selected, Fig. 8(a) shows that A3 achieves the lowest average error after 5 communication rounds, in stark contrast to A2 and A5 focusing on the edge/answer probability, corroborating the importance of taking into account entropy. Furthermore, Fig 8(b) depicts that A3 achieves the entropy of the ground truth SCM, advocating that the knowledge base entropy is a good indicator to identify the reasoning capability of Bob. Nonetheless, the knowledge base entropy is not a proper communication rule for causal reasoning as it ignores the causal relationship therein, as observed by A4 that is even worse than A1.
Next, we consider that Bob’s query is always on . Alice can reflect this task specific information in its default transmission rule A3 by reducing the target clauses from the its entire knowledge base to only the clauses having as their header. This new rule and the original A3 can be interpreted as A3-1 ‘Within Task’ and A3 ‘Beyond Task’ rules, respectively. Fig. 9(a) shows that A3-1 achieves a sufficiently low average error rate with less communication overhead. Nevertheless, as opposed to A3, A3-1 fails to achieve the minimum average error due to its ignorance of the causal relationships that are not directly associated with . Indeed, the resultant knowledge base entropy under A3-1 is different from that under A3 and the ground truth value, as observed in Fig. 9(b).
Vi Open Issues and Challenges
In this section, we present open issues and challenges to design SC systems.
Vi-a Background Communication for Knowledge Base Updates
In the previous sections, we have studied SC under a scenario where a user (Bob) has a set of queries to send and ask another user or a server (Alice) who may have a better knowledge base than Bob, to get answers through TC as shown in Fig. 3. TC may suffer from outages due to fading and interference and from delays due to limited bandwidth. To avoid those difficulties, Bob may update his knowledge base in advance for the anticipated queries whenever the bandwidth of TC is sufficient. From this, we can divide TC into: background TC for updating the knowledge bases of users and foreground TC for sending a query and receiving an answer if the user’s knowledge base is not sufficient to obtain the answer with a certain reliability.
Given limited bandwidth, it is crucial to optimize the resource allocation and scheduling for the foreground TC and background TC. It is expected that the cost of background TC is lower than that of foreground TC as the background TC can be carried out based on best-effort delivery, while the foreground TC needs reliable and low-latency delivery. In this respect, the problem scenario is similar to radio access network (RAN) slicing between ultra-reliable and low-latency communication (URLLC) and other types of services [popovski20185g, bennis2018ultrareliable, pokhrel2020towards]. One key difference is that the priority of background TC depends on the amount of the accumulated knowledge and query patterns and anticipation. In this sense, edge caching problems are also relevant [ko2017live, elbamby2019wireless]. Nonetheless, background TC is additionally challenged by the logical connections within knowledge bases, as observed by the examples in Section V.
Vi-B Pragmatic SC for Memory and Communication Efficiencies
Thus far we have focused mainly on the Shannon-Weaver’s semantics (Level B) problem, while for the effectiveness problem (Level C) we have presumed that all semantic contents can be useful for some generic tasks. Such SC strategies may not be sustainable under limited memory for storing the ever-growing amount of knowledge, not to mention incurring redundant communication costs. Alternatively, inspired from pragmatic information theory [gernert2006pragmatic], we can first focus on a given task, and then count the usefulness of semantic contents based on its effectiveness in the task. In pragmatic information theory, there is a novelty-confirmation trade-off stating that not only identical information but also too novel information do not contribute to updating knowledge and/or having impacts on decision-makings. While the former is trivial, the latter results from the fact that such dissimilar information is barely comprehensible.
Leveraging this idea, consider a remote control scenario where Bob updates his knowledge base only when the received clause is grounded in (i) Bob’s prior knowledge and (ii) the physical world, directly or through multiple hops. The condition (i) comes from the novelty-confirmation trade-off, and (ii) is based on that control task-effective actions should be taken in the real world. For brevity, consider only conditional clauses in the form of but an action that is a factual clause. For given Bob’s knowledge base : if Bob receives satisfying (i) and (ii), Bob updates the knowledge base; and if Bob receives violating (i) and (ii), Bob keeps the knowledge base unchanged. By adding this pragmatic rule to the aforementioned SC framework, one can communicate and store only the semantic contents that are effective in a given task. In doing so, Bob can save the memory costs by simply discarding less task-effective semantic contents as we studied in the example in Section V-B. Furthermore, if Alice knows Bob’s task effectiveness before transmission, they can save the communication costs too. In this respect, it is worth investigating the feedback and prediction mechanisms to estimate the semantic content’s task effectiveness.
Vi-C Compatible SC via Knowledge-Model Conversion
Our proposed SC layer can be seamlessly added on to the conventional TC layer. How to jointly operate such SC and TC layers have been elaborated in Section IV-A, and how to reduce the additional overhead induced by the SC layer will be discussed in Section VI-D
. What makes it challenging is the recently proposed semantics-empowered and goal-oriented SC frameworks that commonly rest on AI-native operations with neural networks[qin2021semantic, tong2021federated, liang2022life], as opposed to our knowledge-based SC layer. We expect that both AI-native and knowledge-based SC frameworks are complementary, even creating a synergetic effect. In this respect, it is promising to study the conversion between neural network models and knowledge bases.
Indeed, it is possible to convert the knowledge base in our SC layer into a neural network model. For instance, treating a knowledge base as a labeled dataset, one can directly infuse the knowlege of the dataset into a neural network model by training the model via supervised learning. Similarly, if the knowledge base is graphical, one can first generate a synthetic corpus from the graph[agarwal2020knowledge], and train the model, yielding a trained neural network that contains the knowledge of the dataset. On the other hand, it is also feasible to transform a neural network model into a knowledge base, in that the model parameters store the information on their training dataset [achille2019information]. One possible solution is to leverage the model-to-corpus verbalization [west2021symbolic]
in natural language processing (NLP), through which a trained model generates synthetic clauses to be stored in a knowledge base. Consequently, an updated knowledge base in our SC layer can improve a neural network model for AI-native SC operations, and vice versa.
Vi-D SC Layer Overhead Reduction via Semantics Alignment
Allocating orthogonal communication and separate computing resources to the SC layer imposes additional overhead on the incumbent communication architecture. A näive solution is superimposing SC and other layers in power domain as in non-orthogonal multiple access [choi2014non, popovski20185g]. Going beyond, one can partly or entirely integrate the SC layer with the existing TC and/or application layers in semantics domain. To illustrate, consider integrating the SC layer message into the TC layer message . It requires to maximize . Such a problem boils down to minimizing subject to the fixed marginal distributions of and . This coincides with the minimum-entropy coupling problem [kocaoglu2017entropic], of which the polynomial complexity solution is available [cicalese2019minimum]. Similarly, for the application-SC layer integration, by maximizing , one can align the action in a control application with , and vice versa.
Accordingly, engineering the semantic representation of by modifying the logic-based language or learning a new emergent language could be an interesting research direction. As shown by the mutual information expressions above, the SC-TC layer integration and the application-SC layer integration may require more bandwidth due to the increased and incur more uncertain action decision-makings due to higher . Furthermore, in different layers, the message sizes can be different, and their communication frequency can be asynchronous. While reflecting this, reducing the SC layer communication overhead via cross-layer integration could be a challenging yet interesting topic for future research.
Vi-E Practical Demonstrators and Practically Establishing Communication Contexts
We outlined a number of issues with integrating semantics into communication, and noted how the communication context (as constituted by the knowledge held by the communicating parties and by what knowledge one party thinks the other has) can help in compression, security and improve efficiency beyond traditional communication models. An analogy is this: one can hear every word (or see every symbol) shared within a conversation among two friends and may not understand what actual knowledge has been exchanged - compression, security (in part at least) and efficiency are concurrently achieved, once the communication context has been established. A practical demonstration of our approach could be useful, e.g., involving machine-to-machine (say, robot-to-robot, or among IoT devices) communication within a shared context, exploiting such an SC based approach, can help shed further light on the quantitative advantages of our approach. One can also investigate how to efficiently establish and maintain such communication contexts before further intensive communications take place.
Vi-F Beyond ProbLog
We have used ProbLog as a concrete illustration of key ideas of what we mean by semantics and as a way to model semantic information and inference, and to demonstrate how TC and SC can interact information-theoretically. However, there are many types of inferences possible and other logics that can be used to model semantics. An open research issue is to consider a similar analysis as we have done in this paper but based on a different logical formalism. For example, a generalized version of ProbLog that allows probabilistic argumentation based reasoning [DBLP:journals/corr/abs-2110-01990] can help deal with the open world of communication where received messages may support or attack certain other pieces of knowledge, where truth and falsity of statements might not be assumed absolute but weighted by evidence and argument.
Vii Concluding Remarks
While several approaches exist to study semantic information, in this paper, we have considered semantic information and knowledge bases based on probabilistic logic, because the probabilistic logic based approaches can allow us to model interactions between SC and TC and formulate various problems to design a SC system subject to constraints of physical channels in a unified manner. In particular, based on probabilistic logic, we have defined various entropy-based measures for knowledge bases and addressed various issues when SC and TC layers interact. Numerical examples have been presented to demonstrate how the proposed probabilistic logic based approaches can efficiently utilize TC channels for SC.
Although we mainly focused on SC between human communicating parties, the proposed approach can be extended to machine-to-machine and human-to-machine SC. For human communicating parties, in general, we have assumed that one party would improve his/her knowledge (base) by receiving answers to a series of queries in this paper. For machines (in general, autonomous agents), there might be given goals to achieve and SC can be carried out to achieve those goals. Thus, together with the open issues in Section VI, it would be interesting to generalize the proposed approach to SC between machines and between machines and human/machine agents.