Modern wireless communication systems have reshaped the operations of the society and people’s lifestyle, becoming an engine for propelling the data economy. Many advances in wireless systems are based on the ideas rooted at Claude Shannon’s locus classicus on information theory . In his work, Shannon defined a communication problem as one concerning “reproducing at one point either exactly or approximately a message selected at another point”. He argued therein that “semantic aspects of communication should be considered as irrelevant to the engineering problem”. Guided by Shannon’s approach and philosophy, most existing communication systems have been designed based on rate-centric metrics such as throughput, spectrum/energy efficiency, and, with the advent of 5G, latency. Nevertheless, there is an increasing belief in the community that the classic Shannon’s framework needs to be upgraded for the next evolution step in communications. Its narrow focus on the reliability level of communication starts to show its limitations in meeting the ambitious goals set for the sixth generation (6G). In particular, the ignored meaning behind the transmitted data are expected to play an important role in 6G communications, which places an unprecedented emphasis on machine intelligence and its interface with human intelligence. In existing systems there is a limited coupling of the high-level meaning or relevance of the data content with the transmission strategies; an example is packet prioritization based on data content, implemented in the upper networking and application layers [2, 3, 4]. However, the separation of transmission and data’s meanings and effectiveness for achieving specific goals inevitably result in redundancy, e.g., transmitting information lacking relevance or freshness. This causes the existing techniques for information filtering, transmitting, and processing to struggle with keeping pace with the exponential growth rate of data traffic [5, 6, 7]. The need of highly efficient communication for supporting machine-intelligence services has triggered a paradigm shift from “semantic neutrality” towards semantic communication (SemCom) [8, 9, 5, 10]. This is the main theme of this article.
The concept of SemCom was introduced by Warren Weaver, a collaborator of Shannon, who defined a communication framework featuring three levels . The first-level, which is targeted by Shannon’s information theory, aims at answering the technical problem that “How accurately can the symbols of communication be transmitted?”. SemCom belongs to the second level concerning an answer to the semantic problem that “How precisely do the transmitted symbols convey the desired meaning?”, beyond which the third level is defined as the effectiveness problem that “How effectively does the received meaning affect conduct in the desired way?”. In Weaver’s time, communication activities dominantly served the purpose of information exchange among humans. Thus the Weaver’s SemCom definition should be interpreted as a concept of Human-to-Human (H2H) communication. In the modern era of machine intelligence, the connotation and scope of SemCom, however, have been substantially enriched and broadened to cover all three levels. This necessitates the presentation of a modern view of SemCom.
1.1 The Rise of Machine Intelligence
The recent rapid advancements in Artificial Intelligence (AI) and Internet-of-Things (IoT) are two main factors contributing to the rise of machine intelligence.
Research on AI dates back to 1950’s. The term AI first appeared in a research proposal aimed at creating “the embryo of an electronic computer that will be able to walk, talk, see, write, reproduce itself and become conscious of its own existence” 
. To materialize the vision, researchers invented neural networks to mimic the mechanism of brain neurons for processing information and realizing intelligence. Early attempts attained some success in demonstrating the effectiveness of such models, e.g. Frank Rosenblatt’s famous concept of Perceptron. The single-layer linear classifier he used is widely regarded as the distant ancestor of modernMachine Learning (ML) algorithms. The ensuing evolution of AI had experienced periodic bouts of enthusiasm interspersed with “AI winters”. In a “winter”, research could stay stagnant for a decade due to limited computing power, insufficient training data, and crudity of AI algorithms. These obstacles were finally eliminated in the past few years after the preceding decades of fast development of chips with remarkable number-crunching power and growing abundance of datasets. Nowadays, we witness the wide-spread use of powerful large-scale neural-network models featuring billions to tens of billions of parameters organized in hundreds of hierarchical layers, termed the Deep Learning (DL) architecture. Advanced ML algorithms have been designed for various tasks, including supervised learning, unsupervised learning, and reinforcement learning. Via heavy-duty statistical analysis of big data, the ML algorithms can enable deep neural networks to understand the inherent patterns of physical objects and attain a wide-range of human-like capabilities, from recognition to translation.
Another paradigm shift in computing is to embed computers into tens of billions of edge devices (e.g., sensors and wearables) and connect them to the mobile networks [13, 14]. Thereby, the resultant IoT can serve as a large-scale sensor network as well as a massive platform for edge-computing. Complex tasks can be executed on the platform to improve the efficiency of businesses and the convenience of consumers. For example, sensors and cameras connected to IoT can act as a surveillance network, or save energy by smart lighting; IoT connected cows can enable the cloud to track their health conditions and eating habits, which provides useful data for smart agriculture. Individual gains may not be walloping but they are compounded as the scale of IoT grows.
The developments of AI and IoT are intertwined and their full potential can be unleashed by integration. On one hand, AI endows on edge devices the human-like capabilities of decision making, reasoning, and vision as well as boosting their communication efficiencies and reliability. On the other hand, massive data are being continuously generated by the enormous number of edge devices in IoT (e.g., more than a hundred trillion gigabytes of data in the next 5 years ). Such data are fuel for AI and can be distilled into intelligence to support a wide-range of emerging applications and improve the efficiencies of data-driven businesses .
1.2 The Context for Defining Semantic Communication
The breathtaking advancements in machine intelligence and the exponential growth of machine population usher in the new era of machine intelligence. The extensive involvement of machines in modern communication give rise to three basic types of communication context: H2H communications, human-to-machine (H2M) communications, and machine-to-machine (M2M) communications. The classic H2H SemCom as considered by Weaver is therefore insufficient for describing future diverse communication tasks. This motivates us to broaden the scope of SemCom by defining three sub-areas matching the mentioned contexts as follows.
H2H SemCom: The definition of H2H SemCom is consistent with the second level of the Weaver’s framework and addresses the semantic problem described earlier. To be precise, the communication purpose is to accurately deliver meanings over a channel for message exchange between two human beings. To this end, the system performance is measured by how well the intended meaning of the sender can be interpreted by the receiver.
H2M SemCom: This area concerns communication between a human being and a machine. The distinction of H2M SemCom lies in the interface between human and machine intelligence, which is different in nature and involves both the second and third level of the Weaver’s framework. For H2M SemCom to be effective, the transmitted messages have to be understood not only by humans but also by machines. To be more specific, the success in H2M SemCom hinges on two aspects: 1) a message sent by a human being should be correctly interpreted by a machine so as to trigger the desired actions or responses (the effectiveness problem); 2) a message sent in the reverse direction should be meaningful for the human at the receiving end (the semantic problem). The typical applications include human and AI symbiosis system, recommendation system, human sensing and care system, and virtual reality (VR)/augmented reality (AR) system.
M2M SemCom: In the absence of human involvement, M2M SemCom concerns the connection and coordination of multiple machines to carry out a computing task. Therefore, this area relates less the level-two communication (i.e., semantic problem) but more to the level-three communication (i.e., effectiveness problem). Latest research on M2M SemCom advocates the approach of integrated communication and computing () that promises more efficient system design under constraints on radio and computing resources. The resultant cross-disciplinary research has led to the emergence of a new class of M2M SemCom techniques to be introduced in the sequel. The typical applications include M2M SemCom are mainly related to those in the areas of distributed sensing, distributed learning, and distributed consensus (e.g., vehicle platooning).
1.3 Motivation and Outline
SemCom has been regarded as a key enabling technology for future networks. Research on SemCom concerns the representation of semantic information, SemCom modeling, enabling techniques, and network design. A number of surveys of the area have appeared where different SemCom frameworks are proposed. First, system-level issues of SemCom such as network architectures and modeling are discussed in [16, 10]. Specifically, a semantic-effectiveness (SE) plane whose functionalities address both the semantic and effectiveness problems is proposed in  to realize information filtering and direct control of all layers. The new layered architecture is showcased with particular applications including immersive and tactile scenarios, integrated communication and sensing (ISAC), and physical-layer computing. The survey in  focuses on SemCom modeling and semantic-aware network architecture. Two SemCom models are introduced based on shared knowledge graph (KG) and semantic entropy, respectively, each which comprises semantic encoder/decoder and semantic noise. Building on these models, semantic networking for federated edge intelligence is then proposed to support semantic information detection and processing, knowledge modeling, and knowledge coordination. On the other hand, efforts have been made to explore SemCom enabling techniques including information representation, data transmission and reconstruction [5, 17, 18]. In particular, the work in  features the integration of semantic and goal-oriented aspects for 6G networks and KG based techniques for information representation, semantic information exchange measurement, semantic noise, and feedback. This survey also presents the interplay between machine learning and SemCom, identifying their mutual enhancement and cooperation in communication networks. In , a semantic-aware communication system is discussed from the perspective of data generation/active sampling, information transmission, and signal reconstruction. To redesign communication networks for SemCom, conventional approaches should be revamped to support new metrics and operations such as semantic metrics, goal-oriented sampling, semantic-aware data generation, compression, and transmission as suggested in . In view of prior work, existing SemCom frameworks are basically extended from the Weaver’s classical definitions and do not comprehensively incorporate current advancements of relevant technologies. There is still a lack of a systematic survey article that provides a unified framework of SemCom in the era of machine intelligence; this is precisely the motivation for the current work.
The contributions of this paper can be summarized as follows. First, this paper defines three different areas of SemCom, i.e. H2H SemCom, H2M SemCom, and M2M SemCom, by identifying the involved subjects and objects. The proposed framework can accordingly describe existing technologies, models, and frameworks, providing a comprehensive reference for both researchers and practitioners. Next, with the proposed framework, we conclude current advancements of technologies that are relevant to or beneficial for SemCom, which can help readers in interpreting easier their research in the context of SemCom. Furthermore, we incorporate the KG based SemCom technologies and extend their applications into H2M SemCom, H2M SemCom, and M2M SemCom scenarios. In addition, according to existing 6G visions, potential technologies and use cases that are helpful for SemCom are introduced.
While H2H SemCom is a classic, well studied area, our discussion focuses on H2M and M2M SemCom for their being new paradigms in the modern era of machine intelligence. Furthermore, we propose a new direction of KG based SemCom that helps accomplish H2H SemCom, H2M SemCom, and M2M SemCom by exploiting the semantic representations of information. An overview of SemCom techniques and applications covered in this article is provided in in TABLE 1.
|KG based SemCom||
The remainder of the paper is organized as follows. In Section 2, we introduce the SemCom principles including semantic and effectiveness encoding, a new network layered architecture, and design approaches. Next, semantic/effectiveness encoding and transmission techniques targetting specific application areas of H2M SemCom and M2M SemCom are presented in the following two sections. Specifically, in Section 3, semantic encoding and H2M SemCom techniques are discussed for the areas of human-machine symbiosis, recommendation, human sensing and care, and VR/AR. Section 4 focuses on M2M SemCom including effectiveness encoding and SemCom techniques for the areas of distributed learning, split inference, distributed consensus and machine-version cameras. Subsequently, we introduce the KG based SemCom approach in Section 5. Finally, in Section 6, a visino of SemCom in the 6G era is proposed.
For ease of reference, We summarize the definitions of the acronyms that are used in this paper in Table 2.
|6G||Sixth Generation||LSTM||Long-Short-Term Memory|
|AI||Artificial Intelligence||MIMO||Multiple Input-Multiple Output|
|AirComp||Over-the-Air Computing||ML||Machine Learning|
|AR||Augmented Reality||MLP||Multi-layer Perceptron|
|BERT||Bidirectional Encoder Representations from Transformers||mMTC||massive Machine Type Communication|
|CDD||Channel Decoded Data||MR||Mixed Reality|
|CNN||Convolutional Neural Network||MSE||Mean Squared Error|
|CRI||Channel Rate Information||NLP||Natural Language Processing|
|CSI||Channel State Information||PAI||Partial Algorithm Information|
|DII||Data Importance Information||PBFT||Practical Byzantine Fault Tolerance|
|DL||Deep Learning||PCA||Principal Component Analysis|
|DNN||Deep Neural Network||QoS||Quality-of-Service|
|DTI||Data Type Information||RNN||Recurrent Neural Network|
|ECG||Electrocardiogram||ROIs||Regions of Interests|
|eMBB||enhanced Mobile Broadband||RRM||Radio-Resource Management|
|FEEL||Federated Edge Learning||SEED||Semantic/Effectiveness Encoded Data|
|FL||Federated Learning||SemCom||Semantic Communication|
|FoV||Field-of-View||SGD||Stochastic Gradient Descent|
|H2H||Human-to-Human||SMCV||Squared Multi-Variate Coefficients of Variation|
|H2M||Human-to-Machine||SplitNet||Split Neural Network|
|IB||Information Bottleneck||SVD||Singular Value Decomposition|
|IC||Integrated Communication and Computing||UAV||Unmanned Aerial Vehicle|
|IoE||Internet-of-Everything||URLLC||Ultra-Reliable Low-Latency Communication|
|ISAC||Integrated Communication and Sensing||XR||Extended Reality|
|KG||Knowledge Graph||SDT||Semantic Difference Transaction|
|LNA||Linear Analog Modulation||MLM||Masked Language Model|
|LSA||Latent Semantic Analysis|
2 SemCom Principles: Encoding, Architecture, and Design Approaches
2.1 Encoding for SemCom
In , the fundamental problem of communication was described as that of reproducing at one point either exactly or approximately a message selected at another point. The communication-theoretic model of Shannon consists of five parts as illustrated in Fig. (a)a and explained as follows.
An information source produces messages to be transmitted to the receiver.
A transmitter encodes and modulates the messages into a signal for robust transmission over a unreliable channel.
A channel is the medium used to propagate the encoded signal from the transmitter to receiver. In the propagation process, the external, random disturbance to the signal is called channel noise.
A receiver performs decoding and demodulation to reconstruct the transmitted message from the received signal such that errors due to channel distortion are corrected;
A destination is a human being or a machine for whom the message is intended.
Information-theoretic encoding focuses on the statistical properties of messages instead of the content of messages. The transmitted message is one selected from a set of possible messages with a given distribution. Mathematically, information theory simplifies H2H communication to transmission of a finite set of symbols. Nevertheless, in practice, the messages have meaning, relevance, and/or usefulness. To be specific, they refer to or are correlated with certain physical or conceptual entities or are contributing towards the achievement of some goal. This semantic aspect of communication was originally treated in Shannon’s theory as being irrelevant to the engineering problem of information transmission. For example, a tacit assumption in Shannon’s model is that the sender always knows what is relevant for the receiver and the receiver is always interested and ready to receive the data sent by the transmitter.
As mentioned earlier, Weaver proposed a more general communication framework characterized by three levels of problems, namely the technical problem (solved by Shannon’s theory), semantic problem, and effectiveness problem . While Weaver’s framework targets H2H communication, we consider the modern SemCom in the era of machine intelligence as addressing both the semantic and effectiveness problems. A diagram of the SemCom system is presented in Fig. (b)b. Accordingly, there are two classes of techniques:
Semantic encoding and transmission: This class of techniques target scenarios where the destination is a human being (e.g., H2H and M2H SemCom). Their purpose is to convey the meaning of a transmitted message as accurate as possible so that it can be correctly interpreted by a human. Therefore, the design of such techniques is to solve the semantic problem in Weaver’s framework.
Effectiveness encoding and transmission: This class of techniques target scenarios where the destination is a machine (e.g., H2M and M2M). Then the techniques aim at delivering a message as a instruction or query to the machine such that it can perform what the sender requires it to do or respond appropriately. In this sense, their design focuses on the effectiveness aspect of communication, thereby the name.
In the remainder of this sub-section, the principles of semantic and effectiveness encoding are introduced while application specific techniques are discussed in following sections.
2.1.1 Semantic Encoding
Even though Shannon’s theory does not explicitly target SemCom, information-theoretic encoding can be adopted for the latter by extending two key notions, entropy and mutual information, to define semantic entropy and semantic mutual information. The entropy of a discrete source measures the amount of information in each sample and depends on the source’s statistics. Mathematically, the entropy of a message is defined as , where are possible outcomes of
with probabilities. Accordingly, the mutual information is given by , which indicates how much the amount of information about the transmitted message is obtained after receiving the message . On the other hand, the mutual information between the source and destination quantifies the amount of information obtained about the former by observing the latter. The combined use of the two measures allows the study of the maximum data rate under a constraint of “physical distortion” [e.g., mean squared error (MSE)]. The unsuitability of these information-theoretic measures for SemCom due to their lack of semantic elements is obvious by considering the following example. A single-letter error results in a transmitted word of “big” to be received as “pig”; the reception of the word “cattle” due to the transmission of “cow” corresponds to errors in multiple letters. The former represents much more reliable information transmission than the latter but the reverse is true from the perspective of semantic transmission.
An attempt on defining the semantic entropy was made in . Therein, a semantic source is modeled as a tuple with modeling the observable world that includes a set of interpretations, representing source’s background knowledge, indicating source inference that is relevant to background knowledge, and denoting message generator or encoder. Then given the probability for each elements in , let indicate the subset of in which the message is justified as “true” by the inference , i.e., , and the logical probability of is defined as and the corresponding semantic entropy is . Those definitions lay a foundation for semantic encoding/decoding and semantic transmission. For instance, the recent work in  argues that the key issue of SemCom is to find a proper semantic interpretation (also termed as semantic representation) and the coding scheme such that the semantic ambiguity of transmitted message and coding redundancy are close to zero. Consequentially, the model (semantic) entropy and the message (syntactic) entropy follows from , meaning that the semantic encoder could achieve intentional source compression with an information loss .
Departing from the above theoretic abstraction, there are diversified approaches for designing practical semantic encoding. The first approach is KG based semantic-encoding that is decomposed into two stages: 1) finding a proper representation of common knowledge background of the communication parties in the form of a KG; 2) encoding data using the KG. Detailed discussion is presented in Section V. Second, the power of ML gives rise to the learning based approach of integrated semantic and channel (i.e., information theoretic) encoding. As an example, for text transmission, a joint semantic and channel coding scheme based on deep learning is proposed in , where encoding a sentence is represented by with denoting the channel encoder and denoting the semantic encoder. It follows that the decoding process is modeled by with the received signal . The encoders and decoders are trained as a single neural network by treating the channel as one layer in the model (similar to SplitNet discussed in the sequel). The training process features the consideration of both semantic similarity and transmission data rate. Specifically, the sentence similarity between the original sentence and the recovered sentence is given by
where represents bidirectional encoder representations from transformers (BERT), a well-known model used for semantic information extraction [138, 139] (more details are presented in Section 3.1.2). Another approach is based on latent semantic analysis (LSA) which compresses text documents by finding their low dimensional semantic representations. This is achieved by finding a low-dimensional semantic subspace using singular value decomposition (SVD) of document-term matrices that indicates appearances of specific words in the documents and then projecting these matrices onto the the subspace (see more details in Section 3.1.1).
2.1.2 Effectiveness Encoding
Effectiveness encoding is to compress messages while retaining their effectiveness as instructions and commands for machines. Techniques are system and application specific and thus there exist a wide-range of designs. As examples, we discuss effectiveness source encoding targeting two representative tasks: classification and distributed machine learning. More application specific effectiveness encoding techniques are discussed in Section 4.
Information Bottleneck (IB) for Classification:
Consider source encoding of an information source represented by a random variable. Classic coding schemes based on Shannon’s rate-distortion theory aims at finding a representation close to in terms of MSE. On the contrary, IB is aware of the computing task (e.g., classification), denoted as , making it an effectiveness coding scheme. The main feature of IB is to extract information from the signal source that can contribute to the effective execution of as much as possible. Taking classification for instance, shall represent the most discriminative feature of . Mathematically, the IB design aims at finding the optimal tradeoff between maximizing the compression ratio and the preserved effectiveness information, corresponding to simultaneously minimizing the mutual information and maximizing . This is equivalent to solving the following multi-objective optimization problem :
where the conditional distribution denotes the source encoder and is a combining weight. Its optimal solution is task-specific. For classification, is the label predicted by the classifier. A general algorithm constructs the optimal source encoder in (2
) via alternating iterations. In each iteration, the probability density functions, and are determined step-by-step. The IB has attracted attention in the area of machine learning as it contributes the much needed theory for studying deep learning . In particular, training a feature-extraction encoder in a deep neural network (DNN) can be interpreted as solving a IB-like problem where the encoder’s function is to encode an input sample into a compact feature map . The encoding operations, e.g., feature compression and filter pruning, regulate the discussed tradeoff in IB.
Stochastic Gradient Quantization: One common method of implementing federated learning (FL), a popular distributed-learning framework based on SGD, requires a device to compute and transmit to a server a stochastic gradient
, computed by taking derivative of a loss function with respect to the parameters of a AI model under training. Detailed discussion of FL is provided in Section4.1. A gradient is high dimensional as its length is equal to the number of model parameters. For instance, the popular Resnet18 model has around
million trainable parameters. As it’s transmission incurs excessive communication overhead, a gradient should be compressed by quantization at the device. A generic vector quantizer is unsuitable for two reasons.First, its design is based on using the MSE as the distortion metric. This metric is undesirable for the current task since a gradient conveys a gradient-descending direction on a (learning) loss function and this metric fails to directly reflect the direction deviation. Second, the conventional vector quantizer can handle only low-dimensional vectors because its complexity grows exponentially fast with the vector dimensions. To tackle the two challenges calls for the design of new effectiveness techniques for source encoding of stochastic gradients. One such design is presented in  with two key features. The first is to divide a high-dimensional gradient into many low-dimensional blocks, each of which is quantized using a low-dimensional component quantizer. The results are combined to give the high-dimensional quantized gradient with combining weights optimized to minimize descent-direction distortion. The second feature is to design a component quantization using the method of a Grassmannian manifold. In the current design, the manifold refers to a space of lines where each line (plus the sign of an associated combining weight) suitably represents a particular descent direction. Essentially, the codebook of a Grassmannian quantizer comprises a set of unitary vectors that are optimized to minimize the expected directional distortion. The effectiveness source encoder for stochastic gradient, designed to target the task of FL, is shown to achieve close-to-optimal learning performance while substantially reducing the communication overhead with respect to the state-of-art approaches, such as a binary gradient quantization scheme called signSGD .
2.2 SemCom Architecture and Design Approaches
The protocol stack of a radio access network is modified in  to support SemCom. Its key feature is the addition of a that interacts with and control all layers to provide efficient solutions for both semantic and effectiveness problems in Weaver’s framework. In this subsection, we propose a new, simpler SemCom architecture as shown in Fig. 2. It builds on the conventional protocol stack but adds a Semantic Layer that resides in the Application Layer as a sub-layer. This allows the Semantic Layer to interface with sensors and actuators, have access to algorithms and content of data in the specific application. The main functions of the Semantic Layer is to perform semantic/effectiveness encoding/decoding discussed in the preceding subsections. On the other hand, the techniques for radio access layers (i.e., Physical Layer, Medium Access Layer, and Logical Link Control Layer) are largely derivatives of Shannon’s information theory; their design is focused on improving semantic-agnostic performance such as data rate, reliability, and latency. Then Semantic Layer transmits to lower layers Semantic/Effectiveness Encoded Data (SEED) and receives from them Channel Decoded Datas (CDD). Based on the proposed architecture, we describe two approaches to the SemCom system design, layer-coupling approach and split neural network (SplitNet), respectively.
2.2.1 Layer-coupling Approach
The first approach, called layer-coupling approach, is to jointly design the Semantic Layer and radio access layers. To this end, we propose the possibility of exchanging control signals between them (see Fig. 2). Among others, a set of basic signals are defined and their functions in layer-coupling design described as follows.
Channel Rate Information (CRI): The information fed back from lower layers enables the semantic/effectiveness (SE) encoders to adapt their coding rates to the wireless channel state.
Data Importance Information (DII): It measures the heterogeneous importance levels of elements of the SE encoded data. For a human receiver it is the interpretation, while for a machine that acts as a receiver is the is effective execution of a task. Examples include identifying keywords in a sentence in terms of representation of its semantic meaning or discriminate gains of different features of an image for the purpose of classification. Such information facilitates adaptive transmission, multi-access, and resource allocation in the lower layers. For instance, for data uploading to a server for model training, the uncertainty levels of local samples can be used as DII to schedule devices .
Partial Algorithm Information (PAI) that includes essential characteristics of current algorithms, such as information related to the AI-architecture or the target function in distributed computing (e.g., average or maximum). PAI enables the physical layer to deploy effectiveness transmission techniques, such as AirComp discussed in the preceding sub-section.
Data Type Information (DTI)
that indicates which category the data belongs to. This includes, for example, image for machine recognition, image for human, and stochastic gradient for machine learning, etc. DTI enables the radio access layers to choose transmission techniques based on a suitable performance metric (e.g., Grassmannian quantization for gradient source encoding discussed in the last sub-section) and understand the corresponding performance requirements (e.g., an image is more sensitive to noise for human vision than for pattern recognition).
The above control signals can be transmitted over a control channel to the receiver and used by its semantic layer to remove semantic noise from CDD for semantic symbol error correction or control computing at the Application Layer. The relevance of the above controls signals to techniques discussed in the sequel are summarized in Table 3.
2.2.2 SplitNet Approach
An extreme form of layer-coupling design is to integrate Semantic Layer and Physical Layer into a single end-to-end global DNN . The global DNN is split into two parts, namely encoder and decoder and the communication channel is sandwiched between them. This is termed SplitNet and its architecture shown in Fig. (b)b. The encoder model (decoder model) is further divided into two sub-modules, semantic encoder (decoder) and channel encoder (decoder), each of which by itself is a neural network [143, 144]. This facilitates training in practice (see more details in Section 4.2 about split inference). Note that the new channel encoder (decoder) in the SplitNet replaces the source and channel encoders (decoders) in the conventional digital architecture in Fig. (a)a. The new encoder directly transmits analog modulated symbols instead of quantizing them into bits and mapping them to predefined modulation symbols.
SplitNet is closely related to the area of joint source-channel coding. The optimality of source-channel separation was proved by Shannon in the case of a point-to-point link with asymptotically large code blocklength . This simplifies the design of communication system as source encoder/decoder and channel encoder/decoder can be optimized as separate modules. This has become a feature of classic design approaches and led to the establishment of source and channel coding as separate sub-disciplines . However, source-channel separation is sub-optimal in the regime of finite code length . It worth mentioning that the sub-optimality is also shown in the context of SemCom . In practice, given a finite bit-length budget (e.g., short packet transmission), the end-to-end signal distortion, or the reconstruction quality of transmitted information, sees a complex tradeoff between source and channel decoding errors. This has motivated researchers to explore the approach of joint source-channel coding with finite code lengths [147, 148, 149]. The joint design has been shown to be simpler and potentially more effective than its separation counterpart in practical SemCom applications, such as transmission of multimedia content [150, 151, 152, 153, 154, 89], speech , and text [138, 155]. In particular, the notion of deep joint source-channel coding has appeared in [151, 152, 153, 89, 144]
, where both the source encoder (decoder) and channel encoder (decoder) are implemented by DNNs. For example, the image retrieval problem in the context of wireless transmission for remote inference is considered in. In their joint source-channel coding approach, the feature vectors are mapped to the channel symbols and decoded at the receiver, where the source and channel encoders are integrated by a DNN after the feature encoder while the the source and channel decoders are consolidated by a DNN, followed by a fully-connected classifier.
Most recently, SplitNet was also adopted in an end-to-end design of a SemCom system. For example, the SplitNet design presented in  for SemCom system is built on the deep-learning based natural language processing (NLP). The key component of the design uses a Transformer, which is a well-known language model for NLP and has the advantages of both recurrent neural networks (RNNs) and convolutional neural networks (CNNs), to construct the encoder and decoder. The loss function for training the DNN model is characterized by two terms: one is the cross entropy which measures the semantic difference between raw and decoded signals, while the other is the mutual information to maximize the system capacity. The SplitNet design was demonstrated to outperform a traditional communication system in terms of sentence similarity, which is specified in (1), and robustness against channel variation. In addition, there are other relevant works on SemCom using the SplitNet approach, e.g., the distributed SemCom system for IoT  and SemCom system for speech transmission  (see more details in Section IV-C).
2.2.3 Comparison between Two Approaches
The advantages of layer-coupling designs include backward compatibility, simplicity, and flexibility. Since the approach is based on a modified version of the conventional protocol stack, SemCom system designed using the approach allows the use of existing coding and communication techniques if they are suitably modified to allow some control by the Semantic Layer. Furthermore, by modularizing a SemCom system, individual modules are simpler to design compared to the fully integrated SplitNet. Furthermore, given standarized interfaces between modules, their design can be distributed to different parties. Inevitably, the advantages of layer-coupling approach are at the expense of optimized performance and end-to-end efficiency. Hence, in terms of performance, SplitNet is a better choice.
A SplitNet design of SemCom system is dedicated to a particular application, somewhat losing the universality of the layered approach. Given a specific task and a radio-propagation environment, the encoder and decoder parts of the neural network are jointly trained to efficiently compress raw data into transmitted symbols while ensuring their robustness against channel fading and noise. This makes it possible to achieve a higher communication efficiency and better task performance than a layer-coupling design. Nevertheless, SplitNet faces its own limitation in three aspects. First, channel fading and noise result in stochastic perturbation to both forward-propagation and back-propagation of the DNN. This may result in slow model convergence during training. As proposed in , the issue can be alleviated by feedback of channel state information (CSI) to mitigate fading at the cost of additional latency and overhead. Second, the radio-propagation environment varies over time and sites, especially for high-mobility applications. A pre-trained end-to-end SplitNet model tends to be ineffective in a new environment and re-training is needed, which is time consuming and may incur excessive communication overhead. Third, analog channel symbols generated by a neural network can be harder for circuit implementation than conventional modulation constellations due to, for example, a larger dynamic range. Nevertheless, research on the SplitNet approach is still in its nascent stage and continuous research advancements are expected to yield effective solutions for overcoming the above limitations.
3 Human-to-Machine Semantic Communications
Recall that H2M SemCom features the transmission of messages that can be understood not only by humans but also by machines, such that they can have dialogue or the latter can assist or care for the former. The potential applications of H2M SemCom are illustrated in Fig. 4. In this section, we discuss semantic encoding and other H2M SemCom techniques in four representative areas: human-machine symbiosis, recommendation, human sensing and care, and VR/AR.
3.1 SemCom for Human-Machine Symbiosis
Human-machine symbiosis (also known as man-computer symbiosis) refers to scenarios in which humans and machines establish a complementary and cooperative relationship. On one hand, using their complementary strengths, they can cooperate to carry out a task that is originally difficult or even infeasible. Humans can benefit from machines’ assistance to improve their life quality or productivity. On the other hand, humans and machine can teach each other to improve individuals’ capabilities e.g., AI-powered education or imitation learning. In this sub-section, we discuss SemCom in the scenarios of Human-machine symbiosis. A typical system is illustrated in Fig.5 and its main operations are described as follows. First, human activities are sensed and the sending results are semantically encoded at edge devices. Then the encoded data are transmitted to an edge server for decoding and subsequent use to train an AI model. Finally, the trained AI model acquires some domain specific, human-like abilities, which are further used to assist humans. The distinction of the human-machine symbiosis lies in the semantic encoding techniques that maps human sensing data or knowledge into into low-dimensional vectors while capturing their semantic meanings or latent features . For example, the encoded semantic data refers to the embedded knowledge in the case of text or boundaries between objects and the background in the case of image [157, 116, 117, 118]. In the remainder of the sub-section, we introduce two representative semantic encoding techniques widely used in human-machine symbiosis, linear LSA models [158, 59, 159] and BERT [160, 144, 138], and discuss their deployment in SemCom systems. Additional techniques are also briefly described, followed by an overview of state-of-the-art applications.
3.1.1 Semantic Encoding by LSA
As a technique in natural language processing, LSA is used to model and extract semantic information from text documents. LSA has diverse applications, ranging from search engine to translation to the study of human memory. A basic LSA technique follows the following procedure. Consider a set of documents. To begin with, each document is expressed as a column vector where each element is binary indicating whether it includes a specific word or term associated with the element’s location. Then putting the column vectors together makes the document set a so called document-term matrix denoted as . From the matrix, semantic information can be extracted via the following steps. First, the principle column subspace of the document-term matrix, called semantic space, is computed by SVD: , where is the column space of , is the singular value matrix, and is the row space. Given desired dimensions , the semantic space is defined as , which is the -dimensional principle column subspace of . The corresponding -dimensional principle singular-value matrix is denoted as . Second, all document vectors are projected onto the low-dimensional semantic space. Specifically, let represents the -th column of and is thus the -th document. Then the extracted semantic vector, denoted as , is obtained as . Last, using the reduced-dimension semantic vectors, the similarity level between any two high-dimensional documents, say -th and -th documents, is measured by efficiently computing the following function:
In the context of SeCom between a human and a machine, the function of LSA is to extract semantic information from human speech or messages and in that way aid the machine’s interpretation. Its deployment in a SemCom system essentially involves the design of a LSA-based semantic encoder and substituting the result into either the layer-coupling architecture (see Fig. 2) or the SplitNet architecture (see Fig. (b)b). As an example, the design proposed in  is based on SplitNet and features integrated semantic/channel coding/decoding implemented by training a split DNN model with the transmitter half performing LSA. On the other hand, for a design based on the layer-coupling architecture, LSA resides in the semantic layer to map each human message into the semantic space. The distilled semantic information in the dimension associated with a larger singular value is more important. This suggests the use of the singular values as DII indicators. Then the LSA-encoded information, together with the DII indicators, are passed to the lower layers for transmission. On the other hand, the CRI feedback in the upward direction enables the semantic layer to adapt the dimensionality of the semantic space to the channel state. Consequently, when the channel supports a high rate, the semantic space can be expanded to yield a better representation of the human message and thus a more accurate understanding by the machine at the receiving end.
3.1.2 Semantic Encoding by BERT
BERT is a well known language processing approach based on a popular model called transformer, which is the first transduction model relying entirely on self-attention to compute representations of its input instead of using sequence-aligned RNNs or convolution to generate its output 
. A transformer comprises an encoding component and a decoding component. Each includes several sequentially connected encoders or decoders. An encoder cascades one self-attention layer with a feed forward neural network. The former performs feature extraction to find the relation of words in the input sentence; the latter is trained with a suitable objective, such as language translation. A decoder has a similar structure as the encoder except for having an extra encoder-decoder attention layer inserted between the self-attention layer and the feed forward neural network. The additional layer helps the decoder focus on a specific position in the input sequence to handle issues, such as the case of one word having multiple meanings.
Building on the transformer architecture, the key feature of BERT is a new training strategy, termed masked language model (MLM), that randomly masks some words in a sentence to generate training samples. The training objective is to learn the masked words in the sentence. This training strategy and objective make it possible to generate an enormous number of unlabelled text data samples for training. Another key feature of BERT is that a text sentence is input into the transformer as a whole rather word-by-word following the natural left-to-right uni-direction. The features endow on the trained transformer the ability of predicting the missing words based on their context, giving the technique the name of bidirectional representations. With this ability, BERT outperforms the uni-directional approaches to become the state-of-the-art strategy for natural language processing. The combination with other techniques (e.g., classification) broadens the applications of BERT, e.g., Q&A, and information retrieval.
The procedure of deploying BERT in a SemCom system to support human-machine symbiosis is similar to that for LSA. In other words, BERT can be simply used as the semantic encoder. For the SplitNet design, BERT is used as a part of the split DNN. For the layer-coupling approach, BERT is used for extracting the important information from the sensed human activities with DII indicators showing their importance levels. The exchange of data and control signals between the semantic and lower layers are similar to those for LSA.
3.1.3 Additional Semantic Encoding Techniques
Other techniques that can play an important role in semantic encoding for human-machine symbiosis are the CNN based approaches of object recognition [163, 164, 165]. Relevant techniques can efficiently compress human sensing data (e.g., facial expressions and behaviours) for efficient transmission to machines for subsequent recognition. A typical object recognition technique detects the boundary between the targeted objects and the background based on contextual features. A representative design of CNN-based auto-encoder (AE) for segmentation is proposed in , termed SegNet, which comprises an encoding component, a decoding component, and a classifier. The encoding component contains several encoders, each of which is paired with a decoder in the decoding component. Each encoder is a CNN, modified from the well known VGG-16 network . Each decoder up-samples its input using the transferred pool indices from its corresponding encoder to produce sparse feature maps. Finally, the output feature maps of the last decoder are fed to a softmax classifier for pixel-wise classification. This generates the segmentation results.
3.1.4 Choice of the Connectivity Type
In 5G systems, there exist three generic connectivity types: enhanced Mobile Broadband (eMBB), Ultra-Reliable Low-Latency Communication (URLLC), and massive Machine Type Communication (mMTC). They are defined to support a wide range of services with heterogeneous quality-of-service (QoS) requirements. The choice of connectivity type for human-machine symbiosis is application dependent. Many relevant applications do not require high transmission rates, ultra reliability, or massive connections. Examples include AI-assisted learning, coding, and debugging. For such applications, normal radio access suffices. On the other hand, there exist a class of symbiosis applications that involve tactile interaction, thereby requiring low latency and reliable transmission. For instance, AI-assisted driving and remote surgery require the response latency between robots and humans (i.e., drivers or surgeons) to be less than ms and ms, respectively . For this class of applications, the provisioning of URLLC connectivity is crucial.
3.1.5 State-of-the-Art Applications
Applications related to human-machine symbiosis can be separated into three main classes. The first class of applications is AI-assisted systems. AI technologies provide ways for machines to acquire human-like skills and abilities by learning from the experiences of human experts (e.g., doctors and drivers), which, in turn, make the machines useful assistants for humans. Using LSA, AI-powered chatbots have started to replace humans in FAQs/customer services in places such as universities  and over social media . Machines can also play an important role in AI assisted healthcare via utilizing machine learning algorithms to extract key information from patients’ records to help doctors with diagnosis and prediction of the risks of diseases . Machine assistance have also been applied to other professional areas, such as video game debugging , automatic programming , driving assistance , and second language learning and teaching .
The second class of applications is interactive machine learning that includes humans in the loop to leverage the generalized problem-solving abilities of human minds [26, 27, 28, 29, 30]. This is particularly useful in cases lacking training samples for rare events that are needed for automatic machine learning to work. Moreover, the joint force of machines and humans can combine their complementary strengths to tackle grand challenges such as protein folding and -anonymization of health data . In such collaboration, human experts uses their experiences to guide machines to reduce the search space.
The third class of applications is worker-AI collaboration where both human and machine workers cooperate as peers to finish real-time tasks, e.g., moderating content, data deduplication [31, 32]). In particular, relying on tactile communication, robots can imitate the actions of remote surgeons in minimally invasive surgery [33, 34]. Such cooperative surgeries can benefit from the machine involvement to improve the accuracy and dexterity of a surgeon and minimize traumas induced on patients. One important design issue for worker-AI collaboration is to prevent machines from telling lies or making mistakes .
3.2 SemCom for Recommendation
A recommendation system predicts user preferences in terms of ratings of a set of items such as songs, movies, and products. Recommendation has become a popular tool for making machines intelligent assistants and improve user experience. Examples include playlist generation for multimedia streaming services, product recommendation for online shopping, marketing on social media, Internet search, and online dating [36, 38, 39, 61]. A SemCom system aims at supporting recommendation in a wireless network, as shown in Fig. 4. In the system, an edge device semantically encodes and transmits the user’s personal data to a edge server for generating recommendations for the user. The purpose of semantic encoding is to infer user ratings from the user data. Given a rating database of a large number of users, the server generates recommendations for a target user using a filtering technique. Among the most popular one is collaborative filtering discussed in the sequel. We will also introduce other techniques including content-based, collaborative, and hybrid filtering. The choice of connectivity type for SemCom systems to support recommendation will be also discussed, followed by an overview of the state-of-the-art applications of recommendation systems.
3.2.1 Collaborative Filtering
Collaborative filtering finds users with similar preferences using their historical ratings . Then the purpose of semantic encoding is to distill the rating information from the historical data recording user’s daily activities such as shopping, entertainment, reading, and multimedia streaming. This removes redundant information to compress the data for efficient transmission. The design of semantic encoder can be based on LSA described in Section III-A by modifying the document-item matrix to count the user’s access/purchase frequencies of different items. In the case where such explicit information is unavailable, an AI model can be trained to infer the users’ preferences from sensing data recording his/her behaviour and emotions in either the physical world or on social-media platforms .
Next, after receiving rating data from multiple users, the server compiles them into a user-item matrix. Each column storing the ratings by one specific user is called an item vector. Let denotes the -th item vector and its -th element representing the rating of -th item in a set of interest. Moreover, denotes the average rating of all items of user . Using the Pearson correlation as a metric, the similarity in preference between user and can be computed as
where represents the set of items rated by both user and , The pairwise similarity measures allow the server to recommend items preferred by some users to others sharing similar interests.
However, as social-media applications are fast growing in number and type, the user-item matrices become increasingly sparse. The insufficient rating data causes difficulty in clustering of similar users. Researchers have developed solutions for this problem by applying techniques from data mining and machine learning including SVD , non-negative matrix factorization , clustering , and probability matrix factorization ).
As in the scenario of human-machine symbiosis, SemCom systems for recommendation can be based on either the layer-coupling or SplitNet architectures. When explicit rating information is available at a device, its uploading is infrequent and may even require only a single upload, as user preferences usually do not change rapidly over time. On the other hand, when such explicit information is unavailable, a large amount of user sensing data may need to be transmitted from the device to the server for preference inference. One way to address this issue is by designing semantic encoders that can locate a low-dimensional item-rating subspace without compromising the recommendation accuracy. The other way is utilizing high-rate access (eMBB) whenever it is available or by deploying a targeted large-rate technology, such as mmWave.
3.2.2 Other Filtering Techniques
Other available filtering techniques for recommendation include content-based filtering, demographic filtering, and hybrid filtering . The content-based approach utilizes the users’ historical data for recommendation. Specifically, the recordings of, e.g., habits or interests, are useful for creating a user profile characterized by a set of features. Then, an item aligned with the features of a profile is likely to interest the associated user and thus can be recommended. Next, the demographic filtering approach classify users according their demographic information such as nationality, age, and gender. Items preferred by one user are recommended to the users in the same demographic class. Last, the hybrid filtering approach combines several aforementioned approaches and has been showcased to boost the recommendation accuracy.
3.2.3 State-of-the-Art Applications
Recommendation systems are deployed in many areas. The most popular venue is social networks where recommendation is applied to emotional health monitoring by detecting abnormality , partner recommendation in online dating , and emoji usage suggestions . Other applications include travel recommendation systems for mobile tourist , remote healthcare (e.g., cloud-assisted drug prescription  and cloud-based mobile health information in ), TV channel recommendation , video recommendation , and music recommendation . Traditionally, to offload the high computation load, recommendation systems are hosted in the cloud server with unlimited computation resources [42, 43, 176]. Nevertheless, the traditional approach can lead to excessive communication latency and overhead as the personal data to upload is known to grow at an exponential rate. Recent years see the increasing popularity of the split-computing approach that spreads a recommendation system across the cloud and the network edge leveraging the edge computing platform . In addition, researchers have proposed unmanned aerial vehicle (UAV) assisted recommendation systems for location based social networks  as well as distributed recommendation systems featuring data privacy [62, 63].
3.3 SemCom for Human Sensing and Care
Human sensing-and-care refers to real-time tracking and monitoring of humans’ health conditions and movements by machines, such as the machines can offer a proper care to the humans. The human monitoring relies on sensors (e.g., temperature and positioning) on or around humans. The sensing data are then transmitted to a server for analysis and decision making. To facilitate the discussion, let us consider the concrete example of biomedical sensors, which are wearable or implantable and perform transduction of biomedical signals (e.g., electrocardiogram (ECG) signals) into electric signals. In this context, the purpose of designing a SemCom system is extract useful features from biomedical sensing data and transmit them to a server for diagnostics or medical image analysis. An example of a feature is the “main-spike” interval of ECG signals, termed QRS interval. In the sequel, we discuss the techniques for biomedical semantic encoding and the associated SemCom system design.
3.3.1 Biomedical Semantic Encoding
Typical biomedical signals include ECG signals for detecting heart activity and electromyography
(EMG) signals for detecting e.g. skeletal activity. Such signals are characterized by a certain level of periodicity and predictability, making it possible to estimate the signal statistics within a short time frame. In the current context, semantic coding is particularized to techniques for estimating the statistics that contains useful information the biological activities of a human. Relevant techniques are based on either the time-domain or frequency-domain approaches. As an example, consider the R-peak detection of a ECG signal, where R refers a point corresponding to a peak of the ECG wave. The detection of R-peak helps the heart-rate characterization. More elaborate analysis of a ECG wave decomposes a main spike into three successive upward/downward deflections, termed Q wave, R wave, and S wave. A time-domain method for their detection mainly use the shape characteristics, such as finding the largest first-order and second-order derivatives. On the other hand, a frequency domain method first transforms the signal into the frequency domain using, e.g., wavelet transformation, and then apply filtering with a suitable passband to extracting the desired information.
In the SemCom system for human sensing-and-care, the biomedical signals that are semantically encoded and transmitted by an small-size edge device. Upon detecting abnormality, the transmissions from this device should be real-time and very reliable, to call for an urgent medical care . In view of these requirements, it is preferable to design the SemCom system on the layer-coupling architecture rather than a DNN model. This is because no complex data should be processed and the complex DNN model may be an overkill, while its long computation time unacceptable. The semantic encoding and transmission can be controlled using DTI generated as follows. The duration, amplitude, morphology, frequencies of Q/R/S waves are all useful for heart related diagnostics ranging from detecting conduction abnormalities to diagnosing ventricular hypertrophy. But they are of different importance levels that are also disease-dependent. In other words, the features of biomedical signals can be assigned different importance levels, resulting in the DII. At the lower layers, the adopted radio-access technologies depend on applications. For indoor applications, sensors are usually linked to a local hub (e.g., a smartphone) using short-range and low-latency technologies such as Zigbee, Bluetooth, and WiFi. For the hubs to access the cloud or for outdoor applications, cellular communication is the preferred choice. In regions with no or poor cellular coverage, satellite communications can be used instead while GPS helps human positioning and tracking. A large-scale network that connects a massive number of sensors can rely on the mMTC service supported within the 5G architecture.
3.3.2 State-of-the-Art Applications
The common application of human sensing-and-care is elderly monitoring [44, 45]. In , a wireless sensor network is deployed to monitor the well-being conditions of the elderly. Specifically, multiple types of sensors are used to monitor their activities such as cooking, dining, and sleeping. A similar system targeting the elderly with dementia is reported in . Another type of application is a super soldier system , which monitors and analyzes the health status and fatigue levels of soldiers by sensing their temperatures, gestures, blood glucose levels, and ECG. The third type of application is the set of general human activity recognition systems [47, 48]. In , head-mounted smartphones are designed to have situation awareness, e.g., awareness of user behaviors and environmental conditions. To this end, the data collected from smartphone sensors (e.g., accelerometers, gyroscopes, and cameras) are transmitted to a server for feature extraction and situation inference. In another design presented in , a wearable magnetic induction device is used for sensing and wirelessly transmitting the magnetic induction signals to a server for activity detection using a RNN based algorithm. Other applications of human sensing-and-care include remote healthcare systems [49, 50, 51] and smart-home monitoring systems ).
3.4 SemCom for VR/AR
VR and AR are two H2M technologies. VR essentially involves the use of mobile devices (e.g., smartphones, glasses, or headsets) to create new human experiences by replacing the physical world with a virtual one. On the other hand, AR devices alter humans experiences by augmenting real objects with computer-generated perceptual information across different senses (e.g., vision, hearing, haptics, hearing, pressure, and smell). VR/AR provide a way of seamlessly merging the physical and virtual words. The resulting immersive human experience can give a rise to a plethora of future services, such as entertainment, virtual meetings or remote education. Offloading computation and caching to edge servers makes it possible to implement latency sensitive VR/AR applications on resource-limited devices. VR/AR data processing and SemCom between devices and servers are discussed in the following sub-section followed by a summary of state-of-the-art SemCom for AR/VR.
3.4.1 VR/AR Semantic Encoding and Transmission
The procedures of semantic encoding and transmission in AR and VR systems are illustrated in Fig. 7. Consider AR semantic encoding whose purpose is to recognize and track physical objects of interest to the user and then project icons, characters and information onto them. Its implementation requires cooperation between a device and an mobile edge computing (MEC) server. First, raw video data recorded locally using on-device cameras are uploaded to the server for processing. In the MEC server, three algorithms, namely mapper, tracker, and object recognizer, are executed . The function of the tracker is to detect the object’s position based on input raw data and to proactively adjust a rendering focal area. Based on the tracking results, the mapper is to distill features (e.g., virtual coordinates) of objects embedded in the raw data using image processing techniques. In parallel, the object recognizer leverages both the object features and video streaming to produce desired rendering data (e.g., cartoon icons and explanatory text) according to the application requirements. Such data are downloaded onto the device where they are superimposed onto the actual scenes by a local renderer and the edited VR videos are displayed to the human user. Next, the function of VR semantic coding is to select only part of the video depicting the virtual world to download and display to the user such that the heavy burden of downlink transmission is alleviated . To this end, the user’s kinesthetic information (e.g. location, angle of view, and head movements) is collected over multiple on-device sensors and efficiently transmitted to the server. The information is processed by a tracker and a mapper operating at the server to detect the field-of-view (FoV) and select the corresponding video output by extraction from cached 360 video streaming such that it best fits the user’s movements. Then video output is downloaded onto a pair of VR glasses or a VR headset for constructing the virtual world. Such semantic encoding dramatically reduces the required downlink data rate as opposed to the full 360 video streaming. It also makes it possible to meet the stringent latency requirement for immersive user experience. The communication efficiency can further improved by deploying advanced semantic encoding techniques such as video segmentation and compression by head-movement prediction and eye-gaze tracking.
The connectivity requirements for VR/AR SemCom are discussed as follows. In general, VR/AR systems need to collect and process real-time multimedia data from the physical world and generate/transmit high-resolution visual and auditory data. Therefore, the required connectivity is characterized by high rate and low latency [53, 54, 55]. As an example, a human FoV covers a horizontal and a vertical ranges of 150 and 120, respectively. The simulation of a realistic FoV generally requires frames per second with each frame consisting of million pixels (60 pixels/degree). Given standard video quantization ( bit/pixel) and H. encoding (with compression rate), the required transmission rate is at least Gb/s [53, 6]. On the other hand, real-time interaction needed for immersive human experiences requires motion-to-photon latency to be lower than ms . Such requirements places VR/AR connectivity at the intersection between eMBB and URLLC. A solution that addresses these issues is the MEC platform in 5G that offloads computation intensive tasks (e.g. tracking, mapping, and recognition) and caching storage-demanding multimedia content at edge servers in the proximity of users as shown in Fig. 7. This reduces the burden of devices to be merely responsible for data collection and displaying videos.
Building the MEC platform, a VR/AR system can be designed based on either the layer-coupling or the SplitNet architecture. Consider the former. Different types of human kinesthetic information are of heterogeneous importance for a specific application and can be thus assigned different DIIs to facilitate importance aware adaptive transmission. On the other hand, for collaborative VR/AR involving multiple devices and servers, the SemCom system design can benefit from exploiting the PAI of AI models (e.g. classification) and other data processing algorithms (e.g. compression and filtering), the DTI and DII of raw data (e.g. voice and images) to optimize the operations of data aggregation and rendering data feedback for boosting the communication efficiency. On the other hand, the scene-data collection, local rendering, and global data processing at servers can be integrated in an end-to-end design using the SplitNet approach. Then the trained neural-network is split for partial implementation at a device and server according to the application requirements and device’s resource constraints.
3.4.2 State-of-the-Art SemCom for VR/AR
There exists a wide range of VR/AR applications with a vast literature (see e.g., the surveys in [57, 58] and references therein). However, the area of SemCom for VR/RA is relatively new and still largely uncharted. Some recent advancements are highlighted as follows. The challenges and enablers for URLLC communications to implement VR/AR are discussed in . Furthermore, a case study of deploying VR in wireless networks is also presented, which integrates millimeter-wave communication, edge computing, and proactive caching. Another design of wireless VR network is proposed in . It is proposed that small base stations are used to first collect and track information on a VR user and then send to the user device the generated 3-D images. The resource management issue targeting such a system is also investigated that accounts for VR metrics such as tracking accuracy, processing delay, and transmission delay. In addition, a new type of VR/AR system enhanced by skin-integrated haptic sensing is proposed in . Such a special wireless sensor can be softly laminated onto the curved skin surfaces to wirelessly transmit haptic information conveying the spatio-temporal patterns of localized mechanical vibration.
4 Machine-to-machine Semantic Communication
Recall the objective of M2M SemCom is to efficiently connect multiple machines and enable them to effectively execute a specific task in a wireless network. It usually targets IoT applications as illustrated in Fig. 8. The typical tasks in M2M SemCom span the areas of sensing, data analytics, learning, reasoning, decision making, and actuation . In this section, we discuss effectiveness encoding and transmission techniques in four representative types of application, namely distributed learning, split inference, distributed consensus, and machine-vision cameras.
4.1 Distributed Learning
The main theme of distributed machine learning is to train an AI model using distributed data at many mobile devices as well as their computation resources.
FL mentioned in Section 2.1.2 stands out as arguably the most popular distributed-learning framework [180, 181]. Its popularity is mainly due to its feature of protecting the ownership of mobile data by avoiding their direct uploading to a server. Instead, based on the classic stochastic gradient descent (SGD) algorithm, FL requires each device to compute a local model updated using local data or a stochastic gradient representing the update, as illustrated in Fig. 9. Then the local model updates are transmitted to the server for aggregation before updating the global model. The aggregation operation suppresses the noise in local updates arising from the limited size of local data. As a result, the noise diminishes as the number of devices grows. Subsequently, the server broadcasts the updated global model to all devices to repeat the above process and the iteration continues until the global model converges. While SplitNet targets inference using a trained model, the layer-coupling approach is a more suitable approach for designing a FL system. In a FL system, the uploading of high-dimensional model updates by many devices poses a communication bottleneck. For instance, the popular ResNet-50 model comprises million parameters or equivalently million bits in the “float64” format. Relevant SemCom techniques for tackling this bottleneck including effectiveness encoding, modulation, multi-access, and radio-resource management (RRM) are discussed separately in the following sub-sections.
4.1.1 Effectiveness Encoding
Consider a FL system with one server and devices, called workers, and an arbitrary communication round (i.e., iteration) in the FL algorithm, say the -th round, that comprises several sequential phases: model broadcast, local effectiveness encoding, model-update uploading, and global-model updating. The focus of this subsection is the effectiveness encoding at a device. Its goal is to convert local training data into a local model by updating the broadcast global model, or a local (stochastic) gradient representing the update. At the beginning of the -th round, the server broadcasts the global-model parameters to all workers. Its local gradient as computed at worker- and the worker’s local dataset are denoted as and , respectively, where is the local-dataset size, the -th sample, and its associated label. The effectiveness encoding involves multi-step (say -step) local gradient descent. To this end, let be partitioned into mini-batches with the -th mini-batch denoted as . The local gradient in step is computed as
where denotes the loss function pre-defined for the learning task. The above computation can be implemented using the well-known back-propagation algorithm. Essentially, implementing the differential operator in (5) involves computing the gradient w.r.t. the model parameters from the last layer to the first layer backwardly (see details in ). Using the local gradient, the step- gradient descent refers to updating the local-model parameters as
where is the step size and . After the last mini-batch is processed, worker- obtains the local model or the corresponding local gradient . This completes the effectiveness-encoding process. The uploading of the local model or local gradient ends the current communication round.
The effectiveness encoding can include the additional operation of local model/gradient compression described as follows. Consider the case of gradient uploading. A gradient tends to be sparse in the sense that a large number of its elements are much smaller in magnitude than others. A simple method of gradient compression is to keep a fixed number of elements with the largest magnitudes and set the remaining ones to zeros, thereby substantially reducing the communication overhead [64, 65]
. Consider the case of gradient uploading. Local models also exhibit sparsity. Parameter (or neurons) pruning can be performed progressively during the process of training using a suitable metric, for example, variance or magnitude. A much simpler method is called Dropout that randomly samples parameters for deletion . Besides reducing communication overhead, the above model pruning is also effective in avoiding model over-fitting.
4.1.2 Effectiveness Modulation and Multi-access
This section aims at overcoming the communication bottleneck in a FL system form the perspective of effectiveness modulation and multi-access.
Linear analog modulation (LNA) supports fast transmission by avoiding the computation-intensive processes of digital modulation, channel encoding and decoding . Though the lack of protection by coding limits its application to reliable communication, recently, LNA is gaining popularity in SemCom especially in fast multimedia transmission  and machine learning [185, 69, 143] as human quality-of-experience, machine inference and learning are robust against noise if it is properly controlled by, for example, power control and scheduling. In the context of learning, it is even possible to exploit channel noise to accelerate the learning process by escaping from saddle and local-optimal points .
LNA is known to be optimal for the task of distributed sensing in a sensor network as illustrated in Fig. 10. The task is to compute an aggregation function (e.g., averaging) of distributed sensor observations so as to suppress the observation noise. To efficiently carry out the task, a technique called over-the-air computing (AirComp) based on LNA exploits the waveform superposition property of a wireless channel to perform over-the-air aggregation of simultaneously transmitted sensing data using LNA . Let denote a noisy observation at sensor of a common source : where represents the sensing noise (see Fig. 10). All observations are transmitted at the same time over Gaussian channels to a server (fusion center) using uncoded LNA. This results in the following received signal:
where results from modulating and is the Gaussian channel noise. The modulated symbol is the scaled version of under the power constraint . The server produces an estimate of the source , denoted as , that minimizes the distortion , where represents the symbol index. In the presence of channel noise, the server receives the desired average of distributed observations. It is proved in  that in the case of Gaussian sources and noise, AirComp achieves the optimal rate-distortion tradeoff for a large number of sensors, making AirComp an effective multi-access technique for the task. On the other hand, it is sub-optimal for the current task to rely on classic information theoretic encoding that first quantizes the observations into bits and channel encoding the bits. The main reason is the mismatch of the task with the objective of the classic scheme aiming at reliable decoding of data symbols transmitted by sensors. A more vivid interpretation is that AirComp treats interference as a friend rather than a foe.
Most recently, AirComp discussed above is applied to realize “over-the-air aggregation” for fast FL, termed over-the-air FL [69, 65, 64]. Given its awareness of the FL algorithm (especially the aggregation operation), AirComp enabled by LNA represents a joint effectiveness design of modulation and multi-access targeting FL. Consider the uploading phase of a communication round and the FL implementation based on local-gradient uploading. The discussion can be extended to the other case of local-model uploading straightforwardly. Each worker modulates its local gradient using LNA and transmits the result over the same frequency band simultaneously as other workers. By supporting such simultaneous access, the communication latency is reined in to avoid linear scaling with the number of devices, as in conventional orthogonal-access schemes. For over-the-air aggregation it is required to align the magnitudes of the received signals. To this end, each worker performs channel-inversion power control. By synchronizing workers’ transmission (by, for example, timing advance) and exploiting the wave-form superposition property of a multi-access channel, the server receives the desired average of local gradients. Mathematically, each received symbol is denoted as and given as
where is the number of devices and for worker , represents a transmitted symbol modulating a single gradient coefficient, the channel gain, and channel-inversion power control with being a given constant, and at last the channel noise. Over-the-air FL is designed for a broadband system in  and a multi-antenna system in [71, 72].
Last, designing over-the-air FL can be based on the proposed SemCom architecture with the layer-coupling approach. In this case, the PAI passed from the Semantic Layer to the Physical Layer is the specifics of the aggregation operation (e.g., aggregation weights, selected devices and their uploading frequencies) in the FL algorithm.
4.1.3 Effectiveness Radio Resource Management
In this subsection, we answer the effectiveness problem for FL from the perspective of RRM. To overcome the communication bottleneck, RRM should be guided by the principle of allocating more resources to the transmission of data that has a higher importance for the model training, while preventing the unimportant data from occupying channels. This leads a new class of effectiveness techniques called (data) importance-aware RRM [73, 74, 75, 76].
In a FL system, There exists multiple types of data including training samples, local gradients, or local models. Regarding training samples for a classifier model, their importance is measured by data uncertainty
, a popular concept in the area of active learning. It is defined as the level that how confident an AI model holds for its prediction to a data sample[187, 73]. Consider a neural-network based classifier model. A common metric for measuring the uncertainty of a sample is the entropy of posteriors of labels as computed using the model,
where is the posterior of label- given input and model parameters . On the other hand, the importance of gradients can be measure by gradient divergence  or squared multivariate coefficients of variation (SMCV) . For a local gradient , its gradient divergence is measured by the variance to the global gradient, given by where is the probability that this local gradient is selected and is the vectorizing operator. The SMCV of an aggregated global gradient vector is given by the sum of means of each entry divided by the sum of variances of each entry with randomness due to channel noise. In addition, the importance of a local model can be measured by its variance to the current global model (see, e.g.,  for an overview).
These schemes features both channel and importance awareness and aim at striking a balance between scheduling devices with strong channel for the objective of rate maximization and those with important data for the objective of accelerating model convergence. As a result, the schemes favour devices with either very important data, very strong channel, or satisfactory levels in both aspects. It should be emphasized that in the context of FL, the two objectives mentioned earlier are not entirely in conflict from the perspective of latency minimization. The former reduces latency per round but the latter reduces the required number of rounds for model convergence. To minimize the total latency (in second), the above tradeoff should be optimized. A common design approach is to derive a DII for implementation using the layer-coupling approach, which accounts for both the data importance and channel state. Then the criterion for importance-aware scheduling is simply to maximize the DII. Consider an edge learning system (e.g., a closed system without the data-privacy issue) directly uploading data from devices to a server for model training. The DII is a linear combination of a channel quality indicator and maximum sample uncertainty of a local dataset . Next, consider scheduling for a FL system. Probabilistic scheduling is adopted to avoid a bias of the trained model towards a particular local dataset. To be specific, in each round, each device is scheduled with a given probability. The optimal probability of a device is shown to be proportional to the local-gradient variance and a monotone decreasing function of the communication latency .
4.2 Split Inference
While the preceding sub-section focuses on model training, the theme of this sub-section is the other facet of machine learning, namely inference using a trained model. In this area, split inference is an emerging paradigm for 5G-and-beyond to offload a large part of the inference task from a mobile device to an edge server hosting a large-scale model . The remaining task executed on-device is to extract useful features from raw data for transmission to the server. The task splitting gives the name of split inference. This mitigates the impact of the resource limitation on the device and enriches its capacity via access to a server model, much more powerful and complex than the one that can be afforded as an on-device counterpart. For instance, classifiers in the Google Cloud can recognize thousands of object classes and that in Alibaba Cloud hundreds of waste classes for litter classification. In the remainder of the subsection, we discuss effectiveness coding and communication separately for the layer-coupling and SplitNet architectures.
4.2.1 Effectiveness Encoding and Transmission for Layer-Coupling Approach
In the context of split inference, effectiveness encoding refers feature extraction, referring to the process in which a device encodes high-dimensional raw data into reduced-dimension features or feature maps . Features represent information essentially for inference while raw data contains a large amount of redundant information (e.g., background objects and noise known as spatial redundancy in raw images ). Stripping away the redundancy substantially reduces communication overhead without compromising inference performance. Our discussion focuses on feature extraction (i.e., effectiveness encoding) while details on inference using features (i.e., effectiveness decoding) can be found in a typical standard machine-learning book (see e.g., ). A classic, simple technique is principal component analysis (PCA) . PCA uses SVD to identify the most informative low-dimensional linear subspace (feature space) embedded in a large high-dimensional dataset, called principle components. Then projection of a data sample onto the feature space yields its features. Modern feature extraction exploits the powerful representation capability of neural networks and rich training data. Such a feature-extraction model can be implemented using multi-layer perceptrons (MLPs) for a general purpose, CNNs for visual data [80, 81], and RNNs for time-series data  or leverage the emerging graph neural networks to improve inference performance with point cloud and non-Euclidean data .
A SemCom system designed for efficient feature transmission is characterized by its feature-importance awareness. As widely reported in the deep learning literature, features do not contribute evenly to inference performance and thus have heterogeneous importance levels . Available importance measures include divergence for data statistical models (e.g., discriminant gains of specific feature dimensions)  and other classification-loss related metrics for DNN models . Consider an importance-aware SemCom system designed using the layer-coupling approach. The CRI passed to the discussed effectiveness encoder controls the number of features to extract. Given the number, features are selected based on their importance levels (DII) to be transmitted in the radio-access layers . There exist numerous algorithms for feature pruning for neural networks (see e.g., ). Some design supports channel adaptation of encoding under given requirements on latency and inference performance . The DII is also passed to the layers for importance aware quantization (e.g., more important features have higher resolutions) and RRM (e.g., more bandwidth/time-slots for more important features) [80, 87, 88]. Moreover, the DII also determines the transmission sequence (i.e., more important features are transmitted first) so that transmission can be stopped earlier under a inference-uncertainty requirement . On the other hand, PAI providing some information on the effectiveness encoder (e.g., its type or architecture) can be useful for the choice of a matched classifier model at the server and thus passed to the latter.
4.2.2 Effectiveness Encoding and Transmission for SplitNet
Consider the implementation of split inference on the SplitNet architecture in Fig. (b)b with semantic encoder/decoder replaced by their effectiveness counterparts targeting the task of inference. The function of the effectiveness encoder is to extract feature based on designs discussed in the preceding sub-section. On the other hand, the effectiveness decoder is a neural network performing inference. The popular approach of designing the pair of channel encoder/decoder is to use AE . An AE comprises an encoder and an decoder. Generally, the AE’s encoder compresses high-dimensional inputs to reduced-dimension outputs; using them as inputs, the decoder attempts to reconstruct the encoder’s inputs. In a split-inference system, the two AE components interface with a wireless channel (see Fig. (b)b). Then the AE based channel encoder directly maps features to analog modulated channel symbols and the channel decoder decode received symbols into features as input to the subsequent effectiveness decoder to generate inference results . The design of semantic and channel encoders are under two constraints. First, given complex channel symbols, the number of extracted features (real scalars) should be . Second, a normalization layer is required in the channel encoder such that channel symbols can satisfy transmit power constraints. The end-to-end training of the encoders/decoders in SplitNet is difficult to a large number of layers in the combined global model and also channel hostility (i.e., fading and noise) embedded in it. This difficulty is overcome by training the two AE components separately from the semantic encoder/decoder. Specifically, the effectiveness encoder and decoder are pre-trained in advance since they are independent of the channel and remain unchanged even if the channel statistics vary 
. On the other hand, the AE based channel encoder and decoder can be quickly retrained using transfer learning as the radio-propagation environment changes. This provides the components capability to cope with channel noise. As a final step, an end-to-end training of all neural is conducted so that they can be further adjusted to achieve optimal end-to-end inference performance in the presence of channel hostility.
Split inference involves a computation-communication tradeoff, described as follows. The effectiveness encoder and decoder in the SplitNet architecture (see Fig. (b)b) can be generated by splitting a single AI model (i.e., a neural network) into two parts with unequal numbers of layers. Shifting the split point to the left results in simpler on-device effectiveness encoding and and higher complexity for effectiveness decoding at the server, and vice versa. Intuitively, as the device is resource constrained, it is desirable to push the split point as close to the input layer of the AI model as possible. The intuition is correct from the perspective of computational load but overlooks the other perspective of communication overhead. Specifically, in a large class of popular AI models in practice, the size of features output by “shallow” feature-extraction layers is large and can be even much larger than that of raw data at the input, which is known as “data amplification effect” . Consequently, a shallow split point may results in unacceptably large communication overhead and energy consumption, defeating the original purpose of split inference. This motivates researchers to adjust the split point with the aim of optimizing the communication-and-computation tradeoff [91, 92]. Relevant algorithms rely on profiling the operational statistics of individual model layers including feature size, latency, energy consumption, and required memory size. Then the profiles are applied to design algorithms for adapting the split point to the time-varying communication rate under latency requirements and devices’ resource constraints.
Last, the required connectivity type for split inference depends on the specific application. For the family of mission critical applications (e.g., finance, auto-driving, and automated factories), URLLC connectivity is required ). For instance, remote inference for autonomous driving is expected to have ms latency and near reliability in communication [195, 196]. Other applications are not latency sensitive but require the close-to-human machine vision (i.e., recognition of hundreds of object classes for a high-end surveillance camera), eMBB will be needed to transmit high-dimensional features extracted from high-definition images.
4.3 Distributed Consensus
Distributed consensus refers to the process that agents in a distributed network act together to reach an agreement by message exchange. A typical algorithm involves each agent interactively updates its own state based on received information on peers’ states . When there are many agents, the convergence could be slow and as a result the iterative process could incur excessive communication overhead, for example, in the specific scenarios of vehicle platooning  and blockchains . To address this issue, the criterion for designing SemCom for efficient distributed consensus is to reduce the overhead without significantly decreasing the convergence speed. The key component is the design of effectiveness encoding that is aware of the algorithm and its objective and based on the knowledge, extracts and transmits semantic information from an agent’s state to others. In the remainder of this subsection, we introduce two representative scenarios of distributed consensus, namely vehicles platooning and blockchains, and discuss matching effectiveness coding techniques.
4.3.1 Vehicle Platooning
Vehicle platooning is a high-way automatic transportation method for driving a cluster of connected vehicles in a formation (e.g., a line) to achieve higher road capacity and greater fuel economy . This requires the member vehicles to brake and accelerate together based on the system state, which represents the consensus. Maintaining the system state requires vehicles to continuously share and update their local states e.g., vehicle parameters (e.g., positions, accelerations and velocities), sensing data (e.g., traffic lights, pedestrians, obstacles, road conditions, and LIDAR imaged point cloud), and even individual auto-pilot models. Transmitting all the raw state data is impractical. For instance, an typical autonomous vehicle collects from its sensors up to several gigabytes of data per seconds. Thus, it is essential to design effectiveness encoding to extract from the raw state data the information essential for convergence to a consensus. To better understand the principle of a vehicle-platooning algorithm, consider a simple scenario of driving a platoon along a straight high-way in a line formation. In this case, effectiveness encoder of each vehicle, say vehicle , outputs its distance to its predecessor, , and its own speed, , which defines the local state, while its control variable is its acceleration, . The local states are assumed to be exchanged continuously between vehicles over wireless links. Let denote the cost at time for front situation (i.e., the relationship between vehicle- and its predecessor, vehicle-). Typically, accounts for all or some of the following aspects, namely safety cost, efficiency cost and comfort cost, each of which can be defined as function of the states of vehicle and those of its neighbours (see examples in [93, 94]). Hence represents the behind situation of vehicle . Then the local control problem at the vehicle over a duration and with the objective of behind-and-front cost minimization can be formulated as 
where denotes the current time instance. Iteratively solving the problem, applying the computed acceleration, and broadcasting the local state by all vehicles will eventually reach their consensus on the platoon’s optimal speed and inter-vehicle separations gaps. There exists a tradeoff between: 1) the complexity of effectiveness encoding and the amount of its output information (that determines communication overhead), and 2) the sophistication of the platooning algorithm. For example, a vehicle’s predicted trajectories can be shared with others in the platoon, requiring effectiveness encoders to compress the trajectories. Most recently, deep learning have been adopted to empower platooning. Essentially, CNN-based effectiveness encoders are designed to intelligently extract information from real-time videos captured by on-board cameras, such as traffic lights, lanes and obstacles . Exchanging such sensing data and use them for consensus on complex manoeuvres give the platoon collective intelligence for auto-driving.
Other SemCom techniques have been extensively studied in the literature. First, URLLC connectivity is required in this mission-critical application to avoid collisions . In terms of latency for vehicle platooning, it should be measured and minimized in terms of information latency rather than the conventional over-the-air latency as the former directly relates to coordinated control performance . To overcome the limit of radio resources, its effectiveness allocation for vehicular platooning should be importance aware by identifying critical and less critical information in vehicles’ state data, allowing them to be compressed accordingly to the specific driving algorithm . On the other hand, it is proposed in  that effectiveness RRM and multi-access should also have situation awareness and be optimized for a specific vehicular network topology represented using a graph. Based on the principles, SemCom techniques are designed based multi-agent RL to integrate the operations in the semantic layer and physical layer (e.g., transmission stopping), thereby reducing the intensity of communication.
A blockchain is a growing chain of blocks, each of which contains a time stamp (when the block was published), transaction data of the blockchain, and cryptographic information of the previous block. In this way, the chain is robust against any alteration of the transaction data by an individual block as it requires changes on subsequent blocks too. As they can implement public distributed ledgers, blockchains find a broad range of applications ranging from crytocurrencies to gaming to financial services . In a distributed network containing a blockchain, the devices are nodes within that blockchain. Nodes can propose changes to the blockchain by submitting transactions via broadcasting to all other nodes. One distinctive feature of the blockchain protocols is that a transaction should be broadcast to all member nodes in order to reach consensus. For this reason, transaction approval relies on frequent communication to exchange information and reach a consensus across a large number of nodes. This motivates the design of effectiveness encoding for blockchains.
Let us take the example of application of blockchains to building construction . A large-scale project involves a large team and many contractors/sub-contractors that perform distributed field works. A blockchain can be used as as a secure distributed ledger to facilitate cooperation and ensure construction quality. Based on this platform, the physical and functional features of building components are stored in blocks and validated by all parties. During the construction, a change made on a particular component (e.g., a new design or construction progress) by a stakeholder will trigger updating of all nodes in the blockchain. For this to happen, the change in question has to be submitted as a updating proposal (i.e., a transaction) and approved by other nodes upon validation before it is made on the blockchain. The detailed digital format of a transaction depends on on the choice of data model structure for that blockchain, such as the Industry Foundation Classes schema for civil engineering, and the transaction algorithm. One design of effectiveness encoding for communicating transactions is based on transmitting differential states, called semantic difference transaction (SDT) in . Specifically, the SDT-based encoder compares the objects in the new schema with the validated ones recorded in the blockchain, aiming to identify the objects that require updating. Then the encoder generates the required changes of only the identified objects, which is broadcast to all other nodes for validation. Compared the case in which the whole schema is broadcast, SDT can substantially reduce the communication overhead, especially given that the changes are usually minor.
Moreover, there exists fault-tolerant consensus protocols for further reduction of the communication overhead via effectiveness-based resource allocation. In the notion of practical Byzantine fault tolerance (PBFT) consensus, an effectiveness-based resource allocation mechanism groups nodes into layers and only executes inner-layer communication of transactions for convergence of consensus with a given security threshold .
4.4 Machine-vision Cameras
Machine vision cameras, which are connected by IoT and relies on servers for sensing data analysis, are capable of identifying interested labels in recorded images and videos such as time, location and objects . They are commonly used as standalone cameras or as a surveillance network deployed in homes, factories, and cities to detect human gestures and activities , for security management , or identify defective products on a production line . At a larger scale, machine-vision cameras are merged into aerial and space sensing networks to form a universal network . The communication bottleneck of a machine-vision camera network arises from large-size raw data generated by each camera and the enormous camera population (e.g., millions of connected surveillance cameras in a metropolitan city) . A single frame in P videos consists of two million pixels while there can be up to frames per second, generating data at a rate of Mbps . In the sequel, we discuss effectiveness encoding and RRM for efficient SemCom in such a network.
One key feature of effectiveness encoding is to detect regions of interests (ROIs) in a set of visual data that contain interesting labels and thereby facilitating trimming of videos or images for efficient streaming to edge or cloud servers for analysis . A CNN model is commonly deployed as an effectiveness encoder to detect ROIs. Specifically, consider a set of frames (or images), denoted as , each of which, say frame , comprises regions, denoted as . A lightweight on-camera CNN detector scans each region of every frame to search for interesting objects. For frame , the indices of spatial ROI will be grouped into the index set defined as . Then the number of spatial ROI is . In the temporal dimension, frames comprising interesting objects are then included into the index set defined as . Consider security management as an example . A region of a frame containing dangerous objects such as knives and guns will be tagged as a spatial ROI. The temporal ROIs sets are then encoded and transmitted to servers for further analysis while those frames not in can be coarsely compressed or even discarded.
While conventional RRM schemes deliver video bits indiscriminately, effectiveness designs targeting machine-vision cameras differentiate the importance level of sensing data given their relevance to ROIs, which can be used as DII in the layer-coupling approach. In terms of quantization, more bits can be allocated to high-resolution quntization of the pixel regions in and fewer bits to quantizing background pixels . In the presence of multiple cameras, the quality of contents from each camera should be assessed in terms of how critical they are for executing a given task. Cameras capturing critical ROIs should be given a higher priority in RRM . Besides ROI detection, it is possible for cameras with increasing computation capacity to perform part of data analysis and extract features from multimedia sensing data using a DNN model and a knowledge base 
. Last, it should be mentioned that given the above operations, IoT connected computer-vision cameras can be implemented using either the layer-coupling or the SplitNet approaches, discussed earlier.
5 KG based Semantic Communications
A KG is composed of the representations of many entities in a semantic space and the relations among them. KGs have become a powerful tool for interpretation and inference over facts [200, 201]. Many massive KGs have been constructed including Wikidata , Google KG , WorldNet , Cyc , YAGO . They have formed the foundation for Internet and a knowledge base for understanding how the world works. In particular, large-scale KGs are used by search engines such as Google, chatbot services such as Apple’s Siri, and social networks such as Facebook. In this section, we introduce a paradigm of SemCom featuring the use of KGs as a tool to improve communication efficiency and effectiveness. In this context, the key function of a KG is to provide a semantic representation of information such that semantic encoding is not only efficient but also robust against communication errors. For H2H communication in the presence of errors, a KG based decoder can correct the errors by decoding the received erroneous message as a correct one with largest similarity on the graph [116, 117]. For H2M symbiosis, a KG can function as a set of human behavior rules to exclude unreasonable results due to faulty sensing results [207, 208, 209]. Furthermore, for M2M SemCom, KGs are useful in knowledge sharing between different types of machines and thereby serving as machine interfaces in heterogeneous networks. In the remainder of the section, we will provide a preliminary on KG theory and then discuss KG based SemCom techniques, applications, and architectures.
5.1 Preliminary on KG Theory
KG refers to the broad area of graph representation of knowledge without a unified definition [119, 210]. For concrete discussion, we consider the definition introduced in  where nodes are nouns related to real-world objects/names/concepts and edges specify their relations. One example is illustrated in Fig. 11. A fact, a basic element of knowledge, can be represented by a so-called factual triple (head node, relation, tail node) or mathematically , e.g., (Albert Einstein, Graduated From, University of Zurich). A node (e.g., or ) is a vector, say a -dimensional vector, storing relevant information, creating a -dimensional semantic space. For instance, if Einstein is the head node , the tail node can be “Theory of Relativity”, “The Nobel Prize”, “University of Zurich”, “Hans Einstein” (his son), and so on. The relation is either a vector if the mapping is distance based, i.e., , or a matrix (re-denote as ) if the mapping semantic-similarity based, i.e., . The knowledge relevant to an object, such as a human, is potentially infinite. A KG reduces the infinite knowledge to a finite-dimensional semantic space to enable practical knowledge processing and transmission.
One potential issue that can arise from the finite dimensionality of a KG is that a fact involving two nodes, and , is still plausible even if it is not captured by the graph due to a missing edge/relation connecting the nodes. The issue can be addressed by introducing a scoring function measuring plausibility. Two typical functions, namely the distance based and a semantic-similarity based function, are given as follows [211, 210]
where and are two relation matrices (edges) of the KG.
Provisioned with sets of valid and invalid facts, a KG can be constructed using either the rule-based or the data-driven approach. Either approach requires the definition of a suitable loss function. One typical choice is the margin-based function given as 
KGs are useful for training AI models especially those with semantic requirements such as linguistic applications and involving human-machine interaction. The structured knowledge in a KG reduces the search complexity in training and helps improve the accuracy of a trained model. Success has been demonstrated in the areas of question answering [108, 109], virtual assistants [110, 111], dialogue , and recommendation systems [113, 114, 115]. Some KG based techniques and their use in SemCom are discussed in the sequel.
5.2 KG based H2H SemCom
For H2H SemCom, a KG representing knowledge on the background of the parties or the domain of their conversation can be injected into a semantic encoder to boost SemCom efficiency and robustness [116, 117, 118]. As a concrete example, we discuss the use of the design presented in  to encode a simple sentence “Albert Einstein won the Nobel Prize for physics in 1921”. The most important component of the KG assisted encoder is a knowledge encoder. Consider input tokens representing individual words in the input sentence/message. Define an entity embedding as a node of the KG to which some tokens of the input sentence can be mapped. For instance, the words/tokens “Albert” and “Einstein” can be mapped to the node “Albert Einstein” of the KG in Fig. 11 and hence share the same entity embedding. Similarly, the words namely “Nobel”, “Prize”, and “physics” are mapped to corresponding nodes of the KG. The distinctive feature of a knowledge encoder is to fuse the tokens of the original message with their entity embeddings to generate output tokens as well as their embeddings targeting a specific task. The output tokens carry not only the information of the input tokens but also that of others mapping to the same entities. For example, an input token “Albert” would generate the output “Albert” and “Einstein”. The encoder is made of stacked aggregators, each further consisting of two multi-head self-attention modules. Note that each such module is designed to concatenate multiple self-attention modules, each of which relates different positions of the input single sequence to compute a representation of the sequence . The use of a knowledge encoder at a semantic receiver can exploit a KG to correct inaccuracy in the semantic meaning and fill some missing tokens of a received message as caused by channel errors during the transmission.
5.3 KG based H2M SemCom
For H2M SemCom, a KG helps a machine to understand the current context and the semantic information embedded in the received message from human beings and react intelligently [122, 123]. To be more specific, training a model underpinning the machine using structured knowledge gives it the ability to recognize the entities embedded in the received messages and their relations to other entities, which helps the generation of logical reactions. Such an approach has found applications in question answering, dialogue and recommendation systems [115, 112]. In the area of robotics, imitation learning has been designed based on a knowledge-driven approach where the robotic assistants imitate human by inferring semantic meanings in the observed human actions [124, 125]. Furthermore, in human sensing applications, KGs can be used to define a set of human behavior rules.
For concrete discussion, the remainder of the sub-section focuses on the use of KG in utterance generation, a basic topic in conversational AI assistants. The task is to generate relevant utterances (sentences or phrases) from a knowledge base. In this area, a long-short-term memory (LSTM) network is widely used. It refers to a specific RNN with long-term memory for the important and consistent information while short-term memory for the unimportant information. The effectiveness of LSTM networks has been proven in enabling a machine to generate utterances based on the received message, the knowledge base as well as its dialogue history with the human partner [108, 112]
. On the other hand, feeding the DBpedia KG corresponding to the Wikipedia database into a LSTM network makes it capable of interpreting questions and producing reasonable answers. There exist other designs. In, a KG is provided to a CNN to extract semantic features of an input question for subsequent answer searching. On the other hand, a knowledge-driven multi-model dialogue system designed in  is capable of gesture recognition, image/video recognition, and speech recognition, providing multi-model human-like abilities to virtual assistants. Futhermore, KGs can also render the operations of recommendation systems more explainable [113, 114, 115].
5.4 KG and M2M SemCom
KGs are related to M2M SemCom in several ways. First, KGs can provide a platform for implementing large-scale IoT networks such as smart cities, logistics networks, and vehicular networks. Consider a vehicular network as an example. A large-scale dynamic KG can be constructed and periodically updated to represent the states of connected vehicles (e.g., locations, velocities, acceleration, routes, and destinations) and their relationships (e.g., chances of collision and whether they form platoons) . Such a KG paves a foundation for facilitating vehicle-to-vehicle communication (e.g., exchange of state information) to avoid accidents or facilitate platooning as well as a platform for traffic management and operating ride-sharing or car-hailing services. Second, KGs provide a tool for managing SemCom or other types of networks to facilitate resource allocation, work-flow recommendation, and service selection [132, 133, 134]. The machine intelligence needed for efficient network management can be powered by structured knowledge embedded in a network KG. Such a KG can be constructed to contain the network topology, requirements of different applications, expert knowledge from community data, product documents, engineer experience reports, user feedback, etc. Third, an M2M SemCom system can be deployed to support the extension and updating of a KG. In particular, SemCom between a large number of edge devices enables distributed knowledge extraction, storage, and fusion [126, 127]. To this end, each device obtains up-to-date local knowledge via interaction with its environment and uploading the real-time knowledge to servers for fusion and updating the global KG [128, 129, 130, 131]. The efficient knowledge transmission can rely on some efficient SemCom technique discussed in preceding sections (e.g., importance-aware transmission). Last, KGs can provide a tool for enabling inter-operability which is necessary for M2M SemCom in cross-domain applications, where the knowledge and information of devices of heterogeneous types have to be shared or aggregated [135, 136, 137]. One particular architecture for such a purpose is proposed in . It uses a server as a semantic core to exchange the messages sent by heterogeneous devices by serving as both a relay and a semantic encoder that translates a message from one machine language into another.
5.5 KG based SemCom Architectures
First, consider the SplitNet architecture discussed in Section 2.2.2. The use of KGs in training AE and auto-decoders has been demonstrated to improve their capabilities to decode the correct semantic meanings from the received messages despite their distortion by communication channels . A related but different approach is proposed in , where combining source information and its corresponding representation in a KG as inputs to the AE is shown to enhance the SemCom performance. Next, an architecture featuring KG server assisted SemCom is presented in . The server located at the network edge relies on a KG to interpret the semantic meaning of the messages sent by a source device, efficiently encode/translate the messages, and then relay the results to the destination device. As a comprehensive KG can have an enormous size, its storage and inference complexity far exceeds the capacities of devices. Offloading the KG to a server overcomes the limitations of devices to exploit the KG for reducing the SemCom overhead.
6 Towards 6G Semantic Communication
While the 5th generation of mobile networks are being rolled out around the world, the global research on 6G has started with an accelerating pace so that the technologies can be ready for commercialization in 2030 [216, 217, 218]. Compared with preceding generations, 6G will achieve limitless connectivity that will scale up IoT to become Internet-of-Everything (IoE) and revolutionize networks by connecting human beings to intelligent machines in such an synergistic way as to create a cyber-physical world . Realizing the vision will require ubiquitous space-air-ground-sea coverage, very low latency, extremely broad terahertz bands, and AI-native network architecture . Furthermore, it calls for seamless integration of communications, sensing, control, and computing. As a result, SemCom has the potential to play a pivotal role in 6G. The realization of SemCom will power several new types of 6G services, aiming at creating truly immersive experiences for humans, such as extended reality (XR), high-fidelity holographic communications, and all-sense communications . In the sequel, we will discuss the 6G services, their requirements for SemCom, and how they can be met by the development of 6G core technologies.
6G will feature a broad set of exciting new services and applications that extend human senses in a fusion of the virtual and physical worlds. They include ubiquitous wireless intelligence, data teleportation, immersive XR, digital replication, holographic communications, telepresence, wearable networks, and sustainable cities. Several of them that are closely related to SemCom are described as follows, along with the new challenges they pose to SemCom.
Immersive XR: XR is an umbrella term encompassing VR, AR, the mixed reality (MR), and the intersections between them . Boundless XR technologies will be integrated with networking, cloud/edge computing, and AI to offer truly immersive experiences for humans, applicable in a wide range of areas such as industrial production, entertainment, education, and healthcare. Its implementation requires the collection and processing of data reflects or describes human movements and surroundings to generate key features that guide system operations, e.g., shifting rendered targets and displaying particular videos. Smooth human experience relies on intentions and preferences that are being interpreted properly by devices and machines, so that they can produce and display desired contents. The continuous human-machine interaction places XR in the domain of H2M SemCom discussed in Section 3. Relevant designs are similar to SemCom for VR/AR discussed therein but their requirements are more stringent in terms of accuracy and diversity of sensing human characteristics (e.g. head movement, arm swing, gestures, speeches), data rates (e.g., 1Gbps for 16K VR [53, 6]) and latency (e.g., motion-to-photon delay below 15-30 ms ). Moreover, the increased reliance of immersive XR on AI calls for SemCom design that is capable of a more efficient support of training and inference using large-scale AI models (i.e., scaling up SplitNet).
High-fidelity Holographic Communication: Holographic communication involves transmission of 3D holograms of human beings or physical objects. Based on high-resolution rendering, wearable displays, and AI, mobile devices will be able to render 3D holograms to display local presence of remote users or machines, creating a more realistic local presence of a remote human being or physical object . Scenarios such as remote repair, remote surgey, and remote education can all benefit from this new form of communication . This new form of SemCom aims at to enhancing visual perception of users to improve the effectiveness of virtual interaction. This requires high-resolution encoding of haptic information, colors, positions, and tilts of a human/object. Displaying interactive high-fidelity holograms requires extremely high data rate (up to Tbps) and stringent latency constraints (possibly sub-milliseconds) . Such requirements make it crucial to boost the efficiency and speed of semantic/effectiveness encoding and transmission techniques to unprecedented levels. Moreover, since holographic communication can potentially involve both human users and machines, their real-time holographic interaction will require seamless integration of H2M and M2M SemCom techniques.
All-Sense Communication: All of five senses, including sight, hearing, touch, smell, and taste will be included in 6G communications using an ensemble of sensors that are wearable or mounted on each device. Combined with holographic communication, the all-sense information will be efficiently integrated to realize close-to-real feelings of remote environments [219, 216]. Such technologies will facilitate tactile communications and haptic control. In all-sense communication, the diversified types of sensing signals create different new dimensions of information, resulting in exponential growth of the complexity of semantic information representation.
The aforementioned future services presents formidable tasks for developing next-generation SemCom technologies. On the other hand, breakthroughs in the area are made possible by leveraging the revolution of 6G technologies. Some key aspects are described as follows.
Almost Limitless Connectivity: While 5G realizes ubiquitous connectivity, 6G will strive to achieve almost limitless connectivity. Specifically, in the 6G era, we expect to experience enormously high bit rates of up to Tbps, low end-to-end latency of less than 100 microseconds or high reliability with properly relaxed latency (e.g. with ms in new radio vehicle), high spectral efficiency of about bps/Hz, massive connections reaching at least , and ultra-wide and multi-frequency frequency bands of up to THz with air, space, earth, and sea coverage [222, 220]. As a result, all machines and human beings will not only be connected but do so in a profound, instantaneous way as to enable in-depth knowledge sharing and interaction, large-scale collaboration, and extensive mutual care. Naturally, the advanced forms of SemCom techniques discussed in this article (for e.g., human-machine symbiosis and dialogues, human sensing and care, learning, inference, etc.) will benefit from almost-limitless connectivity and at the same time bring to it unprecedented end-to-end performance.
Comprehensive AI: AI has been established as a tool for solving problems originally intractable due to either prohibitive complexity or the lack of models and algorithms. 6G are being designed to be comprehensive AI systems where AI will be extensively used for optimizing the overall system performance and network operations [6, 223]. At the physical layer, AI provides a data-driven approach for optimizing modulation and channel coding. At the system level, AI models can automate the collaboration between devices and base stations. It is even possible to apply large-scale AI to optimize the end-to-end performance of a network by enabling, for example, network self-recovering and self-organization. An AI comprehensive system, which comprises a large number of wirelessly connected nodes/entities, intertwines machine learning, inference, and SemCom. For efficient implementation of such systems, it is essential to have the availability of a rich library of advanced SemCom techniques from which highly efficient effectiveness coding and transmission techniques can be retrieved and used to support any of a wide-range of specific optimization tasks and network/system configurations with heterogeneous models and complexity. On the other hand, leveraging the omnipresence of AI, more complex and intelligent SemCom operations can be realized to improve semantic and effectiveness encoding, thereby deepening the level of H2H and H2M conversations, and narrowing the quality gap between machine and human assistance and care.
Integrated Communication, Sensing, Control, and Computing: The realization the 6G applications (such as immersive XR and mobile holograms mentioned earlier) requires resolving the conflict between the required extensive computation capabilities and their reliance on many specialized low-cost, low-power edge devices. One mainstream approach is to jointly design communication, sensing, control, and computing so as to improve the overall system performance under the devices’ constraints. Another relevant approach is to split computing intensive tasks and offload parts from devices to edge servers, which provide an edge computing platform, for execution (which is aligned with the SplitNet approach discussed in this article). These approaches reflect the main theme of the 6G innovation, namely the tight integration of different aspects of data processing and transportation. The required deep application and semantic awareness by future wireless techniques will likely place SemCom at the central stage of 6G development.
There is no doubt that SemCom will continue its growth, potentially becoming a primary area for technology innovation and breakthroughs in the 6G era. Coupling advanced SemCom and 6G technologies paves the way towards the disappearance of the boundary between the physical and virtual worlds.
-  C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech., vol. 27, no. 3, pp. 379–423, 1948.
-  B. Ahlgren, C. Dannewitz, C. Imbrenda, D. Kutscher, and B. Ohlman, “A survey of information-centric networking,” IEEE Commun. Mag., vol. 50, no. 7, pp. 26–36, 2012.
-  3GPP, “5G System: Technical realization of service based architecture,” TS 29.500., [Online] https://www.3gpp.org/ftp/Specs/archive/29_series/29.500/, 2021.
-  G. Brown, “Serviced-based architecture for 5G core network,” White Paper, Huawei Technology Co. Ltd., [Online] https://www.3gpp.org/ftp/Specs/archive/29_series/29.500/, 2017.
-  E. C. Strinati and S. Barbarossa, “6G networks: Beyond shannon towards semantic and goal-oriented communications,” [Online] https://arxiv.org/pdf/2011.14844.pdf/, 2021.
-  Samsung, “The next hyper connected experience for all,” [Online] https://cdn.codeground.org/nsr/downloads/researchareas/20201201_6G_Vision_web.pdf, 2020.
-  C. Wang, M. Renzo, S. Stanczak, S. Wang, and E. G. Larsson, “Artificial intelligence enabled wireless networking for 5G and beyond: Recent advances and future challenges,” IEEE Wireless Commun., vol. 21, no. 1, pp. 16–23, 2020.
-  J. Bao, P. Basu, M. Dean, C. Partridge, A. Swami, W. Leland, and J. A. Hendler, “Towards a theory of semantic communication,” in Proc. IEEE Netw. Sci. Workshop, West Point, NY, Jun 22-24, 2011.
-  O. Goldreich, B. Juba, and M. Sudan, “A theory of goal-oriented communication,” J. ACM (JACM), vol. 59, no. 2, pp. 1–65, 2012.
-  G. Shi, Y. Xiao, Y. Li, and X. Xie, “From semantic communication to semantic-aware networking: Model, architecture, and open problems,” [Online] http://arxiv.org/pdf/2012.15405.pdf, 2020.
-  W. Weaver and C. Shannon, “Recent contributions to the mathematical theory of communication,” Mathematical Theory Commun., University of Illinois Press, 1949.
-  New York Times, “New navy device learns by doing: Psychologist shows embryo of computer designed to read and grow wiser,” [Online] http://www.nytimes.com/1958/07/08/archives/new-navy-device-learns-by-doing-psychologist-shows-embryo-of.html, Jul. 1958.
-  Leader, “Chips with everything: How the world will change as computers spread into everyday objects,” The Economist, vol. 2, Sep. 2019.
-  M. Stoyanova, Y. Nikoloudakis, S. Panagiotakis, E. Pallis, and E. K. Markakis, “A survey on the internet of things (IoT) forensics: Challenges, approaches, and open issues,” IEEE Commun. Surveys Tuts., vol. 22, no. 2, pp. 1191–1221, 2020.
-  J. Simpson, “The past, present and future of big data in marketing,” Forbes, [Online] https://www.forbes.com/sites/forbesagencycouncil/2020/01/17/the-past-present-and-future-of-big-data-in-marketing/?sh=42a277145da9, Jan. 2020.
-  P. Popovski, O. Simeone, F. Boccardi, D. Gündüz, and O. Sahin, “Semantic-effectiveness filtering and control for post-5G wireless connectivity,” J. Indian Inst. Sci., vol. 100, no. 2, pp. 435–443, 2020.
-  M. Kountouris and N. Pappas, “Semantics-empowered communication for networked intelligent systems,” [Online] https://arxiv.org/abs/2007.11579, 2021.
-  E. Uysal, O. Kaya, A. Ephremides, J. Gross, M. Codreanu, P. Popovski, M. Assaad, G. Liva, A. Munari, T. Soleymani, B. Soret, and K. H. Johansson, “Semantic communications in networked systems,” [Online] https://arxiv.org/ftp/arxiv/papers/2103/2103.05391.pdf, 2021.
-  A. Xu, Z. Liu, Y. Guo, V. Sinha, and R. Akkiraju, “A new chatbot for customer service on social media,” in Proc. CHI Conf. Hum. Factors Comput. Syst., Denvor, CO, USA, May 6-11, 2017.
-  B. R. Ranoliya, N. Raghuwanshi, and S. Singh, “Chatbot for university related FAQs,” in Porc. Int. Conf. Adv. Comput. Commun. Inform. (ICACCI), Sep 13-16, 2017, pp. 1525–1530.
-  T. Lysaght, H. Y. Lim, V. Xafis, and K. Y. Ngiam, “AI-assisted decision-making in healthcare,” Asian Bioeth. Rev., vol. 11, no. 3, pp. 299–314, 2019.
-  T. Machado, D. Gopstein, A. Nealen, O. Nov, and J. Togelius, “AI-assisted game debugging with Cicero,” in Proc. IEEE Congr. Evol. Comput. (CEC), Rio de Janeiro, Brazil, July 8-13, 2018.
-  A. Svyatkovskiy, Y. Zhao, S. Fu, and N. Sundaresan, “Pythia: AI-assisted code completion system,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., Anchorage AK USA, Aug 4-8, 2019.
-  A. Ziebinski, R. Cupek, D. Grzechca, and L. Chruszczyk, “Review of advanced driver assistance systems (ADAS),” in Proc.Int. Conf. Comput. Methods Sci. Eng. (ICCMSE), Thessaloniki, Greece, Apr. 21-25, 2017.
-  J. Kannan and P. Munday, “New trends in second language learning and teaching through the lens of ICT, networked learning, and artificial intelligence,” Círculo de Lingüística Aplicada a la Comunicación, vol. 76, pp. 13–30, 2018.
-  A. Holzinger, “Human-computer interaction and knowledge discovery (HCI-KDD): What is the benefit of bringing those two fields to work together?” in Proc. Int. Conf. Availability, Rel. Secur. (ARES), Regensburg, Germany, Sep 2-6, 2013.
-  A. Holzinger, “Interactive machine learning for health informatics: When do we need the human-in-the-loop?” Springer Brain Informat., vol. 3, no. 2, pp. 119–131, 2016.
-  S. Berg, D. Kutra, T. Kroeger, C. N. Straehle, B. X. Kausler, C. Haubold, M. Schiegg, J. Ales, T. Beier, M. Rudy et al., “Ilastik: Interactive machine learning for (bio)image analysis,” Nature Methods, vol. 16, no. 12, pp. 1226–1232, 2019.
-  J. Schneider, “Human-to-AI coach: Improving human inputs to ai systems,” in Proc. Int. Symp. Intell. Data Analysis (ISIDA), Bodenseeforum, Lake Constance, Germany, Apr. 27-29, 2020.
-  M. M. De Graaf and S. B. Allouch, “Exploring influencing variables for the acceptance of social robots,” Robot. Auton. Syst., vol. 61, no. 12, pp. 1476–1486, 2013.
-  A. Sriraman, J. Bragg, and A. Kulkarni, “Worker-owned cooperative models for training artificial intelligence,” in Proc. ACM Conf. Comput. Support. Coop. Work Soc. Comput. (CSCW), Portland, Oregon, USA, Feb 25 – Mar 01, 2017.
-  Amazon, “Amazon Mechanical Turk,” [Online] https://www.mturk.com.
-  A. M. Okamura, “Haptic feedback in robot-assisted minimally invasive surgery,” Curr. Opin. Urol., vol. 19, no. 1, p. 102, 2009.
-  N. Sornkarn and T. Nanayakkara, “Can a soft robotic probe use stiffness control like a human finger to improve efficacy of haptic perception?” IEEE Trans. Haptics, vol. 10, no. 2, pp. 183–195, 2016.
-  T. Chakraborti and S. Kambhampati, “(When) Can AI bots lie?” in Proc. AAAI/ACM Conf. AI Ethics Soc., Honolulu, HI, USA, Jan. 27 - 28, 2019.
R. L. Rosa, G. M. Schwartz, W. V. Ruggiero, and D. Z. Rodríguez, “A knowledge-based recommendation system that includes sentiment analysis and deep learning,”IEEE Trans. Ind. Inform., vol. 15, no. 4, pp. 2124–2135, 2018.
-  D. Gavalas and M. Kenteris, “A web-based pervasive recommendation system for mobile tourist guides,” Pers. Ubiquitous Comput., vol. 15, no. 7, pp. 759–770, 2011.
-  P. Xia, B. Liu, Y. Sun, and C. Chen, “Reciprocal recommendation system for online dating,” in IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining (ASONAM), Paris, France, Aug. 25 - 28, 2015.
-  P. Zhao, J. Jia, Y. An, J. Liang, L. Xie, and J. Luo, “Analyzing and predicting emoji usages in social media,” in Companion Proc. Web Conf., Republic and Canton of Geneva, Switzerland, Apr. 23 - 27, 2018.
-  L. Chin-Feng, C. Jui-Hung, H. Chia-Cheng, H. Yueh-Min, and C. Han-Chieh, “CPRS: A cloud-based program recommendation system for digital TV platforms,” in Int. Conf. Grid Pervasive Comput., Hualien, Taiwan, May 10-14, 2010.
-  J. Davidson, B. Liebald, J. Liu, P. Nandy, T. Van Vleet, U. Gargi, S. Gupta, Y. He, M. Lambert, B. Livingston et al., “The YouTube video recommendation system,” in Proc. ACM Conf. Recommender Syst., Barcelona, Spain, Sep. 26-30, 2010.
-  Y. Zhang, D. Zhang, M. M. Hassan, A. Alamri, and L. Peng, “CADRE: Cloud-assisted drug recommendation service for online pharmacies,” Mobile Netw. Appl., vol. 20, no. 3, pp. 348–355, 2015.
-  S.-L. Wang, Y. L. Chen, A. M.-H. Kuo, H.-M. Chen, and Y. S. Shiu, “Design and evaluation of a cloud-based mobile health information recommendation system on wireless sensor networks,” Comput. Elect. Eng., vol. 49, pp. 221–235, 2016.
-  N. K. Suryadevara and S. C. Mukhopadhyay, “Wireless sensor network based home monitoring system for wellness determination of elderly,” IEEE Sensors J., vol. 12, no. 6, pp. 1965–1972, 2012.
-  C.-C. Lin, M.-J. Chiu, C.-C. Hsiao, R.-G. Lee, and Y.-S. Tsai, “Wireless health care service system for elderly with dementia,” IEEE Trans. Inf. Technol. Biomed., vol. 10, no. 4, pp. 696–704, 2006.
-  S. D. T. Kelly, N. K. Suryadevara, and S. C. Mukhopadhyay, “Towards the implementation of iot for environmental condition monitoring in homes,” IEEE Sensors J., vol. 13, no. 10, pp. 3846–3853, 2013.
-  J. Windau and L. Itti, “Situation awareness via sensor-equipped eyeglasses,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Tokyo, Japan, Nov. 3-7, 2013.
-  N. Golestani and M. Moghaddam, “Human activity recognition using magnetic induction-based motion signals and deep recurrent neural networks,” Nature Commun., vol. 11, no. 1, pp. 1–11, 2020.
-  K. J. Liszka, M. A. Mackin, M. J. Lichter, D. W. York, D. Pillai, and D. S. Rosenbaum, “Keeping a beat on the heart,” IEEE Pervasive Comput., vol. 3, no. 4, pp. 42–49, 2004.
-  S. Vishwanath, K. Vaidya, R. Nawal, A. Kumar, S. Parthasarathy, and S. Verma, “Touching lives through mobile health assessment of the global market opportunity,” GSMA, Tech. Rep., Feb. 2012.
-  Deloitte, “Digital health in the uk an industry study for the office of life sciences,” Office for Life Sciences, Tech. Rep., Sep. 2015.
-  J. Ren, Y. He, G. Huang, G. Yu, Y. Cai, and Z. Zhang, “An edge-computing based architecture for mobile augmented reality,” IEEE Netw., vol. 33, no. 4, pp. 162–169, 2019.
-  M. S. Elbamby, C. Perfecto, M. Bennis, and K. Doppler, “Toward low-latency and ultra-reliable virtual reality,” IEEE Netw., vol. 32, no. 2, pp. 78–84, 2018.
-  M. Chen, W. Saad, and C. Yin, “Virtual reality over wireless networks: Quality-of-service model and learning-based resource management,” IEEE Trans. Commun., vol. 66, no. 11, pp. 5621–5635, 2018.
-  X. Yu, Z. Xie, Y. Yu, J. Lee, A. Vazquez Guardado, H. Luan, J. Ruban, X. Ning, A. Akhtar, D. Li et al., “Skin-integrated wireless haptic interfaces for virtual and augmented reality,” Nature, vol. 575, no. 7783, pp. 473–479, 2019.
-  C. Qvarfordt, H. Lundqvist, and G. P. Koudouridis, “High quality mobile XR: Requirements and feasibility,” in Proc. IEEE Int. Workshop Comput. Aided Model. Design Commun. Links Netw. (CAMAD), Barcelona, Spain, Sep. 17-19, 2018.
-  D. Chatzopoulos, C. Bermejo, Z. Huang, and P. Hui, “Mobile augmented reality survey: From where we are to where we go,” IEEE Access, vol. 5, pp. 6917–6950, 2017.
-  J. M. Davila Delgado, L. Oyedele, P. Demian, and T. Beach, “A research agenda for augmented and virtual reality in architecture, engineering and construction,” Adv. Eng. Informat., vol. 45, p. 101122, 2020.
-  S. T. Dumais, “Latent semantic analysis,” Annu. Rev. Inf. Sci. Technol., vol. 38, no. 1, pp. 188–230, 2004.
-  R. Wang, Y. Liu, P. Zhang, X. Li, and X. Kang, “Edge and cloud collaborative entity recommendation method towards the IoT search,” Sensors, vol. 20, no. 7, p. 1918, 2020.
-  F. Tang, Z. M. Fadlullah, B. Mao, N. Kato, F. Ono, and R. Miura, “On a novel adaptive UAV-mounted cloudlet-aided recommendation system for LBSNs,” IEEE Trans. Emerg. Topics Comput., vol. 7, no. 4, pp. 565–577, 2018.
-  J. Corchado, “A distributed recommendation system ASSOS,” in Proc. IEEE Colloq. Knowl. Discovery, London, UK, 1995.
-  F. Armknecht and T. Strufe, “An efficient distributed privacy-preserving recommendation system,” in Proc. IFIP Ann. Mediterranean Ad Hoc Netw. Workshop, Favignana Island, Italy, Jun. 12-15, 2011.
-  M. Mohammadi Amiri and D. Gündüz, “Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air,” IEEE Trans. Signal Process., vol. 68, pp. 2155–2169, 2020.
-  M. M. Amiri and D. Gündüz, “Federated learning over wireless fading channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3546–3557, 2020.
-  Y. Jiang, S. Wang, V. Valls, B. J. Ko, W.-H. Lee, K. K. Leung, and L. Tassiulas, “Model pruning enables efficient federated learning on edge devices,” [Online] https://arxiv.org/pdf/1909.12326.pdf, 2019.
-  N. Bouacida, J. Hou, H. Zang, and X. Liu, “Adaptive federated dropout: Improving communication efficiency and generalization for federated learning,” in Proc. IEEE Conf. Comput. Commun. Wkshps. (INFOCOM WKSHPS), Vancouver, Canada, May 10-13, 2021.
-  Z. Zhang, G. Zhu, R. Wang, V. K. N. Lau, and K. Huang, “Turning channel noise into an accelerator for over-the-air principal component analysis,” [Online] https://arxiv.org/pdf/2104.10095.pdf, 2021.
-  G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 491–506, 2020.
-  G. Zhu, J. Xu, K. Huang, and S. Cui, “Over-the-air computing for wireless data aggregation in massive IoT,” [Online] https://arxiv.org/abs/2009.02181.pdf, 2020.
-  X. Chen, A. Liu, and M.-J. Zhao, “High-mobility multi-modal sensing for IoT network via MIMO AirComp: A mixed-timescale optimization approach,” IEEE Commun. Lett., vol. 24, no. 10, pp. 2295–2299, 2020.
-  K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, 2020.
-  D. Liu, G. Zhu, Q. Zeng, J. Zhang, and K. Huang, “Wireless data acquisition for edge learning: Data-importance aware retransmission,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 406–420, 2021.
-  J. Ren, Y. He, D. Wen, G. Yu, K. Huang, and D. Guo, “Scheduling for cellular federated edge learning with importance and channel awareness,” IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7690–7703, 2020.
-  N. Zhang and M. Tao, “Gradient statistics aware power control for over-the-air federated learning,” IEEE Trans. Wireless Commun., vol. 20, no. 8, pp. 5115–5128, 2021.
-  D. Liu, G. Zhu, J. Zhang, and K. Huang, “Data-importance aware user scheduling for communication-efficient edge machine learning,” IEEE Trans. Cogn. Commun. Netw., vol. 7, no. 1, pp. 265–278, 2021.
-  M. Seif, R. Tandon, and M. Li, “Wireless federated learning with local differential privacy,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Los Angeles, CA, USA, Jun. 21-26, 2020.
-  D. Liu and O. Simeone, “Privacy for free: Wireless federated learning via uncoded transmission with adaptive power control,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 170–185, 2021.
-  J. Shao and J. Zhang, “Communication-computation trade-off in resource-constrained edge inference,” IEEE Commun. Mag., vol. 58, no. 12, pp. 20–26, 2020.
-  S. R. Alvar and I. V. Bajić, “Pareto-optimal bit allocation for collaborative intelligence,” IEEE Trans. Image Process., vol. 30, pp. 3348–3361, 2021.
-  J. H. Ko, T. Na, M. F. Amir, and S. Mukhopadhyay, “Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained Internet-of-Things platforms,” in Proc. IEEE Int. Conf. Adv. Video Signal Surveillance (AVSS), Auckland, New Zealand, Nov. 27-30, 2018.
-  D. Jahier Pagliari, R. Chiaro, E. Macii, and M. Poncino, “CRIME: Input-dependent collaborative inference for recurrent neural networks,” to appear in IEEE Trans. Comput., 2020.
-  J. Shao, H. Zhang, Y. Mao, and J. Zhang, “Branchy-GNN: A device-edge co-inference framework for efficient point cloud processing,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Toronto, Canada, Jun. 6-11, 2021.
-  Q. Lan, Q. Zeng, P. Popovski, D. Gündüz, and K. Huang, “Progressive feature transmission for edge inference,” The University of Hong Kong, Tech. Rep., 2021.
-  Z. Zhuang, M. Tan, B. Zhuang, J. Liu, Y. Guo, Q. Wu, J. Huang, and J. Zhu, “Discrimination-aware channel pruning for deep neural networks,” in Proc. Int. Conf. Neural Inf. Process. Systems (NeuralIPS), Montréal, Canada, Dec 2018.
-  W. Shi, Y. Hou, S. Zhou, Z. Niu, Y. Zhang, and L. Geng, “Improving device-edge cooperative inference of deep learning via 2-step pruning,” in Proc. IEEE Conf. Comput. Commun. Workshops (INFOCOM WKSHPS), Paris, France, Apr. 19 - May 2, 2019.
J. Park, J. Kim, and J. H. Ko, “Auto-tiler: Variable-dimension autoencoder with tiling for compressing intermediate feature space of deep neural networks for Internet of Things,”Sensors, vol. 21, no. 3, 2021.
J. Choi, H. J. Chang, T. Fischer, S. Yun, K. Lee, J. Jeong, Y. Demiris, and J. Y. Choi, “Context-aware deep feature compression for high-speed visual tracking,” inProc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (CVPR), Salt Lake City, UT, USA, Jun. 18-22, 2018.
-  M. Jankowski, D. Gündüz, and K. Mikolajczyk, “Wireless image retrieval at the edge,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 89–100, 2021.
-  C.-H. Lee, J.-W. Lin, P.-H. Chen, and Y.-C. Chang, “Deep learning-constructed joint transmission-recognition for internet of things,” IEEE Access, vol. 7, pp. 76 547–76 561, 2019.
-  E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-demand accelerating deep neural network inference via edge computing,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 447–457, 2020.
-  M. Krouka, A. Elgabli, C. B. Issaid, and M. Bennis, “Energy-efficient model compression and splitting for collaborative inference over time-varying channels,” [Online] https://arxiv.org/abs/2106.00995, 2021.
-  M. Wang, W. Daamen, S. P. Hoogendoorn, and B. van Arem, “Cooperative car-following control: Distributed algorithm and impact on moving jam features,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 5, pp. 1459–1471, 2016.
-  M. Saeednia and M. Menendez, “A consensus-based algorithm for truck platooning,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 2, pp. 404–415, 2017.
-  Z. Zhou, Z. Akhtar, K. L. Man, and K. Siddique, “A deep learning platooning-based video information-sharing internet of things framework for autonomous driving systems,” Int. J. Distrib. Sensor Netw., vol. 15, no. 11, 2019.
-  F. Xue and W. Lu, “A semantic differential transaction approach to minimizing information redundancy for BIM and blockchain integration,” Autom. Construction, vol. 118, pp. 1–13, 2020.
-  W. Li, C. Feng, L. Zhang, H. Xu, B. Cao, and M. A. Imran, “A scalable multi-layer PBFT consensus for blockchain,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 5, pp. 1146–1160, 2021.
-  R. Hult, G. R. Campos, E. Steinmetz, L. Hammarstrand, P. Falcone, and H. Wymeersch, “Coordination of cooperative autonomous vehicles: Toward safer and more efficient road transportation,” IEEE Signal Process. Mag., vol. 33, no. 6, pp. 74–84, 2016.
-  W. J. Yun, B. Lim, S. Jung, Y.-C. Ko, J. Park, J. Kim, and M. Bennis, “Attention-based reinforcement learning for real-time UAV semantic communication,” [Online] https://arxiv.org/abs/2105.10716, 2021.
-  Z. Jiang, S. Fu, S. Zhou, Z. Niu, S. Zhang, and S. Xu, “AI-assisted low information latency wireless networking,” IEEE Wireless Commun., vol. 27, no. 1, pp. 108–115, 2020.
-  P. Danzi, A. E. Kalor, C. Stefanovic, and P. Popovski, “Delay and communication tradeoffs for blockchain systems with lightweight IoT clients,” IEEE Internet Things J., vol. 6, no. 2, pp. 2354–2365, 2019.
-  T. Sultana and K. A. Wahid, “IoT-guard: Event-driven fog-based video surveillance system for real-time security management,” IEEE Access, vol. 7, pp. 134 881–134 894, 2019.
-  J. Ren, Y. Guo, D. Zhang, Q. Liu, and Y. Zhang, “Distributed and efficient object detection in edge computing: Challenges and solutions,” IEEE Netw., vol. 32, no. 6, pp. 137–143, 2018.
-  Z. Chen, K. Fan, S. Wang, L. Duan, W. Lin, and A. C. Kot, “Toward intelligent sensing: Intermediate deep feature compression,” IEEE Trans. Image Process., vol. 29, pp. 2230–2243, 2020.
-  L. Li, K. Ota, and M. Dong, “Deep learning for smart industry: Efficient manufacture inspection system with fog computing,” IEEE Trans. Ind. Informat., vol. 14, no. 10, pp. 4665–4673, 2018.
-  X. Chen, J.-N. Hwang, D. Meng, K.-H. Lee, R. L. de Queiroz, and F.-M. Yeh, “A quality-of-content-based joint source and channel coding for human detections in a mobile surveillance cloud,” IEEE Trans. Circuits Syst. Video Technol., vol. 27, no. 1, pp. 19–31, 2017.
-  H. Skinnemoen, “UAV satellite communications live mission-critical visual data,” in Proc. IEEE Int. Conf. Aerosp. Electron. Remote Sens. Technol., Yogyakarta, Indonesia, Nov. 13-14, 2014.
-  Q. Wu, P. Wang, C. Shen, A. Dick, and A. Van Den Hengel, “Ask me anything: Free-form visual question answering based on knowledge from external sources,” in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Jun. 2016, pp. 4622–4630.
-  S. W.-T. Yih, M.-W. Chang, X. He, and J. Gao, “Semantic parsing via staged query graph generation: Question answering with knowledge base,” in Proc. Intl. Joint Conf. on Natural Language Process. (ACL), Jul. 2015.
-  V. Këpuska and G. Bohouta, “Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home),” in Proc. IEEE Annual Comput. Commun. Workshop and Conf. (CCWC), Jan. 2018, pp. 99–103.
R. Socher, D. Chen, C. D. Manning, and A. Y. Ng, “Reasoning with neural tensor networks for knowledge base completion,” inProc. Adv. Neural Inf. Process. Syst. (NeuralIPS), 2013, pp. 926–934.
-  H. He, A. Balakrishnan, M. Eric, and P. Liang, “Learning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings,” in Proc. Annual Meeting Association Comput. Linguistics (Volume 1: Long Papers), Jul. 2017, pp. 1766–1776.
-  H. Wang, F. Zhang, X. Xie, and M. Guo, “DKN: Deep knowledge-aware network for news recommendation,” in Proc. World Wide Web Conf. (WWW), 2018, pp. 1835–1844.
-  H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, and M. Guo, “Multi-task feature learning for knowledge graph enhanced recommendation,” in Proc. World Wide Web Conf. (WWW), 2019, pp. 2000–2010.
-  X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T.-S. Chua, “Explainable reasoning over knowledge graphs for recommendation,” in Proc. AAAI Conf. Artificial Intelligence, Hawaii, USA, Jan. 2019.
-  Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE: enhanced language representation with informative entities,” [Online] http://arxiv.org/pdf/1905.07129.pdf, 2019.
-  Y. Sun, S. Wang, Y. Li, S. Feng, X. Chen, H. Zhang, X. Tian, D. Zhu, H. Tian, and H. Wu, “ERNIE: enhanced representation through knowledge integration,” [Online] http://arxiv.org/pdf/1904.09223.pdf, 2019.
-  Z. Qiao, Y. Zhou, D. Yang, Y. Zhou, and W. Wang, “SEED: Semantics enhanced encoder-decoder framework for scene text recognition,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020.
-  S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, “A survey on knowledge graphs: Representation, acquisition, and applications,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–21, 2021.
-  B. Butterfield and J. A. Mangels, “Neural correlates of error detection and correction in a semantic retrieval task,” Cogn. Brain Res., vol. 17, no. 3, pp. 793–817, 2003.
-  M. Jeong, B. Kim, and G. Lee, “Semantic-oriented error correction for spoken query processing,” in Proc. IEEE Autom. Speech Recognit. Underst. Workshop (ASRU), Virgin Islands, USA, Nov. 2003.
-  A. Vinciarelli, A. Esposito, E. André, F. Bonin, M. Chetouani, J. F. Cohn, M. Cristani, F. Fuhrmann, E. Gilmartin, Z. Hammal, D. Heylen, R. Kaiser, M. Koutsombogera, A. Potamianos, S. Renals, G. Riccardi, and A. A. Salah, “Open challenges in modelling, analysis and synthesis of human behaviour in human & human and human & machine interactions,” Cognitive Comput., vol. 7, no. 4, pp. 397–413, Aug. 2015.
-  P. Brézillon, “Context in problem solving: a survey,” The Knowl. Eng. Rev., vol. 14, no. 1, pp. 47–80, 1999.
-  Y. Li, J. Song, and S. Ermon, “InfoGAIL: Interpretable imitation learning from visual demonstrations,” in Proc. Adv. Neural Inf. Process. Syst. (NeuralIPS), Long Beach, CA, USA, Dec. 2017.
-  Y. Zhu, D. Gordon, E. Kolve, D. Fox, L. Fei-Fei, A. Gupta, R. Mottaghi, and A. Farhadi, “Visual semantic planning using deep successor representations,” in Proc. IEEE Intl. Conf. Comput. Vision (ICCV), Oct. 2017.
-  Y. Wei, J. Luo, and H. Xie, “KGRL: An OWL2 RL reasoning system for large scale knowledge graph,” in Proc. Intl Conf. Semantics, Knowl. Grids (SKG), Aug. 2016, pp. 83–89.
-  D. Zheng, X. Song, C. Ma, Z. Tan, Z. Ye, J. Dong, H. Xiong, Z. Zhang, and G. Karypis, “DGL-KE: Training knowledge graph embeddings at scale,” in Proc. Intl. ACM SIGIR Conf. Res. Develop. Inf. Retrieval (SIGIR), 2020, pp. 739–748.
-  B. Swartout, R. Patil, K. Knight, and T. Russ, “Toward distributed use of large-scale ontologies,” Proc. Banff Knowl. Acquisition Workshop, 1996.
-  J. S. Mertoguno, “Distributed knowledge-base: Adaptive multi-agents approach,” Intl. J. AI Tools, vol. 07, no. 01, pp. 59–70, 1998.
-  H. Zhu, X. Wang, Y. Jiang, H. Fan, B. Du, and Q. Liu, “FTRLIM: Distributed instance matching framework for large-scale knowledge graph fusion,” Entropy, vol. 23, no. 5, 2021.
-  F. Zhang and D. Xue, “Distributed database and knowledge base modeling for concurrent design,” Computer-Aided Design, vol. 34, no. 1, pp. 27–40, 2002.
-  E. Aumayr, M. Wang, and A.-M. Bosneag, “Probabilistic knowledge-graph based workflow recommender for network management automation,” in Proc. IEEE Intl. Symp. ”A World of Wireless, Mobile Multimedia and Network” (WoWMoM), Jun. 2019, pp. 1–7.
-  E. Niemela, J. Kalaoja, and P. Lago, “Toward an architectural knowledge base for wireless service engineering,” IEEE Trans. Softw. Eng., vol. 31, no. 5, pp. 361–379, May 2005.
-  A. Mudassir, S. Akhtar, H. Kamel, and N. Javaid, “A survey on fuzzy logic applications in wireless and mobile communication for LTE networks,” in Proc. Intl. Conf. Complex, Intell., Softw. Intensive Syst. (CISIS), Jul. 2016, pp. 76–82.
-  D. Schachinger and W. Kastner, “Semantic interface for machine-to-machine communication in building automation,” in Proc. Intl Workshop Factory Commun, Syst. (WFCS), May 2017, pp. 1–9.
-  A. Gyrard, C. Bonnet, and K. Boudaoud, “Enrich machine-to-machine data with semantic web technologies for cross-domain applications,” in Proc. IEEE World Forum Internet of Things (WF-IoT), Mar. 2014, pp. 559–564.
-  A. Bröring, J. Echterhoff, S. Jirka, I. Simonis, T. Everding, C. Stasch, S. Liang, and R. Lemmens, “New generation sensor web enablement,” Sensors, vol. 11, no. 3, pp. 2652–2699, 2011.
-  H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, 2021.
-  K. Lu, R. Li, X. Chen, Z. Zhao, and H. Zhang, “Reinforcement learning-powered semantic communication via semantic similarity,” [Online] https://arxiv.org/pdf/2108.12121.pdf, 2021.
-  N. Slonim and N. Tishby, “Agglomerative information bottleneck,” in Proc. Adv. Neural Inf. Process. Systems (NeuralIPS), Denver, CO, USA, Nov. 29 - Dec. 4, 1999.
-  N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in IEEE Inf. Theory Workshop (ITW), vol. 1-5, Jeju Island, Korea, Oct. 2015.
-  Y. Du, S. Yang, and K. Huang, “High-dimensional stochastic gradient quantization for communication-efficient edge learning,” IEEE Trans. Signal Process., vol. 68, pp. 2128 – 2142, Mar. 2020.
-  H. Xie and Z. Qin, “A lite distributed semantic communication system for internet of things,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 142–153, 2021.
-  Z. Weng and Z. Qin, “Semantic communication systems for speech transmission,” IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2434–2444, 2021.
-  T. M. Cover, Elements of information theory. John Wiley & Sons, 1999.
-  M. Fresia, F. Peréz-Cruz, H. V. Poor, and S. Verdú, “Joint source and channel coding,” IEEE Signal Process. Mag., vol. 27, no. 6, pp. 104–113, 2010.
-  K. Choi, K. Tatwawadi, A. Grover, T. Weissman, and S. Ermon, “Neural joint source-channel coding,” in Proc. Int. Conf. Mach. Learn. (ICML), Long Beach, California, USA, June 2019.
-  I. Csiszar, “Linear codes for sources and source networks: Error exponents, universal coding,” IEEE Trans. Inf. Theory, vol. 28, no. 4, pp. 585–592, 1982.
-  V. Kostina and S. Verdú, “Lossy joint source-channel coding in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 59, no. 5, pp. 2545–2575, 2013.
-  O. Y. Bursalioglu, G. Caire, and D. Divsalar, “Joint source-channel coding for deep-space image transmission using rateless codes,” IEEE Trans. Commun., vol. 61, no. 8, pp. 3448–3461, 2013.
-  D. B. Kurka and D. Gündüz, “DeepJSCC-f: Deep joint source-channel coding of images with feedback,” IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 178–193, 2020.
-  M. Yang, C. Bian, and H.-S. Kim, “Deep joint source channel coding for wireless image transmission with OFDM,” [Online] https://arxiv.org/pdf/2101.03909.pdf, 2021.
-  E. Bourtsoulatze, D. Burth Kurka, and D. Gunduz, “Deep joint source-channel coding for wireless image transmission,” IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, 2019.
-  A. Katsaggelos, Joint Source-Channel Coding for Video Communications. Elsevier, Academic Press, 2005, pp. 1065–1082.
-  N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source-channel coding of text,” in IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), Calgary, AB, Canada, Apr. 2018.
-  L. FLORIDI, “Is semantic information meaningful data?” Philosophy and Phenomenological Research, vol. 70, no. 2, pp. 351–370, 2005.
-  V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, Dec 2017.
-  S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” J. Am. Soc. Inf. Sci. Technol., vol. 41, no. 6, pp. 391–407, 1990.
-  E. Kodirov, T. Xiang, and S. Gong, “Semantic autoencoder for zero-shot learning,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), Hawaii, USA, July 2017.
-  J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding.” [Online]. Available: http://arxiv.org/pdf/1810.04805.pdf
-  P. Jiang, C.-K. Wen, S. Jin, and G. Y. Li, “Deep source-channel coding for sentence semantic transmission with HARQ.” [Online]. Available: https://arxiv.org/pdf/2106.03009.pdf
-  A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, Long Beach, California, USA, 2017, pp. 5998–6008.
-  R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille, “The role of context for object detection and semantic segmentation in the wild,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), Columbus, Ohio, USA, June 2014.
-  H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “Icnet for real-time semantic segmentation on high-resolution images,” in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, September 2018.
-  L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” 2017. [Online]. Available: http://arxiv.org/pdf/1706.05587.pdf
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Intel. Conf. Learn. Represent. (ICLR), San Diego, California, USA, 2015.
-  Z. Jiang, S. Fu, S. Zhou, Z. Niu, S. Zhang, and S. Xu, “AI-assisted low information latency wireless networking,” IEEE Wireless Communications, vol. 27, no. 1, pp. 108–115, 2020.
-  Z. Yang, B. Wu, K. Zheng, X. Wang, and L. Lei, “A survey of collaborative filtering-based recommender systems for mobile internet applications,” IEEE Access, vol. 4, pp. 3273–3287, 2016.
-  H. Wang, N. Wang, and D. Yeung, “Collaborative deep learning for recommender systems,” in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015, pp. 1235–1244.
-  X. Zhou, J. He, G. Huang, and Y. Zhang, “SVD-based incremental approaches for recommender systems,” Journal of Computer and System Sciences, vol. 81, no. 4, pp. 717–733, 2015.
-  B. Huang, X. Yan, and J. Lin, “Collaborative filtering recommendation algorithm based on joint nonnegative matrix factorization,” Pattern Recognition & Artificial Intelligence, vol. 29, no. 8, pp. 725–734, 2016.
-  S. Zahra, M. A. Ghazanfar, A. Khalid, M. A. Azam, U. Naeem, and A. Prugel-Bennett, “Novel centroid selection approaches for kmeans-clustering based recommender systems,” Information sciences, vol. 320, pp. 156–189, 2015.
-  A. J. Chaney, D. M. Blei, and T. Eliassi-Rad, “A probabilistic model for using social networks in personalized item recommendation,” in Proceedings of the 9th ACM Conference on Recommender Systems, 2015, pp. 43–50.
-  I. Ryngksai and L. Chameikho, “Recommender systems: Types of filtering techniques,” International Journal of Engineering Researck & Technology, Gujarat, vol. 3, no. 2278-0181, pp. 251–254, 2014.
-  D. Ayata, Y. Yaslan, and M. E. Kamasak, “Emotion based music recommendation system using wearable physiological sensors,” IEEE Trans. Consumer Electron., vol. 64, no. 2, pp. 196–203, 2018.
-  Y. Mo, J. Chen, X. Xie, C. Luo, and L. T. Yang, “Cloud-based mobile multimedia recommendation system with user behavior information,” IEEE Systems Journal, vol. 8, no. 1, pp. 184–193, 2014.
-  R. J. Oweis and B. O. Al-Tabbaa, “QRS detection and heart rate variability analysis: A survey,” Biomedical Science and Engineering, vol. 2, no. 1, pp. 13–34, 2014.
-  J. Pan and W. J. Tompkins, “A real-time QRS detection algorithm,” IEEE Transactions on Biomedical Engineering, no. 3, pp. 230–236, 1985.
-  N. Javaid, S. Faisal, Z. A. Khan, D. Nayab, and M. Zahid, “Measuring fatigue of soldiers in wireless body area sensor networks,” in 2013 Eighth International Conference on Broadband and Wireless Computing, Communication and Applications, Compiegne, France, Oct. 2013, pp. 227–231.
-  X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen, “Convergence of edge computing and deep learning: A comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 22, no. 2, pp. 869–904, 2020.
-  W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang, D. Niyato, and C. Miao, “Federated learning in mobile edge networks: A comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 22, no. 3, pp. 2031–2063, 2020.
-  I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. Boston, MA, USA: MIT Press, 2016.
-  T. Marzetta and B. Hochwald, “Fast transfer of channel state information in wireless systems,” IEEE Trans. Signal Process., vol. 54, no. 4, pp. 1268–1278, 2006.
-  T. T. Nu, T. Fujihashi, and T. Watanabe, “Power-efficient video uploading for crowdsourced multi-view video streaming,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Abu Dhabi, United Arab Emirates, Dec. 9-13, 2018.
-  Y. Du and K. Huang, “Fast analog transmission for high-mobility wireless data acquisition in edge learning,” IEEE Wireless Commun. Lett., vol. 8, no. 2, pp. 468–471, 2019.
-  M. Gastpar, “Uncoded transmission is exactly optimal for a simple gaussian “sensor” network,” IEEE Trans. Inf. Theory, vol. 54, no. 11, pp. 5247–5251, Nov 2008.
-  B. Settles, M. Craven, and S. Ray, “Multiple-instance active learning,” in Proc. Adv. Neural Inf. Process. Syst. (NeuralIPS), Vancouver, Canada, Dec. 8-11, 2008.
-  D. Wen, X. Li, Q. Zeng, J. Ren, and K. Huang, “An overview of data-importance aware radio resource management for edge machine learning,” J. Commun. Inf. Netw., vol. 4, no. 4, pp. 1–14, 2019.
-  Y. Wang, K. Lv, R. Huang, S. Song, L. Yang, and G. Huang, “Glance and focus: a dynamic approach to reducing spatial redundancy in image classification,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada, Dec. 6-12, 2020.
-  T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Berlin, Germany: Springer Science & Business Media, 2009.
-  D. X. Jinyang Guo, Wanli Ouyang, “Channel pruning guided by classification loss and feature importance,” in Proc. AAAI Conf. Artificial Intell. (AAAI), New York, NY, USA, Feb. 7-12, 2020.
G. Saon and M. Padmanabhan, “Minimum bayes error feature selection for continuous speech recognition,” inProc. Adv. Neural Inf. Process. Syst. (NeuralIPS), Denver, CO, USA, Nov. 27 - Dec. 2, 2000.
-  H. Li, C. Hu, J. Jiang, Z. Wang, Y. Wen, and W. Zhu, “JALAD: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,” in Proc. IEEE Int. Conf. Parallel Distrib. Syst. (ICPADS), Singapore, Dec. 11-13, 2018.
-  B. Yang, X. Cao, K. Xiong, C. Yuen, Y. L. Guan, S. Leng, L. Qian, and Z. Han, “Edge intelligence for autonomous driving in 6G wireless system: Design challenges and solutions,” IEEE Wireless Commun., vol. 28, no. 2, pp. 40–47, 2021.
-  3GPP, “Technical specification group services and system aspects; feasibility study on new services and markets technology enablers stage 1,” 3GPP, Tech. Rep. TR 22.891 (Release 14), 2016.
-  C. Campolo, A. Molinaro, A. Iera, and F. Menichella, “5G network slicing for vehicle-to-everything services,” IEEE Wireless Commun., vol. 24, no. 6, pp. 38–45, 2017.
-  S. L. Ricker and K. Rudie, “Knowledge is a terrible thing to waste: Using inference in discrete-event control problems,” IEEE Trans. Autom. Control, vol. 52, no. 3, pp. 428–441, 2007.
-  X. Xie and K.-H. Kim, “Source compression with bounded DNN perception loss for IoT edge computer vision,” in Proc. Annual Int. Conf. Mobile Comput. Netw. (Mobicom), Los Cabos, Mexico, Oct. 21-25, 2019.
-  Y. Li, S. Zhang, H. Jia, X. Xie, and W. Gao, “A high-throughput low-latency arithmetic encoder design for HDTV,” in Proc. IEEE Int. Symp. Circuits and Syst. (ISCAS), Beijing, China, May 19-23, 2013.
-  R. Das, A. Neelakantan, D. Belanger, and A. McCallum, “Chains of reasoning over entities, relations, and text using recurrent neural networks,” in Proc. European Chapter the Association Comput. Linguistics (EACL), 2017, pp. 132–141.
-  P. G. Omran, K. Wang, and Z. Wang, “An embedding-based approach to rule learning in knowledge graphs,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 4, pp. 1348–1359, Apr. 2021.
-  D. Vrandecic and M. Krotzsch, “Wikidata: a free collaborative knowledge base,” Commun. of the ACM, vol. 57, no. 10, pp. 78–85, 2014.
-  K. J. Vang, “Ethics of Google’s knowledge graph: Some considerations,” J. Inf., Commun. Ethics Soc., vol. 11, no. 4, pp. 245–260, 2013.
-  G. A. Miller, “WordNet: a lexical database for English,” Commun. ACM, vol. 38, no. 11, pp. 39–41, 1995.
-  C. Matuszek, M.Witbrock, J. Cabral, and J. DeOliveira, “An introduction to the syntax and content of Cyc,” in Proc. AAAI Spring Symp. Formalizing Compiling Background Knowl. Its Appl. Knowl. Representation Question Answering, 2006, pp. 1–6.
-  F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: A core of semantic knowledge,” in Proc. Intl. World Wide Web Conf. (WWW), 2007, pp. 697–706.
-  T. Hachaj and M. R. Ogiela, “Rule-based approach to recognizing human body poses and gestures in real time,” Multimedia Syst., vol. 20, no. 1, pp. 81–99, 2014.
-  W. Zhao, M. A. Reinthal, D. D. Espy, and X. Luo, “Rule-based human motion tracking for rehabilitation exercises: Real-time assessment, feedback, and guidance,” IEEE Access, vol. 5, pp. 21 382–21 394, 2017.
-  W. Zhao, D. D. Espy, M. A. Reinthal, and H. Feng, “A feasibility study of using a single kinect sensor for rehabilitation exercises monitoring: A rule-based approach,” in Proc. IEEE Symp. Comput. Intell. Healthcare e-health (CICARE), Dec. 2014, pp. 1–8.
-  A. Bordes, J. Weston, R. Collobert, and Y. Bengio, “Learning structured embeddings of knowledge bases,” in Proc. AAAI, 2011.
-  A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating Embeddings for Modeling Multi-relational Data,” in Proc. Conf. Neural Inf. Process. Syst. (NeuralIPS), South Lake Tahoe, United States, Dec. 2013, pp. 1–9.
-  L. Cai and W. Y. Wang, “KBGAN: Adversarial learning for knowledge graph embeddings,” in Proc. Conf. the North American Chapter the Association Comput. Linguistics: Human Language Technol., Volume 1 (Long Papers), New Orleans, Louisiana, Jun. 2018, pp. 1470–1480.
-  T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, “Convolutional 2D knowledge graph embeddings,” Proc. AAAI Conf. AI, vol. 32, no. 1, Apr. 2018.
-  I. Balazevic, C. Allen, and T. Hospedales, “TuckER: Tensor factorization for knowledge graph completion,” in Proc. Conf. Empirical Methods Natural Language Process. Intl Joint Conf. Natural Language Process. (EMNLP-IJCNLP), Hong Kong, China, Nov. 2019.
-  X. Zheng, S. Zhou, and Z. Niu, “Urgency of information for context-aware timely status updates in remote control systems,” IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7237–7250, Nov 2020.
-  H. Tataria, M. Shafi, A. F. Molisch, M. Dohler, H. Sjoland, and F. Tufvesson, “6G wireless systems: Vision, requirements, challenges, insights, and opportunities,” Proc. IEEE, vol. 109, no. 7, pp. 1166–1199, Mar. 2021.
-  N. H. Mahmood, H. Alves, O. A. López, M. Shehab, D. P. M. Osorio, and M. Latva-Aho, “Six key features of machine type communication in 6G,” in Proc. 6G Wireless Summit (6G SUMMIT), Levi, Finland, May 2020.
-  W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,” IEEE Netw., vol. 34, no. 3, pp. 134–142, 2020.
-  IMT-2030, “White paper on 6G vision and candidate technologies,” [Online] http://www.caict.ac.cn/english/news/202106/P020210608349616163475.pdf, Jun. 2021.
-  X. You, C. X. Wang, J. Huang et al., “Towards 6G wireless communication networks: Vision, enabling technologies, and new paradigm shifts,” Sci. China Inf. Sci., vol. 64, no. 110301, p. 1–74, Jan. 2021.
-  ITU-T, “Network 2030: A blueprint of technology, applications, and market drivers toward the year 2030,” [Online] https://www.itu.int/en/ITU-T/focusgroups/net2030/Documents/White_Paper.pdf, Nov. 2019.
-  F. Tariq, M. R. A. Khandaker, K.-K. Wong, M. A. Imran, M. Bennis, and M. Debbah, “A speculative study on 6G,” IEEE Wireless Commun., vol. 27, no. 4, pp. 118–125, 2020.
-  3GPP, “Study of enablers for network automation for 5G,” 3GPP, Tech. Rep. TR 23.791 (Release 16), 2019.