Fully probabilistic design for knowledge fusion between Bayesian filters under uniform disturbances

09/22/2021
by   Lenka Kuklišová Pavelková, et al.
0

This paper considers the problem of Bayesian transfer learning-based knowledge fusion between linear state-space processes driven by uniform state and observation noise processes. The target task conditions on probabilistic state predictor(s) supplied by the source filtering task(s) to improve its own state estimate. A joint model of the target and source(s) is not required and is not elicited. The resulting decision-making problem for choosing the optimal conditional target filtering distribution under incomplete modelling is solved via fully probabilistic design (FPD), i.e. via appropriate minimization of Kullback-Leibler divergence (KLD). The resulting FPD-optimal target learner is robust, in the sense that it can reject poor-quality source knowledge. In addition, the fact that this Bayesian transfer learning (BTL) scheme does not depend on a model of interaction between the source and target tasks ensures robustness to the misspecification of such a model. The latter is a problem that affects conventional transfer learning methods. The properties of the proposed BTL scheme are demonstrated via extensive simulations, and in comparison with two contemporary alternatives.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/18/2021

Transferring model structure in Bayesian transfer learning for Gaussian process regression

Bayesian transfer learning (BTL) is defined in this paper as the task of...
08/19/2019

Transfer Learning-Based Label Proportions Method with Data of Uncertainty

Learning with label proportions (LLP), which is a learning task that onl...
03/26/2019

Weighted Multisource Tradaboost

In this paper we propose an improved method for transfer learning that t...
10/20/2016

Kernel Alignment for Unsupervised Transfer Learning

The ability of a human being to extrapolate previously gained knowledge ...
08/26/2019

Improving Automatic Jazz Melody Generation by Transfer Learning Techniques

In this paper, we tackle the problem of transfer learning for Jazz autom...
06/08/2021

Adaptive transfer learning

In transfer learning, we wish to make inference about a target populatio...
06/09/2015

Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective

Transfer learning assumes classifiers of similar tasks share certain par...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Methods of data and information fusion are receiving much attention at present, because of their range of applications in industry 4.0, in the navigation and localization problems, in sensor networks, robotics, and so on Fou:17 AlaHan:20 .

The terms data fusion and information fusion are often used as synonyms. However, in some scenarios, the term data fusion is used for raw data that are obtained directly from the sensors, while the term information fusion concerns processed or transformed data. Other terms associated with data fusion include data combination, data aggregation, multi-sensor data fusion, and sensor fusion Cas:13 .

Data fusion techniques combine multiple sources in order to obtain improved (less expensive, higher quality, or more relevant) inferences and decisions compared to a single source. These techniques can be classified into three non-exclusive categories: (i) data association, (ii) state estimation, and (iii) decision fusion

Cas:13 . In this paper, we focus on state estimation methods.

Conventional data fusion methods work with multiple data channels from one common domain, and originating from the same source. In contrast, cross-domain fusion methods work with data in different domains, but related by a common latent object Zhe:2015 . Data from different domains cannot be merged directly. Instead, knowledge—or “information” above—has to be extracted from these data and only then fused. One method of knowledge fusion is transfer learning, also known as knowledge transfer. This framework aims to extract knowledge from a source domain via a source learning task and to use it in the target domain with a given target learning task. The domains or tasks may differ between the source and target PanYan:10 . Examples of successful deployment of transfer learning in data fusion are found in OuyLow:20 LinHuXiaAlhPir:20 . In accordance with the DIKW classification scheme proposed in Bed:20 , we will refer to transfer learning-based fusion as knowledge fusion.

The performance of transfer learning methods can be improved using computational intelligence Leetal:15 . Bayesian inference provides a consistent approach to building in computational intelligence. It does so via probabilistic uncertainty quantification in decision-making, taking into consideration the uncertainty associated with model parameters, as well as, the uncertainty associated with combining multiple sources of data. In the Bayesian transfer learning (BTL) framework—to be championed in this paper—the source and target can be related through a joint prior distribution, as in KarQiaDou:18 WanTsu:20

. BTL usually adopts a complete stochastic modelling framework, such as Bayesian networks

LiWanLiWan:20

, Bayesian neural networks

ChaKap:19 or hierarchical Bayesian approaches WilFerTad:12 . As already noted, these methods require a complete model of source-target interaction. In contrast, in PapQui:21

, BTL is defined as the task of conditioning a target probability distribution on a transferred source distribution. A dual-modeller framework is adopted, where the target modeller conditions on a probabilistic data predictor provided by an independent local source modeller. No joint interaction model between the source and target is specified, and so the source-conditional target distribution is non-unique and can be optimized in this incomplete modelling scenario. The target undertakes this distributional decision-making task optimally, by minimizing an approximate Kullback-Leibler divergence

KulLei:51 . This generalized approach to Bayesian conditioning in incomplete modelling scenario is known as fully probabilistic design QuiKarGuy:16 .

Our aim in this paper is to derive a BTL algorithm for knowledge fusion that will use knowledge from several source state-space filters to improve state estimation in a single target state-space filter. All (observational and modelling) uncertainties are assumed to be bounded. State estimation under bounded noises represents a significant focus for state filtering methods, since, in practice, the statistical properties of noises are rarely known, with only their bounds being available. They avoid the adoption of unbounded noises, that can lead to over-conservative design Ono:13 . To the best of our knowledge, the topic of BTL-based multi-task/filter state estimation with bounded noises has not yet been addressed in the literature, except in the author’s previous publications JirPavQui:19a ; JirPavQui:20 . In those papers, BTL between a pair of filters affected by bounded noises is presented. The source knowledge is represented by a bounded output (i.e. data) predictor. The optimal target state filtering distribution is then designed via FPD. In JirPavQui:19a , the support of the state inference is an orthotope, while in JirPavQui:20 , it is relaxed to a parallelotope.

There are fusion techniques for state estimation with bounded noises, but these are conventional fusion methods as defined above. Data fusion methods using set membership estimation are addressed, for instance, in WanSheXiaZhu:19 XiaYanQin:18 . In HanHor:00 , set membership and stochastic estimation are combined. In CheHoYu:17 , local Kalman-like estimates are computed in the presence of bounded noises. Particle filters Lietal:16 can also effectively solve the Bayesian estimation problem with bounded noises. However, they are computationally demanding. When used in data fusion context, reduced computational complexity is obtained in HoaDenHarSlo:15 . In WanSheZhuPan:16 and BalCaiCri:06 , particle filtering techniques and set membership approaches are combined.

The current paper significantly extends and formalizes results on BTL reported in the above-mentioned authors’ papers JirPavQui:19a and JirPavQui:20 . Both of those papers report an improvement in target performance in the case of concentrated source knowledge (positive transfer) and rejection of diffuse source knowledge (robust transfer). However, the improvement was only minor compared to the performance of the isolated target, whereas ad hoc proposed variants exhibited significantly improved positive transfer. In the current paper, we formalize the above-mentioned informal variant, showing it to be FPD-optimal. The task of transfer learning-based knowledge fusion with bounded noises is solved in the case where the transferred knowledge is the source’s probabilistic state predictor. An extension to multiple sources is also provided in this paper.

The paper is organized as follows: This section ends with a brief summary of the notation used throughout the paper. Section 2 presents the general problem of FPD-optimal Bayesian state inference and estimation in the target, conditioning on transferred knowledge from a source in the form of the probabilistic state predictor. In Section 3, these general results are specialized to source and target state-space models with uniform noises, and are finally extended to the case of multiple sources in Section 3.3. Section 4 provides the extensive simulation evidence to illustrate the performance of our FPD-optimal BTL scheme. Comparison with a contemporary (non-Bayesian) fusion method for uniformly driven state-space models is also provided, as well as comparison with a completely modelled Bayesian network approach. Section 5 concludes the paper. The proofs of all the theorems are provided in Appendix A.

Notation:

Matrices are denoted by capital letters (e.g.

), vectors and scalars by lowercase letters (e.g.

). is the (i,j)-th element of matrix . denotes the -th row of . denotes the length of a (column) vector , and denotes the set of . Vector inequalities, e.g. , as well as vector maximum and minimum operators, e.g. , are meant entry-wise.

is the identity matrix.

is the set indicator, equalling if and otherwise. is the value of a time-variant column vector , at a discrete time instant, ; is the -th entry of ; . is the Euclidean norm of and is the norm of

. Note that no notational distinction is made between a random variable and its realisation,

. The context will make clear which is meant.

2 FPD-optimal Bayesian transfer learning (FPD-BTL)

Assume two stochastically independent modellers, the source (with subscript S) and the target (without subscript), each modelling their local environment. Here, we will formulate the task of FPD-optimal Bayesian transfer learning (FPD-BTL) between this source and target, the aim being to improve the target’s model of its local environment via transfer of probabilistic knowledge from the source’s local environment, as depicted in Figure 1.

Before addressing the two-task context, let us recall the state estimation problem (filtering) for an isolated target, i.e. in the absence of knowledge transfer from a source.

In the Bayesian filtering framework Karat:05

, a system of interest is described by the following probability density functions (pdfs):

(1)

Here, is an -dimensional observable output, is an optional -dimensional known (exogenous) system input, and is an -dimensional unobservable (hidden) system state. We assume that (i) the hidden state process, , satisfies the Markov property; (ii) no direct relationship between input and output exists in the observation model; and (iii) the optional inputs constitute of a known sequence , , as already stated.

Figure 1: Bayesian transfer learning (BTL), involving probabilistic knowledge transfer from the source to the target Bayesian filter. The source filter transfers its state predictor, , statically at each time, , to the target filter to improve the target’s filtering performance. The source data, , are unobserved by the target, which computes an optimal conditional model of its local observations, i.e. , at each time, . Probabilistic knowledge flow is depicted by dashed lines above.

Bayesian filtering, i.e. the inference task of learning the unknown state process, , given the data history , involves sequential computation of the posterior pdf, . Specifically, is a (multivariate) sequence of observed data, , . Evolution of is described by a two-step recursion (the data update and time update) initialized with the prior pdf, (2), and ending with a data update at the final time, .

The data update (Bayes’ rule) processes the latest datum, :

(2)

The time update (marginalization) infers the evolution of the state at the next time:

(3)

Next, we return to two stochastically independent modellers, i.e. the source and target (Figure 1). Each filter models its local system, and , respectively. The target has access only to the (probabilistic) state predictor of the source, , but not to the actual data or states of the source (Figure 1).

In the isolated target task, the modeller’s complete knowledge about the evolution of its local state and output is expressed uniquely by the joint pdf (i.e. the numerator in (2)):

(4)

where and .

Now, performing knowledge transfer as depicted in Figure 1, the target joint pdf (4) must be conditioned by the transferred source state predictor, , and so the target’s knowledge-conditional joint pdf takes the form . Since no joint model of the source and target relationship is assumed, this pdf is non-unique, and unknown. Specifically, it is a variational quantity, , in a function set, , of possible candidates, as follows:

(5)

We now separately examine the two factors on the right-hand side of (5):

  1. The factor represents the target’s knowledge about its (local) state , after transfer of the source’s state predictor, , to the target. The target chooses to accept the source’s predictor as its own state model with full acceptance. The consequences of this definition will be discussed in Section 4.7. Based on this full acceptance, the target accepts that and are equal in distribution:

    (6)

    In consequence, the factor (6) is fixed in (5).

  2. The factor now remains as the only variational factor, being a consequence of the target’s choice not to elicit an interaction model between the source and target (Figure 1). According to (2), , i.e. the observation model is conditionally independent, given , of . This conditional independence is preserved by the knowledge transfer. Therefore,

    (7)

    The main design—i.e. decision—problem for the target is now to choose an optimal form of (7).

Inserting (6) and (7) into (5):

(8)

where and . The set, , of the target’s admissible joint models, , following knowledge transfer from the source, is therefore

(9)

The optimal pdf, , respecting both the transferred knowledge and the target filter behaviour, is sought using fully probabilistic design (FPD), which is an axiomatically justified procedure for distributional decision-making KarKro:12 , Ber:79 . It seeks , being the joint pdf (8) that minimizes the Kullback-Leibler divergence (KLD) (below) KulLei:51 from to the target’s fixed ideal, . This ideal is defined as (4), i.e. the joint pdf of the isolated target filter, modelling its behaviour prior to (i.e. without) the transfer of source knowledge. To summarize, the ideal pdf, and the knowledge-conditional pdf to be designed by the target are, according to (4), (8) and (9):

(10)
(11)

Recall that the KLD KulLei:51 from to is defined as

(12)

where denotes expectation with respect to . FPD consists in minimizing this KLD (12) objective as a function of in (11), for the fixed ideal (10), i.e.

(13)

(13) conditions the target’s knowledge about future on the transferred in an FPD-optimal manner. For simplicity, the superscript will be omitted in the resulting FPD-optimal pdf, i.e. .

We note the following:

  • The transferred source knowledge, , can be elicited in various ways that are unknown to the target; e.g. as an empirical distribution of a quantity similar to , or some unspecified distributional approximation, etc. QuiKarGuy:17 . In this paper, involving multiple state filtering tasks, we will assume that is the output of the source’s synchronized time update at (3).

  • In the authors’ previous publications JirPavQui:19a , JirPavQui:20 , FolQui:18 , it was the source data predictor which was transferred. Instead, here, for the first time, it is the source state predictor, , that is transferred. As we will see later, this setting ensures robust knowledge transfer.

Recall that our aim is to specialize FPD-optimal Bayesian transfer learning (FPD-BTL framework) defined in (10), (11), (13) to a pair of Bayesian filters under bounded observational and state noises. We now address this aim.

3 FPD-BTL between LSU-UOS filtering tasks

As noted in Section 1, we are specifically interested in knowledge processing among interacting Bayesian state-space filters with uniform noises (LSU models, see below). We therefore instantiate the FPD-optimal scheme (13) for conditioning the target’s observation model on the source’s transfered state predictor in this specific context. Firstly, in Section 3.1, we review the isolated LSU-UOS filter, and derive the approximate solution to the related state estimation problem. Then, the required instantiation of FPD-BTL to a pair of these LSU-UOS filters is presented in Section 3.2. In Section 3.3, the framework is extended to multiple LSU-UOS source filters, transferring probabilistic state knowledge to a single target.

3.1 LSU-UOS filtering task for the isolated target

The general stochastic system description in (2) is now instantiated as a linear state-space model Sod:02

(14)
(15)

where , , . , , are known model matrices of appropriate dimensions; and are additive random processes expressing observational and modelling uncertainties, respectively, and their stochastic model must now be specified. We assume that and

are mutually independent white noise processes uniformly distributed on

known supports of finite measure:

(16)

where , , with finite positive entries, and denotes the uniform pdf on an orthotopic support (UOS), as now defined.

Remark 0.

Consider a finite-dimensional vector random variable, , with realisations in the following bounded subset of :

(17)

where . This convex polytope, , is called an orthotope.

The uniform pdf of on the orthotopic support (17) called the UOS pdf is defined as

(18)

where .

Model (14), (15), (16), together with (18), defines the linear state-space mode with uniform additive noises on orthotopic supports, denote the LSU-UOS model. Its observation and state evolution models (2) are equivalently specified as

(19)
(20)

Exact Bayesian filtering for the LSU model (19) and (20)—i.e. computation of following (2) and (3)—is intractable, since the UOS class of pdfs (Remark 1) is not closed under those filtering operations. One consequence is that the dimension of the sufficient statistic of the filtering pdf (2) is unbounded as

grows i.e. at an infinite filtering horizon and so cannot be implemented (the curse of dimensionality 

Karat:05 ). In JirPavQui:19b ; PavJir:18 , approximate Bayesian filtering with the LSU model (19) and (20), closed within the UOS class (18), is proposed. This involves a local approximation after each data update (2) and time update (3), as recalled below. This tractable but approximate Bayesian filtering procedure will be called LSU-UOS Bayesian filtering.

3.1.1 LSU-UOS data update

Define a strip, , as a set in

bounded by two parallel hyperplanes, as follows:

(21)

Here are scalars, and .

In the data update (2), prior is processed together with in (19), and with the latest observation, , via Bayes’ rule. It starts at with . The resulting filtering pdf is uniformly distributed on a polytopic support that results from the intersection of the orthotopic support of and strips induced by the latest observation, :

(22)

In JirPavQui:19b , a local approximation is proposed so that the resulting polytopic support of (22) is circumscribed by an orthotope, giving

(23)

The approximate Bayesian sufficient statistics, and , process tractably , yielding an implementable algorithm. The details are provided in JirPavQui:19b .

3.1.2 LSU-UOS time update

It now remains to ensure that each data update (above) is, indeed, presented with a UOS output from the preceding time update, as presumed. In each time update, the UOS posterior, (23), is processed together with (20)—uniform on -dependent strips—via the marginalization operator in (3). The resulting pdf does have an orthotopic support, but is not uniform on it. In JirPavQui:19b , the following local approximation projects back into the UOS class, :

(24)

where

(25)

.

3.2 FPD-BTL between a pair of LSU-UOS filtering tasks

We now return to the central concern of this paper: the static FPD-optimal transfer of the state predictor, , from the source LSU-UOS filter (“the source task”) to the target LSU-UOS filter (“the target task”). The transfers will occur statically, , meaning that the marginal state predictor,

is transferred in each step of FPD-BTL. (For a derivation of joint source knowledge transfer—i.e. dynamic transfer—in Kalman filters, see 

PapQui:18 .)

Although there exists an explicit functional minimizer of (13) (see. QuiKarGuy:17 ), our specific purpose here is to instantiate this FPD-optimal solution for UOS-closed filtering in the source and target tasks, as defined in Section 3.1.

We propose that the FPD-optimal target knowledge-constrained observation model (13) (i.e. after transfer), , be uniform with its support, , bounded (here, our set notation emphasizes the fact that the support is a function of ). We now prove that this choice is closed under the FPD optimization (13). While the following theorem is formulated for uniform pdfs on general bounded sets, it is applied to our UOS class in the sequel.

Theorem 2.

Let the target’s ideal pdf in FPD (13) be its isolated joint predictor (10). Assume that the target’s (pre-transfer) state predictor, is uniform on bounded support, . is defined in (19). The transferred source state predictor, , is also uniform, with bounded support, . Define the bounded intersection (Figure 2):

(26)

Assume that the (unoptimized) variational target observation model,  (11), is also uniform with bounded support.

If , then the optimal choice of minimizing the FPD objective (13) is

(27)

where the FPD-optimal set of after transfer of the source knowledge is deduced to be .

If —a testable condition before transfer—then knowledge transfer is stopped,111

This decision is consistent with the definition of conditional probability.

and , i.e. the optimal target conditional observation model is defined to be that of the isolated target.

Proof.

See A.1

case 1 case 2
case 3
Figure 2: The mutual positions of the supports, of , and of . The cases 1, 2 and 3 are separately considered in the proof of Theorem 2.

The sets and are functions only of and , respectively, i.e. they are local statistics of the target or source tasks, respectively. In this way, FPD-BTL is effecting transfer of optimal statistics (knowledge) from source to target, in the spirit of knowledge fusion (Section 1). This is in contrast to any requirement to transfer raw data from the source, for processing in the target, as occurs in conventional multi-task inference (see Section 4). This property of transfer of source-optimal statistics to the target is a defining characteristic of FPD-BTL.

Corollary 3 (Specialization to the UOS case).

Orthotopic sets (17) are closed under the intersection operator (26) (if ). Specifically, if and , then the FPD-optimal set of after transfer (27) is

(28)
Corollary 4.

(27) constraints the allowed set of states in the target’s subsequent (i.e. post-transfer) processing of local datum, , via data update (2). The latter can be written as

(29)

Effectively, then the FPD-optimal transfer restricts the support of the target’s (prior isolated) state predictor, , to  (26), and this then forms the prior for the subsequent processing of the target’s local datum, via a conventional data update (29).

Additional notes:

  • The knowledge is processed sequentially in the target, i.e. firstly the target processes local , yielding ), (27); secondly, the target filter processes local (data update (29)); thirdly the target predicts via the local target time update (3), making available to the next (3-part) step of FPD-BTL its knowledge-conditional state predictor, . Knowledge transfer is therefore interleaved between the time and data updates.

  • The FPD-optimal intersection (26), (27) is a concentration operator in the inference scheme (see Figure 2), ensuring entropy reduction and consistency properties Vaa:98 which—though evident—are not proven here.

  • Recall the full acceptance of the source’s state predictor by the target, this induces a discontinuity between and in Theorem 2 (see Figure 2). This artefact will be discussed further in Section 4.7.

The implied algorithmic sequence for FPD-BTL between a pair of LSU-UOS filters is provided in Algorithm 1.

Initialization:
  • set the initial time and the final time

  • set prior values , for

  • set

  • set noise bounds , (16)

Recursion: for  do
  •  

    Knowledge transfer:

       transfer orthotopic (6)
and compute (29) via (26) and (28)
  •  

    Data update:

  • process local target datum, , into (23) via orthotopic approximation of (22), specified in JirPavQui:19b
  •  

    Time update:

  •        compute  (24) via (25)
    end for
    Termination: set
    • Knowledge transfer:

    transfer final orthotopic (6)
    and compute (29) via (26) and (28)
  • Data update:

  • process final local target datum, , into (23) via orthotopic approximation of (22) JirPavQui:19b
    Algorithm 1 FPD-BTL between two LSU-UOS filtering tasks

    3.3 FPD-BTL for multiple LSU-UOS sources and a single LSU-UOS target

    Here, we extend FPD-BTL (Sec. 3.2) to the case of multiple bounded-support sources, which can be specified to the case of multiple interacting LSU-UOS tasks, again via Corollaries 3 and 4. Assume the same scenario as in Figure 1, with one target but, now, sources, (i.e.  interacting LSU-OUS tasks in total). Once again, the instantiation of the tasks is avoided (i.e. incomplete modelling). Each source provides its state predictor , , statically, , to the target in the same way as in the single source setting.

    Theorem 5.

    Let there be state-space filters, , , …, , having bounded supports of their state predictors, , ,…, , respectively. Assume is the target filter, and , …, the source filters. Then the FPD-optimal target observation model after transfer for the source state predictors is

    where

    (30)
    Proof.

    See A.2. ∎

    4 Simulations studies

    In this section, we provide a detailed study of the performance of the proposed Bayesian transfer learning algorithm (FPD-BTL) between LSU-UOS filtering tasks. We compare it to Bayesian complete (network) modelling (BCM, to be defined below) for the UOS class and to the distributed set-membership fusion algorithm (DSMF) for ellipsoidal sets WanSheXiaZhu:19 , which also involves complete modelling of the networked LSU filters.

    In the design of these comparative experiments, our principal concerns are the following:

    1. To study the influence of the number of sources on the performance of the target filter in FPD-BTL (experiment #1).

    2. To compare FPD-BTL to complete modelling alternatives (BCM and DSMF, experiment #2).

    3. To study the robustness of FPD-BTL—which does not require for tasks interaction (i.e. it is incompletely modelled)—to model mismatches that inevitably occur between source and target tasks in the complete modelling approaches (BCM and DSMF) (experiments #3–#5).

    4. To assess the computational demands of the proposed FPD-BTL algorithm in comparison to the competitive methods (BCM and DSMF).

    Section 4.1 explains the necessary background, emphasizing the important distinction between the synthetic and analytic model in these simulation studies. Then, model mismatch and its types are specified (Sections 4.1.1 and 4.1.2). The specific LSU-UOS systems (19), (20) used in our studies are described in Section 4.2. The completely modelled alternatives (BCM and DSMF) are reviewed in Section 4.3, and the evaluation criteria are defined in Section 4.4. Then, the experimental results are presented and discussed in Section 4.5, before overall findings are collected and interpreted in Section 4.7.

    4.1 Synthetic vs. analytic models

    In computer-based simulations—such as those which follow—we explicitly distinguish between the synthetic model, used for data generation, and the analytic model on which the derived state estimation algorithm depends. The synthetic model can be understood as an abstraction of a natural (physical) data-generating process, while the analytic model is a subjective (i.e. epistemic Jay:03 )—and inevitably approximate—description of this process adopted by the inference task (here, the LSU-UOS filters).

    a) b) c)
    Figure 3: Models (synthetic and/or analytic) for a pair of state-space observation processes, and . a) V-shaped graph, single modeller; b) U-shaped graph, single modeller; c) multiple modellers. A frame denotes a modeller, that observes data (shaded nodes), and for which stochastic dependencies within the frame are known. Conditional and joint probability models of these dependencies are represented by arrows and by lines (i.e. undirected and directed edges), respectively, as usual KolFri:09 . The schemes show the marginal relationship between states and observations within one filtering step of dynamic modelling, but not the (temporal) dynamics themselves.

    Figure 3 shows three models for a pair of state-space filters adopted in this paper, as either synthetic or analytic models (or both). If the V-shaped graph (Figure 3a) is used as the synthetic model, state sequence is realized commonly for all the filters, via the Markov state process (15), and then locally corrupted via independent, additive, white UOS observation noise processes  (14). If the analytic model is also the V-shaped graph (Figure 3a) with known parameters, then we refer to this as complete modelling, as adopted in BCM and DSMF.

    The U-shaped graph (Figure 3b) is adopted as the synthetic model in some of the experiments below. here, the target state sequence, and source state sequence are synthesized as distinct—but mutually correlated—processes, with an appropriate fully specified interaction model between and . The U-shaped graph is not used as an analytic model in this paper.

    As already explained in sections 2 and 3, the multiple modeller approach (Figure 3c) is adopted as the analytic model only in our proposed FPD-BTL approach, expressing the fact that the target elicits no model of the source process or of its relationship to it. The source and target analytic models are therefore stochastically independent, and can be interpreted as independent 2-node marginals of a (unspecified) 4-node complete model. This arrangement respects the key notion of local expertise, i.e. the commonly encountered situation in distributed inference where the source is a better local analytic modeller (i.e. expert) of its local data than the remote target modeller ever can be.

    In the simulation studies most frequently encountered in the literature, the synthetic and analytic models are implicitly assumed to be identical. In the computer-based synthetic-data experiments below, modelling mismatch can be explored, and is, indeed, our priority. However, in real-data studies, the notion of a synthetic model is inadmissible Jay:03 . It follows that there is therefore almost-sure mismatch between the (typically unknowable) “truth theory” of synthesis Jay:03 and the analysis model which prescribes the adopted algorithm. It is for this reason that a study of analysis-synthesis modelling mismatch—as provided below—is of key importance, particularly in assessment of the robustness or fragility of the algorithm to such mismatches.

    In the forthcoming simulations, modelling mismatch will be arranged at the level of the state sequence(s), either via mismatches in the process noise  (16), or state matrix mismatches,  (15). These are detailed in the next two subsections.

    4.1.1 State noise mismatch

    The target filter’s state at time is synthesized according to (15) with a uniform noise,  (16). If the sequence is common for both the filters, data synthesis is described by the V-shaped graph (Figure 3a), as already noted in Section 4.1. Synthesis via the U-shaped graph (Figure 3b) realizes distinct state processes, and , via the operating parameter, , which controls the interaction (i.e. correlation) between them:

    (31)

    Here, and are mutually independent, white UOS state processes in . The source’s analytic model is also given by (15) with perfectly matched parameters. However, we will assume that the target modeller is unaware of the mismatching noise process, , and so the target’s analytic model of their local state process, , is also (15). This enforces a mismatch between the target’s synthetic model (4.1.1), and its analysis model (15), via the state noise mismatch process,  (4.1.1). Note that if , the source and target synthesized states are identical (Figure 3a) and matched to the source and target analysis models(s). In contrast, if , then the marginal synthesis model (i.e. pdf) of

    is trapezoidal (being the convolution of two uniform pdfs) with increased variance, while the target’s mismatched local analytic state model is uniform (

    20).

    4.1.2 State matrix mismatch in the analytic models

    In this section, we distinguish between the state matrix (15) in synthesis, , of the common state process, , (Figure 3a) ), and the state matrix/matrices used in analysis, . Specifically, we set (i.e. no synthesis-analysis mismatch in the source), but (i.e. mismatch in the target). There are several ways to achieve :

    1. Modification of the eigenvalues of invertible

      . In the target analytic model, we modify the eigenvalues of geometrically in one of two ways:

      1. Radial shift: a selected eigenvalue , of is multiplied by a real scalar operating parameter , i.e. , while maintaining Hermitian symmetry.

      2. Rotation: here, is multiplied by a factor, , where the angle of rotation, , is the operating parameter, i.e. . Once again, Hermitian symmetry is maintained.

    2. Multiplication of by a scalar, , i.e. . In this case, all eigenvalues of experience the same radial shift.

    4.2 The synthesis models

    The following specific LSU systems (14), (15), are simulated in the upcoming experiments, i.e. they specify the synthesis model for both and in the V-shaped graph (Figure 3a) or in the U-shaped graph (Figure 3b), as specified in (4.1.1). The uncertainty parameters, and  (16), are specified in each experiment.

    • A second-order system with two complex conjugate poles, described by (14) and (15), with , , , and

      (32)

      This system is studied in Fri:12 , being the discretization and randomization of the continuous-time system, , with sampling period, , and with added random processes, and , representing observational and modelling (i.e. state) uncertainties, respectively.

    • A third-order system with 3 distinct real poles, described by (14) and (15) with , , , and

      (33)

    4.3 Alternative multivariate inference algorithms

    The key distinguishing attribute of our FPD-BTL algorithm is its multiple modeller approach with incomplete modelling of the interaction between the tasks. Its defining characteristic—the transfer of source sufficient statistics and not raw data—for processing at the target, distinguishes it from methods that adopt a complete model of the networked tasks, often involving joint processing—at the target or other fusion centre—of the multiple raw data channels. We will reserve the term transfer learning (TL) for the former (FPD-BTL in the case of our FPD-optimal Bayesian TL scheme), and refer to the latter as multivariate inference schemes. We will compare FPD-BTL against two approaches to the latter: (i) Bayesian multivariate inference (Section 4.3.1) consistent with a complete analysis model (i.e. V-shaped network graph in Figure 3a); and (ii) distributed set-membership fusion (DSMF) (Section 4.3.2), a state-of-the-art, non-probabilistic, fusion-based state estimation algorithm WanSheXiaZhu:19 .

    4.3.1 Bayesian complete modelling (BCM)

    Here, it is assumed that the LSU filters, indexed by , consist of conditionally independent observation models with common Markov state evolution model (15) (i.e. the V-shaped graph as analytic model, Figure 3a). The