1 Introduction
The digitisation of the economy and society had a profound impact on how researchers study economic interactions. The availability of enormous granular datasets has led to innovations in datadriven machine learning approaches, excelling in forecasting. However, the opaqueness of these approaches has been proven to limit the usefulness to policymakers. We thus develop a scalable inference model which combines innovations of popular machine learning methods such as variational approaches and nonlinear forecasting methods with interpretable parametric models to study economic interactions at scale. We apply the model to study an emerging part of this new digital economy.
A prime example of this digitisation is the introduction of decentralised digital currencies which had a big impact on the global financial system. Blockchain, by using decentralised architecture, has been a disruptive innovation in terms of global transactions. With increasing interest in cryptocurrency trading, especially in ERC20 tokens, and increase in the number of exchanges providing a platform for the users to trade with ease, cryptocurrency trading has now become a complex economic system with complicated dynamics between price, trading volume and tokenflow.
Due to a large number of transactions and with every transaction being accessible online and traceable to some extent Bohme et al. [2015], we are left with granular data which can be used for extracting valuable new information about this ecosystem, like, interactions between onchain (transaction amount and token inflow) and offchain (trade data like closing price) ecosystems. Both these features being complementary, create a complex dynamical system similar to what we see in traditional financial markets. This granular dataset, in combination with our model, provides a datadriven approach to inform policymakers and can provide valuable information when creating new regulations to protect investors in this yet largely unregulated market.
Previous works have successfully modelled cryptocurrencies as a dynamical system using TimeVarying Parameter  Vector Autoregression (TVPVAR)
HotzBehofsits et al. [2018b], Nadler et al. [2019b]. They note that a timevarying regression coefficient could ideally model this ecosystem. As noted by Nadler et al. [2019b] and Haldane [2018], the economic model needs to scale to deal with big data appropriately. We build on the previous research and improve the Bayesian inference in a way that is scalable for big data. We introduce a novel algorithm, TimeVarying Parameter  Vector Autoregression  Variational Inference (TVPVARVI) to tackle this problem. We test the proposed approach on synthetic data to validate and perform benchmarking. We then extended it to realworld blockchain data. We use this inference technique to interpret an otherwise unobservable parameter representing the dynamics of the blockchain ecosystem. The resulting parameter vector in latent space is used to visualise and interpret the influence of onchain transactions on trading data, pricing and market dynamics. We use Bayesian inference as opposed to a neural network, as a simplistic NN forecasting architecture would yield a uninterpretable weight matrix in latent space. The parameter estimated using our technique can be interpreted visually, which has been further discussed in Section
5.We also challenge the simplistic statespace model for the evolution of the latent parameter. We propose a new technique to improve the statespace modelling of the ecosystem, by incorporating nonlinearity in our modelling by using datadriven approaches and develop a neural network architecture called TVPVARNet.
In short, the Bayesian inference model helps us in obtaining the latent dynamic parameter which describes the interrelationship of economic variables dynamically over time whose structure economists can exploit for policy analysis. Our methodology is especially relevant for very large and granular datasets which would be computationally prohibitive for many established econometric models, since our model is a scalable algorithm for big data. Extending TVPVARVI, we develop TVPVARNet model which helps in overcoming the shortcoming of the simplistic statespace to forecast this latent parameter in case of outofsample nstep predictions.
2 Background
In the following, we will denote with , the time evolving latent variables and as observations. denotes the operator used to evolve the latent variables in time
(1) 
where is Gaussian background error with covariance matrix . Finally, : will denote the operator which maps the state variables to the observations.
(2) 
where is Gaussian model error with with covariance matrix .
2.1 Bayesian Inference
The objective of Bayesian inference is to update the belief of latent space (analysis, ) by ingesting the observation () and prior/model information () at each time step, t. The prior distribution here is the PDF of the latent state when no observations are available, (i.e. output of the model (1)). The likelihood is the observations we get using the latent space in (2). Whereas, the posterior represents the final updated latent state, after considering observations Bannister . Modelling these PDFs as gaussian process, with prior mean as , covariance Q and likelihood mean as , covariance R, we get
(3) 
We seek a solution maximising the posterior probability,
which means minimising the negative log likelihood, or(4) 
Proven techniques for such inference schemes are the Kalman filter and Variational inference. The Kalman filter
Kalman [1960] approaches the problem statement sequentially by analytically solving the cost function and calculating a weighted average, known as Kalman Gain. In contrast, the Variational inference method minimises the same cost function that reduces the analysis gap between both time distributed observations and model solution Asch et al. [2016], thus working in a continuous way.In the later sections, we will be reporting results of both Kalman and Variational inference methods, modified and implemented to suit TVPVAR modelling. This is further extended to the blockchain data to understand the latent state dynamics.
2.2 Ethereum and ERC20 tokens
Ethereum is a specialised decentralised network that, along with recording transactions on blockchains, allows the creation of smart contracts. This gives an environment to create decentralised applications (DAPPs). A detailed description of Ethereum’s architecture in available in Wood and others .
ERC20 Tokens An Ethereum Token is a digital, blockchainbased asset created on top of the Ethereum network created using a smart contract. These tokens serve as proof of ownership of an amount of gold or a house.
Figure 1 revealed that the market capital of ERC20 tokens now represents 49% of total assets on Ethereum. This increased share incentivizes economic activity using ERC20 tokens, providing excellent avenues for researching the trading activities and dynamics existing within this ecosystem.
Exchanges Cryptocurrency exchanges are online platforms where customers can trade one kind of digital asset or currency for another based on the market value of the given assets. These exchanges are intermediary between buyers and sellers of the cryptocurrency similar to the traditional stock exchange platforms. The most popular exchanges are currently Binance, GDAX, Poloniex etc Frankenfield [2020]. Hence, we can see that these exchanges serve an integral part of the trading ecosystem in blockchain. These exchanges together process nearly $10 billion Blockgeeks and Blockgeeks [2020], playing a major role in driving this ecosystem.
3 Blockchain Ecosystem: Econophysical Analysis
3.1 Dynamic system formulation
We model the onchain (transaction flows and amounts) and offchain trading activities on blockchain exchanges as a Time Varying Parameter Vector Autoregressive model (TVPVAR).
TVPVAR embodies a system which has a set of vector autoregressive coefficients of timevarying nature. TVPVAR has been used to model price dynamics of various cryptocurrencies in other literature HotzBehofsits et al. [2018a] as well. The dynamic relation in the blockchain ecosystem can be time varying and thus TVP VAR helps us in accommodating this shift by varying coefficients, allowing the model parameters to vary across time. The multivariate lagged VAR is
where is vector of observations, a vector of means and is a coefficient matrix and is the lag length. The model can be expressed in compact notation using and . With , we get of dimension and as . We further define using Kronecker product and as identity matrix and define , stacking the columns of a matrix, dim . This allows us to rewrite the TVPVAR in compact notation. Further details can be found in Nakajima and others [2011], Nadler et al. [2019a]:
(5)  
(6) 
Where and
are zero mean error covariances with
and . is the timevarying coefficient or the latent parameter that define the dynamic of the economic system. is the forward model, similar to in (1) to evolve in time.3.2 Link to Bayesian inference
A lot of times, the state representing interrelationship may evolve with time, however, this latent state can be unknown and not directly observable. These have to be inferred from the measurements/observations, which may be very sparse and noisy. Bayesian inference can help in analysing this latent space.
TVPVAR with Bayesian inference is used in estimating the latent blockchain dynamics, in (5). This method enables the use of both sparse and highly aggregated data effectively to infer dynamics to analyse the effect of trading actions on blockchain onchain activities or how token flow effect the pricing actions on exchanges.
We can see that the (5) and (6) relate to the statespace (1) and (2) discussed above.
Tweaking the statespace equations can give us a TVPVAR formulation of our econophysical system and hence modified inference approaches can help us determine the unobserved state variable, i.e. gives us insight to the interrelationship between economic variables like and .
4 Our proposed TVPVARVI: Variational Inference
Native variational data inference needs to be adapted for modelling the statespace economic timevarying model to incorporate observations and interpret the unobserved model parameter, . We propose a new methodology to combine TVPVAR with variational inference, formulating a novel algorithm called TVPVARVI. Rewriting (4) compactly and adapting it to be inline with TVPVAR equations (5),(6) we get a cost function that reduces the analysis gap between time distributed observations and model solution.
(7) 
We minimise the cost function in (7) so as to incorporate observations, and exogenous input , to infer latent parameter . Further details can be found in previous work by Nadler et al. [2019b].
In (7), following notations are used:

[noitemsep]

is window size, forecasts from this time window are used for optimising the initial value.

4D Variational approach is performed over a time window , where is the optimised initial beta value for start of the window at t. Optimising (7) gives us every time steps.

F, model forecast is identity, to evolved
as a random walk process. We later try to learn the F operator, i.e evolution of beta using deep learning.

is the background/prior beta, updated to as observations are ingested.
When , the cost function becomes that of 3DVar (as time window, the 4th dimension in 4D no longer exists) and it minimises the function considering observations only that time. We propose the following algorithm combining TVPVAR with variational approach.
As we see, there is no update step similar to Kalman (calculation of Kalman gain matrix); beta is inferred using iterative minimisation of the cost.
It is important to note that we need to update the background beta, to optimal obtained from the minimisation and use this updated as a starting point prior for the optimiser. Convergence is slower if a random starting point prior is used.
4.1 Implementation details
A challenging part of implementing TVPVARVI was ﬁnding a way to computationally minimise the function in (7
). We used automatic differentiation provided by TensorFlow. By calculating the function gradient from a graph, it allows us to free ourselves from deriving the gradient by hand, which is often infeasible and complex.
We empirically observed that Limitedmemory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) Nocedal and Wright [2006] optimiser worked best for us, LBFGS converged much faster (on an average 14% reduction in execution time) and over performed prevalent first order optimisers like Adam, SGD by reducing Mean squared error (MSE) in order of magnitude of 5.
LBFGS, being one of the effective QuasiNewton methods, gives us the ability to incorporate second order information (by estimating the Hessian) and hence could be the reason for optimal and better convergence given its ability to represent complex structures effectively.
5 Experiments
Synthetic data
We start with a wellbehaved TVPVAR synthetic data to evaluate the accuracy and scalability of the two approaches for big data as it gives us the ground truth for the latent parameters to validate our algorithms on. We then use the variational inference for offchain and onchain blockchain dataset after performing benchmarking. The data generation is similar to the one in Nadler et al. [2019a],
(8)  
where represents the time varying state variable (), and the errors,
are assumed to be Gaussian white noise. Data matrix
is generated as standardised i.i.d process. Note that the forward model operator for is identity here. Figure 2 shows the sample beta and forecast plots estimated by VPVARVI, i.e. our Variational approach.We use the Mean square forecast error as our assessment metric. We implemented both the algorithms, TVPVAR  Kalman and our novel TVPVARVI using TensorFlowgpu. This is done to ensure that the time difference noted was not due to difference in implementation details and technology used. Table 1 shows the results for both approaches.
We can observe that as the problem size (n dimension ) increases, the execution time increases for the Kalman by 1040%. In contrast, for variational, the increase is hardly 1% for large problem size. With the increase in dimension, we see an increase in MSE for Variational, however the magnitude order () remains the same, justifying the slight increase in MSE tradeoff for lower execution time.
This can also be explained by looking at time complexities for the worstcase scenario. Time Complexity for matrix inversion (Kalman Gain) is about (n is the size of observation  N dim) whereas for function minimisation using LBFGS is where m is the size of hessian history stored in memory, which is a constant hyperparam for LBFGS.
Another observation is that on comparing the experiments ran for both 4DVar (window 1, represented by the entry that has a missing value for Kalman, window = 2)) and 3DVar (window = 1), 4DVar has longer execution time, suggesting difficulty to reach the minima. We believe the reason could be that adding future observations (as part of the summation of values in a window) in the cost function could lead to a nonconvex function with many local minima.
5.1 Blockchain data
After experimenting with well behaved TVPVAR synthetic data, we empirically observed that as the dimension of data increases, the 3DVar algorithm becomes more scalable; hence we use TVPVARVI algorithm to interpolate latent parameters,
for the actual blockchain datasets.The dataset of interest is the ERC20 tokens being traded on Ethereum. We deal with mainly two data sources, ERC20 tokens onchain data; and offchain data from exchanges.
OnChain Data: We identify onchain data as trades on Ethereum, amount of ERC20 tokens traded between blockchain addresses, timestamp of the transaction, number of transactions etc.
OffChain Data: We refer to the offchain data as openclosing prices and trading volumes obtained using https://minapi.cryptocompare.com per exchange and ERC20 token.
Applying this methodology to economic data can help us infer the state of the system, how some features affect/drive other features. The main features used for blockchain dataset analysis were amount sent to/received by exchanges, number of transactions sent or received, the closing price of the token being analysed and volume from or to the token being traded.
For example, we tried to visually interpret the effect of onchain token inflow on returns and viceversa for BNB token on Binance exchange.
As follows from the Figure 4, the dynamic is erratic and chaotic for onchain effect on trading prices, leading to the suspicion that the interaction is not structural. Prominent theories exist for traditional exchanges describing that inflow drives the price of the asset. It is very likely not the case for Ethereum ecosystem as suggested by our interpretation of the dynamic.
However, we see a more prominent indication of the dynamic that represents the effect of returns on onchain outflow/inflow of tokens when looking at Fig 4.
This implies a onesided relationship, with token inflows having an erratic and weak relation on returns; whereas returns having a structural impact on the inflow. These results were also corroborated by experiments for other ERC20 tokens like OMG and TRX.
This is possible evidence that the price and trading action probably attracts attention and drives the inﬂow of tokens towards a particular exchange or it could be due to arbitrageurs who benefit from price difference and cause the fluctuation of tokenflow
Nadler et al. . However, onchain flows don’t drive prices. It could be that over time the impact of onchain activity has become more decoupled from price movements on exchanges. This can serve as an important indication factor to policymakers, which can help them to create and shape future regulations concerning ERC20 tokens. It shows, for example, price manipulation is unlikely to be driven by activities on the chain; thus, regulation can focus on exchange specific risk factors such as orderbook fraud or sentimental actions.6 Improving statespace modelling
The statespace model stated in (6) assumes that approximation of the beta evolution function, F is known. For the complex dynamics inherent in these economic systems, it is hard to know F, and with increasing dimensions, the nonlinearities would become stronger; thus linear assumption of the forward model failing Pawar et al. [2020].
The TVPVARVI would be able to update the prior estimate of the latent parameter when observations are available; however, the statespace model would fail for outofsample predictions.
As seen in plots above, there have been fluctuations in values of that might not be driven only by the previous value and innovation but also a nonlinear combination of some lagged values of .
Running TVPVARVI provides us with a complete state of time series of the latent parameter, , with which we can obtain a surrogate forward model using machine learning tools.
Learning evolution of the unobserved parameter has been challenging. We were trying to predict with a time series which had high volatility and was chaotic in nature. Also, multivariatetime series forecasting has been challenging ongoing research as it needs to appropriately leverage the dynamics of multiple variables Lai et al. [2018]. Hence, different architectures were tried before settling on the final model.
6.1 TVPVARNet Model
Figure 5 presents an overview of our proposed architecture, TVPVARNet to model latent parameter dynamics of blockchain data.
The architecture benefits from an ensemble of two neural networks, an LSTM and an Autoregressive dense network with their outputs combined. The LSTMs has been successful due to its ability to capture and effectively exploit temporal dependencies, and take history into account for future state prediction.
However, as suspected by Lai et al. [2018], LSTM could lead to an issue where the scale of outputs might not be sensitive to the scale of inputs. In the evolution of our unobserved parameter, we see constant changes which are also in a nonperiodic manner, resulting in poor forecasting of the model with only LSTM. To address this issue, similar in spirit to the LSTNet Lai et al. [2018], we
combine the final prediction of LSTM with
a linear part, hence the model benefits from nonlinear part having recurrent patterns through LSTM and a linear part through this component. By plotting the predictions on training and test set using only LSTM (see Figures 67) and using the TVPVARNet (see Figures 89), we can understand the cruciality of the Autoregressive component.
TVPVARNet does a good job in generalising, by working appropriately for both kinds of evolution, chaotic and structured. Our architecture was able to adapt and forecast a satisfactory result for twostep ahead predictions.
It is important to note that the combination of TVPVARVI and TVPVARNet is crucial. TVPVARVI provides us with a set of latent parameters to train our model, whereas TVPVARNet helps us with the outofsample forecast. As and when new observations are available, TVPVARVI can be run to obtain updated values to ﬁne tune the TVPVARNet hence giving us the ability to bring continuous correction to our machine learning model.
7 Conclusion, future work and social impact
The findings of this study could be understood as further validation of treating the cryptocurrency system as an econophysical system. By ingesting data from various onchain transactions and trading information, we perform Bayesian inference to infer the latent timeseries, which can significantly help in analysing the dynamics that exist in the economic system. We built a scalable novel TVPVARVI algorithm. Before TVPVARVI, the Kalman Filter approach was computationally expensive. We were able to reduce the execution time in the range of 1040% as dimensions increased.
Our interpretation and results of the unobserved state are broadly consistent with conclusions drawn by Nadler et.al. The most prominent finding to emerge was the direction of interaction between onchain token transaction/flows, and offchain trading action was onesided. Token inflow’s effect on trading actions resulted in weak and chaotic interaction; however, has a highpersistent effect when reversing the roles. Trading price action has a considerate impact on driving the inflow of tokens for an exchange. We also observed that change in trading volume also had a structural effect on the returns.
One of the other significant contributions is challenging the latent statespace evolution and replacing it with a more nuanced, nonlinear neural network model. The TVPVARNet architecture models both kinds of latent dynamics: chaotic and structural. The relevance of having a surrogate forward model for the latent parameter evolution is supported by the fact that the outofsample forecast faces many hurdles with the simplistic statespace model. Our TVPVARNet brings the possibility to forecast latent parameters for certain future timesteps effectively.
One of the directions for future work includes further analysis of this dynamic on highly granular orderbook data. Ideally, these experiments can be replicated to any system that can be modelled as a TVPVAR model, to understand their latent dynamics in a scalable fashion.
Social Impact The insight to the timevarying dynamical interrelationship between tokenflow, price movements and volume activities can give muchneeded information about price and market dynamics in the blockchain ecosystem, which has been unexplored and unregulated. We believe that the traditional financial market rules do not apply to a peertopeer based financial market, hence can pave the way for blockchain analyst and policymakers to understand what factors might or might not affect said movements. This intern can help in avoiding fraudulent activities or arbitrage. Our work can we extended to any econophysical system modelled using TVPVAR equations.
References
 Data Assimilation: Methods, Algorithms, and Applications. Society for Industrial and Applied Mathematics. External Links: Document, Link, ISBN 9781611974539 9781611974546 Cited by: §2.1.
 [2] Variational data assimilation background and methods. Univ of Reading. External Links: Link Cited by: §2.1.
 What is the 0x project? the most comprehensive guide ever written. External Links: Link Cited by: §2.2.
 Bitcoin: economics, technology, and governance. Journal of Economic Perspectives 29 (2), pp. 213–38. External Links: Document, Link Cited by: §1.
 [5] Crypto news, pricing, and research. External Links: Link Cited by: Figure 1.
 Bitcoin exchange definition. Investopedia. External Links: Link Cited by: §2.2.
 Will big data keep its promise. Speech at the Bank of England Data Analytics for Finance and Macro Research Centre, King’s Business School. Cited by: §1.
 Predicting cryptocurrencies using sparse nongaussian state space models. Journal of Forecasting 37 (6), pp. 627–640. External Links: Document, Link, https://onlinelibrary.wiley.com/doi/pdf/10.1002/for.2524 Cited by: §3.1.
 Predicting cryptocurrencies using sparse nongaussian state space models. Journal of Forecasting 37 (6), pp. 627–640. Cited by: §1.
 On the general theory of control systems. In Proceedings First International Conference on Automatic Control, Moscow, USSR, pp. 481–492. Cited by: §2.1.
 Modeling longand shortterm temporal patterns with deep neural networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 95–104. Cited by: §6.1, §6.
 A Scalable Approach to Econometric Inference. pp. 10. Cited by: §3.1, §5.
 Data assimilation for parameter estimation in economic modelling. In 2019 15th International Conference on SignalImage Technology & InternetBased Systems (SITIS), pp. 649–656. Cited by: §1, §4.
 [14] An Econophysical Analysis of the Blockchain Ecosystem. pp. 16. Cited by: §5.1.
 Timevarying parameter var model with stochastic volatility: an overview of methodology and empirical applications. Technical report Institute for Monetary and Economic Studies, Bank of Japan. Cited by: §3.1.
 Numerical optimization second edition. External Links: Link Cited by: §4.1.
 Long shortterm memory embedded nudging schemes for nonlinear data assimilation of geophysical flows. arXiv preprint arXiv:2005.11296. Cited by: §6.
 [18] Ethereum: a secure decentralised generalised transaction ledger. Cited by: §2.2.