Stochastic Variational Methods in Generalized Hidden Semi-Markov Models to Characterize Functionality in Random Heteropolymers
Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP), which transport protons across lipid bilayers selectively and rapidly like natural proton channels. We utilized the probabilistic graphical model framework and developed a generalized hidden semi-Markov model (GHSMM-RHP) to extract the function-determining sequence features, including the transmembrane segments within a chain and the sequence heterogeneity among different chains. We developed stochastic variational methods for efficient inference on parameter estimation and predictions, and empirically studied their computational performance from a comparative perspective on Bayesian (i.e., stochastic variational Bayes) versus frequentist (i.e., stochastic variational expectation-maximization) frameworks that have been studied separately before. The real data results agree well with the laboratory experiments, and suggest GHSMM-RHP's potential in predicting protein-like behavior at the polymer-chain level.
READ FULL TEXT