Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-22T15:56:00.645Z Has data issue: false hasContentIssue false

Bayesian dynamic modeling and monitoring of network flows

Published online by Cambridge University Press:  23 September 2019

Xi Chen*
Affiliation:
LinkedIn Corporation, Sunnyvale, CA 94085, USA
David Banks
Affiliation:
Department of Statistical Science, Duke University, Durham, NC 27708-0251, USA (e-mails: David.Banks@duke.edu, Mike.West@duke.edu)
Mike West
Affiliation:
Department of Statistical Science, Duke University, Durham, NC 27708-0251, USA (e-mails: David.Banks@duke.edu, Mike.West@duke.edu)
*
*Corresponding author. Email: chenxi199008@gmail.com

Abstract

In the context of a motivating study of dynamic network flow data on a large-scale e-commerce website, we develop Bayesian models for online/sequential analysis for monitoring and adapting to changes reflected in node–node traffic. For large-scale networks, we customize core Bayesian time series analysis methods using dynamic generalized linear models (DGLMs). These are integrated into the context of multivariate networks using the concept of decouple/recouple that was recently introduced in multivariate time series. This method enables flexible dynamic modeling of flows on large-scale networks and exploitation of partial parallelization of analysis while maintaining coherence with an over-arching multivariate dynamic flow model. This approach is anchored in a case study on Internet data, with flows of visitors to a commercial news website defining a long time series of node–node counts on over 56,000 node pairs. Central questions include characterizing inherent stochasticity in traffic patterns, understanding node–node interactions, adapting to dynamic changes in flows and allowing for sensitive monitoring to flag anomalies. The methodology of dynamic network DGLMs applies to many dynamic network flow studies.

Type
Original Article
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anacleto, O., Queen, C., & Albers, C. J. (2013a). Forecasting multivariate road traffic flows using Bayesian dynamic graphical models, splines and others traffic variables. Australian and New Zealand Journal of Statistics, 55, 6986.CrossRefGoogle Scholar
Anacleto, O., Queen, C., & Albers, C. J. (2013b). Multivariate forecasting of road traffic flows in the presence of heteroscedasticity and measurement errors. Journal of the Royal Statistical Society (Series C: Applied Statistics), 62, 251270.CrossRefGoogle Scholar
Berry, L. R., & West, M. (2019). Bayesian forecasting of many count-valued time series. Journal of Business and Economic Statistics (in press).CrossRefGoogle Scholar
Berry, L. R., Helman, P., & West, M. (2019). Probabilistic forecasting of heterogeneous consumer transaction-sales time series. International Journal of Forecasting (in press).Google Scholar
Bianchi, D., Billio, M., Casarin, R., & Guidolin, M. (2018). Modeling systemic risk with Markov switching graphical SUR models. Journal of Econometrics, 210, 5874.CrossRefGoogle Scholar
Chen, X., Irie, K., Banks, D., Haslinger, R., Thomas, J., & West, M. (2018). Scalable Bayesian modeling, monitoring and analysis of dynamic network flow data. Journal of the American Statistical Association, 113, 519533.CrossRefGoogle Scholar
Congdon, P. (2000). A Bayesian approach to prediction using the gravity model, with an application to patient flow modeling. Geographical Analysis, 32, 205224.CrossRefGoogle Scholar
Giot, L., Bader, J. S., Brouwer, C., Chaudhuri, A., Kuang, B., Li, Y., Hao, Y. L., Ooi, C. E., Godwin, B., Vitols, E. and Vijayadamodar, G. (2003). A protein interaction map of Drosophila Melanogaster. Science, 302, 17271736.CrossRefGoogle ScholarPubMed
Giraitis, L., Kapetanios, G., Wetherilt, A., & Žikeš, F. (2016). Estimating the dynamics and persistence of financial networks, with an application to the Sterling money market. Journal of Applied Econometrics, 31, 5884.CrossRefGoogle Scholar
Goldstein, M. (1976). Bayesian analysis of regression problems. Biometrika, 63, 5158.CrossRefGoogle Scholar
Gruber, L. F., & West, M. (2016). GPU-accelerated Bayesian learning in simultaneous graphical dynamic linear models. Bayesian Analysis, 11, 125149.CrossRefGoogle Scholar
Gruber, L. F., & West, M. (2017). Bayesian forecasting and scalable multivariate volatility analysis using simultaneous graphical dynamic linear models. Econometrics and Statistics, 3, 322.CrossRefGoogle Scholar
Hanneke, S., Fu, W., & Xing, E. P. (2010). Discrete temporal models of social networks. Electronic Journal of Statistics, 4, 585605.CrossRefGoogle Scholar
Hartigan, J. A. (1969). Linear Bayesian methods. Journal of the Royal Statistical Society (Series B: Methodological), 31, 446454.Google Scholar
Hoff, P. D. (2011). Hierarchical multilinear models for multiway data. Computational Statistics and Data Analysis, 55, 530543.CrossRefGoogle Scholar
Holme, P. (2015). Modern temporal network theory: A colloquium. European Physical Journal B, 88, 234.CrossRefGoogle Scholar
Holme, P., & Saramäki, J. (2013). Temporal Networks. Springer.CrossRefGoogle Scholar
Jansen, B. J., Spink, A., & Kathuria, V. (2007). How to define searching sessions on web search engines. Pages 92–109 of: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., & Masand, B. (eds), Advances in Web Mining and Web Usage Analysis: Eighth International Workshop on Knowledge Discovery on the Web, WebKDD 2006. Lecture Notes in Computer Science. Springer.Google Scholar
Kim, B., Lee, K. H., Xue, L., & Niu, X. (2018). A review of dynamic network models with latent variables. Statistics Surveys, 12, 105135.CrossRefGoogle ScholarPubMed
Koren, R., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 8, 3037.CrossRefGoogle Scholar
McCullough, P., & Nelder, J. A. (1989). Generalized Linear Models. New York: Chapman & Hall.CrossRefGoogle Scholar
Migon, H. S., & Harrison, P. J. (1985). An application of non-linear Bayesian forecasting to television advertising. In Bernardo, J. M., DeGroot, M. H., Lindley, D. V., & Smith, A. F. M. (Eds.), Bayesian Statistics 2 (pp. 681696). North-Holland, Amsterdam: Valencia University Press.Google Scholar
Newman, M. E. J. (2004). Analysis of weighted networks. Physical Review E, 70, 056131.CrossRefGoogle ScholarPubMed
Newman, M. E. J. (2018). Network structure from rich but noisy data. Nature Physics, 14, 542.CrossRefGoogle Scholar
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2, 1135.CrossRefGoogle Scholar
Prado, R., & West, M. (2010). Time Series: Modeling, Computation and Inference. Chapman & Hall/CRC Press.CrossRefGoogle Scholar
Queen, C. M., & Albers, C. J. (2009). Intervention and causality: Forecasting traffic flows using a dynamic Bayesian network. Journal of the American Statistical Association, 104, 669681.CrossRefGoogle Scholar
Richard, E., Gaïffas, S., & Vayatis, N. (2014). Link prediction in graphs with autoregressive features. Journal of Machine Learning Research, 15, 565593.Google Scholar
Sarkar, P., Siddiqi, S. M., & Gordon, G. J. (2007). A latent space approach to dynamic embedding of co-occurrence data. Artificial Intelligence and Statistics, 420427.Google Scholar
Sen, A., & Smith, T. (1995). Gravity Models of Spatial Interaction Behavior. Springer.CrossRefGoogle Scholar
Soriano, J., Au, T., & Banks, D. (2013). Text mining in computational advertising. Statistical Analysis and Data Mining, 6, 273285.CrossRefGoogle Scholar
Tebaldi, C., & West, M. (1998). Bayesian inference on network traffic using link count data. Journal of the American Statistical Association, 93, 557573.CrossRefGoogle Scholar
Tebaldi, C., West, M., & Karr, A. F. (2002). Statistical analyses of freeway traffic flows. Journal of Forecasting, 21, 3968.CrossRefGoogle Scholar
Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P. and Qureshi-Emili, A. (2000). A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae. Nature, 403, 623.CrossRefGoogle ScholarPubMed
West, M. (1985). Generalized linear models: Scale parameters, outlier accommodation and prior distributions (with discussion). In Bernardo, J. M., DeGroot, M. H., Lindley, D. V., & Smith, A. F. M. (Eds.), Bayesian Statistics 2 (pp. 531558). North-Holland, Amsterdam: Valencia University Press.Google Scholar
West, M. (1994). Statistical inference for gravity models in transportation flow forecasting. Discussion Paper 94-20, Duke University, and Technical Report #60, National Institute of Statistical Sciences.Google Scholar
West, M., & Harrison, P. J. (1997). Bayesian Forecasting and Dynamic Models. 2nd edn. Springer.Google Scholar
West, M., Harrison, P. J., & Migon, H. S. (1985). Dynamic generalized linear models and Bayesian forecasting (with discussion). Journal of the American Statistical Association, 80, 7383.CrossRefGoogle Scholar
Xing, E. P., Fu, W., & Song, L. (2010). A state-space mixed membership block model for dynamic network tomography. Annals of Applied Statistics, 4, 535566.CrossRefGoogle Scholar
Xu, K. S., & Hero, A. O. (2014). Dynamic stochastic block models for time-evolving social networks. IEEE Journal of Selected Topics in Signal Processing, 8, 552562.CrossRefGoogle Scholar