Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-26T04:37:54.715Z Has data issue: false hasContentIssue false

9 - Universal Clustering

Published online by Cambridge University Press:  22 March 2021

Miguel R. D. Rodrigues
Affiliation:
University College London
Yonina C. Eldar
Affiliation:
Weizmann Institute of Science, Israel
Get access

Summary

Clustering is a general term for techniques that, given a set of objects, aim to select those that are closer to one another than to the rest, according to a chosen notion of closeness. It is an unsupervised-learning problem since objects are not externally labeled by category. Much effort has been expended on finding natural mathematical definitions of closeness and then developing/evaluating algorithms in these terms. Many have argued that there is no domain-independent mathematical notion of similarity but that it is context-dependent; categories are perhaps natural in that people can evaluate them when they see them. Some have dismissed the problem of unsupervised learning in favor of supervised learning, saying it is not a powerful natural phenomenon. Yet, most learning is unsupervised. We largely learn how to think through categories by observing the world in its unlabeled state. Drawing on universal information theory, we ask whether there are universal approaches to unsupervised clustering. In particular, we consider instances wherein the ground-truth clusters are defined by the unknown statistics governing the data to be clustered.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Rand, W. M., “Objective criteria for the evaluation of clustering methods,” J. Am. Statist. Assoc., vol. 66, no. 336, pp. 846–850, 1971.CrossRefGoogle Scholar
von, U. Luxburg, Williamson, R. C., and Guyon, I., “Clustering : Science or art?” in Proc. 29th International Conference on Machine Learning (ICML 2012), 2012, pp. 65–79.Google Scholar
Bowker, G. C. and Star, S. L., Sorting things out: Classification and its consequences. MIT Press, 1999.Google Scholar
Niu, D., Dy, J. G., and Jordan, M. I., “Iterative discovery of multiple alternative clustering views,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 36, no. 7, pp. 1340–1353, 2014.Google Scholar
Valiant, L., Probably approximately correct: Nature’s algorithms for learning and prospering in a complex world. Basic Books, 2013.Google Scholar
Vogelstein, J. T., Park, Y., Ohyama, T., Kerr, R. A., Truman, J. W., Priebe, C. E., and Zlatic, M., “Discovery of brainwide neural-behavioral maps via multiscale unsupervised structure learning,” Science, vol. 344, no. 6182, pp. 386–392, 2014.Google Scholar
Sanderson, R. E., Helmi, A., and Hogg, D. W., “Action-space clustering of tidal streams to infer the Galactic potential,” Astrophys. J., vol. 801, no. 2, 18 pages, 2015.Google Scholar
Gibson, E., Futrell, R., Jara-Ettinger, J., Mahowald, K., Bergen, L., Ratnasingam, S., Gibson, M., Piantadosi, S. T., and Conway, B. R., “Color naming across languages reflects color use,” Proc. Natl. Acad. Sci. USA, vol. 114, no. 40, pp. 10785–10790, 2017.Google Scholar
Shannon, C. E., “Bits storage capacity,” Manuscript Division, Library of Congress, handwritten note, 1949.Google Scholar
Weldon, M., The Future X Network: A Bell Labs perspective. CRC Press, 2015.Google Scholar
Lintott, C., Schawinski, K., Bamford, S., Slosar, A., Land, K., Thomas, D., Edmondson, E., Masters, K., Nichol, R. C., Raddick, M. J., Szalay, A., Andreescu, D., Murray, P., and Vandenberg, J., “Galaxy Zoo 1: Data release of morphological classifications for nearly 900000 galaxies,” Monthly Notices Roy. Astron. Soc., vol. 410, no. 1, pp. 166–178, 2010.Google Scholar
Kittur, A., Chi, E. H., and Suh, B., “Crowdsourcing user studies with Mechanical Turk,” in Proc. SIGCHI Conference on Human Factors in Computational Systems (CHI 2008), 2008, pp. 453–456.Google Scholar
Ipeirotis, P. G., Provost, F., and Wang, J., “Quality management on Amazon Mechanical Turk,” in Proc. ACM SIGKDD Workshop Human Computation (HCOMP’10), 2010, pp. 64–67.Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G. E., “ImageNet classification with deep convolutional neural networks,” in Proc. Advances in Neural Information Processing Systems 25, 2012, pp. 1097–1105.Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L., “ImageNet large scale visual recognition challenge,” arXiv:1409.0575 [cs.CV], 2014.Google Scholar
Simonite, T., “Google ’s new service translates languages almost as well as humans can.MIT Technol. Rev., Sep. 27, 2016.Google Scholar
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J., “Google –s neural machine translation system: Bridging the gap between human and machine translation,” arXiv:1609.08144 [cs.CL], 2017.Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.Google Scholar
LeCun, Y., Bengio, Y., and Hinton, G., “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.Google Scholar
Simonite, T., “The missing link of artificial intelligence,” MIT Technol. Rev., Feb. 18, 2016.Google Scholar
Shannon, C. E., “A mathematical theory of communication,” Bell Systems Technical J., vol. 27, nos. 3–4, pp. 379–423, 623–656, 1948.Google Scholar
Varshney, L. R., “Block diagrams in information theory: Drawing things closed,” in SHOT Special Interest Group on Computers, Information, and Society Workshop 2014, 2014.Google Scholar
Gray, R. M., “Source coding and simulation,” IEEE Information Theory Soc. Newsletter, vol. 58, no. 4, pp. 1/5–11, 2008 (2008 Shannon Lecture).Google Scholar
Gray, R. M., “Time-invariant trellis encoding of ergodic discrete-time sources with a fidelity criterion,” IEEE Trans. Information Theory, vol. 23, no. 1, pp. 71–83, 1977.Google Scholar
Steinberg, Y. and Verdú, S., “Simulation of random processes and rate-distortion theory,” IEEE Trans. Information Theory, vol. 42, no. 1, pp. 63–86, 1996.Google Scholar
Cover, T. M. and Thomas, J. A., Elements of information theory. John Wiley & Sons, 1991.Google Scholar
Rissanen, J., “Optimal estimation,” IEEE Information Theory Soc. Newsletter, vol. 59, no. 3, pp. 1/6–7, 2009 (2009 Shannon Lecture).Google Scholar
Misra, V., “Universal communication and clustering,” Ph.D. dissertation, Stanford University, 2014.Google Scholar
Rissanen, J. J., “Generalized Kraft inequality and arithmetic coding,” IBM J. Res. Development, vol. 20, no. 3, pp. 198–203, 1976.CrossRefGoogle Scholar
Huffman, D. A., “A method for the construction of minimum-redundancy codes,” Proc. IRE, vol. 40, no. 9, pp. 1098–1101, 1952.Google Scholar
Ziv, J. and Lempel, A., “Compression of individual sequences via variable-rate coding,” IEEE Trans. Information Theory, vol. 24, no. 5, pp. 530–536, 1978.Google Scholar
Ziv, J., “Coding theorems for individual sequences,” IEEE Trans. Information Theory, vol. 24, no. 4, pp. 405–412, 1978.Google Scholar
Wyner, A. D. and Ziv, J., “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Information Theory, vol. 22, no. 1, pp. 1–10, 1976.Google Scholar
Rissanen, J. J., “A universal data compression system,” IEEE Trans. Information Theory, vol. 29, no. 5, pp. 656–664, 1983.Google Scholar
Gallager, R., “Variations on a theme by Huffman,” IEEE Trans. Information Theory, vol. 24, no. 6, pp. 668–674, 1978.Google Scholar
Lawrence, J. C., “A new universal coding scheme for the binary memoryless source,” IEEE Trans. Information Theory, vol. 23, no. 4, pp. 466–472, 1977.CrossRefGoogle Scholar
Ziv, J., “Coding of sources with unknown statistics–Part I: Probability of encoding error,” IEEE Trans. Information Theory, vol. 18, no. 3, pp. 384–389, 1972.Google Scholar
Davisson, L. D., “Universal noiseless coding,” IEEE Trans. Information Theory, vol. 19, no. 6, pp. 783–795, 1973.Google Scholar
Slepian, D. and Wolf, J. K., “Noiseless coding of correlated information sources,” IEEE Trans. Information Theory, vol. 19, no. 4, pp. 471–480, 1973.Google Scholar
Cover, T. M., “A proof of the data compression theorem of Slepian and Wolf for ergodic sources,” IEEE Trans. Information Theory, vol. 21, no. 2, pp. 226–228, 1975.Google Scholar
Csiszár, I., “Linear codes for sources and source networks: Error exponents, universal coding,” IEEE Trans. Information Theory, vol. 28, no. 4, pp. 585–592, 1982.Google Scholar
Shannon, C. E., “Coding theorems for a discrete source with a fidelity criterion,” IRE National Convention Record, vol. 4, no. 1, pp. 142–163, 1959.Google Scholar
Berger, T., Rate distortion theory: A mathematical basis for data compression. PrenticeHall, 1971.Google Scholar
Ziv, J., “Coding of sources with unknown statistics–Part II: Distortion relative to a fidelity critenon,” IEEE Trans. Information Theory, vol. 18, no. 3, pp. 389–394, May 1972.Google Scholar
Ziv, J., “On universal quantization,” IEEE Trans. Information Theory, vol. 31, no. 3, pp. 344–347, 1985.Google Scholar
Hui, E. Yang and Kieffer, J. C., “Simple universal lossy data compression schemes derived from the Lempel–Ziv algorithm,” IEEE Trans. Information Theory, vol. 42, no. 1, pp. 239–245, 1996.Google Scholar
Wyner, A. D., Ziv, J., and Wyner, A. J., “On the role of pattern matching in information theory,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2045–2056, 1998.Google Scholar
Shannon, C. E., “Communication in the presence of noise,” Proc. IRE, vol. 37, no. 1, pp. 10–21, 1949.Google Scholar
Goppa, V. D., “Nonprobabilistic mutual information without memory,” Problems Control lnformation Theory, vol. 4, no. 2, pp. 97–102, 1975.Google Scholar
Csiszár, I., “The method of types,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2505–2523, 1998.Google Scholar
Moulin, P., “A Neyman-Pearson approach to universal erasure and list decoding,” IEEE Trans. lnformation Theory, vol. 55, no. 10, pp. 4462–4478, 2009.Google Scholar
Merhav, N., “Universal decoding for arbitrary channels relative to a given class of decoding metrics,” IEEE Trans. Information Theory, vol. 59, no. 9, pp. 5566–5576, 2013.Google Scholar
Shannon, C. E., “Certain results in coding theory for noisy channels,” Information Control, vol. 1, no. 1, pp. 6–25, 1957.Google Scholar
Feinstein, A., “On the coding theorem and its converse for finite-memory channels,” Information Control, vol. 2, no. 1, pp. 25–44, 1959.CrossRefGoogle Scholar
Csiszár, I. and Narayan, P., “Capacity of the Gaussian arbitrarily varying channel,” IEEE Trans. Information Theory, vol. 37, no. 1, pp. 18–26, 1991.Google Scholar
Ziv, J., “Universal decoding for finite-state channels,” IEEE Trans. Information Theory, vol. 31, no. 4, pp. 453–460, 1985.Google Scholar
Feder, M. and Lapidoth, A., “Universal decoding for channels with memory,” IEEE Trans. Information Theory, vol. 44, no. 5, pp. 1726–1745, 1998.Google Scholar
Misra, V. and Weissman, T., “Unsupervised learning and universal communication,” in Proc. 2013 IEEE International Symposium on lnformation Theory, 2013, pp. 261–265.Google Scholar
Merhav, N., “Universal decoding using a noisy codebook,” arXiv:1609:00549 [cs.IT], IEEE Trans. Information Theory, vol. 64. no. 4, pp. 2231–2239, 2018.Google Scholar
Lapidoth, A. and Narayan, P., “Reliable communication under channel uncertainty,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2148–2177, 1998.Google Scholar
Zeitouni, O., Ziv, J., and Merhav, N., “When is the generalized likelihood ratio test optimal?,” IEEE Trans. Information Theory, vol. 38, no. 5, pp. 1597–1602, 1992.Google Scholar
Feder, M. and Merhav, N., “Universal composite hypothesis testing: A competitive minimax approach,” IEEE Trans. Information Theory, vol. 48, no. 6, pp. 1504–1517, 2002.Google Scholar
Levitan, E. and Merhav, N., “A competitive Neyman-Pearson approach to universal hypothesis testing with applications,” IEEE Trans. Information Theory, vol. 48, no. 8, pp. 2215–2229, 2002.Google Scholar
Feder, M., Merhav, N., and Gutman, M., “Universal prediction of individual sequences,” IEEE Trans. Information Theory, vol. 38, no. 4, pp. 1258–1270, 1992.Google Scholar
Feder, M., “Gambling using a finite state machine,” IEEE Trans. Information Theory, vol. 37, no. 5, pp. 1459–1465, 1991.Google Scholar
Weissman, T., Ordentlich, E., Seroussi, G., Verdú, S., and Weinberger, M. J., “Universal discrete denoising: Known channel,” IEEE Trans. Information Theory, vol. 51, no. 1, pp. 5–28, 2005.Google Scholar
Ordentlich, E., Viswanathan, K., and Weinberger, M. J., “Twice-universal denoising,” IEEE Trans. Information Theory, vol. 59, no. 1, pp. 526–545, 2013.CrossRefGoogle Scholar
Bendory, T., Boumal, N., Ma, C., Zhao, Z., and Singer, A., “Bispectrum inversion with application to multireference alignment,” vol. 66, no. 4, pp. 1037–1050, 2018.Google Scholar
Abbe, E., Pereira, J. M., and Singer, A., “Sample complexity of the Boolean multireference alignment problem,” in Proc. 2017 IEEE International Symposium on Information Theory, 2017, pp. 1316–1320.Google Scholar
Pananjady, A., Wainwright, M. J., and Courtade, T. A., “Denoising linear models with permuted data,” in Proc. 2017 IEEE International Symposium on Information Theory, 2017, pp. 446–450.Google Scholar
Viola, P. and Wells III, W. M., “Alignment by maximization of mutual information,” Int. J. Computer Vision, vol. 24, no. 2, pp. 137–154, 1997.Google Scholar
Raman, R. K. and Varshney, L. R., “Universal joint image clustering and registration using partition information,” in Proc. 2017 IEEE International Symposium on lnformation Theory, 2017, pp. 2168–2172.Google Scholar
Stein, J., Ziv, J., and Merhav, N., “Universal delay estimation for discrete channels,” IEEE Trans. Information Theory, vol. 42, no. 6, pp. 2085–2093, 1996. universal Clustering 297Google Scholar
Ziv, J., “On classification with empirically observed statistics and universal data compression,” IEEE Trans. Information Theory, vol. 34, no. 2, pp. 278–286, 1988.Google Scholar
Merhav, N., “Universal classification for hidden Markov models,” IEEE Trans. Information Theory, vol. 37, no. 6, pp. 1586–1594, Nov. 1991.Google Scholar
Raman, R. K. and Varshney, L. R., “Budget -optimal clustering via crowdsourcing,” in Proc. 2017 IEEE International Symposium on Information Theory, 2017, pp. 2163–2167.Google Scholar
Li, Y., Nitinawarat, S., and Veeravalli, V. V., “Universal outlier hypothesis testing,” IEEE Trans. Information Theory, vol. 60, no. 7, pp. 4066–4082, 2014.Google Scholar
Cormode, G., Paterson, M., Sahinalp, S. C., and Vishkin, U., “Communication complexity of document exchange,” in Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’00), 2000, pp. 197–206.Google Scholar
Muthukrishnan, S. and Sahinalp, S. C., “Approximate nearest neighbors and sequence comparison with block operations,” in Proc. 32nd Annual ACM Symposium on Theory Computation (STOC’00), 2000, pp. 416–424.Google Scholar
Banerjee, A., Merugu, S., Dhillon, I. S., and Ghosh, J., “Clustering with Bregman divergences,” J. Machine Learning Res., vol. 6, pp. 1705–1749, 2005.Google Scholar
Li, M. and Vitányi, P., An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer, 2008.Google Scholar
Bennett, C. H., Gács, P., Li, M., Vitányi, P. M. B., and Zurek, W. H., “Information distance,” IEEE Trans. Information Theory, vol. 44, no. 4, pp. 1407–1423, 1998.Google Scholar
Li, M., Chen, X., Li, X., Ma, B., and Vitányi, P. M. B., “The similarity metric,” IEEE Trans. Information Theory, vol. 50, no. 12, pp. 3250–3264, 2004.Google Scholar
Vitanyi, P., “Universal similarity,” in Proc. IEEE Information Theory Workshop (ITW’05), 2005, pp. 238–243.Google Scholar
Cilibrasi, R. L. and Vitányi, P. M. B., “The Google similarity distance,” IEEE Trans. Knowledge Data Engineering, vol. 19, no. 3, pp. 370–383, 2007.Google Scholar
Nguyen, H. V., Müller, E., Vreeken, J., Efros, P., and Böhm, K., “Multivariate maximal correlation analysis,” in Proc. 31st Internatinal Conference on Machine Learning (ICML 2014), 2014, pp. 775–783.Google Scholar
Estévez, P. A., Tesmer, M., Perez, C. A., and Zurada, J. M., “Normalized mutual information feature selection,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp. 189–201, 2009.Google Scholar
Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A., “Comparing community structure identification,” J. Statist. Mech., vol. 2005, p. P09008, 2005.Google Scholar
Gates, A. J. and Ahn, Y.-Y., “The impact of random models on clustering similarity,” J. Machine Learning Res., vol. 18, no. 87, pp. 1–28, 2017.Google Scholar
Lewis, J., Ackerman, M., and de Sa, V., “Human cluster evaluation and formal quality measures: A comparative study,” in Proc. 34th Annual Conference on Cognitive Science in Society, 2012.Google Scholar
MacQueen, J., “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symposium on Mathematics Statistics and Probability, 1967, pp. 281–297.Google Scholar
Duda, R. O., Hart, P. E., and Stork, D. G., Pattern classification, 2nd edn. Wiley, 2001.Google Scholar
Bishop, C. M., Pattern recognition and machine learning. Springer, 2006.Google Scholar
Linde, Y., Buzo, A., and Gray, R. M., “An algorithm for vector quantizer design,” IEEE Trans. Communication, vol. 28, no. 1, pp. 84–95, 1980.Google Scholar
Dhillon, I. S., Mallela, S., and Kumar, R., “A divisive information-theoretic feature clustering algorithm for text classification,” J. Machine Learning Res., vol. 3, pp. 1265–1287, 2003.Google Scholar
Banerjee, A., Guo, X., and Wang, H., “On the optimality of conditional expectation as a Bregman predictor,” IEEE Trans. Information Theory, vol. 51, no. 7, pp. 2664–2669, 2005.Google Scholar
Dhillon, I. S., Mallela, S., and Modha, D. S., “Information -theoretic co-clustering,” in Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), 2003, pp. 89–98.Google Scholar
Raman, R. K. and Varshney, L. R., “Universal clustering via crowdsourcing,” arXiv:1610.02276 [cs.IT], 2016.Google Scholar
Li, Y., Nitinawarat, S., and Veeravalli, V. V., “Universal outlier detection,” in Proc. 2013 Information Theory Applications Workshop, 2013.Google Scholar
Li, Y., Nitinawarat, S., and Veeravalli, V. V., “Universal outlier hypothesis testing,” in Proc. 2014 IEEE Internatinal Symposium on Information Theory, 2014, pp. 4066–4082.Google Scholar
Li, Y., Nitinawarat, S., Su, Y., and Veeravalli, V. V., “Universal outlier hypothesis testing: pplication to anomaly detection,” in Proc. IEEE International Conference on Acoustics, Speech, Signal Process. (ICASSP 2015), 2015, pp. 5595–5599.Google Scholar
Bu, Y., Zou, S., and Veeravalli, V. V., “Linear -complexity exponentially-consistent tests for universal outlying sequence detection,” in Proc. 2017 IEEE Ilnternational Symposium on lnformation Theory, 2017, pp. 988–992.Google Scholar
Li, Y., Nitinawarat, S., and Veeravalli, V. V., “Universal sequential outlier hypothesis testing,” Sequence Analysis, vol. 36, no. 3, pp. 309–344, 2017.Google Scholar
Wright, J., Tao, Y., Lin, Z., Ma, Y., and H.-Y. Shum, “Classification via minimum incremental coding length (MICL),,” in Proc. Advances in Neural Information Processing Systems 20. MIT Press, 2008, pp. 1633–1640.Google Scholar
Ma, Y., Derksen, H., Hong, W., and Wright, J., “Segmentation of multivariate mixed data via lossy data coding and compression,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 29, no. 9, pp. 1546–1562, 2007.Google Scholar
Yang, A. Y., Wright, J., Ma, Y., and Sastry, S. S., “Unsupervised segmentation of natural images via lossy data compression,” Comput. Vision Image Understanding, vol. 110, no. 2, pp. 212–225, 2008.Google Scholar
Rao, S. R., Mobahi, H., Yang, A. Y., Sastry, S. S., and Ma, Y., “Natural image segmentation with adaptive texture and boundary encoding,” in Computer Vision–ACCV 2009. Springer.Google Scholar
Cilibrasi, R. and Vitányi, P. M. B., “Clustering by compression,” IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1523–1545, 2005.Google Scholar
Ryabko, D., “Clustering processes,” in 27th International Conference on Machine Learning, 2010, pp. 919–926.Google Scholar
Khaleghi, A., Ryabko, D., Mary, J., and Preux, P., “Consistent algorithms for clustering time series,” J. Machine Learning Res., vol. 17, no. 3, pp. 1–32, 2016.Google Scholar
Ryabko, D., “Independence clustering (without a matrix),” in Proc. Advances in Neural lnformation Processing Systems 30, 2017, pp. 4016–4026.Google Scholar
Bell, A. J. and Sejnowski, T. J., “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995.Google Scholar
Bach, F. R. and Jordan, M. I., “Beyond independent components: Trees and clusters,” J. Machine Learning Res., vol. 4, no. 12, pp. 1205–1233, 2003.Google Scholar
Chow, C. K. and Liu, C. N., “Approximating discrete probability distributions with dependence trees,” IEEE Trans. Information Theory, vol. 14, no. 3, pp. 462–467, 1968.Google Scholar
Chickering, D. M., “Learning Bayesian networks is NP-complete,” in Learning from data, Fisher, D. and Lenz, H.-J., eds. Springer, 1996, pp. 121–130.Google Scholar
Montanari, A. and Pereira, J. A., “Which graphical models are difficult to learn?,” in Proc. Advances in Neural Information Processing Systems 22, 2009, pp. 1303–1311.Google Scholar
Abbeel, P., Koller, D., and Ng, A. Y., “Learning factor graphs in polynomial time and sample complexity,” J. Machine Learning Res., vol. 7, pp. 1743–1788, 2006.Google Scholar
Ren, Z., Sun, T., Zhang, C.-H., and Zhou, H. H., “Asymptotic normality and optimalities in estimation of large Gaussian graphical models,” Annals Statist., vol. 43, no. 3, pp. 991–1026, 2015.Google Scholar
Loh, P.-L. and Wainwright, M. J., “Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses,” in Proc. Advances in Neural Information Processing Systems 25, 2012, pp. 2087–2095.Google Scholar
Santhanam, N. P. and Wainwright, M. J., “Information-theoretic limits of selecting binary graphical models in high dimensions,” IEEE Trans. Information Theory, vol. 58, no. 7, pp. 4117–4134, 2012.Google Scholar
Bachschmid-Romano, L. and Opper, M., “Inferring hidden states in a random kinetic Ising model: Replica analysis,” J. Statist. Mech., vol. 2014, no. 6, p. P06013, 2014.Google Scholar
Bresler, G., “Efficiently learning Ising models on arbitrary graphs,” in Proc. 47th Annual ACM Symposium Theory of Computation (STOC’15), 2015, pp. 771–782.Google Scholar
Netrapalli, P., Banerjee, S., Sanghavi, S., and Shakkottai, S., “Greedy learning of Markov network structure,” in Proc. 48th Annual Allerton Conference on Communication Control Computation, 2010, pp. 1295–1302.Google Scholar
Tan, V. Y. F., Anandkumar, A., Tong, L., and Willsky, A. S., “A large-deviation analysis of the maximum-likelihood learning of Markov tree structures,” IEEE Trans. Information Theory, vol. 57, no. 3, pp. 1714–1735, 2011.Google Scholar
Beal, M. J. and Ghahramani, Z., “Variational Bayesian learning of directed graphical models with hidden variables,” Bayesian Analysis, vol. 1, no. 4, pp. 793–831, 2006.Google Scholar
Anandkumar, A. and Valluvan, R., “Learning loopy graphical models with latent variables: Efficient methods and guarantees,” Annals Statist., vol. 41, no. 2, pp. 401–435, 2013.Google Scholar
Bresler, G., Gamarnik, D., and Shah, D., “Learning graphical models from the Glauber dynamics,” arXiv:1410.7659 [cs.LG], 2014, to be published in IEEE Trans. Information Theory.Google Scholar
Dawid, A. P., “Conditional independence in statistical theory,” J. Roy. Statist. Soc. Ser. B. Methodol., vol. 41, no. 1, pp. 1–31, 1979.Google Scholar
Batu, T., Fischer, E., Fortnow, L., Kumar, R., Rubinfeld, R., and White, P., “Testing random variables for independence and identity,” in Proc. 42nd Annual Symposium on the Foundations Computer Science, 2001, pp. 442–451.Google Scholar
Gretton, A. and Györfi, L., “Consistent non-parametric tests of independence,” J. Machine Learning Res., vol. 11, no. 4, pp. 1391–1423, 2010.Google Scholar
Sen, R., Suresh, A. T., Shanmugam, K., Dimakis, A. G., and Shakkottai, S., “Model -powered conditional independence test,” in Proc. Advances in Neural Information Processing Systems 30, 2017, pp. 2955–2965.Google Scholar
Tishby, N., Pereira, F. C., and Bialek, W., “The information bottleneck method,” in Proc. 37th Annual Allerton Conference on Communation Control Computication, 1999, pp. 368–377.Google Scholar
Gilad-Bachrach, R., Navot, A., and Tishby, N., “An information theoretic trade-off between complexity and accuracy,” in Learning Theory and Kernel Machines, Schölkopf, B. and Warmuth, M. K., eds. Springer, 2003, pp. 595–609.Google Scholar
Slonim, N., “The information bottleneck: Theory and applications,” Ph.D. dissertation, The Hebrew University of Jerusalem, 2002.Google Scholar
Kim, H., Gao, W., Kannan, S., Oh, S., and Viswanath, P., “Discovering potential correlations via hypercontractivity,” in Proc. 30th Annual Conference on Neural lnformation Processing Systems (NIPS), 2017, pp. 4577–4587.Google Scholar
Slonim, N., Friedman, N., and Tishby, N., “Multivariate information bottleneck,” Neural Comput., vol. 18, no. 8, pp. 1739–1789, 2006.Google Scholar
Chechik, G., Globerson, A., Tishby, N., and Weiss, Y., “Information bottleneck for Gaussian variables,” J. Machine Learning Res., vol. 6, no. 1, pp. 165–188, 2005.Google Scholar
Slonim, N. and Tishby, N., “Agglomerative information bottleneck,” in Proc. Advances in Neural Information Processing Systems 12, 1999, pp. 617–625.Google Scholar
Slonim, N., Friedman, N., and Tishby, N., “Agglomerative multivariate information bottleneck,” in Proc. Advances in Neural Information Processing Systems 14, 2002, pp. 929–936.Google Scholar
Bridle, J. S., Heading, A. J. R., and MacKay, D. J. C., “Unsupervised classifiers, mutual information and ‘phantom targets,” in Proc. Advances in Neural Information Processing Systems 4, 1992, pp. 1096–1101.Google Scholar
Butte, A. J. and Kohane, I. S., “Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements,” in Biocomputing 2000, 2000, pp. 418–429.Google Scholar
Priness, I., Maimon, O., and Ben-Gal, I., “Evaluation of gene-expression clustering via mutual information distance measure,” BMC Bioinformatics, vol. 8, no. 1, p. 111, 2007.Google Scholar
Kraskov, A., Stögbauer, H., Andrzejak, R. G., and Grassberger, P., “Hierarchical clustering using mutual information,” Europhys. Lett., vol. 70, no. 2, p. 278, 2005.Google Scholar
Aghagolzadeh, M., Soltanian-Zadeh, H., Araabi, B., and Aghagolzadeh, A., “A hierarchical clustering based on mutual information maximization,” in Proc. IEEE International Conference on Image Processing (ICIP 2007), vol. 1, 2007, pp. I-277–I-280.Google Scholar
Chan, C., Al-Bashabsheh, A., Ebrahimi, J. B., Kaced, T., and Liu, T., “Multivariate mutual information inspired by secret-key agreement,” Proc. IEEE, vol. 103, no. 10, pp. 1883–1913, 2015.Google Scholar
Csiszár, I. and Narayan, P., “Secrecy capacities for multiple terminals,” IEEE Trans. Information Theory, vol. 50, no. 12, pp. 3047–3061, 2004.Google Scholar
Chan, C., Al-Bashabsheh, A., and Zhou, Q., “Change of multivariate mutual information: From local to global,” IEEE Trans. Information Theory, vol. 64, no. 1, pp. 57–76, 2018.Google Scholar
Chan, C. and Liu, T., “Clustering by multivariate mutual information under Chow–Liu tree approximation,” in Proc. 53rd Annual Allerton Conference on Communication Control Computation, 2015, pp. 993–999.Google Scholar
Chan, C., Al-Bashabsheh, A., Zhou, Q., Kaced, T., and Liu, T., “Info-clustering: A mathematical theory for data clustering,” IEEE Trans. Mol. Biol. Multi-Scale Commun., vol. 2, no. 1, pp. 64–91, 2016.Google Scholar
Chan, C., Al-Bashabsheh, A., and Zhou, Q., “Agglomerative info-clustering,” arXiv: 1701.04926 [cs. IT], 2017.Google Scholar
Studený, M. and Vejnarová, J., “The multiinformation function as a tool for measuring stochastic dependence,” in Learning in Graphical Models, Jordan, M. I., ed. Kluwer Academic Publishers, 1998, pp. 261–297.Google Scholar
Steeg, G. V. and Galstyan, A., “Discovering structure in high-dimensional data through correlation explanation,” in Proc. 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014, pp. 577–585.Google Scholar
Raman, R. K. and Varshney, L. R., “Universal joint image clustering and registration using multivariate information measures,” lEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 928–943, 2018.Google Scholar
Studený, M., “Asymptotic behaviour of empirical multiinformation,” Kybernetika, vol. 23, no. 2, pp. 124–135, 1987.Google Scholar
Raman, R. K., Yu, H., and Varshney, L. R., “Illum information,” in Proc. 2017 Information Theory Applications Workshop, 2017.Google Scholar
Palomar, D. P. and Verdú, S., “Lautum information,” IEEE Trans. Information Theory, vol. 54, no. 3, pp. 964–975, 2008.Google Scholar
Slonim, N. and Tishby, N., “Document clustering using word clusters via the information bottleneck method,” in Proc. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00), 2000, pp. 208–215.Google Scholar
Goldberger, J., Greenspan, H., and Gordon, S., “Unsupervised image clustering using the information bottleneck method,” in Pattern Recognition, Van Gool, L., ed. Springer, 2002, pp. 158–165.Google Scholar
Dubnov, S., Assayag, G., Lartillot, O., and Bejerano, G., “Using machine-learning methods for musical style modeling,” IEEE Computer, vol. 36, no. 10, pp. 73–80, 2003.Google Scholar
Cilibrasi, R., Vitányi, P., and de Wolf, R., “Algorithmic clustenng of music based on string compression,” Czech. Math. J., vol. 28, no. 4, pp. 49–67, 2004.Google Scholar
Steeg, G. V. and Galstyan, A., “The information sieve,” in Proc. 33rd International Conference on Machine Learning (ICML 2016), 2016, pp. 164–172.Google Scholar
Hodas, N. O., Steeg, G. V., Harrison, J., Chikkagoudar, S., Bell, E., and Corley, C. D., “Disentangling the lexicons of disaster response in twitter,” in Proc. 24th International Conference on the World Wide Web (WWW’15), 2015, pp. 1201–1204.Google Scholar
Anselmi, F., Leibo, J. Z., Rosasco, L., Mutch, J., Tacchetti, A., and Poggio, T., “Unsupervised learning of invariant representations,” Theoretical Computer Sci., vol. 633, pp. 112–121, 2016.Google Scholar
Jordan, M. I., “On statistics, computation and scalability,” Bernoulli, vol. 19, no. 4, pp. 1378–1390, 2013.Google Scholar
Tan, V. Y., “Asymptotic estimates in information theory with non-vanishing error probabilities,” Foundations Trends Communication Information Theory, vol. 11, nos. 1–2, pp. 1–184, 2014.Google Scholar
Gao, W., Oh, S., and Viswanath, P., “Demystifying fixed k-nearest neighbor information estimators,” arXiv:1604.03006 [cs.LG], to be published in IEEE Trans. Information Theory.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×