Hostname: page-component-78c5997874-94fs2 Total loading time: 0 Render date: 2024-11-15T05:17:54.123Z Has data issue: false hasContentIssue false

Strategies for EELS Data Analysis. Introducing UMAP and HDBSCAN for Dimensionality Reduction and Clustering

Published online by Cambridge University Press:  22 November 2021

Javier Blanco-Portals*
Affiliation:
LENS-MIND, Department of Electronics and Biomedical Engineering, Universitat de Barcelona, 08028 Barcelona, Spain Institute of Nanoscience and Nanotechnology (IN2UB), Universitat de Barcelona, 08028 Barcelona, Spain
Francesca Peiró
Affiliation:
LENS-MIND, Department of Electronics and Biomedical Engineering, Universitat de Barcelona, 08028 Barcelona, Spain Institute of Nanoscience and Nanotechnology (IN2UB), Universitat de Barcelona, 08028 Barcelona, Spain
Sònia Estradé
Affiliation:
LENS-MIND, Department of Electronics and Biomedical Engineering, Universitat de Barcelona, 08028 Barcelona, Spain Institute of Nanoscience and Nanotechnology (IN2UB), Universitat de Barcelona, 08028 Barcelona, Spain
*
*Corresponding author: Javier Blanco-Portals, E-mail: [email protected]
Get access

Abstract

Hierarchical density-based spatial clustering of applications with noise (HDBSCAN) and uniform manifold approximation and projection (UMAP), two new state-of-the-art algorithms for clustering analysis, and dimensionality reduction, respectively, are proposed for the segmentation of core-loss electron energy loss spectroscopy (EELS) spectrum images. The performances of UMAP and HDBSCAN are systematically compared to the other clustering analysis approaches used in EELS in the literature using a known synthetic dataset. Better results are found for these new approaches. Furthermore, UMAP and HDBSCAN are showcased in a real experimental dataset from a core–shell nanoparticle of iron and manganese oxides, as well as the triple combination nonnegative matrix factorization–UMAP–HDBSCAN. The results obtained indicate how the complementary use of different combinations may be beneficial in a real-case scenario to attain a complete picture, as different algorithms highlight different aspects of the dataset studied.

Type
Software and Instrumentation
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press on behalf of the Microscopy Society of America

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

AnkerstMihael, M, KriegelHans-Peter, B & Jörg, S (1999). OPTICS. ACM SIGMOD Record 28, 4960.CrossRefGoogle Scholar
Anowar, F, Sadaoui, S & Selim, B (2021). Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput Sci Rev 40, 100378. doi:10.1016/j.cosrev.2021.100378CrossRefGoogle Scholar
Becht, E, McInnes, L, Healy, J, Dutertre, CA, Kwok, IWH, Ng, LG, Ginhoux, F & Newell, EW (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37, 3847.CrossRefGoogle Scholar
Blanco-Portals, J, Torruella, P, Baiutti, F, Anelli, S, Torrel, M, Tarancón, A, Peiró, F & Estradé, S (2021). WhatEELS. A new python-based interactive software solution for ELNES analysis combining clustering and NLLS. Ultramicroscopy 232, 113403.CrossRefGoogle Scholar
Booth, CR, Mooney, PE, Lee, BC, Lent, M & Gubbens, AJ (2012). K2: A super-resolution electron counting direct detection camera for cryo-EM. Microsc Microanal 18, 7879.CrossRefGoogle Scholar
Bosman, M, Keast, VJ, García-Muñoz, JL, D'Alfonso, AJ, Findlay, SD & Allen, LJ (2007). Two-dimensional mapping of chemical information at atomic resolution. Phys Rev Lett 99, 14.CrossRefGoogle ScholarPubMed
Cai, R-F, Chang, M-T, Lo, S-C & Chen, C-C (2020). Novel spectral unmixing approach for electron energy-loss spectroscopy. New J Phys 22, 033029.CrossRefGoogle Scholar
Campello, RJGB, Moulavi, D, Zimek, A & Sander, J (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov From Data 10, 151.CrossRefGoogle Scholar
Chang, M-T, Cai, R-F, Chen, C-C & Lo, S-C (2020). Development of clustering algorithm applied for the EELS analysis of advanced devices. Microsc Microanal 26, 21122114.CrossRefGoogle Scholar
Chávez, E, Navarro, G, Baeza-Yates, R & Marroquín, JL (2001). Searching in metric spaces. ACM Comput Surv 33, 273321.CrossRefGoogle Scholar
Cichocki, A & Phan, AH (2009). Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Transactions on Fundamentals of Electronics, Commun Comput Sci E92-A, 708721.CrossRefGoogle Scholar
Daszykowski, M & Walczak, B (2009). Density-based clustering methods. Compr Chemom 2, 635654.CrossRefGoogle Scholar
de la Peña, F, Berger, MH, Hochepied, JF, Dynys, F, Stephan, O & Walls, M (2011). Mapping titanium and tin oxide phases using EELS: An application of independent component analysis. Ultramicroscopy 111, 169176.CrossRefGoogle ScholarPubMed
Dudeck, KJ, Couillard, M, Lazar, S, Dwyer, C & Botton, GA (2012). Quantitative statistical analysis, optimization and noise reduction of atomic resolved electron energy loss spectrum images. Micron 43, 5767.CrossRefGoogle Scholar
Egerton, RF (2012). Mechanisms of radiation damage in beam-sensitive specimens, for TEM accelerating voltages between 10 and 300 kV. Microsc Res Tech 75, 15501556.CrossRefGoogle ScholarPubMed
Estrader, M, López-Ortega, A, Estradé, S, Golosovsky, IV, Salazar-Alvarez, G, Vasilakaki, M, Trohidou, KN, Varela, M, Stanley, DC, Sinko, M, Pechan, MJ, Keavney, DJ, Peiró, F, Suriñach, S, Baró, MD & Nogués, J (2013). Robust antiferromagnetic coupling in hard-soft bi-magnetic core/shell nanoparticles. Nat Commun 4, 18.CrossRefGoogle ScholarPubMed
Févotte, C & Idier, J (2011). Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput 23, 24212456.CrossRefGoogle Scholar
Fowlkes, EB & Mallows, CL (1983). A method for comparing two hierarchical clusterings. J Am Stat Assoc 78, 553569.CrossRefGoogle Scholar
Grisel, O, Mueller, A, Lars, , Gramfort, A, Louppe, G, Prettenhofer, P, Blondel, M, Niculae, V, Nothman, J, Joly, A, Fan, TJ, Vanderplas, J, kumar, manoj, , Qin, H, Hug, N, Varoquaux, N, Estève, L, Layton, R, Metzen, JH, Lemaitre, G, Jalali, A, (Venkat) Raghav, R, Schönberger, J, Yurchak, R, Li, W, Woolam, C, Tour, TDl, Eren, K, Boisberranger, Jd & Eustache, (2021). scikit-learn/scikit-learn: scikit-learn 0.24.1. Available at https://zenodo.org/record/4450597 (accessed April 23, 2021).Google Scholar
Haberfehlner, G, Hoefler, SF, Rath, T, Trimmel, G, Kothleitner, G & Hofer, F (2021). Benefits of direct electron detection and PCA for EELS investigation of organic photovoltaics materials. Micron 140, 102981.CrossRefGoogle ScholarPubMed
Han, J, Kamber, M & Pei, J (2011). Data Mining: Concepts and Techniques. Amsterdam: Elsevier.Google Scholar
Hart, JL, Lang, AC, Leff, AC, Longo, P, Trevor, C, Twesten, RD & Taheri, ML (2017). Direct detection electron energy-loss spectroscopy: A method to push the limits of resolution and sensitivity. Sci Rep 7, 114. doi:10.1038/s41598-017-07709-4CrossRefGoogle ScholarPubMed
Hershey, JR & Olsen, PA (2007). Approximating the Kullback Leibler divergence between Gaussian mixture models. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings 4, IV317.Google Scholar
Jain, AK (2010). Data clustering: 50 years beyond K-means. Pattern Recog Lett 31, 651666.CrossRefGoogle Scholar
Jolliffe, I (2005). Principal component analysis. Encyclopedia of Statistics in Behavioral Science. Available at https://onlinelibrary-wiley-com.sire.ub.edu/doi/full/10.1002/0470013192.bsa501 (accessed July 20, 2021).Google Scholar
Jolliffe, IT & Cadima, J (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374. Available at https://royalsocietypublishing-org.sire.ub.edu/doi/abs/10.1098/rsta.2015.0202 (accessed July 20, 2021).Google ScholarPubMed
Kalinin, SV, Roccapriore, KM, Cho, SH, Milliron, DJ, Vasudevan, R, Ziatdinov, M & Hachtel, JA (2021). Separating physically distinct mechanisms in complex infrared plasmonic nanostructures via machine learning enhanced electron energy loss spectroscopy. Adv Opt Mater 9, 113.CrossRefGoogle Scholar
Kullback, S (1997). Information Theory and Statistics. Courier Corporation.Google Scholar
Leijten, ZJWA, Keizer, ADA, De With, G & Friedrich, H (2017). Quantitative analysis of electron beam damage in organic thin films. J Phys Chem C 121, 1055210561.CrossRefGoogle ScholarPubMed
Lichtert, S & Verbeeck, J (2013). Statistical consequences of applying a PCA noise filter on EELS spectrum images. Ultramicroscopy 125, 3542. doi:10.1016/j.ultramic.2012.10.001 (accessed July 20, 2021).CrossRefGoogle ScholarPubMed
Linderman, GC, Rachh, M, Hoskins, JG, Steinerberger, S & Kluger, Y (n.d.). Efficient Algorithms for T-distributed Stochastic Neighborhood Embedding.Google Scholar
Lloyd, S (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129137.CrossRefGoogle Scholar
Maaten Van Der, L (2014). Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 15, 32213245.Google Scholar
MacQueen, J & others (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1, 281297.Google Scholar
Maimon, O & Rokach, L (2005). Data Mining and Knowledge Discovery Handbook. New York: Springer-Verlag.CrossRefGoogle Scholar
McInnes, L, Healy, J & Astels, S (2017). Hdbscan: Hierarchical density based clustering. The Journal of Open Source Software 2, 205.CrossRefGoogle Scholar
Mcinnes, L, Healy, J & Melville, J (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.Google Scholar
Murtagh, F & Legendre, P (2014). Ward's hierarchical agglomerative clustering method: Which algorithms implement ward's criterion? Journal of Classification 31, 274295.CrossRefGoogle Scholar
Ng, A.Y., Jordan, M. I. & Weiss, Y (2001). Advances in Neural Information Processing Systems, 849856.Google Scholar
Oliveira, FHM, MacHado, ARP & Andrade, AO (2018). On the use of t-distributed stochastic neighbor embedding for data visualization and classification of individuals with Parkinson's disease. Computational and Mathematical Methods in Medicine 2018, 117. doi: 10.1155/2018/8019232.CrossRefGoogle ScholarPubMed
Pal, K & Sharma, M (2020). Performance evaluation of non-linear techniques UMAP and t-SNE for data in higher dimensional topological space. Proceedings of the 4th International Conference on IoT in Social, Mobile, Analytics and Cloud, ISMAC 2020 1106–1110.CrossRefGoogle Scholar
Pauca, VP, Piper, J & Plemmons, RJ (2006). Nonnegative matrix factorization for spectral data analysis. Linear Algebra and its Applications 416, 2947.CrossRefGoogle Scholar
Plotkin-Swing, B, Lovejoy, T, Dellby, N, Corbin, G, Hoffman, M, De Carlo, S, Piazza, L, Meyer, C, Mittelberger, A & Krivanek, O (2020). Hybrid pixel EELS detector: Low noise, high speed, and large dynamic range. Microsc Microanal 26, 19281930.CrossRefGoogle Scholar
Potapov, P (2017). On the loss of information in PCA of spectrum-images. Ultramicroscopy 182, 191194.CrossRefGoogle ScholarPubMed
Ryu, J, Kim, H, Kim, RM, Kim, S, Jo, J, Lee, S, Nam, KT, Joo, Y-C, Yi, G-C, Lee, J & Kim, M (2021). Dimensionality reduction and unsupervised clustering for EELS-SI. Ultramicroscopy.113314.CrossRefGoogle ScholarPubMed
Schubert, E, Sander, J, Ester, M, Kriegel, HP & Xu, X (2017). DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems 42. Available at https://dl.acm.org/doi/abs/10.1145/3068335 (accessed June 10, 2021).CrossRefGoogle Scholar
Shi, J & Malik, J (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888905.Google Scholar
Shiga, M, Tatsumi, K, Muto, S, Tsuda, K, Yamamoto, Y, Mori, T & Tanji, T (2016). Sparse modeling of EELS and EDX spectral imaging data by nonnegative matrix factorization. Ultramicroscopy 170, 4359.CrossRefGoogle ScholarPubMed
Spiegelberg, J & Rusz, J (2017). Can we use PCA to detect small signals in noisy data? Ultramicroscopy 172, 4046.CrossRefGoogle ScholarPubMed
Spurgeon, SR, Ophus, C, Jones, L, Petford-Long, A, Kalinin, SV, Olszta, MJ, Dunin-Borkowski, RE, Salmon, N, Hattar, K, Yang, W-CD, Sharma, R, Du, Y, Chiaramonti, A, Zheng, H, Buck, EC, Kovarik, L, Penn, RL, Li, D, Zhang, X, Murayama, M & Taheri, ML (2020). Towards data-driven next-generation transmission electron microscopy. Nature Materials 2020 20:3 20, 274279.Google Scholar
Sun, M, Azumaya, CM, Tse, E, Bulkley, DP, Harrington, MB, Gilbert, G, Frost, A, Southworth, D, Verba, KA, Cheng, Y & Agard, DA (2021). Practical considerations for using K3 cameras in CDS mode for high-resolution and high-throughput single particle cryo-EM. J Struct Biol 213, 107745. doi:10.1016/j.jsb.2021.107745CrossRefGoogle ScholarPubMed
Tate, MW, Purohit, P, Chamberlain, D, Nguyen, KX, Hovden, R, Chang, CS, Deb, P, Turgut, E, Heron, JT, Schlom, DG, Ralph, DC, Fuchs, GD, Shanks, KS, Philipp, HT, Muller, DA & Gruner, SM (2016). High dynamic range pixel array detector for scanning transmission electron microscopy. Microsc Microanal 22, 237249.CrossRefGoogle ScholarPubMed
Tenenbaum, JB, Silva, Vd & Langford, JC (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290, 23192323.CrossRefGoogle ScholarPubMed
Torruella, P, Arenal, R, de la Peña, F, Saghi, Z, Yedra, L, Eljarrat, A, López-Conesa, L, Estrader, M, López-Ortega, A, Salazar-Alvarez, G, Nogués, J, Ducati, C, Midgley, PA, Peiró, F & Estradé, S (2016). 3D visualization of the iron oxidation state in FeO/Fe 3 O 4 core–shell nanocubes from electron energy loss tomography. Nano Lett 16, 50685073.CrossRefGoogle Scholar
Torruella, P, Estrader, M, López-Ortega, A, Baró, MD, Varela, M, Peiró, F & Estradé, S (2018). Clustering analysis strategies for electron energy loss spectroscopy (EELS). Ultramicroscopy 185, 4248.CrossRefGoogle Scholar
Trebbia, P & Bonnet, N (1990). EELS elemental mapping with unconventional methods I. Theoretical basis: Image analysis with multivariate statistics and entropy concepts. Ultramicroscopy 34, 165178.CrossRefGoogle ScholarPubMed
Trunk, GV (1979). A problem of dimensionality: A simple example. IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 306307.CrossRefGoogle Scholar
Udell, M, Horn, C, Zadeh, R & Boyd, S (2016). Generalized low rank models. Found Trends Mach Learn 9, 1118.CrossRefGoogle Scholar
Wang, YX & Zhang, YJ (2013). Nonnegative matrix factorization: A comprehensive review. IEEE Trans Knowled Data Eng 25, 13361353.CrossRefGoogle Scholar
Ward, JH (1963). Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58, 236244.CrossRefGoogle Scholar
Winterstein, JP & Carter, CB (2014). Electron-beam damage and point defects near grain boundaries in cerium oxide. J Eur Ceram Soc 34, 30073018. doi:10.1016/j.jeurceramsoc.2014.02.017CrossRefGoogle Scholar
Yedra, L, Eljarrat, A, Arenal, R, Pellicer, E, Cabo, M, López-Ortega, A, Estrader, M, Sort, J, Baró, MD, Estradé, S & Peiró, F (2012). EEL spectroscopic tomography: Towards a new dimension in nanomaterials analysis. Ultramicroscopy 122, 1218.CrossRefGoogle ScholarPubMed
Yedra, L, Xuriguera, E, Estrader, M, López-Ortega, A, Baró, MD, Nogués, J, Roldan, M, Varela, M, Estradé, S & Peiró, F (2014). Oxide wizard: An EELS application to characterize the white lines of transition metal edges. Microsc Microanal 20, 698705.CrossRefGoogle ScholarPubMed
Zhang, D, Zhu, Y, Liu, L, Ying, X, Hsiung, C-E, Sougrat, R, Li, K & Han, Y (2018). Atomic-resolution transmission electron microscopy of electron beam–sensitive crystalline materials. Science 359, 675679.CrossRefGoogle ScholarPubMed
Supplementary material: File

Blanco-Portals et al. supplementary material

Blanco-Portals et al. supplementary material

Download Blanco-Portals et al. supplementary material(File)
File 26.9 MB