From smartphones to tablets to laptops and even to supercomputers, data is being collected and produced. With so many bits and bytes, data analytics and data mining play unprecedented roles in computing. Linear algebra is an important tool in this field. In this chapter, we touch on some tools in data mining that use linear algebra, many built on ideas presented earlier in the book.
Before we start, how much data is a lot of data? Let's look to Facebook. What were you doing 15 minute ago? In that time, the number of photos uploaded to Facebook is greater than the number of photographs stored in the New York Public Library photo archives. Think about the amount of data produced in the past two hours or since yesterday or last week. Even more impressive is how Facebook can organize the data so it can appear quickly into your news feed.
Slice and Dice
In Section 8.3, we looked at clustering and saw how to break data into two groups using an eigenvector. As we saw in that section, it can be helpful, and sometimes necessary for larger networks, to plot the adjacency matrix of a graph. In Figure 11.1, we see an example where a black square is placed where there is a nonzero entry in the matrix and a white square is placed otherwise.
The goal of clustering is to find maximally intraconnected components and minimally interconnected components. In a plot of a matrix, this results in darker square regions. We saw this for a network of about fifty Facebook friends in Figure 8.5 (b). Now, let's turn to an even larger network. We'll analyze the graph of approximately 500 of my friends on Facebook. We see the adjacency matrix visualized in Figure 11.2 (a). Here we see little organization or a pattern of connectivity with my friends. If we partition the group into two clusters using the Fiedler method outlined in Section 8.3, after reordering the rows and columns so clusters appear in a group, we see the matrix in Figure 11.2 (b).
To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.