Published online by Cambridge University Press: 09 November 2021
Cleansing personal data
We have focused a lot on classification and normalisation, but what about other data cleansing such as supplier or customer addresses? I have seen first hand how many duplicates can occur if you’re working with multiple systems.
I had a client with nine different sources of data, which when merged totalled 2.8 million rows of customer information. When I cleansed this, it reduced to 1.3 million rows of data. I certainly didn't do this in Excel, and there are a number of tools available that can help, but if you’re on a budget or need to get something done relatively quickly, I can show you how to achieve this using Excel.
It would be hard to cover all the different formats that addresses can come in. Quite often you will see a mixture of formats, such as everything in one cell or spread across many cells or with the information in the wrong column. I’ll be focusing on names, addresses and e-mails for this exercise, but you will most likely have telephone numbers as well.
I’ll share as an example what I’m going to call my sample customer list (Figure 5.1 on the next page). It's got a name and address and an e-mail column and there is a mixture of upper and lower case data, as well as sentence case. This is not unusual and I see this in a lot of files. Again, it's down to creating some standards and making sure everyone applies them.
As all the formulas used for this process are in previous chapters, there will be less illustrations in this chapter, but feel free to refer back to Chapters 1, 2 and 4 if you’d like some visual support.
Cleansing names in Excel
There are a number of different ways to approach this and it very much depends on what you need for your business. In my sample file, the names are all in one column. Is that how you want them to stay? Or do you want to split them out into first name and last name? This is most likely something that is decided elsewhere in the business, but it doesn't mean you can't use their standards and formatting and be consistent.
To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.