This post is an expansion on an excellent piece titled “Data Cleaning IS Analysis, Not Grunt Work” by Randy Au, here in the context of marketing analytics. In his article, Randy talks about an often undervalued aspect of any analytics journey: data cleaning. In most literature on this topic, data cleaning is seen as a necessary but menial step required to enable more valuable analysis down the line. What this fails to capture is that every decision that is made around data through the pipeline affects the type of analysis, and results, you will be able to do in later stages – whether it is mining for patterns in the data or refining your queries. In this blog post, we’ll expand on this topic in the context of marketing data, and how you best set yourself up for success in your analysis of data in later stages.
What is data cleaning?
The purpose of data cleaning is to ensure that the data set you are reporting on is of high integrity. This means that your data sets are properly mapped, standardized and normalized, deduplicated, and quality checked on a regular basis. As you can see, many (if not all) of the tasks involved in data cleaning require the user to understand the data set that they are working with, to be able to perform the necessary tasks.
Understanding and working with the data in such a way, is a gigantic task for any marketer or BI specialist. And even more so in a scenario where you are creating unified data sets based on different marketing and analytics platforms.
For most organizations then, data cleaning should be a collaborative task, where all transformation decisions are readily available for anyone to quality check and amend as needed.
Manuel errors play a role as well
Another aspect of data cleaning relevant to marketing analytics is that, even in a time of increased automation on the buying side, many reporting values are still generated based on manual input. Take UTM parameters for instance. For most ad platforms, these are still inserted on a manual basis or automatically applied based on a template that the buyer has supplied. But what if someone inserts these values incorrectly or your organization decides to change the format of how many variables you include in the campaign name?
Any change in the top layer has trickle-down effects further down in your reporting pipeline, and it’s critical to be prepared for potential changes by having a robust, and easily adjusted, logic in your transformation layer.
Country, Country or Country?
One example of how data cleaning is a form of analysis can be found when you are mapping your advertising spend and subsequent results to a specific country. At a glance, this might not seem like a task that should require some in-depth knowledge of the individual platforms or how your organization structures its marketing efforts. However, there are many ways to attribute a value for “country,” and whatever decision(s) you make should be clear to the person who consumes the data further down the analysis pipeline.
- When an advertising platform supplies a country-dimension in its reporting, it usually refers to where the user that saw the ad was located.
- As an advertiser, you might also indicate in a campaign or ad name which country it targeted and extract this value from there.
- Alternatively, you might have an advertising account dedicated to a specific country which also lets you derive a country definition from the reporting data that originates from a particular ad account.
As you can see, you might well end up with 3 different values for something like “country” for a single row of reporting data, and they would all potentially be accurate depending on what you are trying to convey. This example highlights the importance of proper attention to data cleaning and making these choices clear to whoever consumes the data in later steps.
Funnel’s approach to data cleaning
Now that we have established that data cleaning is an essential part of any data pipeline, let’s look at some of the steps Funnel takes to facilitate this very important task:
Storing the original data
When you create new dimensions and metrics in Funnel, you never overwrite the original underlying data sets. This means that if your definitions change, or errors are detected, you are always able to edit existing fields of data, or create new ones, without having to re-download the entire data set again. Sometimes, this is not even possible due to the original platform's data retention rules.
Business user-friendly UI
Data cleaning is defined by big changes (remove bad observations, fix errors, fill in missing values) or small (removing trailing whitespace, normalizing date formats), and Funnel’s proprietary data model takes care of many of these tasks and more right “out of the box.” On top of this, you have a powerful transformation layer where users are able to apply additional business logic without having to write any code.
Ability to easily explore your data
The mapping, standardization, and normalization of large data sets is not an easy task, so the ability to test your transformation rules prior to exporting the data is very important. With Funnel’s Data Explorer, you have the ability to easily query your custom metrics and dimensions to make sure that they represent what you intended them to do before exporting them onward.
Different dates - Different rules
As discussed earlier, many fields used for marketing reporting are based on manual input (such as campaign names). If you use campaign names to signal characteristics of the campaign, such as targeted region, season, objective etc., you want to have the ability to isolate these characteristics in your reporting. That way, you can harmonize these across different campaigns and advertising platforms. What tends to happen, and rightfully so, is that these structures change over time as organizations include more and more characteristics. In Funnel, you are able to set rules for your custom fields that are unique to different date ranges, so that you are not limiting your ability to innovate on something like a campaign name structure for fear that it will be detrimental to your analysis down the line.
Setting your organization up for success
I hope that this article has conveyed that data cleaning is not merely a menial task that should be reduced at all costs, but rather a critical step in any analytics journey that deserves more care and attention. Whenever we make decisions with data (like what to include, exclude or how we define it), inevitably impacts whatever we are able to use the data for in later steps.
If you already have a setup in place for cleaning your marketing data, make sure that it enables you and your team to have an overview of the transformation logic applied and that you have a process for recurring quality checks. Always keep the original data set stored separately and intact so that you have a chance to apply new transformation logic without polluting the original data set. This way, you can continue improving your efforts without having to re-download the data set from the platform each time, which often isn’t possible.
If you would like to get in touch with us to learn more about how we work to help marketers and BI professionals with the data cleaning step of marketing analytics, feel free to reach out to us via our contact forms for a conversation or a demo.