Data discrepancy analysis workflow

We have discussed the possible sources of data discrepancies. Here, you will find an example of how stark the differences between analytic tools may look when comparing the data they collected. Often, however, things may not be nearly as obvious to the naked eye. Then, it is necessary to take a few considerate steps to unveil systematic inconsistencies that you may be facing. Let us share with you our workflow that we use for this purpose.

Below the PDF, you will find a short description of each phase:

    / [Open PDF]

Exploration:

There are many lessons to be learned from the traditional research process. If you are reading this article, you are surely wondering already how the data you have collected across your analytic tools compares. So there is your problem. A simple problem statement, however, is never enough. Instead, you need to form concrete research questions that can be verified through statistical methods. Usually, a good research question is one that can be answered in a one-word sentence. At the same time, it should be a question that is relevant to your day-to-day activities as you need to have an idea of which data sources you use to collect the information.

For example, let’s say that you store information about client orders in Salesforce. At the same time, you use these orders as conversions to optimize your paid campaigns in Google Ads. Then you can pose the following question: Is there a significant difference between the number of client orders registered in Salesforce and Google Ads?

Implementation:

Having formed a few such hypotheses, your focus can start shifting towards the actual data. Without exceptions, it is best to have at least one data source that you can trust without reservations. To give a simple example, there are many ways in which a simple form submission may never reach the servers of Google to be registered in Google Analytics 4. On the other hand, you can be certain that every form submission that you care about actually reaches your own systems. In other words, you have a source of true data that you should benchmark all of your other tracking tools against.

Very often, the source of true data will be your CRM system. However, only final conversions that require other actions to be taken in response are usually registered there. What if you want to simply check that you have implemented your LinkedIn Ads tracking properly? At that point, it is usually best to take a step back and consider implementing another tool that can help you obtain at least a sample of unbiased data. Consent management is a very serious topic that can often lead to data discrepancies if mishandled. Therefore, we consider the best option to opt for a fully anonymized tracking solution that you can install as a clever set of counters actively looking for and excluding web scrapers and other undesirable actors from your datasets. You would find PostHog or Plausible Analytics as our preferred options in these scenarios. Once ready, all you need is to start collecting the data and after a reasonable period of time passes, export all of it into a unified dataset.

Analysis and reporting:

Do you have all your data ready in a single place? Then you should be all set to start employing statistical tests to verify your hypotheses! However, the process rarely stops there. Along the way, you may come across unforeseen inconsistencies. You might notice that a significant shift in data collection occurred at the end of last year. But where did it come from?. Or you will find that suspiciously few conversions are being associated with your mobile app. So what if the issues you are facing are device-specific? Questions like that may push you back to square one as you will often need to collect an entirely new set of data, else you might never really understand what is happening under the hood.

But, of course, we can also help you with all that and guide you through the process. Don’t hesitate to contact us, we will be happy to take a look!