In a rush? Check out the TL;DR with the most common causes of data discrepancies
First, a bit of context
Over the past two decades, digital marketing has been steadily rising in ranks to ultimately become a pillar of every business venture. While seldom receiving as much attention, web traffic data collection and processing are heavily intertwined with one’s ambition to promote products and services. Ask yourself: could you ever make any constructive evaluation of successes and failures of your web advertising campaigns without knowing where your users come from and what they are doing on your website?
Even just a few years ago, marketers didn’t have to think twice about their data collection efforts. While there would always exist some stumbling blocks to avoid, they were mostly pertaining to individual tools and services. Did you remember to set UTM tags when choosing the landing page your advertisement would direct your future customers to? Is your data collection script initialized correctly on every page of your website? Without investing an overwhelming amount of time, advertising managers used to be able to go through an established checklist of general issues and their worries would end with the last item being ticked off.
Unfortunately, the internet has become a whole lot more complicated since then. These days, websites commonly combine full bags of different packages, many of which interact and directly affect each other’s functionality. As a result, misinformed decisions and mishandled implementations may cause massive disparities between what users do on the website and what information is collected on these interactions. On top of that, browsers themselves became much smarter and began to evaluate compiled content and restrict (or even actively block) anything that might be perceived as insecure or privacy-invading.
For instance, you can consider the case of a consent banner. Without giving it a second thought, you might view it as a simple form with three buttons showing up at the beginning of a user’s journey that may be forgotten thereafter. With tens of ready-made solutions flooding the market, however, it is tough to know exactly what is happening under the hood. Does your consent banner attend to your GTM container in any way? Is it blocking executions of all scripts when a user refrains from giving their data collection consent? Or does it maybe block outgoing requests to common data collection endpoints instead? Does it rely on Google Consent Mode commands being added to the data layer? Does it purge all cookies and other persistent browser storage available to the website at every load? Or is it possible that it does nothing at all? What if the only functionality of your consent banner is to save preferences in a cookie, expecting the web developer to ensure that the website is compliant with law based on the value stored in this cookie?
These and many other questions arise not only when speaking of consent banners, but also other essential parts of every website out there, including the set-ups of GTM containers, content security protocols, and even content delivery services. Some integral parts of this ecosystem, such as ad blocking extensions and privacy settings in browsers, you might have no control over. Still, there are more effective ways to deal with their impacts, depending on your tracking goals.
What follows are three hypothetical (but not uncommon) situations where traffic attribution gets severely impacted, thus heavily limiting one’s ability to properly allocate advertising budget.
Scenario #1
Imagine that you work for a marketing agency partnering with major businesses to help them manage paid advertising campaigns. Recently, a new client dropped into your lap, hoping to collaborate with you on a series of articles and interviews they would like to see published across a selection of outlets that you have partnered with in the past. As this is your first time working for the company, trust is still being established and the client is not ready to provide you with direct access to their data. Instead, you receive view-only access to a preconfigured PowerBI dashboard which should contain all the data needed for you to evaluate the successes and failures of your articles. Most importantly, the client clearly specifies which tracking parameter-value combinations must be included in the links directing to their website from your articles to ensure tracking.
The cooperation between you and your new client starts out smoothly. As you roll out the articles, however, you become quite disenchanted by the traffic coming to your client’s website through your campaigns. While you would expect to see thousands of people flowing in, only hundreds of sessions appear in the provided dashboard with the parameters designated to your campaign. Unsurprisingly, the relationship turns sour very quickly, with everyone trying to shift the blame. Based on your publisher’s claims, you have a hard time believing the numbers you are seeing in the dashboard. On the other hand, your client makes a good point that tracking clearly works, otherwise there would be no data at all.
In the meantime, a simple notice of the complete absence of new users appearing as part of the campaigns could have told most of the story. To strengthen user privacy, the implemented consent banner seeks and blocks all outgoing HTTP requests to known tracking servers as long as no consent is given by the user. When a user agrees to cookies, the consent banner reloads the web page in order to purge the created tracking blockers. Unfortunately, such reload also purges query parameters from the URL, thus preventing tracking tools to register the session source for users who visit the website for the first time.
Scenario #2
Now, let’s try on the boots of an experienced marketing manager at a large governmental organization. In order to set an example for others, the agency mandated a privacy-first tracking set-up across all of its domains. It goes as far as to require a complete absence of all personal identifiable information in the collection process. Although this approach heavily improves the user experience, it makes your job a lot more difficult in the process. To collect the data, the websites use a shared self-hosted instance of Piwik PRO, with settings tweaked such that nothing beyond session-level data is ever recorded. Without your knowledge, your servers are also equipped with a redirection hub, where all traffic passes through before being allowed to retrieve any content. Each incoming session is evaluated based on where it came from, and any suspicious activity is blocked from accessing your websites’ contents.
However, the IT department made a critical oversight during the proxy server configuration. While successfully implementing the proxy redirection, they failed to account for preservation of query parameters within the URLs. Naturally, the redirection page itself, designed solely for proxy routing, lacks any Piwik PRO tracking code. Thus, upon redirection, the UTM parameters are effectively stripped from the URL, resulting in the loss of attribution data. Therefore, when users finally reach the intended web pages, where your campaigns are leading towards, Piwik PRO records their visits without any information on the original marketing source. Moreover, the referrer field always stores the corporate proxy server’s domain, providing little to no insight into the true origin of the traffic.
Scenario #3
Finally, try to picture yourself in the position of a starting freelancer. Due to limited resources, you have to rely on your own skills to take care of everything. From what you have learned, there is nothing more straightforward than choosing from the wide variety of templates available for setting up a simple WordPress site. You also find that the Google Marketing Platform provides the most cost-effective tracking and advertising set-up. With that in mind, you choose a template and let it guide you through every step of the way. To your pleasant surprise, it even instructs you on how to set up accounts in Google Ads and Google Analytics 4. Without breaking a sweat, you fill out the provided tracking identifiers in a pre-installed WordPress plugin to set everything up. The template employs yet another plugin to serve your website with an up-to-standard consent banner. According to the template, it fits your tracking set-up perfectly as it uses the most up-to-date version of Google Consent Mode to handle consent management, as recommended by Google.
Having designated some money to spreading the word about your endeavor through Google Ads, you soon become interested in how well Google spends your money. At that point, you notice that the clicks reported by Google Ads massively outnumber the sessions that are associated with your campaign in Google Analytics 4. This makes you wonder whether Google does not inflate the number of clicks reported since you are paying them for every click on your advertisement. Should you just blindly trust Google? That being said, you also see that the total number of visits to your website pretty much matches the numbers of clicks reported in Google Ads and you are happy with the traffic, so you decide not to invest more of your precious time into your little investigation.
What you may never find out is that the issue lies in the way your consent banner communicates consent of your users. When a user arrives at your website, a tracking request is sent immediately to Google Analytics 4. Among other things, the request carries information on what brought the user to your website and whether they consented to analytical and advertising data collection. This happens even before the user gets a chance to express their consent by interacting with the consent banner. With this piece of information missing, the tracking request is set to assume that all consents are rejected by default. As such, Google’s servers drop the incoming request upon receiving it. Unfortunately, there is nothing that would cause a replacement tracking request to be sent automatically once the user confirms they are willing to share their data. Consequently, Google Analytics 4 starts tracking such sessions on the following page instead, where all information about the user clicking on your advertisement is already lost.
Conclusion
Remember the simple checklist that advertising managers could compile and follow in the past? It never really went anywhere. However, it expanded rapidly with the introduction of new technologies and regulatory requirements. Actually, it grew so much that uncovering discrepancies in data and where they come from is near impossible without having an expert on the topic by your side. Luckily, we can be that expert! Reach out if you would like to discuss how we can help you optimize your data collection practices and improve your insights.
Questions?
Don’t hestitate to contact us if you have any questions or remarks. You can use the contact form below to get in touch with our engineers: