I will show you 2 approaches that will fix sampled data in your google analytics reports – also when you are using the free version of analytics.
Long time period + lots of dimensions = sampled data
If we have a situation where we want to report over a long time period for different dimensions we normally run into sampled data. For some cases, e.g. when you look at conversion data, this will give you just wrong numbers and lead to wrong actions. For that reason I highly recommend to always look at the “containsSampledData” field in the API Response if using the reporting API for fetching data. But which workarounds do we have if we get sampled data?
Use partitions over time to avoid sampling
Let’s say we want to look at a full year of reporting data and our api request is showing us that we have sampled data in the response the solution is to make more requests with smaller time periods and sum up all results in the end. In our example we would split the single request to 365 requests, one per day, and sum up the results. Job done!
This will work fine for the free google analytics and also for the premium version. Yes, you can access raw data in big query in analytics 360 – but sometimes the query statements become quite tricky on nested fields and it is more convenient to just use the API.
Expert tip for the google analytics free version: daily loading strategy for accessing raw data in big query
Ok – we already learned that we get unsampled data when we use daily partitions. But when using the API we still have the limitation of only 7 different dimensions per report.
To get real raw data over all dimensions we have to use another tweak. We will split up the requests for all dimensions we need. For every request we have to add the clientid and a custom dimension filled with the timestamp of the hit. On those two keys we can join the different tables in big query and we have raw data for all dimensions. Pretty cool, isn’t it?
Hint: In the past you had to write also the clientid to a custom dimension but it is now available in reports by using “ga:clientId”. Currently the API documentation does not tell you about this.
This approach is very good for power users with a lot of reportings over time and different dimensions.
Use it as a service with SEAlyzer
Maybe you have not a developer background and still want unsampled data without doing all the stuff manually then we may have a solution for you: At SEAlyzer we are doing this part for you and you can easily configure your data loading jobs or daily audit jobs based on that data. Interested? Then just get in contact with me: firstname.lastname@example.org