- Posted by neefischer
- On 10. September 2018
If you have just one account to manage the most common way is to set up some basic reports that show the daily development for your most important KPIs and your most important dimensions.
Let’s assume we have the following AdWords KPIs:
To keep it simple we just look at one dimension:
- Device (Mobile | Desktop | Tablet)
So in total we have 10*3 = 30 views on your data we have to monitor somehow – in other words: you have daily 30 cases where you have to decide whether it is an anomaly or not. In most cases this is done with a big dashboard (e.g. in google data studio) or in a spreadsheet with filterable pivot tables.
Manageable for 1 account. But what if you have 100? The number of daily cases increases to 100*30 = 3000 Cases. This number isn’t manageable anymore. To handle this problem somehow the account manager will start to reduce the number of KPIs and dimension and will prioritize on the most important accounts. Maybe the account manager also has to switch to a weekly or monthly outlier review process as a result of the high number of cases.
How to automate that process of anomaly detection?
To following tasks have to be solved somehow:
- We need a data pipeline with all relevant AdWords Reports that should be monitored
- We have to find a way to define a ruleset of items that are marked as “anomaly”
- Better than looking daily at a report is to push alerts by mail or Slack notifications if they really occur
Point 1.) and 3.) should be part of your BI infrastructure or you use SEAlyzer SaaS so let us focus on the approach for anomaly detection. For my approach I made the decision to an statistical approach instead of a machine learning solution mainly because of the easy of use:
- Better than fixed thresholds (e.g. CPO > 1.5*AvgCPO => In other words: if the CPO is 50% higher than average report an anomaly) is to use distributions or the standard deviation to calculate anomalies dynamically.
- In most accounts there is a different seasonality within a week – so we shouldn’t compare mondays with sundays but instead mondays with mondays
- I prefered IQR (Interquartile Range) method over standard distribution outlier detection methods because it’s more robust to outlier values that affect the global average. IQR explained in simple words: we are ordering all observations per KPI and build quartiles. We calculate the IQR = Quartile 3 – Quartile 1.
All values that are lower than the lower bound or greater than the upper bound are marked as outliers
Lowerbound = Q1– 1.5 ×IQR
Upperbound = Q3 – 1.5 ×IQR
Let’s build it all together to a running application
I used the SEAlyzer Data Fetch Service to store the Account Performance Report with a defined set of KPIs and Dimensions to Google Cloud Storage.
On file changes in the cloud storage bucket a google function is triggered that calculates the outliers and stores the result back to google cloud storage. In the past we had to use amazon lambda to write such python serverless microservices but since July 2018 google finally offers python support for google functions. 🙂
Things are getting more easy now:
- Load data
- Make some crazy stuff with scikit-learn, pandas, numpy, nltk, etc.
- Write the result back to cloud storage
- Show the data in SEAlyzer modules – and depending on the use case – publish actions via AdWords API to the accounts
A well known problem is brand bidding that causes a high average CPC – when you realize that someone is doing it you write mails to the competitor who is doing this and hopefully an agreement is found. That’s my example in the screenshot: Outliers in AverageCPC for a brand account
By drilling down to the outlier data you see the exact value – in this case the AverageCPC – for the outlier date but also:
- Standard Deviation
for the last 365 days. So you know exactly what the historical values looked like and why the anomaly detection was triggered.
Note: In the example I wanted to show an concrete use case and filtered by accounts and KPIs. By default the aggregation of outliers for all accounts, KPIs and dimensions is shown. One number tells you if something was wrong and you can drill down in details to identify actions.
If you want to see the solution in action with your own adwords data please write to me by mail firstname.lastname@example.org