- Posted by neefischer
- On 15. September 2018
What do I mean with “outlier segments”? Think of very large pivot table and you start for your most important KPIs to look at every possible pivot view to identify segments where e.g. the CPO is higher than normal.
Standard dimensions in the keyword performance report where SEA managers look at would be:
The set of KPIs to analyze performance outliers in segments might be:
Ok, let’s look at the possible combinations:
All combinations do not fit on this screenshot – but i can tell you the total: 119
For 4 KPIs this means 4*119 = 476 possible pivot views – for one account! Seems like automating this task is a good idea 🙂
Some more things to consider:
- If the sample size per segment is too low the number of outlier segments will be very high – so it should be possible to define a threshold for a minimal samplesize
- The performance data date range should also be flexible – short ranges will show possible outliers in big segments. If going up with the date rage also patterns in smaller segments can be identified.
- Standard dimensions are ok but but we should also use some derived features that google isn’t providing in their report. An example of a derived feature is word count of the keyword: wordcount(“hotels in berlin”) = 3
With that additional attribute we can generate new insights. In the past I found one example where the combination of a short wordcount + broad match + mobile device was very poor in performance. The reason was that the mobile page wasn’t that good – if the user had to do lots of navigation steps the conversion rate was going down. For the segment combination the traffic was very very generic. If the wordcount in the keywords went up the delivered landingpages were more specific and the mobile performance was ok.
How should an automated solution look like?
- It should be possible to provide parameters for:
- input_source: the raw report – e.g. daily fetched keyword performance report
- data_transformations: feature (dimension) generation: simple functions like wordcount in keywords but also fancy stuff like e.g. kmeans
- min_samplesize: segments are only shown in the output if they have more observations than this parameter
- max_depth: maybe we have to limit the depth of combinations because the number of combinations is exploding (this means the computation time is very high) – the likelihood of finding such “deep” segments with enough sample size is also very low.
- dimensions: the input set of dimensions where the combinations are made with
- kpis: the set of KPIs that should be aggregated for each segment, it should also be possible to add calculated KPIs
Ok – enough to build a first version. Let’s call this Function “PowerMiner” 🙂
- I decided to use an approach build on top of python pandas – in my case it’s no problem to load the report data of one adwords account into memory. For each account a new script instance is doing the calculation. In the end all is merged together…
- With combinations from the itertools module it’s easy to create the possible combinations
- The possible combinations are the input for groupings on the report data frame
- Results where the sample size is higher then the defined limit are stored to a new dataframe – an earlier approach with using dictionaries, where all KPIs where mapped to the key:value list, was very RAM intensive because there are so many key combinations possible with a high number of input dimensions – but most of them are far away form reaching the minimum sample size.
- KEYVALUE_LIST is the concatenated key of the grouping, KEY_LIST the column where the grouping was made with (I thought this would be usefull for filter possibilities in an UI-Interface for browsing the data). NOTE: The reason for this is that we have to merge all the results into one structure.
I had to XXX-out the sensitive CPO (Cost per order), CR (Conversion rate) and VPC (Value per Click) columns, but i think you can imagine that you see some very interesting results when sorting all the segment combinations by those columns.
And now think of doing this not only with the keyword performance report but also other reports like age, gender or location performance report. We can also add them to the result list when we have the same KPI structure. This is pretty awesome! Instead of going manually through every report we get one aggregated view that shows the most important segments.