Today I want to share an approach that can group display placements together based on their website content. The initial idea was that there will be very well performing clusters but also some poor ones.

The problem we solve with approaches like that is that we can also handle and judge elements with low sample size. By grouping together elements of the same nature we suddenly have big numbers and we can run actions like blocking placements or adding positive patterns to our managed placement list.

For our solution we need this:

  • A large list of placement urls out of google ads
  • python code for scraping all websites – putting all extracted words/ngrams in a vector space and build tf-idf matrix. Instead of text we have a vector now that is describing the webpage. Now we can easily run some computations.
  • Run kmeans clustering on Tf-Idf matrix
  • Visualize clusters with word clouds
  • Join performance data to every cluster

I had to play around with the number of clusters that make sense (and some other settings in the Tf-Idf Vectorizer) but after a short time I got this:

This are just 4 example clusters that had a poor performance after joining the google ads data. With that approach it is possible to scan easily hundreds of word clouds – maybe you block them – or if you have a problem with low sample sizes for your placements you can e.g. also use the cluster performance (value per click) for estimating a good bid for your managed placement.