Clustering Google Display Placements Using Tf-Idf + KMeans

Stefan Neefischer

Posted on 18 May 2020

You can group display placements together based on their website content and judge elements even with low sample sizes. This will help you detect the good and bad performing ones and make adjustments.

Grouping display placements together based on their website content uncovers very well performing clusters along with some poor ones. The problem we solve with an approach like this lets us handle and judge elements with low sample size. By grouping together elements of the same nature, we suddenly have big numbers and we can run actions like blocking placements or adding positive patterns to our managed placement list.

Clustering Google Display Placements Using Tf-Idf + KMeans

For our solution we need this:

A large list of placement URLs out of Google Ads.
Python code for scraping all websites—putting all extracted words/n-Grams in a vector space and build Tf-Idf matrix. Instead of text, we have a vector now that describes the web page. Now we can easily run some computations.
Run kmeans clustering on Tf-Idf matrix.
Visualize clusters with word clouds.
Join performance data to every cluster.

We had to play around with the number of clusters that make sense (and some other settings in the Tf-Idf Vectorizer) but after a short while we got this:

These are just four examples clusters that had a poor performance after joining the Google Ads data. With that approach, it’s possible to scan hundreds of word clouds easily and maybe block them. If you have a problem with low sample sizes for your placements, you can also use the cluster performance (value per click) for estimating a good bid for your managed placement.

18 May 2020

Data Science in Digital Marketing, Google Display

Subscribe to Newsletter

Get the latest and most creative solutions for applied MarTech by subscribing to our Deep Dive Newsletter.

Subscribe Now

Clustering Google Display Placements Using Tf-Idf + KMeans

More Similar Posts

Analyzing Google Trends Data to get Covid insights for Germany:

Why is IP score important for B2B?

Import offline conversions to Bing Ads automatically (Full script solution)

Account-based marketing (ABM): How targeting high-value accounts can boost ROI

B2B programmatic advertising: Improve your business with efficient targeting

N-gram analysis in PPC

Negativation strategies for bad traffic segments in Google Ads

Getting started with Simple ML for Sheets

Subscribe to Newsletter

Recent Posts

Categories