Clustering Google Display Placements Using Tf-Idf + KMeans

Stefan Neefischer

Posted on 18 May 2020

You can group display placements together based on their website content and judge elements even with low sample sizes. This will help you detect the good and bad performing ones and make adjustments.

Grouping display placements together based on their website content uncovers very well performing clusters along with some poor ones. The problem we solve with an approach like this lets us handle and judge elements with low sample size. By grouping together elements of the same nature, we suddenly have big numbers and we can run actions like blocking placements or adding positive patterns to our managed placement list.

Clustering Google Display Placements Using Tf-Idf + KMeans

For our solution we need this:

A large list of placement URLs out of Google Ads.
Python code for scraping all websites—putting all extracted words/n-Grams in a vector space and build Tf-Idf matrix. Instead of text, we have a vector now that describes the web page. Now we can easily run some computations.
Run kmeans clustering on Tf-Idf matrix.
Visualize clusters with word clouds.
Join performance data to every cluster.

We had to play around with the number of clusters that make sense (and some other settings in the Tf-Idf Vectorizer) but after a short while we got this:

These are just four examples clusters that had a poor performance after joining the Google Ads data. With that approach, it’s possible to scan hundreds of word clouds easily and maybe block them. If you have a problem with low sample sizes for your placements, you can also use the cluster performance (value per click) for estimating a good bid for your managed placement.

18 May 2020

Data Science in Digital Marketing, Google Display

Subscribe to Newsletter

Get the latest and most creative solutions for applied MarTech by subscribing to our Deep Dive Newsletter.

Subscribe Now

Clustering Google Display Placements Using Tf-Idf + KMeans

More Similar Posts

What exactly changed in the Search Term Report in September 2021?

Consent Mode V2 Implementation using GTM and Klaro

Free PEMAVOR tools for content writers that boost your content strategy

How to analyze winning and losing search patterns

How to write a good content from scratch

Python Script: Cluster Keywords into Topics using SERP Results

How to build a User-based Content Recommendation System for your Website with BigQuery (ML)

The 5 best free HTTP status code checker tools to track down problems

PPC Posts You May Have Missed This Week, December 8, 2021

PPC Posts You May Have Missed This Month, November 2023

Subscribe to Newsletter

Recent Posts

Categories