Entity recognition in google ads

Today I want to talk about entity recognition on search terms in google ads. There are several great use cases of how you can use the results for optimizing your google ads accounts.

What does “entity” mean?

Based on your business you will have different entities – let’s say you sell products. For each product you normally have a brand name, specific product identifiers, color, size, etc.
I’ll bet you are already using entities for deriving keywords based on product feed data by concatenating different columns. Now think of inverting the whole process of keyword generation. You start with all types of user queries and you want to extract components to get structured queries.

How to use discovered entity patterns?

Google Shopping campaigns or dynamic search ads are a great way to discover new relevant search queries, of course there are also a lot of bad ones that should be blocked. Using n-gram analysis give great insights but sometimes it is not enough. Mapping all performance data to extracted entities will give you new insights:

  • There will be bad patterns that should be blocked. One frequent entity pattern in queries is this (for bigger companies):
    [%yourBrand%] [%first name or surname%]
    What is happening here? Users search not for your brand because of buying something – they are searching for people working in your company
    => There are thousands of names than can be looked up in databases – each name often has not enough clicks to be discovered in ngram analysis – with the entity aggregation we are able to see these patterns! A common action would be to negativate those words.
  • For a second great use case think of a user driven account structure. Based on well performing entity patterns you can easily derive a logical grouped account structure in a granular way.

How to build a custom entity database? A step by step guide

  1. A great starting point is your product master data – a lot of attributes can be easily accessed by using existing product feeds (e.g. the ones you use for google shopping)
  2. Use your domain knowledge to entities: competitor names, cities, transactional keywords, …
    There are a lot of lists out there that can be used for this.
  3. Enrich your list of step 1) and 2) with automatically detected close variants and similar entities. I do this by using stemming algorithms, distance algorithms and neural networks (word2vec implementation)

Puh – when reading this sounds like boring theory – let’s look a real word example for a customer of us that is selling software.

I feed the system with a first entity “operating system” and assign two values: “windows” and “linux”. That’s it.
This is what I get when we query our neural net:


Without any knowledge of linux we get a list of linux/unix distributions and misspellings out of the neural network that was trained with a full year of customers search queries.

I mentioned that I’m no expert in linux distributions, some distributions I know like “debian” or “ubuntu”. After a quick google research for the unknown words I added everything except “game” and “bash” to my list of operating systems. Pretty nice, isn’t it?

I know this is some initial work but it is worth it! And remember: the better your initial input is (e.g. your product attributes) the better the entity recognition will be. In the end we have a fully tailored entity lookup list that can be used to label every search query with found entities.

In our simple example I’m looping over all search queries – whenever I find a key that is contained in our entity database I assign the full query a tag with “Operating System”. Of course also multiple tags will be very common.

New insights for “low sample size” elements

What is the benefit of using this approach? I want to give you an example in identifying negative keywords where one bad pattern is that a online, desktop, mobile, etc. game is part of the search query:

  • Filter for performance outliers on query level with enough sample data => 0 results
  • Filter on 1-grams with enough sample data => 1 result: “fortnite”
  • Query the neural net with “fortnite”:
With “fortnite” as input we were able identify > 100 other online games with low sample size that were hidden before.

In total the savings of the “hidden” games were many times higher than our input “fortnite”. Of course these games will pop up after some months as bad one grams – but then the budget is already lost.
With that approach we are able to set negatives in a very early stage.

Currently we run python scripts for those analysis for some of our big customers. If you are interested in a web based application doing the job – please tell me – we currently thinking about releasing a beta for this.