- Posted by neefischer
- On 12. September 2018
Making use of the search queries that appear in your Adwords auctions offers a high potential for optimizing your business. If you millions of them you can’t do it manually. In this post I will talk about doing this job in an automated way that scales well also for Gigabytes of search queries.
How to start? Here are some points that sound as a good starting point:
- Normally there are ready to use lookup lists available for every business. For example brandnames, product categories, etc. So without much effort we can use them already as a lookup base. If one or more matches can be made the tags are added to the search query. E.g. we have the query “hugo boss parfum” we could lookup “hugo boss” in our brand list. The tag brand:hugo boss would be added. With that key:value tag structure we can later analyze the search queries on key level or even more detailed on the specific values.
- To further improve the tagging coverage we have to build up own lists. A good strategy is to count words and n-grams that can be found in the queries. If we start to tag the most frequent phrases this will also mean that the number of tags will increase very quickly. Let’s say the color “red” appeared in 1000 different search terms. After adding the tag color:red or more generic productattribute:red this results in 1000 new tags in our search term list.
- The result is already good but we can do better! Let’s use the existing classifications and query a word2vec model to find and tag semantic similar words. With that approach we will cover synonyms and also misspellings very well.
Some thoughts on the implementation:
- if the number of queries is very high performance might be an issue. For this reason i prefered the python flashtext module over a regex approach.
- depending on the number of queries the n-Gram dictionaries might get very big. This is a problem when hosting the solution in amazon lambda or google functions
How should the result look like?
- All relevant KPIs that are available on search term level, e.g. Costs, Clicks, Conversions, etc., should be grouped on your tags: so our example “hugo boss parfum” will be shown as brand | category. In that grouping also search queries like “chanel lipstick” will be included.
- not classified words should be shown as ???. This means also tags like brand | category | ??? are possible.
- If we later group on the resulting tags it makes sense to sort the taglist into alphabetical order – otherwise there would be more than one line in the grouping for the same key.
- There has to be some information about completeness of tagging. If we e.g. are able to cover 80% of the clicks with a full tagging this is maybe good enough and we can stop to increase our tag lookup list.
If there are no tags at all the output would look like this in the beginning
- ??? => Search queries with one word
- ??? | ??? => Search queries with two words
- ??? | ??? | ??? => Search queries with three words
We know the total clicks (you can also use costs or word counts) of all search queries and can compare them with the number of clicks of tags that contain only ???, in other words: the ratio is the percentage of where we have no coverage at allAfter adding more an more tags there will to more states:
- Partial coverage: brand | ???
- Full coverage: brand | category
In total we have 3 KPIs that will show as the level of completeness of our tagging – based on that numbers everybody can decide by their own whether to add more tags or stop that.
Let’s start building the application now… 🙂