Expert Insights

How to distill behavioral data to support marketing KPIs

Robert Børlum-Bach, Head of Data & Analytics, Artefact Nordics shares his insights about automating audience-scoring prediction through Google Cloud Platform and CRMint

Robert Børlum-Bach
November 11 · 5 min read

Do you want to tie your business objectives with your Google Analytics data better? Artefact's Head of Data & Analytics Nordics - Robert Børlum-Bach, is sharing how to use machine learning models to distill behavioral dimensions and support marketing KPIs.


This article describes an example of activating web analytics data based on one or more specific KPIs. The metaphor of distilling, concentrating, refining is introduced to describe the process of taking multiple data dimensions and boiling them down into a concentrate.

The example case is the modeling of a propensity (likeness) score of adding a product to the cart. This score of 0-100 is added back as a custom user dimension, enabling the creation of segments and audiences based on their propensity to add a product to the cart. This both helps better targeting and bidding through ad servers, but also content optimization and personalization.

Furthermore, the created dimension distillate can be used for training the next model for another KPI - thus both sharpening the media activation part and augmenting the data and understanding itself.

Objective: Distilling data to optimize against KPIs

In my personal experience, the connection between the business objectives and what is measured through web and app analytics tools has often been a bit unclear, or an obscured measure of apples and oranges.

The measurement plan or solution design requirement is essential for making the connection between the objective and the data measured. In a classic maturity model, this is the foundation - the first steps that have to be set before collecting, enriching or activating the data.

Sidenote: Google Analytics is a marketing tool, based on a classic funnel approach. And having events or goals specified as KPIs is essential.

The step after the KPI and planning phase is to collect data and to report on it, reactively. The buck often stops here. Why not use the data collected to actively improve the business objective behind the indicator?


Many business objectives are measured and supported by multiple data dimensions, that often are peripheral to the objective itself. Using machine learning models, we can distill these dimensions based on their importance to a KPI metric.

In my personal experience, the connection between the business objectives and what is measured through web and app analytics tools has often been a bit unclear, or an obscured measure of apples and oranges.

The add-to-cart propensity

In this case example, the business objective is to more intelligently use the marketing budget against a lower-funnel activity - the add-to-cart event.

We work with the concept of creating a custom data dimension, which gives us a score of how likely the user is to add a product to the cart (propensity). The data dimension’s scope is the user. Specifically, each clientId will be attributed a score of 0-100 on how likely they are to add a product to a cart.

Four generic audiences can be made for a start. Subsequent granular audiences can be made by mixing the created score with other relevant dimensions.

Users with scores from 0-25 have a very low propensity, this equals an audience segment that can either be excluded or analyzed more thoroughly. A score of 26-50 would be a low-to-mid segment, while the audience segments of 51-75 and of 76-100 are in focus for the marketing activities. Enabling adjustments of the bid activities to the expected value.

The technical bits

The setup is almost exclusively built in the Google Cloud Platform, with the exception of CRMint not being an official Google product, and where third party data is from a non-Google database.

The ingested data are from Google Analytics 360 with some joined user data from an external CRM system. The datasets are updated intraday in Big Query.

A specific data scheme is created in Big Query with the custom dimensions hypothesised having the biggest impact and attributes in creating the machine learning model. These dimensions include, but are not limited to:

  • clientId (!)
  • visitNumber
  • totals.timeOnSite
  • trafficSource.source
  • device.browser
  • device.deviceCategory
  • geoNetwork.region
  • hits.eCommerceAction.action_type
  • hits.customDimensions.index
  • hits.eventInfo.eventAction

The schema is used to train the model (a Tensorflow framework) - an iterative process of usually 2-3 cycles. The trained scoring model utilizes AI Platform for the subsequent deployment.

The facilitator for connecting the services is CRMint: A data pipeline tool that integrates and automates the flow - from Google Analytics 360 through GCP and to Google's advertising products (DV360, Google Ads, etc).

For marketers, the deliverables are clear: new audiences are available in the Google Analytics interface which can be pushed to the advertising products used.

The unique selling point in CRMint is reusable “workers” and a graphical user interface to better design and understand the pipeline, steps and job functions. From a personal side, there’s an integrated worker for importing the created dimensions back into the Google Analytics interface, so you don’t have to develop a Measurement Protocol or API service. Amazing stuff.


In the example, four pipelines have been created, each consisting of different steps or jobs. The first queries, prepare and export the trained data schema. The second “activates” the Tensorflow framework and ML Engine, exporting and formatting the model predictions in a data import scheme understandable for Google Analytics. The third pipeline imports said scores as user scoped custom dimensions in Google Analytics (querytime), while the fourth pipeline updates the audience scores.

The marketing benefit

For marketers, the deliverables are clear: new audiences are available in the Google Analytics interface which can be pushed to the advertising products used. In this example, the full suite of Google marketing products is used. This enables better bidding in Google AdWords, more specific targeting in DV360 and creating checkout flow tests in Google Optimize.

Let the data circulate

An additional value of using a customized setup like this one, and not “just” using built-in smart bidding algorithms or similar, is that the created dimension can be used to further improve a new data scheme and model, and so on, and so forth. This creates a data augmentation loop. The distillates created will often be strong inputs for other classification models. In a cooking analogy, the distillate is the fond, the stock - which defines the dish.

In the example, a natural next step is to use the propensity score as an input for predicting the product type or product category users are most interested in.


Does it work? Yes. With a well-documented setup, reusable code and repositories in Github utilizing CRMint, the development time regarding data engineering and data science can be held to a minimum - breaking even of only a few months with more intelligent marketing spend. The additional value created, such as personalising content using Google Optimise and using the created dimension, is ready to be harvested.

Similar Articles

5 tips to building your data layer

Josh West
October 14 · 6 min read time

Tag Management Architecture: Marketing Pixel Orchestration

Alban Gérôme
June 03 · 5 min read
Subscribe to newsletter

The latest insights and trends, sent to your inbox

Get informed with no-nonsense marketing and analytics tips and best practices from industry leaders worldwide.