Learn how to build a powerful multi-touch attribution model in GA4 using Google BigQuery. Uncover the true value of each marketing channel.
In the world of digital marketing, understanding the true value of each marketing channel is crucial for optimizing campaigns and maximizing return on investment (ROI). Multi-touch attribution is a powerful technique that allows marketers to assign credit to various touchpoints along the customer journey, providing a more comprehensive view of how different channels contribute to conversions.
While ad networks offer default attribution models, these often have limitations. They typically show the effectiveness of each channel in isolation, without considering how visitors interact with multiple channels and touchpoints before converting. This can lead to an incomplete picture of channel performance and suboptimal marketing decisions.
Enter Google Analytics 4 (GA4) and BigQuery. By combining the robust data collection capabilities of GA4 with the power of BigQuery's cloud-based data warehousing and querying, marketers can build sophisticated multi-touch attribution models that provide a more accurate and nuanced understanding of channel performance.
In this blog post, we'll explore how to unlock the power of multi-touch attribution in GA4 using BigQuery. We'll delve into the benefits of custom attribution models, walk through the process of setting up your data pipeline, and share practical examples of how to implement multi-touch attribution logic using SQL queries. By the end, you'll have the tools and knowledge needed to create your own multi-touch attribution model and uncover valuable insights to optimize your marketing efforts.
To build a multi-touch attribution model in GA4 using BigQuery, the first step is to ensure that your data pipeline is set up correctly. This involves capturing website visitor interactions in BigQuery and then stitching together these interactions to create a complete picture of each user's journey.
GA4 offers a native integration with BigQuery, allowing you to easily export your website visitor data into a BigQuery dataset. To set this up, follow these steps:
Once the integration is set up, GA4 will automatically export your website visitor data to BigQuery on a daily basis. This data will include information such as page views, events, user properties, and more.
To transform and analyze your GA4 data in BigQuery, you can leverage powerful tools like dbt (data build tool), MetricMaven, and Looker Studio (formerly Google Data Studio).
To build a multi-touch attribution model, you need to stitch together user interactions across multiple sessions and channels. This can be accomplished using SQL queries in BigQuery.
Here's a high-level overview of the steps involved:
By following these steps and leveraging the power of BigQuery and SQL, you can create a robust multi-touch attribution model that provides valuable insights into the performance of your marketing channels.
In the next section, we'll dive deeper into the specific SQL queries and techniques used to implement multi-touch attribution logic in BigQuery.
Copy and paste the query below in the Google BigQuery editor and click "Format Query" in the top right.
WITH stitch AS (SELECT ga_session_id as session_id,lower(source) utm_source,lower(medium) utm_medium,lower(campaign) utm_campaign,FIRST_VALUE(event_timestamp) OVER(PARTITION BY ga_session_id ORDER BY event_timestamp) session_started_at,LAST_VALUE(event_timestamp) OVER(PARTITION BY ga_session_id ORDER BY event_timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) session_ended_at,COUNT(DISTINCT ga_session_id) OVER(PARTITION BY user_pseudo_id) AS sessions_count,user_pseudo_id,case when REGEXP_CONTAINS(event_name, 'your converion name') THEN 1 END conversion_session,CASE WHEN event_name = 'page_view' then 1 end AS page_views FROM `your-project-id.your-dataset-id.events_*`
),sessions_attrib AS (SELECT session_id,user_pseudo_id,COALESCE(utm_source, '(not set)') utm_source,COALESCE(utm_medium, '(not set)') utm_medium,COALESCE(utm_campaign, '(not set)') utm_campaign,session_started_at,session_ended_at,conversion_session,SUM(page_views) pageviews,MIN(CASE WHEN conversion_session = 1 THEN session_started_at END) OVER(PARTITION BY user_pseudo_id) AS first_conversion_time FROM stitch GROUP BY 1, 2, 3, 4, 5, 6, 7, 8),conversion_sessions AS (SELECT sa.*,CASE WHEN first_conversion_time BETWEEN session_started_at AND session_ended_at THEN TRUE ELSE FALSE END AS conversion_session,CASE WHEN first_conversion_time > session_started_at THEN TRUE ELSE FALSE END AS prospect_session FROM sessions_attrib sa),data AS (SELECT ,CASE WHEN session_id IS NULL THEN 1 WHEN session_id = LAST_VALUE(session_id) OVER (PARTITION BY user_pseudo_id ORDER BY session_started_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) THEN 1 ELSE 0 END AS LAST_click_attrib_pct,CASE WHEN session_id IS NULL THEN 1 WHEN session_id = FIRST_VALUE(session_id) OVER (PARTITION BY user_pseudo_id ORDER BY session_started_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) THEN 1 ELSE 0 END AS first_click_attrib_pct,CASE WHEN session_id IS NULL THEN 1 ELSE 1/COUNT(session_id) OVER (PARTITION BY user_pseudo_id) END AS even_click_attrib_pct,CASE WHEN session_id IS NULL THEN 1 WHEN timestamp_micros(session_started_at) <= TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 7 day) THEN 1 WHEN timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 14 day) THEN .5 WHEN timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 14 day) AND timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 21 day) THEN .25 WHEN timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 21 day) AND timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 28 day) THEN .125 ELSE 0 END AS time_decay_attrib_pct FROM conversion_sessions),AggregatedPaths AS (SELECT user_pseudo_id,array_to_string(ARRAY_AGG(utm_source ORDER BY session_started_at DESC), ', ') AS utm_source_path,array_to_string(ARRAY_AGG(utm_medium ORDER BY session_started_at DESC), ', ') AS utm_medium_path FROM data GROUP BY user_pseudo_id)SELECT d.,ap.utm_source_path,ap.utm_medium_path FROM data d JOIN AggregatedPaths ap ON d.user_pseudo_id = ap.user_pseudo_id
Now that we've set up our data pipeline and have our GA4 data flowing into BigQuery, let's dive into the SQL queries and techniques used to implement multi-touch attribution logic.
The SQL query shared in the previous section is a comprehensive example of how to build a multi-touch attribution model in BigQuery. Let's break it down step by step:
The first common table expression (CTE) named stitch
extracts relevant session data from the GA4 BigQuery export. It selects the following fields:
ga_session_id
as session_id
source
, medium
, and campaign
(in lowercase) as utm_source
, utm_medium
, and utm_campaign
respectivelyFIRST_VALUE
and LAST_VALUE
of event_timestamp
to determine the session start and end timesCOUNT
of distinct ga_session_id
to calculate the total number of sessions per useruser_pseudo_id
to identify unique usersconversion_session
flag based on specific event names (using REGEXP_CONTAINS)page_views
flag based on the 'page_view' eventThe sessions_attrib
CTE builds upon the stitch
CTE and performs session-level attribution. It:
stitch
CTEutm_source
, utm_medium
, and utm_campaign
with '(not set)' using COALESCEpageviews
per session using SUMfirst_conversion_time
for each user using MIN and CASE statementsThe apps_sessions
CTE further refines the session data by identifying prospect and application sessions based on the first_conversion_time
. It:
sessions_attrib
CTEapplication_session
if the first_conversion_time
falls between the session start and end timesprospect_session
if the first_conversion_time
is greater than the session start timeThe data
CTE is where the actual attribution logic is implemented. It calculates the attribution percentages for each session based on various attribution models:
LAST_click_attrib_pct
: Assigns 100% credit to the last session before conversionfirst_click_attrib_pct
: Assigns 100% credit to the first sessioneven_click_attrib_pct
: Distributes credit evenly among all sessionstime_decay_attrib_pct
: Assigns credit based on the time delay between the session and conversion (using predefined time intervals and weights)The AggregatedPaths
CTE aggregates the user paths by concatenating the utm_source
and utm_medium
values in the order of session start time (descending) for each user.
The final SELECT statement joins the data
and AggregatedPaths
CTEs based on the user_pseudo_id
and selects all fields from data
along with the aggregated user paths (utm_source_path
and utm_medium_path
).
This SQL query provides a solid foundation for building a multi-touch attribution model in BigQuery. It demonstrates how to extract relevant data from GA4, attribute sessions based on conversion events, calculate attribution percentages using different models, and aggregate user paths for analysis.
You can further customize and extend this query based on your specific attribution requirements, such as incorporating additional data sources, refining the attribution logic, or calculating monetary values associated with each conversion.
By leveraging the power of BigQuery and SQL, you can gain valuable insights into the effectiveness of your marketing channels and make data-driven decisions to optimize your campaigns.
Once you have implemented your multi-touch attribution model in BigQuery using SQL, the next step is to visualize and share the insights with stakeholders. Looker Studio (formerly Google Data Studio) is a powerful tool that allows you to create interactive dashboards and reports based on your BigQuery data.
To visualize your attribution results in Looker Studio, you need to connect your BigQuery dataset to Looker Studio and create a data source. Here's how:
With your data source set up, you can now create reports and visualizations to showcase your attribution insights. Here are a few examples of reports you can create:
To make your attribution insights more interactive and accessible, you can create a Looker Studio dashboard that allows users to select different attribution models on the fly. Here's how:
By creating interactive dashboards in Looker Studio, you can empower stakeholders to explore attribution insights on their own and make data-driven decisions based on their preferred attribution methodology.
Remember to keep your reports and dashboards clear, concise, and visually appealing. Use appropriate charts and graphs to convey insights effectively, and provide context and explanations where necessary.
In the next section, we'll discuss how to leverage the insights from your multi-touch attribution model to optimize your marketing campaigns and drive better business outcomes.
Multi-touch attribution provides a more comprehensive view of channel performance by considering the entire customer journey, from the initial touchpoint to the final conversion. By assigning credit to each touchpoint based on its contribution to the conversion, you can gain a deeper understanding of how different channels and campaigns work together to drive results.
Armed with these insights, you can make data-driven decisions to optimize your marketing spend and tactics. For example, if you find that certain channels consistently drive high-quality leads that convert well, you may want to allocate more budget to those channels. Conversely, if a particular campaign or channel is not delivering the desired results, you can adjust your strategy or reallocate resources to more effective initiatives.
It's important to remember that there is no one-size-fits-all attribution model. The best model for your business will depend on factors such as your industry, customer behavior, and marketing goals. Therefore, it's essential to experiment with different attribution models and compare the insights they provide. By testing and refining your approach over time, you can develop a customized attribution model that accurately reflects the unique dynamics of your business.
Building a multi-touch attribution model in Google Analytics 4 using BigQuery offers numerous benefits for data-driven marketers. By leveraging the power of GA4's enhanced data collection and BigQuery's robust querying capabilities, you can create a customized attribution model that provides a more accurate and comprehensive view of your marketing performance.
One of the key advantages of this approach is the level of control and flexibility it offers. By building your own attribution model, you can tailor it to your specific business needs and goals, rather than relying on the default models provided by ad networks. Additionally, by connecting your attribution model to downstream systems, such as your CRM or marketing automation platform, you can gain a more holistic view of your customer journey and make more informed decisions.
If you're looking to take your marketing analytics to the next level, consider exploring Rittman Analytics' services for data analytics and centralization. Our team of experts can help you build a customized attribution model, integrate your data sources, and provide actionable insights to drive your business forward.
By embracing multi-touch attribution and leveraging the power of GA4 and BigQuery, you can unlock valuable insights, optimize your marketing campaigns, and ultimately achieve better business outcomes.