Digital Analytics
May 13, 2024

Unlocking Insights with Multi-Touch Attribution in Google Analytics 4 using BigQuery

Learn how to build a powerful multi-touch attribution model in GA4 using Google BigQuery. Uncover the true value of each marketing channel.

Unlocking Insights with Multi-Touch Attribution in Google Analytics 4 using BigQuery

Unlock the Power of Multi-Touch Attribution in GA4 with BigQuery

In the world of digital marketing, understanding the true value of each marketing channel is crucial for optimizing campaigns and maximizing return on investment (ROI). Multi-touch attribution is a powerful technique that allows marketers to assign credit to various touchpoints along the customer journey, providing a more comprehensive view of how different channels contribute to conversions.

While ad networks offer default attribution models, these often have limitations. They typically show the effectiveness of each channel in isolation, without considering how visitors interact with multiple channels and touchpoints before converting. This can lead to an incomplete picture of channel performance and suboptimal marketing decisions.

Enter Google Analytics 4 (GA4) and BigQuery. By combining the robust data collection capabilities of GA4 with the power of BigQuery's cloud-based data warehousing and querying, marketers can build sophisticated multi-touch attribution models that provide a more accurate and nuanced understanding of channel performance.

In this blog post, we'll explore how to unlock the power of multi-touch attribution in GA4 using BigQuery. We'll delve into the benefits of custom attribution models, walk through the process of setting up your data pipeline, and share practical examples of how to implement multi-touch attribution logic using SQL queries. By the end, you'll have the tools and knowledge needed to create your own multi-touch attribution model and uncover valuable insights to optimize your marketing efforts.

Setting Up Your Data Pipeline for Multi-Touch Attribution

To build a multi-touch attribution model in GA4 using BigQuery, the first step is to ensure that your data pipeline is set up correctly. This involves capturing website visitor interactions in BigQuery and then stitching together these interactions to create a complete picture of each user's journey.

Linking GA4 to BigQuery

GA4 offers a native integration with BigQuery, allowing you to easily export your website visitor data into a BigQuery dataset. To set this up, follow these steps:

  1. In your GA4 property, navigate to the "Admin" section.
  2. Click on the "BigQuery Linking" option under the "Property" column.
  3. Follow the prompts to create a new BigQuery project or select an existing one.
  4. Choose the dataset where you want to store your GA4 data.
  5. Enable the BigQuery integration and wait for the data to start flowing into your dataset.

Once the integration is set up, GA4 will automatically export your website visitor data to BigQuery on a daily basis. This data will include information such as page views, events, user properties, and more.

Leveraging dbt, MetricMaven, and Looker Studio

To transform and analyze your GA4 data in BigQuery, you can leverage powerful tools like dbt (data build tool), MetricMaven, and Looker Studio (formerly Google Data Studio).

  • dbt: dbt is a command-line tool that allows you to write and execute SQL transformations on your BigQuery data. With dbt, you can define custom models, perform data cleansing, and create derived tables that are optimized for analysis. You can use dbt to stitch together user interactions and create a unified view of each customer's journey.
  • MetricMaven: MetricMaven is a data integration platform that enables you to easily move data from various sources into BigQuery. If you have additional data sources beyond GA4, such as a CRM or an e-commerce platform, you can use MetricMaven to consolidate all your data into a single BigQuery dataset. This allows you to enrich your attribution model with more comprehensive customer data.
  • Looker Studio: Looker Studio is a powerful data visualization and reporting tool that integrates seamlessly with BigQuery. Once you have your GA4 data in BigQuery and have performed the necessary transformations using dbt, you can use Looker Studio to create interactive dashboards and reports. These dashboards can display your multi-touch attribution insights, allowing stakeholders to explore the data and gain valuable insights.

Stitching Together User Interactions with SQL

To build a multi-touch attribution model, you need to stitch together user interactions across multiple sessions and channels. This can be accomplished using SQL queries in BigQuery.

Here's a high-level overview of the steps involved:

  1. Identify unique users: Use user identifiers, such as user_pseudo_id in GA4, to identify unique users across sessions.
  2. Aggregate sessions: Group together sessions belonging to each user and order them chronologically.
  3. Identify touchpoints: Extract the relevant marketing touchpoints from each session, such as the UTM parameters (source, medium, campaign) or the referring URL.
  4. Assign attribution: Apply your desired attribution logic (e.g., first-touch, last-touch, linear) to assign credit to each touchpoint based on its position in the user's journey.
  5. Aggregate results: Aggregate the attribution results to calculate the total contribution of each channel or campaign.

By following these steps and leveraging the power of BigQuery and SQL, you can create a robust multi-touch attribution model that provides valuable insights into the performance of your marketing channels.

In the next section, we'll dive deeper into the specific SQL queries and techniques used to implement multi-touch attribution logic in BigQuery.

Implementing Multi-Touch Attribution Logic in BigQuery

Copy and paste the query below in the Google BigQuery editor and click "Format Query" in the top right.

WITH stitch AS (SELECT ga_session_id as session_id,lower(source) utm_source,lower(medium) utm_medium,lower(campaign) utm_campaign,FIRST_VALUE(event_timestamp) OVER(PARTITION BY ga_session_id ORDER BY event_timestamp) session_started_at,LAST_VALUE(event_timestamp) OVER(PARTITION BY ga_session_id ORDER BY event_timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) session_ended_at,COUNT(DISTINCT ga_session_id) OVER(PARTITION BY user_pseudo_id) AS sessions_count,user_pseudo_id,case when REGEXP_CONTAINS(event_name, 'your converion name') THEN 1 END conversion_session,CASE WHEN event_name = 'page_view' then 1 end AS page_views FROM `your-project-id.your-dataset-id.events_*`),sessions_attrib AS (SELECT session_id,user_pseudo_id,COALESCE(utm_source, '(not set)') utm_source,COALESCE(utm_medium, '(not set)') utm_medium,COALESCE(utm_campaign, '(not set)') utm_campaign,session_started_at,session_ended_at,conversion_session,SUM(page_views) pageviews,MIN(CASE WHEN conversion_session = 1 THEN session_started_at END) OVER(PARTITION BY user_pseudo_id) AS first_conversion_time FROM stitch GROUP BY 1, 2, 3, 4, 5, 6, 7, 8),conversion_sessions AS (SELECT sa.*,CASE WHEN first_conversion_time BETWEEN session_started_at AND session_ended_at THEN TRUE ELSE FALSE END AS conversion_session,CASE WHEN first_conversion_time > session_started_at THEN TRUE ELSE FALSE END AS prospect_session FROM sessions_attrib sa),data AS (SELECT ,CASE WHEN session_id IS NULL THEN 1 WHEN session_id = LAST_VALUE(session_id) OVER (PARTITION BY user_pseudo_id ORDER BY session_started_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) THEN 1 ELSE 0 END AS LAST_click_attrib_pct,CASE WHEN session_id IS NULL THEN 1 WHEN session_id = FIRST_VALUE(session_id) OVER (PARTITION BY user_pseudo_id ORDER BY session_started_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) THEN 1 ELSE 0 END AS first_click_attrib_pct,CASE WHEN session_id IS NULL THEN 1 ELSE 1/COUNT(session_id) OVER (PARTITION BY user_pseudo_id) END AS even_click_attrib_pct,CASE WHEN session_id IS NULL THEN 1 WHEN timestamp_micros(session_started_at) <= TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 7 day) THEN 1 WHEN timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 14 day) THEN .5 WHEN timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 14 day) AND timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 21 day) THEN .25 WHEN timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 21 day) AND timestamp_micros(session_started_at) > TIMESTAMP_SUB(timestamp_micros(first_conversion_time), INTERVAL 28 day) THEN .125 ELSE 0 END AS time_decay_attrib_pct FROM conversion_sessions),AggregatedPaths AS (SELECT user_pseudo_id,array_to_string(ARRAY_AGG(utm_source ORDER BY session_started_at DESC), ', ') AS utm_source_path,array_to_string(ARRAY_AGG(utm_medium ORDER BY session_started_at DESC), ', ') AS utm_medium_path FROM data GROUP BY user_pseudo_id)SELECT d.,ap.utm_source_path,ap.utm_medium_path FROM data d JOIN AggregatedPaths ap ON d.user_pseudo_id = ap.user_pseudo_id

Now that we've set up our data pipeline and have our GA4 data flowing into BigQuery, let's dive into the SQL queries and techniques used to implement multi-touch attribution logic.

The SQL query shared in the previous section is a comprehensive example of how to build a multi-touch attribution model in BigQuery. Let's break it down step by step:

1. Extracting Session Data (stitch CTE)

The first common table expression (CTE) named stitch extracts relevant session data from the GA4 BigQuery export. It selects the following fields:

  • ga_session_id as session_id
  • source, medium, and campaign (in lowercase) as utm_source, utm_medium, and utm_campaign respectively
  • FIRST_VALUE and LAST_VALUE of event_timestamp to determine the session start and end times
  • COUNT of distinct ga_session_id to calculate the total number of sessions per user
  • user_pseudo_id to identify unique users
  • conversion_session flag based on specific event names (using REGEXP_CONTAINS)
  • page_views flag based on the 'page_view' event

2. Attributing Sessions (sessions_attrib CTE)

The sessions_attrib CTE builds upon the stitch CTE and performs session-level attribution. It:

  • Selects relevant fields from the stitch CTE
  • Replaces NULL values in utm_source, utm_medium, and utm_campaign with '(not set)' using COALESCE
  • Calculates the total pageviews per session using SUM
  • Determines the first_conversion_time for each user using MIN and CASE statements

3. Identifying Prospect and Conversion sessions (conversion_sessions CTE)

The apps_sessions CTE further refines the session data by identifying prospect and application sessions based on the first_conversion_time. It:

  • Selects all fields from the sessions_attrib CTE
  • Flags application_session if the first_conversion_time falls between the session start and end times
  • Flags prospect_session if the first_conversion_time is greater than the session start time

4. Calculating Attribution Percentages (data CTE)

The data CTE is where the actual attribution logic is implemented. It calculates the attribution percentages for each session based on various attribution models:

  • LAST_click_attrib_pct: Assigns 100% credit to the last session before conversion
  • first_click_attrib_pct: Assigns 100% credit to the first session
  • even_click_attrib_pct: Distributes credit evenly among all sessions
  • time_decay_attrib_pct: Assigns credit based on the time delay between the session and conversion (using predefined time intervals and weights)

5. Aggregating User Paths (AggregatedPaths CTE)

The AggregatedPaths CTE aggregates the user paths by concatenating the utm_source and utm_medium values in the order of session start time (descending) for each user.

6. Final SELECT Statement

The final SELECT statement joins the data and AggregatedPaths CTEs based on the user_pseudo_id and selects all fields from data along with the aggregated user paths (utm_source_path and utm_medium_path).

This SQL query provides a solid foundation for building a multi-touch attribution model in BigQuery. It demonstrates how to extract relevant data from GA4, attribute sessions based on conversion events, calculate attribution percentages using different models, and aggregate user paths for analysis.

You can further customize and extend this query based on your specific attribution requirements, such as incorporating additional data sources, refining the attribution logic, or calculating monetary values associated with each conversion.

By leveraging the power of BigQuery and SQL, you can gain valuable insights into the effectiveness of your marketing channels and make data-driven decisions to optimize your campaigns.

Visualizing Your Multi-Touch Attribution Results in Looker Studio

Once you have implemented your multi-touch attribution model in BigQuery using SQL, the next step is to visualize and share the insights with stakeholders. Looker Studio (formerly Google Data Studio) is a powerful tool that allows you to create interactive dashboards and reports based on your BigQuery data.

Connecting Your Attribution Model to Looker Studio

To visualize your attribution results in Looker Studio, you need to connect your BigQuery dataset to Looker Studio and create a data source. Here's how:

  1. Open Looker Studio and click on the "+" button to create a new data source.
  2. Select "BigQuery" as the connector type and authenticate your Google account.
  3. Choose the BigQuery project, dataset, and table that contains your attribution model results.
  4. If your attribution model query is complex or involves multiple tables, you can create a SQL-derived table in Looker Studio. This allows you to write a custom SQL query to fetch the data you need. Simply click on "Custom Query" and enter your SQL query.
  5. Click "Connect" to create the data source.

Creating Reports and Visualizations

With your data source set up, you can now create reports and visualizations to showcase your attribution insights. Here are a few examples of reports you can create:

  1. Registrations by Week and Channel: Create a stacked column chart that shows the number of registrations (or conversions) broken down by week and marketing channel. This report helps you understand the weekly performance of each channel and identify trends over time.
  2. Attribution Model Comparison: Create a table or bar chart that compares the results of different attribution models side by side. For example, you can show the number of registrations attributed to each channel using first-touch, last-touch, linear, and time-decay models. This report allows stakeholders to see how different attribution methodologies impact the perceived value of each channel.
  3. Channel Performance Breakdown: Create a pie chart or table that shows the percentage contribution of each channel to overall registrations or revenue. This report provides a high-level overview of the relative performance of each channel.
  4. Campaign Performance: If you have campaign-level data, create a table that shows the performance of individual campaigns within each channel. This report helps you identify top-performing campaigns and optimize your marketing efforts accordingly.

Creating a Dashboard with Selectable Attribution Models

To make your attribution insights more interactive and accessible, you can create a Looker Studio dashboard that allows users to select different attribution models on the fly. Here's how:

  1. Create a new dashboard in Looker Studio and add your attribution reports and visualizations.
  2. Add a dropdown filter control to the dashboard and configure it to allow users to select from a list of attribution models (e.g., first-touch, last-touch, linear, time-decay).
  3. Link the dropdown filter to the relevant dimensions or metrics in your reports. For example, if you have separate columns for each attribution model in your data source, link the filter to those columns.
  4. Users can now interact with the dashboard by selecting different attribution models from the dropdown, and the reports will update accordingly.

By creating interactive dashboards in Looker Studio, you can empower stakeholders to explore attribution insights on their own and make data-driven decisions based on their preferred attribution methodology.

Remember to keep your reports and dashboards clear, concise, and visually appealing. Use appropriate charts and graphs to convey insights effectively, and provide context and explanations where necessary.

In the next section, we'll discuss how to leverage the insights from your multi-touch attribution model to optimize your marketing campaigns and drive better business outcomes.

Unlocking Insights and Optimizing Your Marketing Campaigns

Multi-touch attribution provides a more comprehensive view of channel performance by considering the entire customer journey, from the initial touchpoint to the final conversion. By assigning credit to each touchpoint based on its contribution to the conversion, you can gain a deeper understanding of how different channels and campaigns work together to drive results.

Armed with these insights, you can make data-driven decisions to optimize your marketing spend and tactics. For example, if you find that certain channels consistently drive high-quality leads that convert well, you may want to allocate more budget to those channels. Conversely, if a particular campaign or channel is not delivering the desired results, you can adjust your strategy or reallocate resources to more effective initiatives.

It's important to remember that there is no one-size-fits-all attribution model. The best model for your business will depend on factors such as your industry, customer behavior, and marketing goals. Therefore, it's essential to experiment with different attribution models and compare the insights they provide. By testing and refining your approach over time, you can develop a customized attribution model that accurately reflects the unique dynamics of your business.

Conclusion

Building a multi-touch attribution model in Google Analytics 4 using BigQuery offers numerous benefits for data-driven marketers. By leveraging the power of GA4's enhanced data collection and BigQuery's robust querying capabilities, you can create a customized attribution model that provides a more accurate and comprehensive view of your marketing performance.

One of the key advantages of this approach is the level of control and flexibility it offers. By building your own attribution model, you can tailor it to your specific business needs and goals, rather than relying on the default models provided by ad networks. Additionally, by connecting your attribution model to downstream systems, such as your CRM or marketing automation platform, you can gain a more holistic view of your customer journey and make more informed decisions.

If you're looking to take your marketing analytics to the next level, consider exploring Rittman Analytics' services for data analytics and centralization. Our team of experts can help you build a customized attribution model, integrate your data sources, and provide actionable insights to drive your business forward.

By embracing multi-touch attribution and leveraging the power of GA4 and BigQuery, you can unlock valuable insights, optimize your marketing campaigns, and ultimately achieve better business outcomes.

Want help with attribution modeling in GA4 for your business? Get in touch with us here to get started.