How to prepare data for overlap analysis with Venn diagram inside Google Data Studio
I always prefer preparing data outside Data Studio and I always prefer using builtin chart types when it is possible.
But sometimes, we need something fast, more or less disposable, addressing temporarily visualization needs, to get a first level of insights.
This article describes how to create quickly an overlap report, loading raw data from several data sets, preparing them to respect the required format of Venn diagram input and display them.
1. Concepts and requirements about overlap & Venn diagram
Overlap and Venn diagram considerations
Overlap report cannot be made with aggregated metrics. It requires an identifier, common to each data set, to allow to scan if every value is present (or not) in each data set.
On the other hand, to display disks and related intersections, the viz engine expects a metric value of each data set (disks), and in addition, metric value of each intersection. In other wording, preparing data, we must create also redundancy inside output data set for intersections purposes.
Queries overlap of organic results & Google Ads impressions
For this tutorial, I decide to use Google Search Console (GSC – site data) & Google Ads data of a website to quantify the number terms with impressions:
- Specific to organic results
- Specific to Ads
- Common to both elements, present on the search engine result page (SERP) of Google…so the intersection of Ads disk and GSC disk
I keep it simple using just 2 data sources, but the way it is designed allows to include more sets of data (see the 2nd example at the end of the article).
So regarding what we said about Venn diagram considerations, in our example:
- Our identifier is Search Term (And query from GSC)
- Our metric is the count of Search Terms and queries
Of courses, this is just an example and the recipe can be applied to a lot of use cases requiring overlap visualization for clusters, audiences, segments…
2. The result
3. Ingredients for the overlap report
First, let’s discover which elements we will build through this Google Data Studio tutorial:
- Your 2 data sets from 2 segments or data sources. For the example :
- Impressions from Google Ads account broken down by Search term
- Impressions from Google Search Console, for site data, broken down by Query
- Blended data tables with mentioned data to join common identifiers and also identify the ones specific to each data set
- A custom visualization chart to draw Venn diagram table chart
- Chart calculated fields, to calculate required dimension and metric, specific to Venn diagram data need
4. Blended data
Them, we blend data and name them “BD – Ads ans organic impressions by query” as described below for our example:
- Left table – GSC site data
- Table name: “GSC”
- Dimension: “Query”
- Metric: “Impressions”
- Right table – Google Ads data
- Table name: “Ads”
- Dimension: “Search term”
- Metric: “Impressions”
- Join configuration
- Operator: Full outer join
- join keys: Query for GSC table and Search term for Ads table
5. Custom visualization for Venn diagram
Venn diagram is an efficient visualization to depict overlap. For our tutorial, I propose to test a free community visualization generously cooked by First but you can build your own custom visualization & maintain it, taking advantage of visualization libraries like D3 or Chart.js.
The more important is the graphic rendering of Venn diagram, easy & fast to understand.
Then, you must to allow the execution of custom visualization ressources for the active report and turn on community visualization access on your data sources.
But first, If you add a partner visualization for the first time, here, how to do it.
6. Chart configuration
Let’s configure the Venn diagram:
- Char type: Our custom viz Venn diagram, freshly added on report page thanks to the previous step
- Setup tab
- Data source:
- Your blended data “BD – Ads ans organic impressions by query”
- Dimension:
- A chart calculated field named “Zone” corresponding to formula below
- Metric:
- A chart calculated field named “Queries” corresponding to formula below
- Select Count (CT) aggregation instead of Count distinct
(theoretically, we don’t need count distinct for these data sets & it is more resource consuming)
- Sort:
- Queries (metric) descending
(I’ve noticed bugs or difficulty to draw the viz if the sorting was different)
- Queries (metric) descending
- Data source:
- Style tab
- Select the wished color for each disk
- Hide text and put logos, or just show dimensions and / or metric values
Zone
REGEXP_EXTRACT(
CONCAT(
IF(Query IS NOT NULL,",GSC","")
,
IF(Search term IS NOT NULL,",Google Ads","")
)
,"^.(.*)$")
Queries
COALESCE(Query,Search term)
7. Configuration review
As usual, just to be sure for having configured correctly the chart, let’s make a quick review.
8. Other example with Google Analytics
As already said, this tutorial can be applied for various use cases, including more data sets, the limit is just related to the number of tables & about data blending to join tables.
Here, another example with Google Analytics data, It is made thanks to a website measurement plan, collecting account identifiers for each interaction when the user is logged (user id). For this case, overlap chart with Venn diagram provides a picture of cross device usages with 3 disks, and data blending is a little bit different (same data source filtered with a star data model for table joins purposes).