Google Data Studio – Dynamic cluster of dimension values for interactive statistics grouping
Of course, nothing is better than getting cleaned, organized and grouped data to build beautiful dashboards with Google Data Studio and other data viz tools.
But messy data is also our daily lives. At some point (the best reason), because we are in charge actually to clean it or to create an efficient measurement plan.
Data Studio can also help you – without coding skills – to explore and test any data, at any stage of data analysis. And data exploration can significantly be accelerated with home made tools.
As example, the capability to cluster dynamically existing data is possible with Data Studio. It will help you to identify trends and apply the better options for the future.
How many times you read a table chart, saw the high number of values (with the related long tail of descending metric values) and suspect that a cluster of values with a common denominator could be significant, driving your analysis or your way to organize your data? 🧐
Let’s build a simple & interactive report that help you to cluster your data and create soft grouping inside Data Studio!
Ingredients
For this post, I handle data from demo account of Google Analytics, using Landing Page dimension with Sessions metric, but you can do exactly the same thing with any other data set. Just take care of the consistency of the applied aggregation on the metric.
So, to build this exploration report, we will combine the following elements:
- Your main data source providing dimension values and related metrics you need to explore and cluster
- Some parameters to fill grouping rules and labels (3 groups for the example)
- A calculated field to take in account dynamic grouping inside our charts
- A main chart to see the evolution of statistics after your dynamic grouping, with the related
- A bar chart as a secondary chart to see the impact on the number of values and the long tail
The result
Like keywords, URLs is a good example of dimensions with a high cardinality of values, requiring to create groups to understand if some suspected clusters can get significant proportion of stats, changing the trends, and so the analysis.
Let’s see the result.
1. Parameters
Let’s start the configuration!
First, add the following 5 text parameters – permitting any value – to your main data source. Here their names and IDs:
- Group #1 label / group_1_label
- Group #1 rule / group_1_rule
- Group #2 label / group_2_label
- Group #2 rule / group_2_rule
- Group #3 – auto extract / group_3
2. Calculated fields
Then, we create a calculated field on our data source to use it as the main dimension of your charts, overwriting the original value of the selected dimension (here Landing page):
- For the Group #1 and #2, we check if the related parameters are not empty and if the rule corresponds to a regular expression, included in the dimension value, if it is the case, we overwrite the dimension value with the filled label
- For the Group #3, we auto extract, inside the value, what is filled as a regular expression and create one group by different extraction.
Warning! To keep simple the use of REGEXP_EXTRACT function, the related parameter should be, at least, filled with a string between parenthesis, or with a real regular expression with capturing group
Grouped landing page
CASE
WHEN Group #1 rule != "" AND Group #1 label != "" AND REGEXP_CONTAINS(Landing Page,Group #1 rule) THEN CONCAT("# Group: ",Group #1 label)
WHEN Group #2 rule != "" AND Group #2 label != "" AND REGEXP_CONTAINS(Landing Page,Group #2 rule) THEN CONCAT("# Group: ",Group #2 label)
WHEN Group #3 - auto extract != "" AND REGEXP_CONTAINS(Landing Page, Group #3 - auto extract) THEN CONCAT("# Group: ", REGEXP_EXTRACT(Landing Page,Group #3 - auto extract))
ELSE Landing Page
END
3. Settings panel of the report
Now our fields are ready, let’s build the settings panel we will use to create our clusters dynamically regarding explored data.
We need to insert in the report page our 5 parameters through input box controls, decorated with some section labels and graphic elements representing the cascade of the 3 available rule slots.
4. Charts
At last, we create 2 charts.
The first one is our main chart for exploration works, here a table chart with main metric displayed with bars:
- Setup tab
- Data source: Your main data source (here GA demo account)
- Dimension: Grouped landing page
- Metric: Sessions
(selecting in my example percent of total as comparison calculation) - Sort by Sessions descending
- Style tab
- Show row numbers (Table Body section)
- Show pagination (Table Footer section)
- Display bar and show number for the first metric
- (optional) To highlight the groups, in my case, I also applied a conditional formatting for rows with clusters with yellow background
Then, we create a small vertical bar chart to follow the trend of our initial long tail, applying progressively some clusters.
- Setup tab
- Data source: Your main data source (here GA demo account)
- Dimension: Grouped landing page
- Metric: Sessions
- Sort by Sessions descending
- Style tab
- Vertical bars
- High number of bars
- A flashy color
- Axes: Do not show axes
- Legend: Do not show legend
Configuration review
Not sure about a step described above, don’t worry, it’s time to make a full review of the configuration with the video below.
That’s all. Now, it is your turn to apply this report for your specific context. Custom adaptations are endless. Make this report yours.