How to split comma-separated values to single values with Looker Studio
Let’s describe step by step, inside Looker Studio (formerly known as Google Data Studio) – how to split delimiter-separated values – as example with comma – from a text dimension to get the same related metrics, drilled down by real single values.
If your system of storage or your data collection – like Google Analytics – doesn’t allow to manage list dimension, and you don’t wish to send several tracking hits – one by separate value – to optimize the payload of your measurement device, this trick offers a solution to do it quickly & without a layer of data preparation.
But be careful, it is not perfect, just quite nice for simple & small data sets.
Here some examples of use cases, regardless of the data source & related tools:
- Blog: Article tags
- SEO: List of targeted & optimized keywords
- E-commerce: Personalized payment methods of a checkout funnel
- Attribution: Channel touches for a linear attribution model of conversions
Ingredients
For this article, I apply the method with an hit-scoped custom dimension, collecting articles tags on Google Analytics, but as usual, you can do exactly the same for any other context with a dimension collecting delimiter-separated values.
To build an example of bar chart, let’s use the following elements:
- A Google Sheets data source playing the role of list splitter
- Blended data to mix data & metrics from Google Analytics (pageviews by article tags), with Sheets data source
- A table chart with bars to visualize our data
- A calculated dimension to split items to single values
- A calculated metric to keep values only for the right rows
So, another new & simple roundabout way to use data blending, like in a previous article to customize labels for pie & donut charts.
0. Maximum number of values inside the list dimension
First, we need to know our data and especially the max number of values stored in our raw dimensions. It will be important for the consistency of your report in the future.
To get it on GA as example, you can use the following calculated field, applying as the metric of a scorecard on a wide period. In my example, values are delimited with comma separator. Feel free to adapt the formula with the delimiter character of your data.
For most of cases, each single value is unique as part of the limiter-separated values of the raw dimension, so this number cannot be higher than the number of distinct value.
Max number of tags
MAX(
LENGTH(
REGEXP_REPLACE(Article tags,"([^,])","")
)
)+1
Explanation of the formula:
We remove all characters except commas to count them, get the higher value with max function, and increase by one to get the max number of tags.
If you are not familiar with regular expressions, in the formula, the delimiter corresponds to the comma just after the ^.
For my tiny data set, the formula returns 4. So with a minimum of allowance, I consider that my list dimension cannot contain more that 10 values to separate.
1. Google Sheets data source
I create a simple Google Sheets with one tab & one column “index” of 10 values, from 1 to 10. Our static splitter: Inside Data Studio, we cannot generate lists except from a data source. That’s a limitation we have to deal with for this trick.
Then, we add the Google Sheets tab as a Data Studio data source.
2. Blended data
Now, we configure blended data “BD for list dimension” as described below for our example:
- Left table
- Data source: Google Sheet data source with indexes
- Dimension : Index
- Right table
- Data source: Main data source, here Google Analytics data source
- Dimensions: Article tags, Nb tags as a calculated dimension (so on blended data scope) with the formula below
- Metric: Pageviews
- Join configuration
- Operator: Cross join
- No key
Nb tags
LENGTH(REGEXP_REPLACE(Article tags,"[^,]",""))+1
Explanation of the formula: We remove all characters except commas to count them and increase by one to get the number of tags. The delimiter corresponds to the comma just after the ^.
3. Chart configuration
At last, we create the chart, here a table chart:
- Setup tab
- Data source: Your blended data “BD for list dimension”
- Dimension: A chart calculated field “Tag” with the formula below
- Metric: A chart calculated field “Page views” with the formula below and default aggregation (SUM)
- Sort by Page views (so the “New Field” metric) descending
- Style tab
- Show metric values as bars with numbers (metrics selection)
Calculated field to use as chart dimension
Tag
CASE
WHEN Nb tags >= Index AND Index = 1 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){1}'),",","")
WHEN Nb tags >= Index AND Index = 2 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){2}'),",","")
WHEN Nb tags >= Index AND Index = 3 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){3}'),",","")
WHEN Nb tags >= Index AND Index = 4 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){4}'),",","")
WHEN Nb tags >= Index AND Index = 5 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){5}'),",","")
WHEN Nb tags >= Index AND Index = 6 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){6}'),",","")
WHEN Nb tags >= Index AND Index = 7 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){7}'),",","")
WHEN Nb tags >= Index AND Index = 8 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){8}'),",","")
WHEN Nb tags >= Index AND Index = 9 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){9}'),",","")
WHEN Nb tags >= Index AND Index = 10 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){10}'),",","")
ELSE REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){1}'),",","")
END
Explanation of the formula:
Keep in mind that data set rows have been replicated 10 times thanks to the left join with Google Sheets data source.
We extract the single value from the comma-separated values only if the list contains enough values and only for the specific index.
Here in red the three occurrences of delimiter to replace for each WHEN output, if you don’t use comma character:
REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){10}’),”,“,””)
PS: Unfortunately, REGEXP_EXTRACT function accepts only literal expression (no possible dynamic injection of the index to simplify the formula)…
Calculated field to use as chart metric
Page views
CASE
WHEN Nb tags >= Index THEN Pageviews
ELSE 0
END
Explanation of the formula:
We count pageviews only if the list contains enough values. Crossing this selection, with the dimension above, duplicates the metric value for each single split value. That’s exactly what we need.
It means also that any pre calculated or pre aggregated metric needs to be rebuilt with raw metrics after the split or needs the right aggregation.
Configuration review
At last, let’s see how everything is configured inslide Looker Studio and allow us to do our nice chart, based on comma-separated values, becoming now a real list dimension. Voilà.