How to explore schemas of BigQuery tables with Looker Studio
Today, I propose you to build a tool for BigQuery.
Google Cloud BigQuery is the best friend of Looker Studio. We will not state today all the benefits to consider BigQuery, at least, as the preparation layer of Looker Studio reports.
The more you use BigQuery for your data warehouse or just for your visualisation layer, the more you multiply/optimize/factorize your tables for transformation and visualisation purposes.
And often, facing a collection of tables, you need to know quickly (without a complicated graph with tangled relation links) where is a specific data, is it already available and computed as expected, do I need to compute&store it in a lower business layer before taking care of a new table for my visualisation need, etc.
In this kind of situation, the tool described in this article could help. I built it also because I think it is a nice communication support between data engineers, as something you use during a meeting or you change, capture and send in a mail as an element of briefing (“here are the data we need to clean and merge”).
This tool offers to display efficiently BigQuery meta data information of your tables using a specific view of INFORMATION_SCHEMA.
This time, Looker Studio will help BigQuery management, providing clear & simple visualisations of your datasets states.
Example of use case: The result
For this tutorial, I selected a BigQuery public dataset exported from Google Trends. Obviously, it is not the most relevant example because there are only 4 tables. The more you have generated tables, the more you would need this kind of tool. Transformation, reconciliation, and tables for visualisation. Modern digital analytics or BI devices are nice playgrounds.
Before jumping into the configuration steps of the tutorial, let’s see the result with the screenshot below.
Ingredients to build this tool
For this tiny tool, we will need the following components:
- A BigQuery custom query as the data source to get the meta data of your data warehouse tables
- Some controls to filter the displayed data set
- 2 table charts with all the information of your tables and the related fields, in one place
- 4 repeated groups allowing to highlight specific tables and to filter fields
1. Data source: The custom query
First, let’s create a new report and a BigQuery data source with the custom query below. In this example, “bigquery-public-data” is the project id and “google_trends” is the dataset id. Change it with the corresponding information in your context.
For our dataset, we select the table name, the column name, the field type, if it is nullable and if it is a partitioning column. This query will process around 10MB of data for each execution.
Custom query for your BigQuery data source
SELECT
table_name,
column_name as field_name,
DATA_TYPE as type,
IF(IS_NULLABLE = 'YES','Y','N') as is_nullable,
IF(IS_PARTITIONING_COLUMN = 'YES','Y','') as partitioning_column
FROM `bigquery-public-data`.google_trends.INFORMATION_SCHEMA.COLUMNS
Then, give a nice name to your freshly created data source. You can also change the 5 fields with interface-friendly names.
2. Controls to filter displayed meta data
Now, let’s start to fill our report page with some controls to filter the displayed tables information. In the example, I’ve chosen to include:
- 2 drop-down controls to filter by Table name and Fiel name
- 2 advanced filters to be able to filter by regex or any operator on the same data source fields
3. Tables & fields section
The “Tables & Fields” section is a recap of filtered meta data. A way to get an overview, check the fields common to several tables, etc. It included 2 table charts:
- A list of tables using “Table name” as the unique chart field (Dimension section of chart setup)
- A table displayed all the data source fields as dimension: Table name, Field name, Type, Is nullable, Partitioning col.
4. Tables highlights section
At last, let’s add a way to select specific tables and display it like in the schema tab of tables in BigQuery console or like in any software of data base design.
So we will create 4 repeated groups respecting the same pattern of elements:
- A drop-down control to select a table (“Table name” data source field)
- with a specific table name as default selection (setup tab)
- turning on “single-select” option (style tab)
- turning off “Allow Select all” (style tab).
Be sure to fill a default selection to be able to use this option.
- An input box to filter field names
- Search type: Regex or Contains
- Label position: Left
- A table chart with “Field name”, “Type”, “Is nullable” an “Partitioning col.” as dimension
Create of first set of elements described above, adapt the design and the element positions like in the screenshot below. Then, duplicate the set 3 times, and apply a specific color for each.
I’ve chosen to use as specific color for each table to ease the discussion during a meeting. When the good tables are selected, it is visually faster to talk about the green table than mentioning its name.
At last, group the elements of each set to isolate the filtering at a group scope. I always apply elements grouping at the end of the design. Otherwise, it is really complicated to select a specific element.
I also recommend to refresh your browser page. Grouping elements doesn’t refresh data to apply the new filtering scopes in the page.
Voilà! It’s built.
Final thoughts
It is just one example of tool usage of data from BigQuery INFORMATION_SCHEMA, inside Looker Studio UI capabilities. The opportunities or relevant use cases are infinite. We could imagine to:
- Extend this example of report with conditional formatting to non respected convention naming, required type/partitioning or any schema rules of the project…for quality assurance
- Use Looker Studio to document each table and the layers of transformation of your pipeline
- Log & follow the size of tables and trigger alerts with the Looker Studio Pro freshly released feature
- …
At last and without Looker Studio, here a nice inspiring usage of BigQuery INFORMATION_SCHEMA for Google Analytics 4 data, written by Johan van de Werken: GA4 BigQuery export schema change log
Happy meta data querying.