Connect GX Cloud to BigQuery
To connect GX Cloud to data stored in BigQuery, use the GX Cloud API.
Prerequisites
- A GX Cloud account with Workspace Editor permissions or greater.
- A GCP project with a BigQuery dataset that has a table or view.
- BigQuery credentials stored securely in a
credentials.jsonfile outside of version control. - Python version 3.10 to 3.13.
- Recommended. A Python virtual environment.
Install GX Cloud
Run the following terminal command to install the GX Cloud library with support for BigQuery dependencies:
pip install 'great_expectations[bigquery]'
Get your credentials
You'll need your user access token, organization ID, and workspace ID to set your environment variables. Don't commit your access token to your version control software.
-
In GX Cloud, click Tokens.
-
In the User access tokens pane, click Create user access token.
-
In the Token name field, enter a name for the token that will help you quickly identify it.
-
Click Create.
-
Copy and then paste the user access token into a temporary file. The token can't be retrieved after you close the dialog.
-
Click Close.
-
Copy the value in the Organization ID field into the temporary file with your user access token.
-
In the Workspace ID pane, find the relevant Workspace name, then copy the associated ID into the temporary file with your other credentials and save the file.
GX recommends deleting the temporary file after you set the environment variables.
Set your credentials as environment variables
Environment variables securely store your GX Cloud credentials.
-
Save your GX Cloud credentials as environment variables by entering
export ENV_VAR_NAME=env_var_valuein the terminal or adding the command to your~/.bashrcor~/.zshrcfile. For example:Terminal inputexport GX_CLOUD_ACCESS_TOKEN=<user_access_token>
export GX_CLOUD_ORGANIZATION_ID=<organization_id>
export GX_CLOUD_WORKSPACE_ID=<workspace_id> -
Optional. If you created a temporary file to record your credentials, delete it.
Connect a BigQuery Data Source and add a Data Asset
- Instructions
- Sample code
-
Run the following Python code to create a Data Context object:
Pythonimport great_expectations as gx
context = gx.get_context(mode="cloud")The Data Context will detect the previously set environment variables and connect to your GX Cloud account.
-
Define the Data Source's parameters.
The following information is required when you create a BigQuery Data Source:
name: A descriptive name used to reference the Data Source. This should be unique within your workspace.connection_string: The connection string used to connect to the database. The format for this isbigquery://<GCP_PROJECT>/<BIGQUERY_DATASET>?credentials_path=/path/to/your/credentials.json.
Replace the variable values with your own and run the following Python code:
Pythondata_source_name = "my_bigquery_datasource"
connection_string = (
"bigquery://my_project/my_dataset?credentials_path=/my/credentials.json"
) -
Add a BigQuery Data Source to your Data Context by executing the following code:
Pythondata_source = context.data_sources.add_bigquery(
name=data_source_name, connection_string=connection_string
) -
Decide whether you want to validate the records in a single table or the records returned by a SQL query.
- To validate the records in a single table, you will create a Table Data Asset.
- To validate the records returned by a SQL query, you will create a Query Data Asset. Note that Query Data Assets have some limitations compared to Table Data Assets.
- Table Data Asset
- Query Data Asset
-
Define your Table Data Asset's parameters.
The following information is required when you create a Table Data Asset:
name: A name by which you can reference the Data Asset in the future. This should be unique within the Data Source.table_name: The name of the SQL table that the Table Data Asset will retrieve records from.
Pythondata_asset_name = "my_table_asset"
table_name = "my_table" -
Add the Data Asset to your Data Source. A new Data Asset is created and added to a Data Source simultaneously:
Pythontable_data_asset = data_source.add_table_asset(
table_name=table_name, name=data_asset_name
)
-
Define your Query Data Asset's parameters.
The following information is required when you create a Query Data Asset:
name: A name by which you can reference the Data Asset in the future. This should be unique within the Data Source.query: The SQL query that the Data Asset will retrieve records from.
Pythondata_asset_name = "my_query_asset"
query = "SELECT * FROM my_table WHERE column = 'value'" -
Add the Data Asset to your Data Source. A new Data Asset is created and added to a Data Source simultaneously:
Pythonquery_data_asset = data_source.add_query_asset(query=query, name=data_asset_name)
import great_expectations as gx
context = gx.get_context(mode="cloud")
# Add a BigQuery Data Source
data_source_name = "my_bigquery_datasource"
connection_string = (
"bigquery://my_project/my_dataset?credentials_path=/my/credentials.json"
)
data_source = context.data_sources.add_bigquery(
name=data_source_name, connection_string=connection_string
)
# Add a Table Data Asset
data_asset_name = "my_table_asset"
table_name = "my_table"
table_data_asset = data_source.add_table_asset(
table_name=table_name, name=data_asset_name
)
# Get the updated Data Source
data_source = context.data_sources.get(data_source_name)
# Add a Query Data Asset
data_asset_name = "my_query_asset"
query = "SELECT * FROM my_table WHERE column = 'value'"
query_data_asset = data_source.add_query_asset(query=query, name=data_asset_name)
Next steps
Limitations
Keep the following limitations in mind when working with BigQuery Data Sources.
- BigQuery Data Source connections cannot be edited in the GX Cloud UI. Use the GX Cloud API if you need to edit the connection.
- BigQuery Data Assets cannot be added through the GX Cloud UI. Use the GX Cloud API to add more Data Assets from your BigQuery Data Source.
- ExpectAI is not supported.
- Data Asset metrics are not supported.
- The Data Health dashboard entity filter cannot detect the Data Asset’s columns.
- Expectations for Anomaly Detection cannot be automatically generated. You can manually configure Anomaly Detection by adding Expectations with Dynamic Parameters or forecasted ranges.
- Ad hoc Validations cannot be triggered through the GX Cloud UI. Use the API to run an ad hoc Validation.
- Recurring Validations cannot be scheduled in GX Cloud. Use an orchestrator to run recurring Validations.