How to configure a BigQuery Datasource

This guide will help you add a BigQuery project (or a dataset) as a Datasource. This will allow you to validate tables and queries within this project. When you use a BigQuery Datasource, the validation is done in BigQuery itself. Your data is not downloaded.

Prerequisites: This how-to guide assumes you have already:

Steps

  1. Run the following CLI command to begin the interactive Datasource creation process:

great_expectations datasource new
  1. Choose “Big Query” from the list of database engines, when prompted.

  2. Identify the connection string you would like Great Expectations to use to connect to BigQuery, using the examples below and the PyBigQuery documentation.

    If you want Great Expectations to connect to your BigQuery project (without specifying a particular dataset), the URL should be:

    bigquery://project-name
    

    If you want Great Expectations to connect to a particular dataset inside your BigQuery project, the URL should be:

    bigquery://project-name/dataset-name
    

    If you want Great Expectations to connect to one of the Google’s public datasets, the URL should be:

    bigquery://project-name/bigquery-public-data
    
  1. Enter the connection string when prompted (and press Enter when asked “Would you like to proceed? [Y/n]:”).

  2. Should you need to modify your connection string, you can manually edit the great_expectations/uncommitted/config_variables.yml file.

Additional notes

Environment variables can be used to store the SQLAlchemy URL instead of the file, if preferred - search documentation for “Managing Environment and Secrets”.

Additional resources