Skip to main content
Version: 0.18.9

Connect to SQL database Data Assets

Use the information provided here to connect to Data Assets stored in SQL databases. Great Expectations (GX) uses SQLAlchemy to connect to SQL Data Assets, and most of the SQL dialects supported by SQLAlchemy are also supported by GX. For more information about the SQL dialects supported by SQLAlchemy, see Dialects.

SQL

Connect GX to a SQL database to access Data Assets.

Prerequisites

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Determine your connection string

GX supports numerous SQL Data Sources. However, most SQL dialects have their own specifications for defining a connection string. See the dialect documentation to determine the connection string for your SQL database.

Some examples of different connection strings:

The following are some of the connection strings that are available for different SQL dialects:

  • AWS Athena: awsathena+rest://@athena.<REGION>.amazonaws.com/<DATABASE>?s3_staging_dir=<S3_PATH>
  • BigQuery: bigquery://<GCP_PROJECT>/<BIGQUERY_DATASET>?credentials_path=/path/to/your/credentials.json
  • MSSQL: mssql+pyodbc://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>?driver=<DRIVER>&charset=utf&autocommit=true
  • MySQL: mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
  • PostgreSQL: postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
  • Redshift: postgresql+psycopg2://<USER_NAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>?sslmode=<SSLMODE>
  • Snowflake: snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss
  • SQLite: sqlite:///<PATH_TO_DB_FILE>
  • Trino: trino://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<CATALOG>/<SCHEMA>

Run one of the connection strings in your preferred SQL dialect to store the connection string in the connection_string variable with plain text credentials. The following code is an example of the PostgreSQL connection string format:

Python
connection_string = "postgresql+psycopg2://username:my_password@localhost/test"
Is there a more secure way to include my credentials?

You can use environment variables or a key in config_variables.yml to store connection string passwords. After you define your password, you reference it in your connection string similar to this example:

Python
connection_string = (
"postgresql+psycopg2://<username>:${MY_PASSWORD}@<host>:<port>/<database>"
)

In the previous example MY_PASSWORD is the name of the environment variable, or the key to the value in config_variables.yml that corresponds to your password.

If you include a password as plain text in your connection string when you define your Data Source, GX automatically removes it, adds it to config_variables.yml, and substitutes it in the Data Source saved configuration with a variable.

Create a SQL Data Source

Run the following Python code to create a SQL Data Source:

Python
datasource = context.sources.add_sql(
name="my_datasource", connection_string=connection_string
)

Connect to the data in a table (Optional)

  1. Run the following Python code to set the asset_name and asset_table_name variables:

    Python
    asset_name = "my_asset"
    asset_table_name = my_table_name
  2. Run the following Python code to create the Data Asset:

    Python
    table_asset = datasource.add_table_asset(name=asset_name, table_name=asset_table_name)

Connect to the data in a query (Optional)

  1. Run the following Python code to define a Query Data Asset:

    Python
    asset_name = "my_query_asset"
    query = "SELECT * from yellow_tripdata_sample_2019_01"
  2. Run the following Python code to create the Data Asset:

    Python
    query_asset = datasource.add_query_asset(name=asset_name, query=query)

Add additional tables or queries (Optional)

Repeat the previous steps to add additional Data Assets.