Skip to main content

How to connect to a Snowflake database

This guide will help you connect to data in a Snowflake database. This will allow you to ValidateThe act of applying an Expectation Suite to a Batch. and explore your data.

Prerequisites: This how-to guide assumes you have:
  • Completed the Getting Started Tutorial
  • A working installation of Great Expectations
  • Have access to data in a Snowflake database

Steps​

1. Choose how to run the code in this guide​

Get an environment to run the code in this guide. Please choose an option below.

If you use the Great Expectations CLICommand Line Interface, run this command to automatically generate a pre-configured Jupyter Notebook. Then you can follow along in the YAML-based workflow below:

great_expectations datasource new

2. Install required dependencies​

First, install the necessary dependencies for Great Expectations to connect to your Snowflake database by running the following in your terminal:

pip install sqlalchemy
pip install snowflake-connector-python
pip install snowflake-sqlalchemy

3. Add credentials​

Great Expectations provides multiple methods of using credentials for accessing databases. Options include using a file not checked into source control, environment variables, and using a cloud secret manager. Please read the article How to Configure Credentials for instructions on alternatives.

For this guide we will use a connection_string like this:

snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss
note

Please note application=great_expectations_oss connection string parameter above is simply so that Snowflake can identify connection source. It has no bearing on anything else.

4. Instantiate your project's DataContext​

Import these necessary packages and modules.

from ruamel import yaml

import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest

Load your DataContext into memory using the get_context() method.

context = ge.get_context()

5. Configure your Datasource​

Put your connection string in this template:

datasource_yaml = f"""
name: my_snowflake_datasource
class_name: Datasource
execution_engine:
class_name: SqlAlchemyExecutionEngine
connection_string: snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss
data_connectors:
default_runtime_data_connector_name:
class_name: RuntimeDataConnector
batch_identifiers:
- default_identifier_name
default_inferred_data_connector_name:
class_name: InferredAssetSqlDataConnector
include_schema_name: true
"""

Run this code to test your configuration.

context.test_yaml_config(datasource_yaml)

You will see your database tables listed as Available data_asset_names in the output of test_yaml_config().

Feel free to adjust your configuration and re-run test_yaml_config as needed.

6. Save the Datasource configuration to your DataContext​

Save the configuration for your DatasourceProvides a standard API for accessing and interacting with data from a wide variety of source systems. into your DataContext by using the add_datasource() function.

context.add_datasource(**yaml.load(datasource_yaml))

7. Test your new Datasource​

Verify your new Datasource by loading data from it into a ValidatorUsed to run an Expectation Suite against data. using a BatchRequest.

Here is an example of loading data by specifying a SQL query.

batch_request = RuntimeBatchRequest(
datasource_name="my_snowflake_datasource",
data_connector_name="default_runtime_data_connector_name",
data_asset_name="default_name", # this can be anything that identifies this data
runtime_parameters={
"query": f"SELECT * from {sfSchema.lower()}.taxi_data LIMIT 10"
},
batch_identifiers={"default_identifier_name": "default_identifier"},
)

context.create_expectation_suite(
expectation_suite_name="test_suite", overwrite_existing=True
)
validator = context.get_validator(
batch_request=batch_request, expectation_suite_name="test_suite"

πŸš€πŸš€ Congratulations! πŸš€πŸš€ You successfully connected Great Expectations with your data.

Additional Notes​

To view the full scripts used in this page, see them on GitHub:

Next Steps​

Now that you've connected to your data, you'll want to work on these core skills: