Skip to main content

How to instantiate a Data Context without a yml file

This guide will help you instantiate a Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. without a yml file, aka configure a Data Context in code. If you are working in an environment without easy access to a local filesystem (e.g. AWS Spark EMR, Databricks, etc.) you may wish to configure your Data Context in code, within your notebook or workflow tool (e.g. Airflow DAG node).

Prerequisites: This how-to guide assumes you have:
note

Steps​

1. Create a DataContextConfig​

The DataContextConfig holds all of the associated configuration parameters to build a Data Context. There are defaults set for you to minimize configuration in typical cases, but please note that every parameter is configurable and all defaults are overridable. Also note that DatasourceConfig also has defaults which can be overridden.

Here we will show a few examples of common configurations, using the store_backend_defaults parameter. Note that you can use the existing API without defaults by omitting that parameter, and you can override all of the parameters as shown in the last example. A parameter set in DataContextConfig will override a parameter set in store_backend_defaults if both are used.

The following store_backend_defaults are currently available:

  • S3StoreBackendDefaults
  • GCSStoreBackendDefaults
  • DatabaseStoreBackendDefaults
  • FilesystemStoreBackendDefaults

The following example shows a Data Context configuration with an SQLAlchemy DatasourceProvides a standard API for accessing and interacting with data from a wide variety of source systems. and an AWS S3 bucket for all metadata StoresA connector to store and retrieve information about metadata in Great Expectations., using default prefixes. Note that you can still substitute environment variables as in the YAML based configuration to keep sensitive credentials out of your code.

from great_expectations.data_context.types.base import DataContextConfig, DatasourceConfig, S3StoreBackendDefaults

data_context_config = DataContextConfig(
datasources={
"sql_warehouse": DatasourceConfig(
class_name="Datasource",
execution_engine={
"class_name": "SqlAlchemyExecutionEngine",
"credentials": {
"drivername": "postgresql+psycopg2",
"host": "localhost",
"port": "5432",
"username": "postgres",
"password": "postgres",
"database": "postgres",
},
},
data_connectors={
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"name": "whole_table",
},
}
)
},
store_backend_defaults=S3StoreBackendDefaults(default_bucket_name="my_default_bucket"),
)

The following example shows a Data Context configuration with a Pandas datasource and local filesystem defaults for metadata stores. Note: imports are omitted in the following examples. Note: You may add an optional root_directory parameter to set the base location for the Store Backends.

from great_expectations.data_context.types.base import DataContextConfig, DatasourceConfig, FilesystemStoreBackendDefaults

data_context_config = DataContextConfig(
datasources={
"pandas": DatasourceConfig(
class_name="Datasource",
execution_engine={
"class_name": "PandasExecutionEngine"
},
data_connectors={
"tripdata_monthly_configured": {
"class_name": "ConfiguredAssetFilesystemDataConnector",
"base_directory": "/path/to/trip_data",
"assets": {
"yellow": {
"pattern": r"yellow_tripdata_(\d{4})-(\d{2})\.csv$",
"group_names": ["year", "month"],
}
},
}
},
)
},
store_backend_defaults=FilesystemStoreBackendDefaults(root_directory="/path/to/store/location"),
)

The following example shows a Data Context configuration with an SQLAlchemy datasource and two GCS buckets for metadata Stores, using some custom and some default prefixes. Note that you can still substitute environment variables as in the YAML based configuration to keep sensitive credentials out of your code. default_bucket_name, default_project_name sets the default value for all stores that are not specified individually.

The resulting DataContextConfig from the following example creates an Expectations StoreA connector to store and retrieve information about collections of verifiable assertions about data. and Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. using the my_default_bucket and my_default_project parameters since their bucket and project is not specified explicitly. The Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. is created using the explicitly specified my_validations_bucket and my_validations_project. Further, the prefixes are set for the Expectations Store and Validation Results Store, while Data Docs use the default data_docs prefix.

data_context_config = DataContextConfig(
datasources={
"sql_warehouse": DatasourceConfig(
class_name="Datasource",
execution_engine={
"class_name": "SqlAlchemyExecutionEngine",
"credentials": {
"drivername": "postgresql+psycopg2",
"host": "localhost",
"port": "5432",
"username": "postgres",
"password": "postgres",
"database": "postgres",
},
},
data_connectors={
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"name": "whole_table",
},
}
)
},
store_backend_defaults=GCSStoreBackendDefaults(
default_bucket_name="my_default_bucket",
default_project_name="my_default_project",
validations_store_bucket_name="my_validations_bucket",
validations_store_project_name="my_validations_project",
validations_store_prefix="my_validations_store_prefix",
expectations_store_prefix="my_expectations_store_prefix",
),
)

The following example sets overrides for many of the parameters available to you when creating a DataContextConfig and a Datasource.

data_context_config = DataContextConfig(
config_version=2,
plugins_directory=None,
config_variables_file_path=None,
datasources={
"my_spark_datasource": DatasourceConfig(
class_name="Datasource",
execution_engine={
"class_name": "SparkDFExecutionEngine"
},
data_connectors={
"tripdata_monthly_configured": {
"class_name": "ConfiguredAssetFilesystemDataConnector",
"base_directory": "/path/to/trip_data",
"assets": {
"yellow": {
"pattern": r"yellow_tripdata_(\d{4})-(\d{2})\.csv$",
"group_names": ["year", "month"],
}
},
}
},
)
},
stores={
"expectations_S3_store": {
"class_name": "ExpectationsStore",
"store_backend": {
"class_name": "TupleS3StoreBackend",
"bucket": "my_expectations_store_bucket",
"prefix": "my_expectations_store_prefix",
},
},
"validations_S3_store": {
"class_name": "ValidationsStore",
"store_backend": {
"class_name": "TupleS3StoreBackend",
"bucket": "my_validations_store_bucket",
"prefix": "my_validations_store_prefix",
},
},
"evaluation_parameter_store": {"class_name": "EvaluationParameterStore"},
},
expectations_store_name="expectations_S3_store",
validations_store_name="validations_S3_store",
evaluation_parameter_store_name="evaluation_parameter_store",
data_docs_sites={
"s3_site": {
"class_name": "SiteBuilder",
"store_backend": {
"class_name": "TupleS3StoreBackend",
"bucket": "my_data_docs_bucket",
"prefix": "my_optional_data_docs_prefix",
},
"site_index_builder": {
"class_name": "DefaultSiteIndexBuilder",
"show_cta_footer": True,
},
}
},
validation_operators={
"action_list_operator": {
"class_name": "ActionListValidationOperator",
"action_list": [
{
"name": "store_validation_result",
"action": {"class_name": "StoreValidationResultAction"},
},
{
"name": "store_evaluation_params",
"action": {"class_name": "StoreEvaluationParametersAction"},
},
{
"name": "update_data_docs",
"action": {"class_name": "UpdateDataDocsAction"},
},
],
}
},
anonymous_usage_statistics={
"enabled": True
}
)

2. Pass this DataContextConfig as a project_config to BaseDataContext​

from great_expectations.data_context import BaseDataContext
context = BaseDataContext(project_config=data_context_config)

3. Use this BaseDataContext instance as your DataContext​

If you are using Airflow, you may wish to pass this Data Context to your GreatExpectationsOperator as a parameter. See the following guide for more details:

Additional resources​