great_expectations.cli.datasource

Module Contents

Classes

DatasourceTypes()

Generic enumeration.

SupportedDatabases()

Generic enumeration.

Functions

datasource()

Datasource operations

datasource_new(directory)

Add a new datasource to the data context.

delete_datasource(directory, datasource)

Delete the datasource specified as an argument

datasource_list(directory)

List known datasources.

_build_datasource_intro_string(datasource_count)

datasource_profile(datasource, batch_kwargs_generator_name, data_assets, profile_all_data_assets, directory, view, additional_batch_kwargs, assume_yes)

Profile a datasource (Experimental)

add_datasource(context, choose_one_data_asset=False)

Interactive flow for adding a datasource to an existing context.

_add_pandas_datasource(context, passthrough_generator_only=True, prompt_for_datasource_name=True)

_add_sqlalchemy_datasource(context, prompt_for_datasource_name=True)

_should_hide_input()

This is a workaround to help identify Windows and adjust the prompts accordingly

_collect_postgres_credentials(default_credentials=None)

_collect_snowflake_credentials(default_credentials=None)

_collect_snowflake_credentials_user_password()

_collect_snowflake_credentials_sso()

_collect_snowflake_credentials_key_pair()

_collect_bigquery_credentials(default_credentials=None)

_collect_mysql_credentials(default_credentials=None)

_collect_redshift_credentials(default_credentials=None)

_add_spark_datasource(context, passthrough_generator_only=True, prompt_for_datasource_name=True)

select_batch_kwargs_generator(context, datasource_name, available_data_assets_dict=None)

get_batch_kwargs(context, datasource_name=None, batch_kwargs_generator_name=None, data_asset_name=None, additional_batch_kwargs=None)

This method manages the interaction with user necessary to obtain batch_kwargs for a batch of a data asset.

_get_batch_kwargs_from_generator_or_from_file_path(context, datasource_name, batch_kwargs_generator_name=None, additional_batch_kwargs=None)

_get_batch_kwargs_for_sqlalchemy_datasource(context, datasource_name, additional_batch_kwargs=None)

_verify_sqlalchemy_dependent_modules()

_verify_mysql_dependent_modules()

_verify_postgresql_dependent_modules()

_verify_redshift_dependent_modules()

_verify_snowflake_dependent_modules()

_verify_bigquery_dependent_modules()

_verify_pyspark_dependent_modules()

skip_prompt_message(skip_flag, prompt_message_text)

profile_datasource(context, datasource_name, batch_kwargs_generator_name=None, data_assets=None, profile_all_data_assets=False, max_data_assets=20, additional_batch_kwargs=None, open_docs=False, skip_prompt_flag=False)

“Profile a named datasource using the specified context

great_expectations.cli.datasource.logger
class great_expectations.cli.datasource.DatasourceTypes

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

PANDAS = pandas
SQL = sql
SPARK = spark
great_expectations.cli.datasource.MANUAL_GENERATOR_CLASSES
class great_expectations.cli.datasource.SupportedDatabases

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

MYSQL = MySQL
POSTGRES = Postgres
REDSHIFT = Redshift
SNOWFLAKE = Snowflake
BIGQUERY = BigQuery
OTHER = other - Do you have a working SQLAlchemy connection string?
great_expectations.cli.datasource.datasource()

Datasource operations

great_expectations.cli.datasource.datasource_new(directory)

Add a new datasource to the data context.

great_expectations.cli.datasource.delete_datasource(directory, datasource)

Delete the datasource specified as an argument

great_expectations.cli.datasource.datasource_list(directory)

List known datasources.

great_expectations.cli.datasource._build_datasource_intro_string(datasource_count)
great_expectations.cli.datasource.datasource_profile(datasource, batch_kwargs_generator_name, data_assets, profile_all_data_assets, directory, view, additional_batch_kwargs, assume_yes)

Profile a datasource (Experimental)

If the optional data_assets and profile_all_data_assets arguments are not specified, the profiler will check if the number of data assets in the datasource exceeds the internally defined limit. If it does, it will prompt the user to either specify the list of data assets to profile or to profile all. If the limit is not exceeded, the profiler will profile all data assets in the datasource.

great_expectations.cli.datasource.add_datasource(context, choose_one_data_asset=False)

Interactive flow for adding a datasource to an existing context.

Parameters
  • context

  • choose_one_data_asset – optional - if True, this signals the method that the intent is to let user choose just one data asset (e.g., a file) and there is no need to configure a batch kwargs generator that comprehensively scans the datasource for data assets

Returns

a tuple: datasource_name, data_source_type

great_expectations.cli.datasource._add_pandas_datasource(context, passthrough_generator_only=True, prompt_for_datasource_name=True)
great_expectations.cli.datasource._add_sqlalchemy_datasource(context, prompt_for_datasource_name=True)
great_expectations.cli.datasource._should_hide_input()

This is a workaround to help identify Windows and adjust the prompts accordingly since hidden prompts may freeze in certain Windows terminals

great_expectations.cli.datasource._collect_postgres_credentials(default_credentials=None)
great_expectations.cli.datasource._collect_snowflake_credentials(default_credentials=None)
great_expectations.cli.datasource._collect_snowflake_credentials_user_password()
great_expectations.cli.datasource._collect_snowflake_credentials_sso()
great_expectations.cli.datasource._collect_snowflake_credentials_key_pair()
great_expectations.cli.datasource._collect_bigquery_credentials(default_credentials=None)
great_expectations.cli.datasource._collect_mysql_credentials(default_credentials=None)
great_expectations.cli.datasource._collect_redshift_credentials(default_credentials=None)
great_expectations.cli.datasource._add_spark_datasource(context, passthrough_generator_only=True, prompt_for_datasource_name=True)
great_expectations.cli.datasource.select_batch_kwargs_generator(context, datasource_name, available_data_assets_dict=None)
great_expectations.cli.datasource.get_batch_kwargs(context, datasource_name=None, batch_kwargs_generator_name=None, data_asset_name=None, additional_batch_kwargs=None)

This method manages the interaction with user necessary to obtain batch_kwargs for a batch of a data asset.

In order to get batch_kwargs this method needs datasource_name, batch_kwargs_generator_name and data_asset_name to combine them into a fully qualified data asset identifier(datasource_name/batch_kwargs_generator_name/data_asset_name). All three arguments are optional. If they are present, the method uses their values. Otherwise, the method prompts user to enter them interactively. Since it is possible for any of these three components to be passed to this method as empty values and to get their values after interacting with user, this method returns these components’ values in case they changed.

If the datasource has batch_kwargs_generators that can list available data asset names, the method lets user choose a name from that list (note: if there are multiple batch_kwargs_generators, user has to choose one first). If a name known to the chosen batch_kwargs_generator is selected, the batch_kwargs_generators will be able to yield batch_kwargs. The method also gives user an alternative to selecting the data asset name from the batch_kwargs_generators’s list - user can type in a name for their data asset. In this case a passthrough batch kwargs batch_kwargs_generators will be used to construct a fully qualified data asset identifier (note: if the datasource has no passthrough batch_kwargs_generators configured, the method will exist with a failure). Since no batch_kwargs_generators can yield batch_kwargs for this data asset name, the method prompts user to specify batch_kwargs by choosing a file (if the datasource is pandas or spark) or by writing a SQL query (if the datasource points to a database).

Parameters
  • context

  • datasource_name

  • batch_kwargs_generator_name

  • data_asset_name

  • additional_batch_kwargs

Returns

a tuple: (datasource_name, batch_kwargs_generator_name, data_asset_name, batch_kwargs). The components of the tuple were passed into the methods as optional arguments, but their values might have changed after this method’s execution. If the returned batch_kwargs is None, it means that the batch_kwargs_generator will know to yield batch_kwargs when called.

great_expectations.cli.datasource._get_batch_kwargs_from_generator_or_from_file_path(context, datasource_name, batch_kwargs_generator_name=None, additional_batch_kwargs=None)
great_expectations.cli.datasource._get_batch_kwargs_for_sqlalchemy_datasource(context, datasource_name, additional_batch_kwargs=None)
great_expectations.cli.datasource._verify_sqlalchemy_dependent_modules() → bool
great_expectations.cli.datasource._verify_mysql_dependent_modules() → bool
great_expectations.cli.datasource._verify_postgresql_dependent_modules() → bool
great_expectations.cli.datasource._verify_redshift_dependent_modules() → bool
great_expectations.cli.datasource._verify_snowflake_dependent_modules() → bool
great_expectations.cli.datasource._verify_bigquery_dependent_modules() → bool
great_expectations.cli.datasource._verify_pyspark_dependent_modules() → bool
great_expectations.cli.datasource.skip_prompt_message(skip_flag, prompt_message_text) → bool
great_expectations.cli.datasource.profile_datasource(context, datasource_name, batch_kwargs_generator_name=None, data_assets=None, profile_all_data_assets=False, max_data_assets=20, additional_batch_kwargs=None, open_docs=False, skip_prompt_flag=False)

“Profile a named datasource using the specified context

great_expectations.cli.datasource.msg_prompt_choose_datasource = Configure a datasource: 1. Pandas DataFrame 2. Relational database (SQL) 3. Spark DataFrame 4. Skip datasource configuration
great_expectations.cli.datasource.msg_prompt_choose_database
great_expectations.cli.datasource.msg_prompt_filesys_enter_base_path =

Enter the path (relative or absolute) of the root directory where the data files are stored.

great_expectations.cli.datasource.msg_prompt_datasource_name =

Give your new Datasource a short name.

great_expectations.cli.datasource.msg_db_config =

Next, we will configure database credentials and store them in the {0:s} section of this config file: great_expectations/uncommitted/config_variables.yml:

great_expectations.cli.datasource.msg_unknown_data_source =
Do we not have the type of data source you want?
  • Please create a GitHub issue here so we can discuss it!

  • <blue>https://github.com/great-expectations/great_expectations/issues/new</blue>