Generator Module

class great_expectations.datasource.generator.batch_generator.BatchGenerator(name, type_, datasource=None)

Generators produce identifying information, called “batch_kwargs” that datasources can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, downsampling, or other techniques appropriate for the datasource.

For example, a generator could produce a SQL query that logically represents “rows in the Events table with a timestamp on February 7, 2012,” which a SqlAlchemyDatasource could use to materialize a SqlAlchemyDataset corresponding to that batch of data and ready for validation.

A batch is a sample from a data asset, sliced according to a particular rule. For example, an hourly slide of the Events table or “most recent users records.”

A Batch is the primary unit of validation in the Great Expectations DataContext. Batches include metadata that identifies how they were constructed–the same “batch_kwargs” assembled by the generator, While not every datasource will enable re-fetching a specific batch of data, GE can store snapshots of batches or store metadata from an external data version control system.

get_available_data_asset_names()
get_config()
reset_iterator(data_asset_name)
get_iterator(data_asset_name)
yield_batch_kwargs(data_asset_name)

great_expectations.datasource.generator.in_memory_generator.InMemoryGenerator

class great_expectations.datasource.generator.in_memory_generator.InMemoryGenerator(name='default', datasource=None)

Bases: great_expectations.datasource.generator.batch_generator.BatchGenerator

A basic generator that simply captures an existing object.

get_available_data_asset_names()

great_expectations.datasource.generator.query_generator.QueryGenerator

class great_expectations.datasource.generator.query_generator.QueryGenerator(datasource, name='default')

Bases: great_expectations.datasource.generator.batch_generator.BatchGenerator

Produce query-style batch_kwargs from sql files stored on disk

add_query(data_asset_name, query)
get_available_data_asset_names()

great_expectations.datasource.generator.filesystem_path_generator.SubdirReaderGenerator

class great_expectations.datasource.generator.filesystem_path_generator.SubdirReaderGenerator(name='default', datasource=None, base_directory='/data', reader_options=None)

Bases: great_expectations.datasource.generator.batch_generator.BatchGenerator

The SubdirReaderGenerator inspects a filesytem and produces batch_kwargs with a path and timestamp.

SubdirReaderGenerator recognizes generator_asset using two criteria:
  • for files directly in ‘base_directory’ with recognized extensions (.csv, .tsv, .parquet, .xls, .xlsx, .json), it uses the name of the file without the extension

  • for other files or directories in ‘base_directory’, is uses the file or directory name

SubdirReaderGenerator sees all files inside a directory of base_directory as batches of one datasource.

SubdirReaderGenerator can also include configured reader_options which will be added to batch_kwargs generated by this generator.

property reader_options
property base_directory
get_available_data_asset_names()

great_expectations.datasource.generator.filesystem_path_generator.GlobReaderGenerator

class great_expectations.datasource.generator.filesystem_path_generator.GlobReaderGenerator(name='default', datasource=None, base_directory='/data', reader_options=None, asset_globs=None)

Bases: great_expectations.datasource.generator.batch_generator.BatchGenerator

property reader_options
property asset_globs
property base_directory
get_available_data_asset_names()

great_expectations.datasource.generator.databricks_generator.DatabricksTableGenerator

class great_expectations.datasource.generator.databricks_generator.DatabricksTableGenerator(name='default', datasource=None, database='default')

Bases: great_expectations.datasource.generator.batch_generator.BatchGenerator

Meant to be used in a Databricks notebook

get_available_data_asset_names()