Skip to main content
Version: 0.18.9

Batch

A Batch is a selection of records from a Data AssetA collection of records within a Data Source which is usually named based on the underlying data system and sliced to correspond to a desired specification..

A Batch provides an interface for describing specific data from any Data SourceProvides a standard API for accessing and interacting with data from a wide variety of source systems., and supports the creation of MetricsA computed attribute of data such as the mean of a column., and ValidationsThe act of applying an Expectation Suite to a Batch..

Batches are designed to be "MECE" -- mutually exclusive and collectively exhaustive partitions of Data Assets. However, in many cases the same underlying data could be present in multiple batches, for example if an analyst runs an analysis against an entire table of data each day, with only a fraction of new records being added.

Consequently, the best way to understand what "makes a Batch a Batch" is the act of attending to it. Once you have defined how a Data Source's data should be sliced (even if that is to define a single slice containing all of the data in the Data Asset), you have determined what makes those particular Batches "a Batch." The Batch is the fundamental unit that Great Expectations will validate and about which it will collect metrics.

Relationship to other objects

A Batch is generated by providing a Batch RequestProvided to a Data Source in order to create a Batch. to a Data Asset. It provides a reference to interact with the data through the Data Asset and adds metadata to precisely identify the specific data included in the Batch.

Metrics are always associated with a Batch of data. The identifier for the Batch is the primary way that Great Expectations identifies what data to use when computing a Metric and how to store that Metric.

Batches are also used by ValidatorsUsed to run an Expectation Suite against data. when they run an Expectation Suite against data.

Use Cases

When creating Expectations interactively, a ValidatorUsed to run an Expectation Suite against data. needs access to a specific Batch of data against which to check Expectations. The how to guide on interactively creating expectations covers using a Batch in this use case.

During Validation, a CheckpointThe primary means for validating data in a production deployment of Great Expectations. checks a Batch of data against Expectations from an Expectation SuiteA collection of verifiable assertions about data.. You must specify a Batch Request for the Checkpoint to run.

Consistency

A Batch is always part of a Data Asset. A Data Asset can be configured to slice its data into batches in many ways. For example, it can be based on an arbitrary field, including datetimes, from the data.

A Batch is always built using a Batch Request. See Batch Request or Connect to a Data Source.

Once a Data Asset identifies the specific data that will be included in a Batch based on the Batch Request, it creates a reference to the data and adds metadata to including the parameters used in the Batch Request.

Access

You can use the get_batch_list_from_batch_request method to access a Batch through the Data Asset. You don't typically access the Batch directly. Instead, you pass a Batch Request to a Validator or a Checkpoint.

Create

The BatchRequest object is the primary API used to construct Batches. You construct a Batch Request that corresponds to a batch via the Data Asset's method build_batch_request.

note

Instantiating a Batch does not necessarily “fetch” the data by immediately running a query or pulling data into memory. Instead, think of a Batch as a wrapper that includes the information that you will need to fetch the right data when it’s time to Validate.