Skip to main content
Version: 0.18.9

Batch

class great_expectations.core.batch.Batch(data: Optional[Union[Type[great_expectations.core.batch.BatchData], Type[pandas.core.frame.DataFrame], Type[pyspark.sql.dataframe.DataFrame]]] = None, batch_request: Optional[Union[great_expectations.core.batch.BatchRequestBase, dict]] = None, batch_definition: Optional[great_expectations.core.batch.BatchDefinition] = None, batch_spec: Optional[great_expectations.core.id_dict.BatchSpec] = None, batch_markers: Optional[great_expectations.core.batch.BatchMarkers] = None, data_context=None, datasource_name=None, batch_parameters=None, batch_kwargs=None)#

A Batch is a selection of records from a Data Asset.

A Datasource produces Batch objects to interact directly with data. Creating a Batch does NOT require moving data; the Batch facilitates access to the data and maintains metadata.

-Relevant Documentation Links -
Parameters
  • data – A BatchDataType object which interacts directly with the ExecutionEngine.

  • batch_request – BatchRequest that was used to obtain the data.

  • batch_definition – Complete BatchDefinition that describes the data.

  • batch_spec – Complete BatchSpec that describes the data.

  • batch_markers – Additional metadata that may be useful to understand batch.

  • data_context – DataContext connected to the

  • Deprecated since version 0.14.0.

  • datasource_name – name of datasource used to obtain the batch

  • Deprecated since version 0.14.0.

  • batch_parameters – keyword arguments describing the batch data

  • Deprecated since version 0.14.0.

  • batch_kwargs – keyword arguments used to request a batch from a Datasource

  • Deprecated since version 0.14.0.

Returns

Batch instance created.

head(n_rows: int = 5, fetch_all: bool = False) pandas.core.frame.DataFrame#

Return the first n rows from the Batch.

This function returns the first n_rows rows. It is useful for quickly testing if your object has the data you expected.

It will always obtain data from the Datasource and return a Pandas DataFrame available locally.

Parameters
  • n_rows – the number of rows to return

  • fetch_all – whether to fetch all rows; overrides n_rows if set to True

Returns

A Pandas DataFrame

to_json_dict() dict[str, JSONValues]#

Returns a JSON-serializable dict representation of this Batch.

Returns

A JSON-serializable dict representation of this Batch.