Profile Module

class great_expectations.profile.base.ProfilerDataType

Bases: enum.Enum

An enumeration.

INT = 'int'
FLOAT = 'float'
STRING = 'string'
BOOLEAN = 'boolean'
DATETIME = 'datetime'
UNKNOWN = 'unknown'
class great_expectations.profile.base.ProfilerCardinality

Bases: enum.Enum

An enumeration.

NONE = 'none'
ONE = 'one'
TWO = 'two'
FEW = 'few'
VERY_FEW = 'very few'
MANY = 'many'
VERY_MANY = 'very many'
UNIQUE = 'unique'
class great_expectations.profile.base.DataAssetProfiler

Bases: object

classmethod validate(data_asset)
class great_expectations.profile.base.DatasetProfiler

Bases: great_expectations.profile.base.DataAssetProfiler

classmethod validate(dataset)
classmethod add_expectation_meta(expectation)
classmethod add_meta(expectation_suite, batch_kwargs=None)
classmethod profile(data_asset, run_id=None, profiler_configuration=None, run_name=None, run_time=None)
class great_expectations.profile.basic_dataset_profiler.BasicDatasetProfilerBase

Bases: great_expectations.profile.base.DatasetProfiler

BasicDatasetProfilerBase provides basic logic of inferring the type and the cardinality of columns that is used by the dataset profiler classes that extend this class.

INT_TYPE_NAMES = {'BIGINT', 'BYTEINT', 'DECIMAL', 'INT', 'INTEGER', 'IntegerType', 'LongType', 'SMALLINT', 'TINYINT', 'int'}
FLOAT_TYPE_NAMES = {'DOUBLE_PRECISION', 'DoubleType', 'FLOAT', 'FLOAT4', 'FLOAT8', 'FloatType', 'NUMERIC', 'float'}
STRING_TYPE_NAMES = {'CHAR', 'STRING', 'StringType', 'TEXT', 'VARCHAR', 'str', 'string'}
BOOLEAN_TYPE_NAMES = {'BOOL', 'BOOLEAN', 'BooleanType', 'bool'}
DATETIME_TYPE_NAMES = {'DATE', 'DATETIME', 'DateType', 'TIME', 'TIMESTAMP', 'Timestamp', 'TimestampType', 'datetime64'}
class great_expectations.profile.basic_dataset_profiler.BasicDatasetProfiler

Bases: great_expectations.profile.basic_dataset_profiler.BasicDatasetProfilerBase

BasicDatasetProfiler is inspired by the beloved pandas_profiling project.

The profiler examines a batch of data and creates a report that answers the basic questions most data practitioners would ask about a dataset during exploratory data analysis. The profiler reports how unique the values in the column are, as well as the percentage of empty values in it. Based on the column’s type it provides a description of the column by computing a number of statistics, such as min, max, mean and median, for numeric columns, and distribution of values, when appropriate.

class great_expectations.profile.basic_suite_builder_profiler.BasicSuiteBuilderProfiler

Bases: great_expectations.profile.basic_dataset_profiler.BasicDatasetProfilerBase

This profiler helps build coarse expectations for columns you care about.

The goal of this profiler is to expedite the process of authoring an expectation suite by building possibly relevant expections for columns that you care about. You can then easily edit the suite and adjust or delete these expectations to hone your new suite.

Ranges of acceptable values in the expectations created by this profiler (for example, the min/max of the value in expect_column_values_to_be_between) are created only to demonstrate the functionality and should not be taken as the actual ranges. You should definitely edit this coarse suite.

Configuration is optional, and if not provided, this profiler will create expectations for all columns.

Configuration is a dictionary with a columns key containing a list of the column names you want coarse expectations created for. This dictionary can also contain a excluded_expectations key with a list of expectation names you do not want created or a included_expectations key with a list of expectation names you want created (if applicable).

For example, if you had a wide patients table and you want expectations on three columns, you’d do this:

suite, validation_result = BasicSuiteBuilderProfiler().profile(

dataset, {“columns”: [“id”, “username”, “address”]}

)

For example, if you had a wide patients table and you want expectations on all columns, excluding three statistical expectations, you’d do this:

suite, validation_result = BasicSuiteBuilderProfiler().profile(

dataset, {

“excluded_expectations”: [

“expect_column_mean_to_be_between”, “expect_column_median_to_be_between”, “expect_column_quantile_values_to_be_between”,

],

}

)

For example, if you had a wide patients table and you want only two types of expectations on all applicable columns you’d do this:

suite, validation_result = BasicSuiteBuilderProfiler().profile(

dataset, {

“included_expectations”: [

“expect_column_to_not_be_null”, “expect_column_values_to_be_in_set”,

],

}

)

It can also be used to generate an expectation suite that contains one instance of every interesting expectation type.

When used in this “demo” mode, the suite is intended to demonstrate of the expressive power of expectations and provide a service similar to the one expectations glossary documentation page, but on a users’ own data.

suite, validation_result = BasicSuiteBuilderProfiler().profile(dataset, configuration=”demo”)

class great_expectations.profile.columns_exist.ColumnsExistProfiler

Bases: great_expectations.profile.base.DatasetProfiler