Skip to main content
Version: 0.18.9

UserConfigurableProfiler

class great_expectations.profile.user_configurable_profiler.UserConfigurableProfiler(profile_dataset: Union[great_expectations.core.batch.Batch, great_expectations.dataset.dataset.Dataset, great_expectations.validator.validator.Validator], excluded_expectations: Optional[List[str]] = None, ignored_columns: Optional[List[str]] = None, not_null_only: bool = False, primary_or_compound_key: Optional[List[str]] = None, semantic_types_dict: Optional[Dict[str, List[str]]] = None, table_expectations_only: bool = False, value_set_threshold: str = 'MANY')#

Build an Expectation Suite from a dataset.

The Expectations built are strict - they can be used to determine whether two tables are the same.

Instantiate with or without a number of configuration arguments. Once instantiated, if these arguments change, a new Profiler will be needed.

Build a suite without a config as follows:

profiler = UserConfigurableProfiler(dataset)
suite = profiler.build_suite()

Use a Profiler to build a suite with a semantic types dict, as follows:

semantic_types_dict = {
"numeric": ["c_acctbal"],
"string": ["c_address","c_custkey"],
"value_set": ["c_nationkey","c_mktsegment", 'c_custkey', 'c_name', 'c_address', 'c_phone']
}
profiler = UserConfigurableProfiler(dataset, semantic_types_dict=semantic_types_dict)
suite = profiler.build_suite()

Parameters
  • profile_dataset – A Great Expectations Dataset or Validator object.

  • excluded_expectations – A list of Expectations to omit from the suite.

  • ignored_columns – A list of columns for which you would like to NOT create Expectations.

  • not_null_only – By default, each column is evaluated for nullity. If the column values contain fewer than 50% null values, then the Profiler will add expect_column_values_to_not_be_null; if greater than 50% it will add expect_column_values_to_be_null. If not_null_only is set to True, the Profiler will add a not_null Expectation irrespective of the percent nullity (and therefore will not add expect_column_values_to_be_null).

  • primary_or_compound_key – A list containing one or more columns which are a dataset's primary or compound key. This will create an expect_column_values_to_be_unique or expect_compound_columns_to_be_unique Expectation. This will occur even if one or more of the primary_or_compound_key columns are specified in ignored_columns.

  • semantic_types_dict – A dict where the keys are available semantic types (see profiler.base.ProfilerSemanticTypes) and the values are lists of columns for which you would like to create semantic-type-specific Expectations e.g.:

                            "semantic_types": { "value_set": ["state","country"], "numeric":["age", "amount_due"]}

:param table_expectations_only: If True, this will only create the two table level

Expectations available to this Profiler (expect_table_columns_to_match_ordered_list and expect_table_row_count_to_be_between). If primary_or_compound_key is specified, it will create a uniqueness Expectation for that column as well.

Parameters

value_set_threshold – Must be one of: "none", "one", "two", "very_few", "few", "many", "very_many", "unique". When the Profiler runs without a semantic_types_dict, each column is profiled for cardinality. This threshold determines the greatest cardinality for which to add expect_column_values_to_be_in_set. For example, if value_set_threshold is set to "unique", it will add a value_set Expectation for every included column. If set to "few", it will add a value_set Expectation for columns whose cardinality is one of "one", "two", "very_few", or "few". For the purposes of comparing whether two tables are identical, it might make the most sense to set this to "unique".

Raises

ValueError – If an invalid primary_or_compound_key is provided.

build_suite() great_expectations.core.expectation_suite.ExpectationSuite#

Build an Expectation Suite based on the semantic_types_dict if one is provided.

Otherwise, profile the dataset and build the suite based on the results.

Returns

An Expectation Suite.