How to create a new Expectation Suite using suite scaffold

great_expectations suite scaffold helps you quickly create of Expectation Suites through an interactive development loop that combines Profilers and Data Docs.

Prerequisites: This how-to guide assumes you have already:

Steps

  1. Run `suite scaffold`

    great_expectations suite scaffold my_suite
    

    Much like the suite new and suite edit commands, you will be prompted interactively to choose some data from one of your datasources.

    great_expectations suite scaffold npi_distributions
    Heads up! This feature is Experimental. It may change. Please give us your feedback!
    
    Enter the path of a data file (relative or absolute, s3a:// and gs:// paths are ok too)
    : npi.csv
    

    Important

    This command generates a disposable jupyter notebook in your great_expectations/uncommitted directory. The goal is to save you time by writing boilerplate code for you.

    Because these notebooks can be generated at any time from the Expectation Suites (stored as JSON) you should consider the notebook to be a disposable artifact. You can delete it at any time.

    Once you choose a data asset, the CLI will open up a jupyter notebook.

    You can also run this command without opening the notebook by using the `` –no-jupyter`` flag and then starting up jupyter separately:

    suite scaffold npi_distributions --no-jupyter
    Heads up! This feature is Experimental. It may change. Please give us your feedback!
    
    Enter the path of a data file (relative or absolute, s3a:// and gs:// paths are ok too)
    : npi.csv
    To continue scaffolding this suite, run `jupyter notebook uncommitted/scaffold_npi_distributions.ipynb`
    
  2. Within the notebook, choose columns and Expectations to scaffold.

    Run the first cell in the notebook that loads the data. You don’t need to worry about what’s happening there.

    The next code cell in the notebook presents you with a list of all the columns found in your selected data:

    included_columns = [
        'crim',
        'zn',
        'indus',
        'chas',
        'nox',
        'rm',
        'age',
        # 'dis',
        'rad',
        # 'tax',
        'ptratio',
        # 'b',
        # 'lstat',
        # 'medv'
    ]
    

    To select which columns you want to scaffold Expectations on, simply uncomment them to include them.

    The next code cell shows the scaffold config, which contains the list of columns you included in the previous step, and the list of Expectations you want to generate (or not generate) for these columns. The list of included or excluded Expectations is a list of strings of the Expectation names, for example:

    scaffold_config = {
        "included_columns": included_columns,
        # "excluded_columns": [],
        # "included_expectations": [],
        # "excluded_expectations": [],
    }
    
  3. Generate Data Docs and review the results there

    Run the next few code cells to see the scaffolded suite in Data Docs.

    Because the scaffolder is not very smart, you will want to edit this suite to tune the parameters and make any adjustments such as removing Expectations that don’t make sense for your use case. You can iterate on included and excluded columns and Expectations to get closer to the Suite you want.

Additional notes

Important

The Suites generated by the scaffold command are not meant to be production suites - they are scaffolds to build upon.

Great Expectations will choose which expected values for Expectations might make sense for a column based on the type and cardinality of the data in each selected column.

You will definitely want to edit the Suite to fine-tune it after scaffolding.