Skip to main content
Version: 1.0.5

Configure Data Docs

Data Docs translate Expectations, Validation Results, and other metadata into human-readable documentation that is saved as static web pages. Automatically compiling your data documentation from your data tests in the form of Data Docs keeps your documentation current. This guide covers how to configure additional locations where Data Docs should be created.

Prerequisites:

To host Data Docs in an environment other than a local or networked filesystem, you will also need to install the appropriate dependencies and configure access credentials accordingly:

Procedure

  1. Define a configuration dictionary for your new Data Docs site.

    The main component that requires customization in a Data Docs site configuration is its store_backend. The store_backend is a dictionary that tells GX where the Data Docs site will be hosted and how to access that location when the site is updated.

    The specifics of the store_backend will depend on the environment in which the Data Docs will be created. GX Core supports generation of Data Docs in local or networked filesystems, Amazon S3, Google Cloud Service, and Azure Blob Storage.

    To create a Data Docs site configuration, select one of the following environments and follow the corresponding instructions.

    A local or networked filesystem Data Doc site requires the following store_backend information:

    • class_name: The name of the class to implement for accessing the target environment. For a local or networked filesystem this will be TupleFilesystemStoreBackend.
    • base_directory: A path to the folder where the static sites should be created. This can be an absolute path, or a path relative to the root folder of the Data Context.

    To define a Data Docs site configuration for a local or networked filesystem environment, update the value of base_directory in the following code and execute it:

    Python
    base_directory = "uncommitted/data_docs/local_site/"  # this is the default path (relative to the root folder of the Data Context) but can be changed as required
    site_config = {
    "class_name": "SiteBuilder",
    "site_index_builder": {"class_name": "DefaultSiteIndexBuilder"},
    "store_backend": {
    "class_name": "TupleFilesystemStoreBackend",
    "base_directory": base_directory,
    },
    }
  2. Add your configuration to your Data Context.

    All Data Docs sites have a unique name within a Data Context. Once your Data Docs site configuration has been defined, add it to the Data Context by updating the value of site_name in the following to something more descriptive and then execute the code::

    Python
    site_name = "my_data_docs_site"
    context.add_data_docs_site(site_name=site_name, site_config=site_config)
  3. Optional. Build your Data Docs sites manually.

    You can manually build a Data Docs site by executing the following code:

    Python
    context.build_data_docs(site_names=site_name)
  4. Optional. Automate Data Docs site updates with Checkpoint Actions.

    You can automate the creation and update of Data Docs sites by including the UpdateDataDocsAction in your Checkpoints. This Action will automatically trigger a Data Docs site build whenever the Checkpoint it is included in completes its run() method.

    Python
    checkpoint_name = "my_checkpoint"
    validation_definition_name = "my_validation_definition"
    validation_definition = context.validation_definitions.get(validation_definition_name)
    actions = [
    gx.checkpoint.actions.UpdateDataDocsAction(
    name="update_my_site", site_names=[site_name]
    )
    ]
    checkpoint = context.checkpoints.add(
    gx.Checkpoint(
    name=checkpoint_name,
    validation_definitions=[validation_definition],
    actions=actions,
    )
    )

    result = checkpoint.run()
  5. Optional. View your Data Docs.

    Once your Data Docs have been created, you can view them with:

    Python
    context.open_data_docs()