Skip to main content
Version: 1.7.0

Manage Data Sources

A Data Source is an object that tells GX Cloud how to connect to a specific location of data and provides an entry point for organizing that data into Data Assets, which can be validated. Visit the compatibility reference for a full list of supported Data Sources. Contact us to request support for additional sources.

Data Source limitations

To connect to the following data locations, you must use the GX Cloud API. These are not available for connection in the GX Cloud UI:

All of these Data Sources have the following limitations, regardless of your GX Cloud deployment pattern:

Azure Blob Storage, BigQuery, Google Cloud Storage, Pandas, and Spark have the following additional limitations:

  • Data Asset metrics are not supported.
  • You cannot define a batch in the UI. You can use the GX Cloud API to create a Batch Definition.
  • When you add an Expectation, you cannot generate Expectations for Anomaly Detection. You can manually configure Anomaly Detection by adding Expectations with Dynamic Parameters or forecasted ranges.
  • Ad hoc Validations cannot be triggered through the GX Cloud UI. Use the UI to generate a Validation code snippet that you can use to run an ad hoc Validation through the GX Cloud API.
  • Recurring Validations cannot be scheduled in GX Cloud. Use an orchestrator to run recurring Validations.

Edit Data Source settings

You can use the GX Cloud UI to edit settings for Databricks SQL, PostgreSQL, Redshift, and Snowflake Data Sources. You can use the GX Cloud API to edit settings for any Data Source.

When editing a Data Source in the GX Cloud UI, you can change the name and connection details.

Prerequisites

Procedure

  1. In GX Cloud, select the relevant Workspace and then click Data Assets.

  2. Click Manage Data Sources.

  3. Click Edit Data Source for the Data Source you want to edit.

  4. Edit the configuration as needed. Available fields vary by source type. For details, refer to the instructions for connecting GX Cloud to your source type.

  5. Click Save.

Data Source credential management

Options for managing credentials depend on whether you are connecting a Data Source in the GX Cloud UI or through the GX Cloud API.

Depending on your deployment pattern, you have the following options for managing credentials for Data Sources connected through the GX Cloud UI.

  • Direct input is supported for all GX Cloud deployment patterns. You can input credentials directly into the GX Cloud UI. These credentials are stored in GX Cloud and securely encrypted at rest and in transit.

  • Environment variable substitution is supported for agent-enabled and read-only deployments. To enhance security, you can use environment variables to manage sensitive connection parameters or strings. For example, instead of directly including your database password in configuration settings, you can use a variable reference like ${MY_DATABASE_PASSWORD}. When using environment variable substitution, your credentials are not stored or transmitted to GX Cloud.

To use environment variable substitution, do the following:

  1. Inject the variable into your GX Agent container or environment.

    When running the GX Agent Docker container, include the environment variable in the command. For example:

    Terminal input
    docker run -it -e MY_DATABASE_PASSWORD=<YOUR_DATABASE_PASSWORD> -e GX_CLOUD_ACCESS_TOKEN=<YOUR_ACCESS_TOKEN> -e GX_CLOUD_ORGANIZATION_ID=<YOUR_ORGANIZATION_ID> greatexpectations/agent:stable

    When running the GX Agent in another container-based service, including Kubernetes, ECS, ACI, and GCE, use the service's instructions to set and provide environment variables to the running container.

    When using environment variable substitution in a read-only deployment, set the environment variable in the environment where the GX Cloud API Python client is running.

  2. In the Data Source setup form in the GX Cloud UI, enter the name of your environment variable, enclosed in ${}. For example, ${MY_DATABASE_PASSWORD}.