Skip to main content
Version: 1.5.5

GX Cloud overview

GX Cloud is a fully managed SaaS platform that simplifies data quality management and monitoring. With GX Cloud, you and your organization can work collaboratively to define and maintain shared understanding of your data.

GX Cloud in your environment

You can integrate GX Cloud at any point in your data pipeline to manage and monitor data quality. Common integration points include but are not limited to the following:

  • Ingestion: validate raw data before writing it to your data warehouse so that you can quarantine bad records and identify bugs in your source system.

  • Transformation: check the results of transformations in your warehouse and condition pipeline steps based on validation success or failure.

  • Delivery: ensure unexpected patterns reveal business insights rather than data quality issues.

Here’s an example of where these three common integration points fit in a generic data pipeline:

Incoming data from Square, Mailchimp, and Salesforce are validated by GX Cloud before being written as raw data in a Snowflake data warehouse. Transformations are validated within the Snowflake data pipeline. Finalized data is validated before being served by BI tools such as Tableau, Power BI, and Looker.

You can also integrate GX Cloud with version control systems and with data at rest. Common workflows that validate data outside the data pipeline include:

  • CI/CD: test changes to your transformation code before merging it to production so that code changes don’t have negative downstream impacts on data.

  • Exploration: enable your stakeholders to create and run ad hoc tests to get a better understanding of the data they’re consuming.

For a full list of data sources and other tools supported by GX Cloud, visit the compatibility reference.

GX Cloud concepts

The key GX Cloud concepts described below provide a data validation vocabulary that represents your data, your data validation criteria, and the results of data validation. Within GX Cloud, these concepts are applied to define components and implement data validation workflows.

Data SourceData Source
A Data Source is the GX representation of a database or data store.
Data AssetData Asset
A Data Asset is a collection of records within a Data Source.
ExpectationExpectation
An Expectation is a declarative, verifiable assumption about your data. Expectations serve as unit tests for your data.
ValidationValidation
A Validation runs selected Expectations against a Data Asset to validate the data defined by that Data Asset.
Validation ResultValidation Result
A Validation Result captures the outcome of a Validation and related metadata that describes passing and failing data.

GX Cloud workflow

GX Cloud data validation workflows are created using GX Cloud components, entities that represent GX Cloud data validation concepts.

Standard data validation workflow

A GX Cloud data validation workflow can be implemented using the following steps:

Standard GX Cloud workflow

  1. Connect to your data.
  2. Create a Data Asset.
  3. Define Expectations.
  4. Validate your data.
  5. Review and share your Validation Results with your organization.

Additional workflow features

There are a variety of GX Cloud features that support additional enhancements to your GX Cloud data validation workflow.

GX Cloud workflow enhanced with product features

  • GX Cloud user management. GX Cloud functions as a shared portal to manage and monitor your organization's data quality. Users can be invited to your GX Cloud organization and assigned a role that governs their ability to view and edit components and workflows in GX Cloud. See Manage users and access tokens for more details.

  • Data Asset profiling. GX Cloud introspects your data schema by default on Data Asset creation, and also offers one-click fetching of additional descriptive metrics including column type and statistical summaries. Data profiling results are used to suggest parameters for Expectations that you create.

  • Automate rules for Anomaly Detection. GX Cloud can automatically generate Expectations that detect column changes, volume changes that deviate from historical patterns, and changes to the proportion of null values in each column. This option is available when you create new Data Assets or add Expectations for an existing Data Asset.

  • Personalize rules with ExpectAI (BETA). GX Cloud can generate AI-recommended Expectations for a Data Asset. These will be personalized based on an analysis of a sample of your data.

  • Generate code for custom SQL Expectations with ExpectAI (BETA). To simplify working with custom SQL Expectations, you can use ExpectAI to generate a SQL query based on a natural language prompt you provide and a data profile GX Cloud automatically provides.

  • Schedule Validations. GX Cloud enables you to schedule validations, so that you can test and assess your data on a regular cadence and monitor data quality over time. See Manage schedules for more detail.

  • Alerting. GX Cloud provides the ability to send alerts when validations fail, enabling your organization to remain proactively aware of the health of your Data Assets. See Manage alerts for more detail.

  • Monitor Data Health. GX Cloud provides metric summaries and trends to help you understand and improve test coverage and success across your business. See Data Health for more detail.

GX Cloud architecture

GX Cloud architecture comprises a frontend web UI, storage for entity configuration and metadata, a backend application, and a Python API.

You interact using the UI, API, or both. How GX Cloud connects to your data depends on your deployment pattern.

  • GX Cloud frontend web UI. Enables you to manage and validate your organization's data quality without running code and provides shared visibility into your organization's Validation Results history.

  • GX Cloud data storage. Stores the configurations for your organization's Data Sources, Data Assets, Expectations, and Validations alongside your organization's Validation Result histories and Data Asset descriptive metrics.

  • GX Cloud backend application. Contains the necessary logic and compute to connect to data and run queries. The specifics of how the GX Cloud backend connects to your data is described in Deployment patterns.

  • GX Cloud API. Enables you to interact programmatically with GX Cloud entities and workflows using Python scripts.