Manage Data Assets
A Data Asset is a collection of records that you create when you connect to your Data Source. When you connect to your Data Source, you define a minimum of one Data Asset. You use these Data Assets to create the Batch Requests that select the data that is provided to your Expectations.
Create a Data Asset
Create a Data Asset to define the data you want GX Cloud to access. To connect to Data Assets for a Data Source not currently available in GX Cloud, see Connect to data in the GX Core documentation.
- Snowflake
- PostgreSQL
Define the data you want GX Cloud to access within Snowflake.
Prerequisites
-
You have a GX Cloud account.
-
You have a Snowflake database, schema, and table.
-
You have a Snowflake account with USAGE privileges on the table, database, and schema you are validating, and you have SELECT privileges on the table you are validating. To improve data security, GX recommends using a separate Snowflake user service account to connect to GX Cloud.
-
You know your Snowflake password.
Connect to a Snowflake Data Asset
-
In GX Cloud, click Data Assets > New Data Asset.
-
Click the New Data Source tab and then select Snowflake.
-
Select one of the following options:
-
If you're connecting to an org-hosted Snowflake Data Asset for the first time, copy the code and see Connect GX Cloud to Snowflake.
-
If you're testing GX Cloud features and functionality in a self-hosted environment, click I have created a GX Cloud user with valid permissions and then click Continue.
-
-
Enter a meaningful name for the Data Asset in the Data Source name field.
-
Optional. To use a connection string to connect to a Data Source, click the Use connection string selector, enter a connection string, and then move to step 6. The connection string format is:
snowflake://<user_login_name>:<password>@<accountname>
. -
Complete the following fields:
-
Account identifier: Enter your Snowflake organization and account name separated by a hyphen (
oraganizationname-accountname
) or your account name and a legacy account locator separated by a period (accountname.region
). The legacy account locator value must include the geographical region. For example,us-east-1
.To locate your Snowflake organization name, account name, or legacy account locator values see Finding the Organization and Account Name for an Account or Using an Account Locator as an Identifier.
-
Username: Enter the username you use to access Snowflake.
-
Password: Enter a Snowflake password. To improve data security, GX recommends using a Snowflake service account to connect to GX Cloud.
-
Database: Enter the name of the Snowflake database where the data you want to validate is stored. In Snowsight, click Data > Databases. In the Snowflake Classic Console, click Databases.
-
Schema: Enter the name of the Snowflake schema (table) where the data you want to validate is stored.
-
Warehouse: Enter the name of your Snowflake database warehouse. In Snowsight, click Admin > Warehouses. In the Snowflake Classic Console, click Warehouses.
-
Role: Enter your Snowflake role.
-
-
Click Connect.
-
Select tables to import as Data Assets:
-
Check the box next to a table name to add that table as an asset.
-
At least one table must be added.
-
To search for a specific table type the table's name in the Search box above the list of tables.
-
To add all of the available tables check the box for All Tables.
-
-
Click Add Asset.
-
Create an Expectation. See Create an Expectation.
Define the data you want GX Cloud to access within PostgreSQL.
Prerequisites
-
You have a GX Cloud account.
-
You have a PostgreSQL database, schema, and table.
-
You have a PostgreSQL instance. To improve data security, GX recommends using a separate user service account to connect to GX Cloud.
-
You know your PostgreSQL access credentials.
Connect to a PostgreSQL Data Asset
-
In GX Cloud, click Data Assets > New Data Asset.
-
Click the New Data Source tab and then select PostgreSQL.
-
Select one of the following options:
-
If you're connecting to an org-hosted PostgreSQL Data Asset for the first time, copy the code and see Connect GX Cloud to PostgreSQL.
-
If you're testing GX Cloud features and functionality in a self-hosted environment, click I have created a GX Cloud user with valid permissions and then click Continue.
-
-
Enter a meaningful name for the Data Asset in the Data Source name field.
-
Enter a connection string in the Connection string field. The connection string format is
postgresql+psycopg2//YourUserName:YourPassword@YourHostname:5432/YourDatabaseName
. -
Click Connect.
-
Select tables to import as Data Assets:
-
Check the box next to a table name to add that table as an asset.
-
At least one table must be added.
-
To search for a specific table type the table's name in the Search box above the list of tables.
-
To add all of the available tables check the box for All Tables.
-
-
Click Add Asset.
-
Create an Expectation. See Create an Expectation.
View Data Asset metrics
Data Asset metrics provide you with insight into the data you can use for your data validations.
-
In GX Cloud, click Data Assets and then select a Data Asset in the Data Assets list.
-
Click the Overview tab.
When you select a new Data Asset, schema data is automatically fetched.
-
Optional. Select one of the following options:
-
Click Profile Data if you have not previously returned all available metrics for a Data Asset.
-
Click Refresh to refresh the Data Asset metrics.
-
Available Data Asset metrics
The following table lists the available Data Asset metrics.
Column | Description |
---|---|
Row Count | The number of rows within a Data Asset. |
Column | A column within your Data Asset. |
Type | The data storage type in the Data Asset column. |
Min | For numeric columns the lowest value in the column. |
Max | For numeric columns, the highest value in the column. |
Mean | For numeric columns, the average value with the column. This is determined by dividing the sum of all values in the Data Asset by the number of values. |
Median | For numeric columns, the value in the middle of a data set. 50% of the data within the Data Asset has a value smaller or equal to the median, and 50% of the data within the Data Asset has a value that is higher or equal to the median. |
Null % | The percentage of missing values in a column. |
Add an Expectation to a Data Asset column
When you create an Expectation after fetching metrics for a Data Asset, the column names and some values are autopopulated for you and this can simplify the creation of new Expectations. Data Asset Metrics can also help you determine what Expectations might be useful and how they should be configured. When you create new Expectations after fetching Data Asset Metrics, you can add them to an existing Expectation Suite, or you can create a new Expectation Suite and add the Expectations to it.
-
In GX Cloud, click Data Assets and then select a Data Asset in the Data Assets list.
-
Click the Overview tab.
When you select a new Data Asset, schema data is automatically fetched.
-
Optional. Select one of the following options:
-
Click Profile Data if you have not previously returned all available metrics for a Data Asset.
-
Click Refresh to refresh the Data Asset metrics.
-
-
Click New Expectation.
-
Select one of the following options:
-
To add an Expectation to a new Expectation Suite, click the New Suite tab and then enter a name for the new Expectation Suite.
-
To add an Expectation to an existing Expectation Suite, click the Existing Suite tab and then select an existing Expectation Suite.
-
-
Select an Expectation type. See Available Expectation types.
-
Complete the fields in the Create Expectation pane.
-
Click Save to add the Expectation, or click Save & Add More to add additional Expectations.
Add a Data Asset to an Existing Data Source
Additional Data Assets can only be added to Data Sources created in GX Cloud.
-
In GX Cloud, click Data Assets and then select New Data Asset.
-
Click the Existing Data Source tab and then select a Data Source.
-
Click Add Data Asset.
-
Select tables to import as Data Assets:
-
Check the box next to a table name to add that table as an asset.
-
At least one table must be added.
-
To search for a specific table type the table's name in the Search box above the list of tables.
-
To add all of the available tables check the box for All Tables.
-
-
Click Add Asset.
Edit Data Source settings
Edit Data Source settings to update Data Source connection information or access credentials. You can only edit the settings of Data Sources created in GX Cloud.
- Snowflake
- PostgreSQL
-
In GX Cloud, click Data Assets.
-
Click Manage Data Sources.
-
Click Edit Data Source for the Snowflake Data Source you want to edit.
-
Optional. Edit the Data Source name.
-
Optional. If you used a connection string to connect to the Data Source, click the Connection string tab and edit the Data Source connection string.
-
Optional. If you're not using a connection string, edit the following fields:
-
Account identifier: Enter new Snowflake account or locator information. The locator value must include the geographical region. For example,
us-east-1
. To locate these values see Account Identifiers. -
Username: Enter a new Snowflake username.
-
Password: Enter the password for the Snowflake user you're connecting to GX Cloud. To improve data security, GX recommends using a Snowflake service account to connect to GX Cloud.
-
Database: Enter a new Snowflake database name.
-
Schema: Enter a new schema name.
-
Warehouse: Enter a new Snowflake database warehouse name.
-
Role: Enter a new Snowflake role.
-
-
Click Save.
-
In GX Cloud, click Data Assets.
-
Click Manage Data Sources.
-
Click Edit Data Source for the PostgreSQL Data Source you want to edit.
-
Optional. Edit the Data Source name.
-
Optional. Click Show in the Connection string field and then edit the Data Source connection string.
-
Click Save.
Edit a Data Asset
You can only edit the settings of Data Assets created in GX Cloud.
-
In GX Cloud, click Data Assets and in the Data Assets list click Edit Data Asset for the Data Asset you want to edit.
-
Edit the following fields:
-
Table name: Enter a new name for the Data Asset table.
-
Data Asset name: Enter a new name for the Data Asset. If you use the same name for multiple Data Assets, each Data Asset must be associated with a unique Data Source.
-
-
Click Save.
Data Source credential management
To connect to your Data Source in GX Cloud, there are two methods for managing credentials:
-
Direct input: You can input credentials directly into GX Cloud. These credentials are stored in GX Cloud and securely encrypted at rest and in transit. When Data Source credentials have been directly provided, they can be used to connect to a Data Source in any GX Cloud deployment pattern.
-
Environment variable substitution: To enhance security, you can use environment variables to manage sensitive connection parameters or strings. For example, instead of directly including your database password in configuration settings, you can use a variable reference like
${MY_DATABASE_PASSWORD}
. When using environment variable substitution, your password is not stored or transmitted to GX Cloud.
Environment variable substitution is not supported in fully hosted deployments.
-
Configure the environment variable: Enter the name of your environment variable, enclosed in
${}
, into the Data Source setup form. For instance, you might use${MY_DATABASE_PASSWORD}
. -
Inject the variable into your GX Agent container or environment: When running the GX Agent Docker container, include the environment variable in the command. For example:
docker run -it -e MY_DATABASE_PASSWORD=<YOUR_DATABASE_PASSWORD> -e GX_CLOUD_ACCESS_TOKEN=<YOUR_ACCESS_TOKEN> -e GX_CLOUD_ORGANIZATION_ID=<YOUR_ORGANIZATION_ID> greatexpectations/agent:stable
When running the GX Agent in other Docker-based service, including Kubernetes, ECS, ACI, and GCE, use the service's instructions to set and provide environment variables to the running container.
When using environment variable substitution in a read-only deployment, set the environment variable in the environment where the GX Core Python client is running.
Delete a Data Asset
-
In GX Cloud, click Settings > Datasources.
-
Click Delete for the Data Source and the associated Data Assets you want to delete.
-
Click Delete.