Skip to main content
Version: 1.7.1

Manage Data Assets

A Data Asset is a collection of records from a Data Source. You can validate the whole Data Asset or a time-based subset of it. When you first connect to a Data Source, you define a minimum of one Data Asset. You can add more Data Assets from that same Data Source later.

Add a Data Asset from a new Data Source

To add a Data Asset from a new Data Source, refer to Connect GX Cloud.

Add a Data Asset from an existing Data Source

You can use the GX Cloud UI to add a Data Asset from an existing Databricks SQL, PostgreSQL, Redshift, or Snowflake Data Source. You can use the GX Cloud API to add a Data Asset from any Data Source.

Prerequisites

Procedure

To add a Data Asset from an existing Data Source using the GX Cloud UI, complete the following steps:

  1. In GX Cloud, select the relevant Workspace and then click Data Assets > New Data Asset.

  2. In the Existing Data Source tab, select the relevant Data Source.

  3. Select one or more tables or views to import as Data Assets.

  4. Click Add x Asset(s).

  5. Decide which Anomaly Detection options you want to enable. By default, GX Cloud adds warning-severity Expectations to detect Schema and Volume anomalies. You can de-select recommendations you’d like to opt out of. You can choose to generate Expectations to detect Completeness anomalies.

  6. Click Start monitoring or Finish.

Now you can add an Expectation for your new Data Asset.

View Data Asset metrics

Data Asset metrics provide insights into your data that you can use to inform your Expectations. Schema data is automatically fetched when you create a new Databricks SQL, PostgreSQL, Redshift, or Snowflake Data Asset. For Amazon S3 Data Assets, you can manually fetch metrics.

To view Data Asset metrics, do the following:

  1. In GX Cloud, select the relevant Workspace, click Data Assets, and then select a Data Asset in the Data Assets list.

  2. Click the Metrics tab.

  3. Optional. Select one of the following options:

    • Click Profile Data if you have not previously returned all available metrics for a Data Asset.

    • Click Refresh to refresh the Data Asset metrics.

Available Data Asset metrics

The following table lists the available Data Asset metrics.

ColumnDescription
Row CountThe number of rows within a Data Asset.
ColumnA column within your Data Asset.
TypeThe data storage type in the Data Asset column.
MinFor numeric columns, the lowest value in the column.
MaxFor numeric columns, the highest value in the column.
MeanFor numeric columns, the average value in the column.
This is determined by dividing the sum of all values in the column by the number of values.
MedianFor numeric columns, the value in the middle of a data set.
50% of the data within the Data Asset has a value smaller or equal to the median, and 50% of the data within the Data Asset has a value that is higher or equal to the median.
Null %The percentage of missing values in a column.

Delete a Data Asset

  1. In GX Cloud, select the relevant Workspace and then click Data Assets.
  2. In the Data Assets list, click Delete Data Asset for the Data Asset you want to remove.
  3. Review the warning and click Delete to confirm.

View GX Cloud logs

If you encounter an issue performing a GX Cloud task, review log information to troubleshoot the cause and determine a fix.

  1. In GX Cloud, select the relevant Workspace and then click Logs.

  2. Click Show log next to a log entry to display additional log details.

  3. Optional. Click Hide log to close the log details view.