Manage Data Assets
A Data Asset is a collection of records from a Data Source. You can validate the whole Data Asset or a time-based subset of it. When you first connect to a Data Source, you define a minimum of one Data Asset. You can add more Data Assets from that same Data Source later.
Add a Data Asset from a new Data Source
To add a Data Asset from a new Data Source, refer to Connect GX Cloud.
Add a Data Asset from an existing Data Source
You can use the GX Cloud UI to add a Data Asset from an existing Databricks SQL, PostgreSQL, Redshift, or Snowflake Data Source. You can use the GX Cloud API to add a Data Asset from any Data Source.
- UI
- API
Prerequisites
- A GX Cloud account with Workspace Editor permissions or greater.
- A Databricks SQL, PostgreSQL, Redshift, or Snowflake Data Source.
Procedure
To add a Data Asset from an existing Data Source using the GX Cloud UI, complete the following steps:
-
In GX Cloud, select the relevant Workspace and then click Data Assets > New Data Asset.
-
In the Existing Data Source tab, select the relevant Data Source.
-
Select one or more tables or views to import as Data Assets.
-
Click Add x Asset(s).
-
Decide which Anomaly Detection options you want to enable. By default, GX Cloud adds warning-severity Expectations to detect Schema and Volume anomalies. You can de-select recommendations you’d like to opt out of. You can choose to generate Expectations to detect Completeness anomalies.
-
Click Start monitoring or Finish.
Prerequisites
- A GX Cloud account with Workspace Editor permissions or greater.
- Your Cloud credentials saved in your environment variables.
- A Data Source.
- Python version 3.9 to 3.13.
- An installation of the Great Expectations Python library.
- Recommended. A Python virtual environment.
Procedure
- Instructions
- Sample code
To add a Data Asset from an existing Data Source using the GX Cloud API, complete the following steps:
-
Create a Data Context object.
Pythonimport great_expectations as gx
context = gx.get_context(mode="cloud") -
Fetch the Data Source.
Pythondata_source = context.data_sources.get("my_data_source")
-
Define your Data Asset's parameters. Refer to the API reference for your Data Source for details on required and optional parameters. Here’s an example for a Data Asset on an S3 Data Source.
Pythonasset_name = "s3_taxi_csv_file_asset"
s3_prefix = "data/taxi_yellow_tripdata/" -
Add the Data Asset to your Data Source. Refer to the API reference for your Data Source for method details. Here’s an example for a
.csv
file Data Asset on an S3 Data Source.Pythons3_file_data_asset = data_source.add_csv_asset(name=asset_name, s3_prefix=s3_prefix)
# Create a Data Context object.
import great_expectations as gx
context = gx.get_context(mode="cloud")
# Fetch the Data Source.
data_source = context.data_sources.get("my_data_source")
# Define your Data Asset's parameters.
asset_name = "s3_taxi_csv_file_asset"
s3_prefix = "data/taxi_yellow_tripdata/"
# Add the Data Asset to your Data Source.
s3_file_data_asset = data_source.add_csv_asset(name=asset_name, s3_prefix=s3_prefix)
Now you can add an Expectation for your new Data Asset.
View Data Asset metrics
Data Asset metrics provide insights into your data that you can use to inform your Expectations. Schema data is automatically fetched when you create a new Databricks SQL, PostgreSQL, Redshift, or Snowflake Data Asset. For Amazon S3 Data Assets, you can manually fetch metrics.
To view Data Asset metrics, do the following:
-
In GX Cloud, select the relevant Workspace, click Data Assets, and then select a Data Asset in the Data Assets list.
-
Click the Metrics tab.
-
Optional. Select one of the following options:
-
Click Profile Data if you have not previously returned all available metrics for a Data Asset.
-
Click Refresh to refresh the Data Asset metrics.
-
Available Data Asset metrics
The following table lists the available Data Asset metrics.
Column | Description |
---|---|
Row Count | The number of rows within a Data Asset. |
Column | A column within your Data Asset. |
Type | The data storage type in the Data Asset column. |
Min | For numeric columns, the lowest value in the column. |
Max | For numeric columns, the highest value in the column. |
Mean | For numeric columns, the average value in the column. This is determined by dividing the sum of all values in the column by the number of values. |
Median | For numeric columns, the value in the middle of a data set. 50% of the data within the Data Asset has a value smaller or equal to the median, and 50% of the data within the Data Asset has a value that is higher or equal to the median. |
Null % | The percentage of missing values in a column. |
Delete a Data Asset
- In GX Cloud, select the relevant Workspace and then click Data Assets.
- In the Data Assets list, click Delete Data Asset for the Data Asset you want to remove.
- Review the warning and click Delete to confirm.
View GX Cloud logs
If you encounter an issue performing a GX Cloud task, review log information to troubleshoot the cause and determine a fix.
-
In GX Cloud, select the relevant Workspace and then click Logs.
-
Click Show log next to a log entry to display additional log details.
-
Optional. Click Hide log to close the log details view.