10.4. Advanced Topics

10.4.1. Deleting RDS data

IMPORTANT: Deleting Remote Data Subscription Data Set should used with care and only for testing purposes.

Data Sets exist in three different components: the data is published by the Data Ingestion Engine indexing component, it is served by the Remote Data Subscription Publisher and retrieved by the Remote Data Subscription Subscriber. Each of these components provide the option to delete the Data Set inside the component’s data.

Deleting data sets before publishing

When running pipelines, the pipeline run dialog has a option Clear before publishing in the ‘Remote Data Subscription’. This flag will remove all Data Ingestion Engine data for previously published data sets of the current pipeline. The data served by the RDS Publisher is left untouched, until it is overwritten at the very end of the pipeline run.

Deleting data sets on the Publisher, after publishing

The Publisher data can be deleted under ‘Published data catalog’ by clicking the ui-trash icon. Deleting a Data Set here will prevent any Subscriber from downloading this Data Set until it is recreated by the Data Ingestion Engine pipeline. It is strongly recommended to execute a full pipeline run with the Clear before publishing flag enabled after deleting a Data Set on the Remote Data Subscription Publisher.

Restriction: a Data Set cannot be deleted while a pipeline is active.

Deleting data sets on the Subscriber, after downloading

The Subscriber data can be deleted under ‘Remote Data Subscriptions’ by clicking the ui-trash icon. This removes all previously downloaded data for the Data Set. The next data retrieval will download the data again to match the data on the Publisher-side. It is strongly recommended that the user runs a ‘Full’ schedule instead of deleting the Data Set if the data is still required by a pipeline.

Deleting a Data Set on the Subscriber will cause the “Import Remote Data Source” component to consider the previous data as ‘removed’ until the next Remote Data Subscription schedule runs.

Restriction: A Data Set cannot be delete while a Remote Data Subscription schedule is active.