Create a connection in Anaplan Data Orchestrator to import data from Google Cloud Storage (GCS). Then use the connection to extract data and create a source dataset.
Before you create a connection, you need:
- Your GCS bucket hosted on Google cloud platform.
- A user with:
- Google Cloud Platform (GCP) service account email.
- GCP private key.
- Bucket name with at least read permissions for the bucket and objects.
- Folder path to a subset of the bucket.
- Access to the Data Orchestrator application.
View the Google Cloud documentation for more information.
Create a connection to Google Cloud Storage
To create a connection:
- Select Data Orchestrator from the top-left navigation menu.
- Select Connections from the left-side panel.
- Select Create connection.
- On the Create connection page, select Google Cloud Storage, and then select Next.
If you don't find the connector, enter a search term in the Find... field. - On the Connection details page, enter these details and select Next:
- Name: Create a name for your connection. The name can contain alphanumeric characters and underscores.
- Description: Optionally, enter a description about your connection.
- On the Connection credentials page, enter your Google credentials and select Next:
- GCP Service Account Email: Enter your GCP service account email (for example,
XXXXXXXXXXX-compute@developer.gserviceaccount.com
). - GCP Private Key: Enter GCP private key
For example:
- GCP Service Account Email: Enter your GCP service account email (for example,
-----BEGIN PRIVATE KEY-----
************************************************************
... (Your private key content in Base64) ...
... multiple lines ...
-----END PRIVATE KEY-----
- Bucket name: The GCS bucket where your data is stored (for example,
GCS-EXAMPLE-BUCKET
). - After the connection test is complete, select Done.
Extract data from the Google Cloud Storage connection
You can extract data from the Google Cloud Storage connection to add source data to Data Orchestrator. The data extract creates a source dataset.
To extract data:
- Select Data Orchestrator from the top-left navigation menu.
- Select Source data from the left-side panel.
- Select Add data > From connection.
- On the Dataset details page, enter these details and select Next:
- Connection: Select your the GCS connection you created.
- Dataset Name: Enter a name for the dataset.
- Description: Optionally, provide a description for the dataset.
- Path Name: Specify the file path pattern (default pattern: /**/*.csv).
- Column Separator: Choose a separator from the options: Tab, Comma, Semicolon, or Other. If you select Other, specify the custom separator.
- Text Delimiter: Specify the character used to enclose text in your CSV (for example,
"
). - Header Row: Specify the row number where the header is located (for example,
1
). - First Data Row: Specify the row number where the actual data starts (for example,
2
).
- On the Choose an upload type page, enter these details and select Next:
- Select the Load type:
- Full replace: Replaces the entire dataset with new data.
- Append: Adds records from new or updated files to the existing dataset without deleting or updating previous records.
- Incremental: Captures records from new or updated files and updates existing records in the dataset based on a matching primary key and appends records with unmatched primary keys.
- Select the columns to import.
- If you choose Incremental as the load type, Primary Key (PK) is required. The system uses the PK to identify and update existing records. The Cursor Field is automatically selected and is the last updated date of the files that match the pattern. This ensures that only new or updated files are included in the sync.
- If you choose Append as the load type, the system automatically chooses the Cursor Field. So no need for you to set it up. This enables seamless appending of new data and maintains synchronization.
- Select the Load type:
- Select Create in the confirmation dialog.