Create a connection in Anaplan Data Orchestrator to import data from Azure Blob Storage. Then use the connection to extract data and create a source dataset.
Prerequisites
Before you create a connection, you need:
- A storage account hosted on Azure Blob Storage.
- A user with Azure Blob Storage account name, container name, endpoint URL (optional) that has at least read permissions for the bucket and objects.
- An authentication method. You can use either:
- Storage Account Key: Azure Secret Key is required to connect with this method.
- OAuth: A client ID, client secret, Azure tenant ID, and refresh token is required to connect with this method.
- Folder path to a subset of the container.
To load a CSV file to Data Orchestrator, it must be UTF-8 encoded.
Create a connection to Azure Blob Storage
You will need your Azure Blob Storage credentials to connect the Azure Blob Storage data with Data Orchestrator. View the Azure Blob Storage documentation for more information about your credentials.
To create a connection:
- Select Data Orchestrator from the top-left navigation menu.
- Select Connections from the left-side panel.
- Select Create connection.
- Select Azure Blob Storage and then select Next.
If you can't find the connector, enter a search term in the Find... field. - On the Connection details page, enter these details and select Next:
- Name: Create a name for your connection. The name can contain alphanumeric characters and underscores.
- Description: Enter a description about your connection.
- On the Connection credentials page, enter your Azure Blob credentials and select Next:
For information about the fields on this page, see Authentication options for Azure Blob Storage connections. - After the connection test is complete, select Done.
Extract data from the Azure Blob Storage connection
You can extract data from the Azure Blob connection to add source data to Data Orchestrator. The data extract creates a source dataset.
To extract data:
- Select Data Orchestrator from the top-left navigation menu.
- Select Source data from the left-side panel.
- Select Add data > From connection.
- On the Dataset details page, enter these details and select Next:
- Connection: Select your existing connection.
- Dataset name: Enter a name for the dataset.
- Description: Optionally, provide a description for the dataset.
- Path name: Specify the file path pattern (default: /**/*.csv). (See additional information in the Path name for Azure Blob data extracts section below.)
- Column separator: Choose a separator from the options: Tab, Comma, Semicolon, or Other. If Other is selected, specify the custom separator.
- Text Delimiter: Specify the character used to enclose text in your CSV (for example, ").
- Header Row: Indicates the row number where the header is located (for example, 1).
- First Data Row: Specify the row number where the actual data starts (for example, 2).
- On the Choose an upload type page, enter these details and select Next:
- Select the Load type:
- Full replace: Completely replaces the current loaded data with the new data.
- Append: Adds the new data to the end of the current table.
- Incremental: Takes the data and incrementally updates what was previously loaded.
- Select the columns to import.
- If you choose Incremental as the load type, Primary Key (PK) is required. The system uses the PK to identify and update existing records. The Cursor Field is automatically selected and is the last updated date of the files that match the pattern. This ensures that only new or updated files are included in the sync.
- If you choose Append as the load type, the system automatically chooses the Cursor Field. So no need for you to set it up. This enables seamless appending of new data and maintains synchronization.
- Select the Load type:
- Select Create in the confirmation dialog.
Path name for Azure Blob data extracts
When you extract data from Azure Blob connections, you are asked to enter the Path name. If the container includes files with the same file name pattern, you can enter *.CSV to upload all the files with the same file name pattern.
Example
Your container is called SALES_DATA, and you have files called SALES_wk01.CSV, SALES_wk02.CSV, and SALES_wk03.CSV. If you enter Sales_Files/Sales_*.CSV for the Path name, all three files are uploaded to Data Orchestrator.
If you add more files to your bucket later with the same file name pattern, you can sync the data to upload the new files.