Tips for importing large volumes of data
Is it taking a long time to import data into your Anaplan models? Take a look at these suggestions for working with the data you import. Not all of the suggestions will work in every situation, but they will help you to structure and format your data for a smooth import.
These suggestions apply to models with grids where the cell count is 100 million or more data items.
Setting up the import
- Consider creating an intermediate model — a data hub — to stage the import and reduce any load on the destination model. If you’re not sure how to create a data hub, visit Learning Center and register for the 305: Hub Model Hierarchy Management course, which will help you get started.
- Imports can be scheduled to reduce the impact on users of the source and/or destination model. If you think the import will take longer than five minutes, it’s a good idea to set up data integration and schedule some tasks to run at the most convenient time.
- Create your import in Anaplan so you can reference it for data integration, using the API, Anaplan Connect, or any other ETL (Extract, Transform, Load) tool.
- Large volume data loads should be brought into Anaplan as flat transactional loads and then summarized in a module with dimensions.
- Don’t assume that your data is corrupted and it's slowing down your import. Check for special characters in the name of your lists or the names of your data elements and remove them where you can. Anaplan sees some special characters as corrupting the data and can take considerable time to process them before moving on with the import.
- If you’re importing from one module to another, make sure that both modules have a similar structure. When mapping the import, set the fields to ignore any values that do not need to be imported for faster performance.
Working with more than 200 million cells? Breaking down the data is key. Ideally, aim to import 100 million cells or less at any one time.
- Break the data into smaller components. You can use Time as a basis for breaking the data down, or break hierarchies into smaller sections.
- Importing into a list usually takes longer than importing into a model because Anaplan has to calculate the structure of the module. For data with more than 100,000 records, the fastest option is to load directly into a module. Otherwise, error generation will slow the import significantly.
- Use saved views to limit the data being transferred in an import.
Importing the data
When importing, Anaplan processes every cell presented to it, including summarized or aggregated cells that could be calculated later. Those cells are processed unnecessarily and the import will take much longer to complete. You will also encounter some large error log files, as each cell that is ignored will present an error.
- Hide the parent levels in every hierarchy and save the result as a view (parent levels in hierarchies don’t contain raw data, the data is aggregated, so they can be ignored at import and reinstated later). Use the Select Levels to Show option to quickly identify and hide the parent levels, leaving you with only child items for the import.
- Hide time-based summaries such as quarters, fiscal year, etc. Attach a dynamic filter to show only months or your most detailed time scale – the summary items will be hidden.
- Limit the number of versions you are importing; two is optimal. Hide any versions you don’t need to import.
- Clean up the source data to only include the values that you need. Don’t waste time importing data that hasn’t got a location in the destination dataset. Use Views to identify the data you require.
- Use Boolean markers to show those items you do want (True) and not the ones you don’t (False). Only import the records with a value of True using a saved view.
- Use Boolean values to mark the lowest level of data for import and ignore summarized values. The Boolean value will be used as a filter to only show lowest level items in your saved view.
Keep these recommendations in mind for easier, faster imports.