Differences between ER Modeling and Dimensional Modeling
May 24, 2016
Mapplet
Suggestions for Using Mapplets
June 15, 2016
Show all

Basic Performance Tips for ETL/Informatica mappings – Part 1

  • When your source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. This typically improves performance by 10 to 20 percent.
  •  The rule of thumb is not to cache any table over 500,000 rows. This is only true if the standard row byte count is 1,024 or less. If the row byte count is more than 1,024, then the 500k rows will have to be adjusted down as the number of bytes increase (i.e., a 2,048 byte row can drop the cache row count to between 250K and 300K, so the lookup table should not be cached in this case). This is just a general rule though. Try running the session with a large lookup cached and not cached. Caching is often still faster on very large lookup tables.
  • When using a Lookup Table Transformation, improve lookup performance by placing all conditions that use the equality operator = first in the list of conditions under the condition tab.
  • Cache only lookup tables if the number of lookup calls is more than 10 to 20 percent of the lookup table rows. For fewer number of lookup calls, do not cache if the number of lookup table rows is large. For small lookup tables (i.e., less than 5,000 rows), cache for more than 5 to 10 lookup calls.
  • Replace lookup with decode or IIF (for small sets of values).
  • If caching lookups and performance is poor, consider replacing with an unconnected, uncached lookup.
  •  For overly large lookup tables, use dynamic caching along with a persistent cache. Cache the entire table to a persistent file on the first run, enable the update else insert option on the dynamic cache and the engine will never have to go back to the database to read data from this table. You can also partition this persistent cache at run time for further performance gains.
  • Review complex expressions and simplify them using variable ports as much as possible.
  • Examine mappings via Repository Reporting and Dependency Reporting within the mapping.
  • Minimize aggregate function calls when using along with the Lookup Transformation.
  • Replace Aggregate Transformation object with an Expression Transformation object and an Update Strategy Transformation for certain types of Aggregations.
  • Optimize char-varchar comparisons (i.e., trim spaces before comparing).
  • Operators are faster than functions (i.e., || vs. CONCAT).
  • Optimize IIF expressions.
  • Avoid date comparisons in lookup; replace with string.
  • Test expression timing by replacing with constant.
  • Using flat files located on the server machine loads faster than a database located in the server machine.
  • Fixed-width files are faster to load than delimited files because delimited files require extra parsing.
  • If processing intricate transformations, consider loading first to a source flat file into a relational database, which allows the PowerCenter mappings to access the data in an optimized fashion by using filters and custom SQL Selects where appropriate.
  • If working with data that is not able to return sorted data (e.g., Web Logs), consider using the Sorter Advanced External Procedure.
  • Use a Router Transformation to separate data flows instead of multiple Filter Transformations.
  • Use a Sorter Transformation or hash-auto keys partitioning before an Aggregator Transformation to optimize the aggregate. With a Sorter Transformation, the Sorted Ports option can be used, even if the original source cannot be ordered.
  • Use a Normalizer Transformation to pivot rows rather than multiple instances of the same target.
  • Rejected rows from an update strategy are logged to the bad file. Consider filtering before the update strategy if retaining these rows is not critical because logging causes extra overhead on the engine. Choose the option in the update strategy to discard rejected rows.
  • When using a Joiner Transformation, be sure to make the source with the smallest amount of data the Master source.
  • If an update override is necessary in a load, consider using a Lookup transformation just in front of the target to retrieve the primary key. The primary key update will be much faster than the non-indexed lookup override.

Leave a Reply

Your email address will not be published. Required fields are marked *

Share
+1
Tweet
Pin
Share