Explain about Live data and Staged Data?
During the creation of Profile definition, we can select one of the option (Live data or Staged data) for the data drill down.
- Live Data: – It’s the default type of drill down for any profile definition, Live data access the row data you drill down on the source.
- Staged Data: – All the row data are staged in the profiling warehouse.
Please note for few sources like Mainframes and other BigData it is not possible to drilldown on Live data as accessing these systems will be expensive, so the default drill down for these systems will be staged.
Explain about compare profiles?
Compare Profiles enables us to compare the output from two transformations. It is very useful in providing a before and after picture of the data. It can be used with any transformation/mapplet/source object that has outputs.
What is Join Analysis Profiling?
Join analysis describes the degree of potential joins between two data columns. Use a join profile to analyze column joins in a data source or across multiple data sources.
A join profile displays results as a Venn diagram and as numerical and percentage values. You create and run a join profile from an enterprise discovery profile.
What is Multiple Profile?
If you have to create profiles on multiple tables at once, this can be done by using Multiple Profile option from the profile wizard window. All the profiles created by multiple profile can be prefixed and suffixed by some default values which will be added to the profile name.
To create multiple profiles, select all the objects > right click and select profile > select multiple profiles
Explain about Logical Data Object (LDO)?
Logical Data Objects (LDO’s) are virtual mappings that helps you to integrate data from multiple sources and formats into a standardized view. Consider a scenario where you have to join data from multiple tables and include them in one profile, exclude columns from being shown in the profile, filter out records, rename columns without Impacting (changing) the actual physical object. LDO’s are like relational views in database.
In the below example, data is read from two different sources and loaded to the Customer target table after applying the transformation rules.
What is the difference between Classification model and probabilistic model?
Probabilistic model is used to identify the type of information in a string. When used in Parser or Labeler Probabilistic model detect relevant terms in the input string and disregard terms that are not relevant.
Classification model is used to identify the type of information in a data record. When used in classifier transformation, the transformation uses the common values to classify the information in each record and classifies to which of a set of categories or sub-populations a new.
Probabilistic model is used in Parser and Labeler transformations.
Classification model is used in classifier transformation only.
Explain about Named Entity Recognition (NER) model?
NER is a tool which is used to label and parse named entities from text using a statistical approach to analyzed data patterns. NER can be applied on various types of texts.
In the below example, based on the source data patterns, the model is trained to label and parse the data as per the requirements.
- Using data (decisions makers)
- Generating data (users using systems or people in different roles running the business)
- Maintaining data (owners of the business units)
- Providing a mechanism to capture data (IT)
IT and Business should work together as a team to implement best practices and process improvements at the source.
For each probabilistic model, Informatica creates a .ner file on the server.
Explain about Discrete Address Validation Input Template?
Discrete Address Validation Input Template accepts a single address element. If each source attribute represents a unique address element, such as house /apartment number, street name, city, state, or ZIP code.
|Street Number||Street Name||Street Type||house / apartment||City||State||Zip|
Explain about Address Doctor Subscription Files?
Address Doctor supplies below subscription files:
- MD: – Batch Interactive. Used in Batch /Interactive/Certified Mode
- MD: – Fast Complete. Used in Suggestion List mode
- MD – xxx5cnn.MD: – Certified. Used in Certified Mode (will typically be multiple files)
- MD: – Geocoding
- MD: –Cameo
- MD: –Extended/Supplementary (applies to some countries only)
- Informatica Address Doctor refers to suggestion list mode as fast completion mode.
- xxx in the file names indicates the ISO country code.
- n – indicates a number.
Can we use reference tables in Case Converter Transformation?
Reference tables can be used only when the case conversion type is Title Case or Sentence case.
When to use Character execution mode in Labeler Transformation?
If the source data has to be labeled at character level as alphabet, digit, space, symbols.
For example, the Labeler transformation can label the phone number M-999-999-9999 as “XSNNNSNNNNSNNN” where S stands for symbol, N stands for numeric character and X stands for alphabetic character.
How Score Output port in labeler generates the score values?
Labeler Transformation parses the input data against the data available in probabilistic model when labeler is configured to parse data against the probabilistic model and generated the score based on the data similarity between the input data and the patterns defined in the probabilistic model.
In below example, for each source record, labeler has generated a score based on the source data and the defined probabilistic data patterns.
Explain about Pattern Based Parser data mode in Parser Transformation?
It is used to parse the patterns made of multiple strings, if the data from multiple strings have to be parsed apart or sorted. The patterns should be recognizable by the upstream ports LabelData port and Tokenized data port from the Labeler transformation. For example, if the source contains multiple string value with SSN, EMPNO, DateofBirth, if these values have to be parsed in to three separate attributes, based on the system token sets each multistring can be separated.
Pattern Based parser will provide full flexibility to customize each individual pattern output
How to choose the right Group key?
Group Key will improve the performance of the Match Transformation and Informatica recommends passing the Group key to the Match Transformation.
Match Transformation can be configured with out the Group Key from the upstream Key Generator Transformation.
Choosing the right Group Key Will Improve the matching success rate and is mostly useful in Classic Match. Identity Match can be configured with out the Group Key as Identity Matching generates its own Match Groups (or Keys) within which records are compared to find matches. These groups are based on the Population used and the input field selected as the Key Field on the Match Type tab.
Follow the below steps for selecting the right Group Key
- Know the data: – Profile the data to ensure the key attributes are unique, complete and contains accurate data.
- Pre-processing: – Use standardization to remove noise words/symbols, punctuation and apply other cleanse rules based on the profiling output and standardize the data such way that the when the strategies are applied in Key Generator the Group sizes shouldn’t exceed 5000 records.
Is SequenceId mandatory for the match process?
Yes, SequenceId is mandatory for the match process. Every record in an input data set must include a unique sequence identifier. If a data set contains duplicate sequence identifiers, the Match transformation cannot identify duplicate records correctly. Use the Key Generator transformation to create unique identifiers if none exist in the data.
When you create an index data store for identity data, the Match transformation adds the sequence identifier for each record to the data store. When you configure the transformation to compare a data source with the index data store, the transformation might find a common sequence identifier in both data sets. The transformation can analyze the sequence identifiers if they are unique in the respective data sets.
Note: – The Match Mapping will be Invalid without connecting the SequenceId port from the upstream mapping.