Big Data Management Informatica (BDM)

1 - About

BDM is graphical language to develop jobs on an Hadoop environment.

BDM is the following application services available on a domain

3 - Components

Big Data Management can connect to Hadoop as a data source and push job processing to the Hadoop cluster.

If you run the mapping in a Hadoop environment, the mapping may run on:

The Data Integration Service:

  • determine the best engine to run the mapping.
  • will then pushes mapping and profiling jobs.

BDM connections:

  • HBase (a NoSQL database comprising key-value pairs on Hadoop that performs operations in real-time)
  • Hadoop,
  • HDFS,
  • Hive connections
  • JDBC

YARN ? (Hadoop clusters manager, job templates and executions)

Sqoop
The Model Repository Service uses JDBC to import metadata. The Data Integration Service runs the mapping in the Hadoop run-time environment and pushes the job processing to Sqoop. Sqoop then creates map-reduce jobs in the Hadoop cluster, which perform the import and export job in parallel.

3.1 - Client

  • Informatica Developer (the Developer tool, Create and run profiles against big data sources, and run mappings and workflows)
  • Informatica Administrator (the Administrator tool, Monitor the status of profile, mapping, and MDM Big Data Relationship Management jobs on the Monitoring tab of the Administrator tool)
  • Informatica Analyst (Create and run profiles on big data sources)
  • infacmd

3.2 - Services

  • Analyst Service: The Analyst Service runs the Analyst tool in the Informatica domain. The Analyst Service manages the connections between service components and the users that have access to the Analyst tool.
  • Data Integration Service: The Data Integration Service can process mappings in the native environment or push the mapping for processing to the Hadoop cluster in the Hadoop environment. The Data Integration Service also retrieves metadata from the Model repository when you run a Developer tool mapping or workflow. The Analyst tool and Developer tool connect to the Data Integration Service to run profile jobs and store profile results in the profiling warehouse.
  • Model Repository Service. The Model Repository Service manages the Model repository. The Model Repository Service connects to the Model repository when you run a mapping, mapping specification, profile, or workflow.

3.2.1 - Repository

  • Model repository: The Model repository stores profiles, data domains, mapping, and workflows that you manage in the Developer tool. The Model repository also stores profiles, data domains, and mapping specifications that you manage in the Analyst tool.
  • Profiling warehouse: The Data Integration Service runs profiles and stores profile results in the profiling warehouse.

3.2.2 - Grammar (Function)

See user guide (appendix C: Function reference)

4 - Documentation

dit/powercenter/bdm.txt ยท Last modified: 2018/01/17 17:20 by gerardnico