Enterprise Information Catalog (EIC) Installation

1 - About

Enterprise Information Catalog (EIC) uses the Catalog Service and other application services to bring together configured data assets in an enterprise and present a comprehensive view of the data assets and data asset relationships.

This installation occurs:

3 - Minimal topology information

3.1 - Cluster

You can deploy Enterprise Information Catalog either:

  • in an internal Hadoop distribution on Hortonworks (inside the installer) on the same machine
  • or external Hadoop distribution on Cloudera, Hortonworks, or Azure HDInsight

3.1.1 - Internal

  • The Enterprise Information Catalog installer creates an Informatica Cluster Service as an ISP service.
  • Enterprise Information Catalog uses Apache Ambari to manage and monitor the internal Hadoop cluster.

3.1.2 - External

Preparation:

We use the below information

  • <username> : the Informatica domain user
  • <ServiceClusterName> : the name of the service cluster (you enter it when you create the Catalog Service)

in HDFS before creating the Catalog Service:

/Informatica/LDM/<ServiceClusterName>
/user/<username>

Make the owner of the /Informatica/LDM/<ServiceClusterName> and /user/<username> directories.

3.2 - Services

3.3 - Domain

  • The Informatica domain is the administrative unit.
  • Enterprise Information Catalog requires a dedicated domain
  • EIC is installed within the Informatica domain

3.4 - Data Set

  • Small, Medium, Large, Default, and Demo data set sizes (Configured in Informatica Administrator using custom properties). You cannot change the data set size if you had selected a Demo data set size or if the data set size is smaller.
  • are classified based on the amount of metadata to process and the number of nodes used to process metadata

3.5 - Client

Enterprise Information Catalog contains the following client applications:

  • Informatica Administrator
  • Informatica Catalog Administrator
  • Enterprise Information Catalog search tool

3.6 - Repository

The types of repository is based on the type of data and metadata that it stores.

  • Domain configuration repository: A relational database that stores domain configuration and user information.
  • Model repository: A relational database that stores metadata created by Enterprise Information Catalog and application services to enable collaboration between the clients and services. Model repository also stores the resource configuration and data domain information.
  • Profiling warehouse: A relational database that stores profile results. Profile statistics form one part of the comprehensive metadata view that Enterprise Information Catalog provides.
  • Reference data warehouse: A relational database that stores data values for the reference table objects that you define in the Model repository. When you add data to a reference table, the Content Management Service writes the data values to a table in the reference data warehouse.

4 - Prerequisites

4.1 - SQL Server

  • The database
CREATE DATABASE EIC;
ALTER DATABASE EIC SET ALLOW_SNAPSHOT_ISOLATION ON;
ALTER DATABASE EIC SET READ_COMMITTED_SNAPSHOT ON;
USE EIC;
  • the user account name for the domain configuration repository
-- the database user account name for the domain configuration repository
CREATE LOGIN eic_dom WITH PASSWORD = 'pwd', DEFAULT_DATABASE = EIC;
CREATE USER eic_dom FOR LOGIN eic_dom;
CREATE SCHEMA eic_dom AUTHORIZATION eic_dom;
ALTER USER eic_dom WITH DEFAULT_SCHEMA = eic_dom;
 
-- Permission
EXEC sp_addrolemember 'db_ddladmin', 'eic_dom';
EXEC sp_addrolemember 'db_datawriter', 'eic_dom';
EXEC sp_addrolemember 'db_datareader', 'eic_dom';
 
-- Double ?
GRANT CONNECT TO eic_dom;
GRANT CREATE TABLE TO eic_dom;
GRANT CREATE VIEW TO eic_dom;
  • the user account name for the mrs repository
CREATE LOGIN eic_dom WITH PASSWORD = 'pwd', DEFAULT_DATABASE = EIC;
CREATE USER eic_dom FOR LOGIN eic_dom;
CREATE SCHEMA eic_dom AUTHORIZATION eic_dom;
ALTER USER eic_dom WITH DEFAULT_SCHEMA = eic_dom;
 
-- Permission
EXEC sp_addrolemember 'db_ddladmin', 'eic_dom';
EXEC sp_addrolemember 'db_datawriter', 'eic_dom';
EXEC sp_addrolemember 'db_datareader', 'eic_dom';
 
-- Double ?
GRANT CONNECT TO eic_dom;
GRANT CREATE TABLE TO eic_dom;
GRANT CREATE VIEW TO eic_dom;
  • The mrs database account name
CREATE LOGIN eic_mrs WITH PASSWORD = 'pwd', DEFAULT_DATABASE = EIC;
CREATE USER eic_mrs FOR LOGIN eic_mrs;
CREATE SCHEMA eic_mrs AUTHORIZATION eic_mrs;
GRANT CONNECT TO eic_mrs;
GRANT CREATE TABLE TO eic_mrs;
GRANT CREATE VIEW TO eic_mrs;
ALTER USER eic_mrs WITH DEFAULT_SCHEMA = eic_mrs;
EXEC sp_addrolemember 'db_ddladmin', 'eic_mrs';
EXEC sp_addrolemember 'db_datawriter', 'eic_mrs';
EXEC sp_addrolemember 'db_datareader', 'eic_mrs';
CREATE LOGIN eic_pwh WITH PASSWORD = 'pwd', DEFAULT_DATABASE = EIC;
CREATE USER eic_pwh FOR LOGIN eic_pwh;
CREATE SCHEMA eic_pwh AUTHORIZATION eic_pwh;
GRANT CONNECT TO eic_pwh;
GRANT CREATE TABLE TO eic_pwh;
GRANT CREATE VIEW TO eic_pwh;
ALTER USER eic_pwh WITH DEFAULT_SCHEMA = eic_pwh;
EXEC sp_addrolemember 'db_ddladmin', 'eic_pwh';
EXEC sp_addrolemember 'db_datawriter', 'eic_pwh';
EXEC sp_addrolemember 'db_datareader', 'eic_pwh';

4.2 - Hardware Minimum

For a simple topology (p25 and 28 of the installation doc):

The minimum system requirements for the Informatica Domain and Hadoop cluster on the same machine:

  • Disk Space: 75 GB
  • Memory(RAM): 32 GB
  • Number of CPU cores:: 16

The minimum system requirements for the Informatica domain if the Hadoop cluster is not on the Informatica domain machine:

  • Disk Space: 40 GB
  • Memory(RAM): 16 GB
  • Number of CPU cores:: 8

Temp: 8 GB of temporary disk space.

Example on Azure with a machine size of Standard_F8S (8 Cores, 16 Gb Memory, 128 Gb Disk, 284 Euro/month):

az.cmd vm create ^
    --resource-group myGroup ^
    --name HI-INFA-EIC-01 ^
    --image RedHat:RHEL:7.3:latest ^
    --size Standard_F8s ^ 
    --authentication-type password ^
    --admin-username hi-adm ^
    --admin-password BC896a9fc39a! ^
    --location westeurope

4.3 - HDFS

Create the directory /Informatica/LDM/<ServiceClusterName>

If you do not specify a service cluster name, Enterprise Information Catalog considers DomainName_CatalogServiceName as the default value. You must then have the /Informatica/LDM/<DomainName>_<CatalogServiceName> directory in HDFS.

Create the directory

/Informatica/LDM/DOMAIN_EIC_01_CS_EIC_01

where:

  • DOMAIN_EIC will be the DomainName
  • and CS_EIC_01 will be the catalog service name
hadoop fs -mkdir /Informatica
hadoop fs -mkdir /Informatica/LDM
hadoop fs -mkdir /Informatica/LDM/DOMAIN_EIC_01_CS_EIC_01
hadoop fs -chmod -R 777 /Informatica/LDM/DOMAIN_EIC_01_CS_EIC_01

5 - Installation

5.1 - Installation

  • Installer: Enterprise Information Catalog installer install the services.
  • When you install the Enterprise Information Catalog services on a machine, you install all the files for all services.
  • The first time you run the installer, you must create the domain. During the installation on the additional machines, you create worker nodes that you join to the domain.

5.1.1 - As machine admin

mkdir /tmp/infa
cd /tmp/infa
wget https://containerName.blob.core.windows.net/install/informatica_1020_server_linux-x64.tar
# As sudo other, you don't have any permissions
sudo su tar -xvf informatica_1020_server_linux-x64.tar
 
# Other installation files
wget https://containerName.blob.core.windows.net/install/ScannerBinaries.zip
mv ScannerBinaries.zip /tmp/infa/source
 
# Installation user
sudo useradd powercenter
sudo passwd powercenter
 
sudo chown -R powercenter.powercenter .
 
# place the key in the home
sudo mv license.key /home/powercenter/informatica/license.key
sudo chown powercenter.powercenter /home/powercenter/informatica/license.key
 
# resources limit
sudo vi  /etc/security/limits.conf
powercenter    hard   nofile    32000
powercenter    soft   nofile    3000
5.1.1.1 - Firewall

Firewalld

Add the ports:

  • Azure Firewall
az.cmd network nsg rule create ^
    --resource-group myGroup ^
    --nsg-name INFA-BDM-01NSG ^
    --name allow-infa-admin-hi ^
    --protocol tcp ^
    --priority 1021 ^
    --destination-port-range 6008 ^
    --source-address-prefixes publicIP
az.cmd network nsg rule create ^
    --resource-group myGroup ^
    --nsg-name INFA-BDM-01NSG ^
    --name allow-infa-admin-hi ^
    --protocol tcp ^
    --priority 1022 ^
    --destination-port-range 6005 ^
    --source-address-prefixes publicIP
  • As root, Red Hat Firewall
sudo firewall-cmd --zone=public --add-port=8443/tcp --permanent
sudo firewall-cmd --zone=public --add-port=6005-6009/tcp --permanent
sudo firewall-cmd --zone=public --add-port=6014-6114/tcp --permanent
sudo firewall-cmd --reload

5.1.2 - As powercenter

The install.sh and silentinstall.sh program are wrapper of the real installer. It validates the environment for the installer.

  • The install.sh will perform the following: To create a response file, you can add the -r option. See install.bin
./Server/install.bin -DINSTALL_MODE=CONSOLE -DINSTALL_TYPE=0
  • The silentinstall.sh will perform the following:
./Server/install.bin -i silent -DINSTALL_MODE=SILENT

A silent installation:

cd /tmp/infa
 
# for the installation directory
mkdir ~/informatica
mkdir ~/informatica/10.2
 
# For the keystore
mkdir ~/informatica/10.2/isp
mkdir ~/informatica/10.2/isp/config
mkdir ~/informatica/10.2/isp/config/keys
 
 
cp SilentInput.properties SilentInputBackup.properties

Example of modification of the property files for an hdInsight Cluster: The diff was made with winmerge. Tools > generate patch

LICENSE_KEY_LOC=/home/powercenter/informatica/license.key
USER_INSTALL_DIR=/home/powercenter/informatica/10.2
INSTALL_LDM=1
ACCEPT_ORACLE_LICENSE=1
HTTPS_ENABLED=1
KEY_DEST_LOCATION=/home/powercenter/informatica/10.2.0/isp/config/keys
PASS_PHRASE_PASSWD=pwd
SSL_ENABLED=true
DB_TYPE=MSSQLServer
DB_UNAME=eic_dom
DB_PASSWD=pwd
SQLSERVER_SCHEMA_NAME=eic_dom
DB_SERVICENAME=EIC
DB_ADDRESS=msft-db-01:1433
DOMAIN_NAME=DOMAIN_EIC_01
DOMAIN_HOST_NAME=INFA-EIC-01
NODE_NAME=NodeEic01
DOMAIN_USER=Administrator
DOMAIN_PSSWD=pwd
DOMAIN_CNFRM_PSSWD=pwd
CREATE_SERVICES=1
MRS_DB_TYPE=MSSQLServer
MRS_DB_UNAME=eic_mrs
MRS_DB_PASSWD=pwd
MRS_SQLSERVER_SCHEMA_NAME=eic_mrs
MRS_DB_SERVICENAME=EIC
MRS_DB_ADDRESS=msft-db-01:1433
MRS_SERVICE_NAME=MRS_EIC_01
DIS_SERVICE_NAME=DIS_EIC_01
DIS_PROTOCOL_TYPE=both
DIS_HTTP_PORT=8095
DIS_HTTPS_PORT=8096
ASSOCIATE_PROFILE_CONNECTION=1
PWH_DB_TYPE=SQLServer
PWH_DB_UNAME=eic_pwh
PWH_DB_PASSWD=pwd
PWH_SQLSERVER_SCHEMA_NAME=eic_pwh
PWH_DB_SERVICENAME=EIC
PWH_DB_ADDRESS=msft-db-01:1433
LOAD_DATA_DOMAIN=1
CMS_SERVICE_NAME=CMS_EIC_01
CMS_PROTOCOL_TYPE=https
CMS_HTTPS_PORT=8106
CMS_DB_TYPE=SQLServer
CMS_DB_UNAME=eic_cms
CMS_DB_PASSWD=pwd
CMS_SQLSERVER_SCHEMA_NAME=eic_cms
CMS_DB_SERVICENAME=EIC
CMS_DB_ADDRESS=msft-db-01:1433
CLUSTER_HADOOP_DISTRIBUTION_TYPE=HortonWorks
IS_CLUSTER_SSL_ENABLE=true
CATALOGUE_SERVICE_NAME=CS_EIC_01
CATALOGUE_SERVICE_TLS_HTTPS_PORT=8086
CLUSTER_HADOOP_DISTRIBUTION_URL=https://clus-spark-01.azurehdinsight.net
CLUSTER_HADOOP_DISTRIBUTION_URL_USER=adm
CLUSTER_HADOOP_DISTRIBUTION_URL_PASSWD=pwd
5.1.2.1 - Check

Bi starting the installer, you can check the prerequisites:

./install.sh
******************************************************************************************************
System Check Summary - Step 4 of 4
******************************************************************************************************
[ Type 'back' to go to the previous panel or 'help' to check the help contents for this panel or 'quit' to cancel the installation at any time. ]

Informatica Pre-Installation (i10PreInstallChecker) System Check Tool Results
[Pass] Disk Space: Available disk space is 15,548 MB. Sufficient for the Informatica installation.
[Pass] Processors: Available number of processors is 2. Sufficient for the Informatica installation.
[Pass] Physical Memory: Available physical memory is 16,416 MB. Sufficient for the Informatica installation.
[Pass] Temporary Space: Available temporary disk space is 15,548 MB. Sufficient for the Informatica installation.
[Pass] Ports: Port range is 6,005 - 6,009. All port numbers within the port range are available for the Informatica installation.
[Pass] Locale Environment Variable: The LANG environment variable is set to language en_US.UTF-8. The LC_ALL environment variable is set to language null. Sufficient for the Informatica installation.
[Pass] JRE_HOME Environment Variable: The JRE_HOME environment variable does not contain a value. Sufficient for the Informatica installation.
[Pass] File Descriptor Limits: The file descriptor limits per process is 32000. Sufficient for the Informatica installation.
[Pass] CREATE TABLE Privilege: The database user account has the CREATE TABLE privilege. The installer successfully created a database table.
[Pass] INSERT RECORD Privilage: The installer successfully inserted a record into database table.
[Pass] DELETE RECORD Privilage: The installer successfully deleted a record from database table.
[Pass] CREATE VIEW Privilege: The database user account has the CREATE VIEW privilege. The installer successfully created a database view.
[Pass] DROP VIEW Privilege: The database user account has the DROP VIEW privilege. The installer successfully dropped a database view.
[Pass] DROP TABLE Privilege: The database user account has the DROP TABLE privilege. The installer successfully dropped a database table.
[Pass] SQL Server READ COMMITTED Isolation Level: The SQL Server READ COMMITTED isolation level for the database is set to ON. Sufficient for the Informatica installation.
[Pass] SQL Server Case Sensitivity: The SQL Server instance is not case-sensitive. Sufficient for the Informatica installation.
[Information] Informatica Installation Directory: /home/powercenter
[Information] Informatica Starting Port Number: 6005
[Information] Database Type: SQLServer
[Information] Database User ID: hi_eic_dom
[Information] Database Host Name: hi-msft-db-01
[Information] Database Port Number: 1433
[Information] Database Service Name: hi_eic
[Information] Operating System: Operating system is Linux. Operating system version is 3.10.0-514.28.1.el7.x86_64.
[Information] RAM: The memory module size is 16,416 MB.
[Information] Virtual Memory: Virtual Memory is set to unlimited.
5.1.2.2 - Installation Silent
/tmp/infa/silentinstall.sh

Installation log are in the root installation directory. For instance, /home/powercenter/informatica/10.2/Informatica_10.2.0_InstallLog.log

OS detected is Linux

\***************************************************************************
\* Welcome to the Informatica 10.2.0 Server Installer.  *
\***************************************************************************



Configure the LANG and LC_ALL variables to generate the appropriate code pages and
create and connect to repositories and Repository Services.
Before you continue, read the following documents:
* Informatica 10.2.0 Installation Guide, Informatica Release Guide and Informatica Release Notes.
* B2B Data Transformation 10.2.0 Installation, Configuration Guide and Release Notes.

You can find the 10.2.0 documentation in the Product Documentation section at https://network.informatica.com/.
The installer requires Linux version 2.6.32-431 or later versions of the 2.6.32 series or version 3.10.0-0 or later versions of the 3.10.0 series.
The current operating system Linux version 3.10.0-514.
Current operating system meets minimum requirements.
-----------------------------------------------------------
Checking for an Informatica 10.2.0 installation.
Launching installer in silent mode ...
Installation Completed.

5.2 - Post Installation

5.2.1 - Environment variable

._bash_profile
export INFA_HOME=/home/powercenter/informatica/10.2/
 
# A pair of bin
export PATH=$PATH:$INFA_HOME/java/jre/bin/
export PATH=$PATH:$INFA_HOME/tomcat/bin/

5.2.2 - Start and Stop

  • Upload the below file to /tmp
infainit
#!/bin/sh
# chkconfig: 345 99 10
# description: Informatica auto start-stop script for the init system
 
INFA_OWNER=powercenter
export INFA_HOME=/home/powercenter/informatica/10.2
 
case "$1" in
    'start')
        # Start Informatica
        su - $INFA_OWNER -c "${INFA_HOME}/tomcat/bin/infaservice.sh startup"
        ;;
    'stop')
        # Stop Informatica
        su - $INFA_OWNER -c "${INFA_HOME}/tomcat/bin/infaservice.sh shutdown"
        ;;
esac
 
#
exit
  • As root
sudo su -
mv /tmp/infainit /etc/init.d/infa
chgrp powercenter /etc/init.d/infa
chmod 750 /etc/init.d/infa
chown powercenter /etc/init.d/infa
  • Symlink that gives the level and when to start the script (K=shutdown and S=startup)
ln -s /etc/init.d/infa /etc/rc.d/rc0.d/K01infa
ln -s /etc/init.d/infa /etc/rc.d/rc3.d/S99infa
ln -s /etc/init.d/infa /etc/rc.d/rc5.d/S99infa

Then:

service infa start
service infa stop

Output example after startup:

Starting Informatica services on node 'NodeEic01'
Using CURRENT_DIR:     /home/powercenter/informatica/10.2/tomcat/bin
Using INFA_HOME:       /home/powercenter/informatica/10.2
Using System log directory :   /home/powercenter/informatica/10.2/logs/NodeEic01

5.2.3 - Log

  • Admin: ${INFA_HOME}/logs/NodeEic01/services/AdministratorConsole/_AdminConsole_jsf.log
  • MRS: ${INFA_HOME}/logs/NodeEic01/services/ModelRepositoryService
  • DIS ${INFA_HOME}/10.2/logs/NodeEic01/services/DataIntegrationService/DIS_EIC_01_jsf.log
  • CMS ${INFA_HOME}/logs/NodeEic01/services/ContentManagementService/CMS_EIC_01_jsf.log

5.2.4 - Configuring the Catalog Service for Azure HDInsight

(p123)

After you create the Catalog Service, configure the following custom properties in Informatica Administrator for the Catalog Service

  • LdmCustomOptions.deployment.azure.account.key: The key to authenticate the Catalog Service to connect to Azure storage account . The value of the Azure storage account key might be encrypted or non encrypted. You can retrieve the value from fs.azure.account.key.<storage account name> property in core-site.xml file present in the Azure HDInsight cluster. Location /etc/hadoop/
  • LdmCustomOptions.deployment.azure.key.decryption.script.path: If the key specified in the LdmCustomOptions.deployment.azure.account.key property is in encrypted format, you can use the decrypt shell script to decrypt the key using the key certificate. You must verify that you copy the decrypt shell script and key certificate file to the (same path as cluster machine) domain machine before enabling Catalog Service. Make sure that you maintain the path in the Azure HDInsight cluster machine for the copied files in the domain machine. The value for the property is the location of the decrypt shell script. For example, /usr/lib/hdinsight-common/scripts/decrypt.sh. The key certificate file, key_decryption_cert.prv, is present in the /usr/lib/hdinsight-common/certs/key_decryption_cert.prv directory of Azure HDInsight cluster.
  • LdmCustomOptions.deployment.hdfs.default.fs: Address of the WASB storage account to which the Catalog Service must connect. The address includes the WASB storage container name with the storage account name. The value for the property is the complete WASB address with the container and storage account names. You can retrieve the value for the property from the fs.defaultFS property in the core-site.xml file present in the Azure HDInsight cluster. Example: wasb://[email protected]

Example:

6 - Annexes

6.1 - Uninstall

  • Drop Informatica
sudo service infa stop
cd ~/informatica/10.2/Uninstaller_Server/
./uninstaller
  • Drop the database
DROP DATABASE EIC;

6.2 - install.bin

INSTALL_HOME/Server/install.bin -?
Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...
Usage: install [-f <path_to_installer_properties_file> | -options]
            (to execute the installer)

where options include:
    -?          show this help text
    -h          show this help text
    -help       show this help text
    --help      show this help text
    -i [swing | console | silent]
            specify the user interface mode for the installer
    -D<name>=<value>
            specify installer properties
    -r <path_to_generate_response_file>
            Generates response file.
JVM heap size options are only applicable to Installers
    -jvmxms <size>
            Specify JVM initial heap size.
    -jvmxmx <size>
            Specify JVM maximum heap size.
The options field may also include the following in case of uninstaller
if it is enabled for Maintenance Mode
    -add <feature_name_1> [<feature_name_2 ...]
            Add Specified Features
    -remove <feature_name_1> [<feature_name_2 ...]
            Remove Specified Features
    -repair
            Repair Installation
    -uninstall
            Uninstall

notes:
    1. the path to the installer properties file may be either absolute,
       or relative to the directory in which the installer resides.
    2. if an installer properties file is specified and exists, all other
       command line options will be ignored.
    3. if a properties file named either 'installer.properties' or
       <NameOfInstaller>.properties resides in the same directory as the
       installer, it will automatically be used, overriding all other command
       line options, unless the '-f' option is used to point to another valid
       properties file.
    4. if an installer properties file is specified but does not exist, the
       default properties file, if present, will be used.  Otherwise, any
       supplied command line options will be used, or if no additional
       options were specified, the installer will be run using the default
       settings.

6.3 - Test cluster access

curl --basic --user user:pwd https://clus-spark-01.azurehdinsight.net

7 - More

dit/powercenter/eic.txt · Last modified: 2018/02/22 21:46 by gerardnico