BDM - Installation in Azure

> Data Integration Tool (ETL/ELT) > Big Data Management Informatica (BDM)

1 - About

From Azure, two types of installation for BDM are possible:

This is a BDM installation on an empty VM.

Powercenter is installed in single node installation topology: a domain with one node. The node hosts the domain. The Service Manager and all Informatica application services run on the node.

A Big Data Management installation is just a Powercenter installation where the following application service are used:

3 - Note

4 - Steps

4.1 - VM

az.cmd vm create ^
    --resource-group myGroup ^
    --name INFA-BDM-01 ^
    --image RedHat:RHEL:7.3:latest ^
    --size Standard_DS11_v2 ^
    --authentication-type password ^
    --admin-username adm ^
    --admin-password pwd ^
    --location westeurope

4.2 - SQL Server

Made with: SQL Server - Installation on Azure via an image

Microsoft SQL Server 200 MB of disk space for the database. Allocate more space based on the amount of data and/or metadata you want to cache.

Create so much user as you want with the following template:

CREATE DATABASE HI_INFA;
ALTER DATABASE HI_INFA SET ALLOW_SNAPSHOT_ISOLATION ON;
ALTER DATABASE HI_INFA SET READ_COMMITTED_SNAPSHOT ON;
 
CREATE LOGIN hi_infa_adm WITH
		PASSWORD = 'pwd',
		DEFAULT_DATABASE = HI_INFA;
 
 
USE HI_INFA;
DROP USER hi_infa_adm;
DROP schema hi_infa_adm;
CREATE USER hi_infa_adm FOR LOGIN hi_infa_adm;
CREATE SCHEMA hi_infa_adm AUTHORIZATION hi_infa_adm;
GRANT CONNECT TO hi_infa_adm;
GRANT CREATE TABLE TO hi_infa_adm;
GRANT CREATE VIEW TO hi_infa_adm;
ALTER USER hi_infa_adm WITH DEFAULT_SCHEMA = hi_infa_adm;
Advertising

4.3 - As root

4.3.1 - Creation System User Account

sudo useradd powercenter
sudo passwd powercenter

4.3.2 - Software Preparation

mkdir tmp/infa
cd tmp/infa
wget https://infstorageuser.blob.core.windows.net/install/informatica_1020_server_linux-x64.tar
# as root, not as normal user,  access is need for utime ...
tar -xvf informatica_1020_server_linux-x64.tar
 
# write permission on the installation directory
chown -R powercenter:powercenter .

4.3.3 - Firewall

Firewalld

Add the port 6008 accessible from your public IP.

  • Azure Firewall
az.cmd network nsg rule create ^
    --resource-group myGroup ^
    --nsg-name INFA-BDM-01NSG ^
    --name allow-infa-admin-hi ^
    --protocol tcp ^
    --priority 1021 ^
    --destination-port-range 6008 ^
    --source-address-prefixes publicIP
az.cmd network nsg rule create ^
    --resource-group myGroup ^
    --nsg-name INFA-BDM-01NSG ^
    --name allow-infa-admin-hi ^
    --protocol tcp ^
    --priority 1022 ^
    --destination-port-range 6005 ^
    --source-address-prefixes publicIP
  • As root, Red Hat Firewall
sudo firewall-cmd --zone=public --add-port=6008/tcp --permanent
sudo firewall-cmd --zone=public --add-port=6005-6009/tcp --permanent
sudo firewall-cmd --zone=public --add-port=6014-6114/tcp --permanent
sudo firewall-cmd --reload

4.3.4 - Resource Limit

Set the open file descriptor limit per process to 16,000 or higher. The recommended limit is 32,000 file descriptors per process.

as root:

vi  /etc/security/limits.conf
powercenter    hard   nofile    32000
powercenter    soft   nofile    3000
Advertising

4.4 - As powercenter

4.4.1 - Environment Variable

Normally, nothing to do. Just for info,

  • IATEMPDIR: Location of the temporary files created during installation. Informatica requires 1 GB disk space for temporary files. default (/tmp) directory.
  • LANG and LC_ALL: The character encoding determines the types of characters that appear in the UNIX terminal. (Latin1 or ISO-8859-1 for French, EUC-JP or Shift JIS for Japanese, or UTF-8 for Chinese or Korean)
echo $LANG
# locale
en_US.UTF-8
  • Unset the following variable
unset JRE_HOME # Infa bring its own JRE
unset DISPLAY # Installation is without X11

4.4.2 - ODBC

For relational database, such as SQL Server, the metadata are imported with JDBC but the exectuion is done with ODBC (???)

Unix ODBC must be the configured.

~.bash_profile
export INFA_HOME=/home/powercenter
export ODBCHOME=${INFA_HOME}/ODBC7.1
export ODBCINI=${ODBCHOME}/odbc.ini
export ODBCINST=${ODBCHOME}/odbcinst.ini 
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${INFA_HOME}/server/bin:$ODBCHOME/lib;

Restart the server.

sudo service infa stop
sudo service infa start

Verify that in ODBC.ini:

InstallDir=/home/powercenter/ODBC7.1

After the installation, you can add all the relational database that you want in the connection page after having added a DSN in the ODBC.ini file. Example with Microsoft SQL Server

Advertising

4.4.3 - Precheck

# Remove the tar file to get space
rm *.tar
# Start the check
./install.sh
OS detected is Linux
unjar task is in progress.............
unjar of ESD completed.....

\***************************************************************************
\* Welcome to the Informatica 10.2.0 Server Installer.  *
\***************************************************************************

Before you continue, read the following documents:
* Informatica 10.2.0 Installation Guide, Informatica Release Guide and Informatica Release Notes.
* B2B Data Transformation 10.2.0 Installation, Configuration Guide and Release Notes.

You can find the 10.2.0 documentation in the Product Documentation section at https://network.informatica.com/.

Configure the LANG and LC_ALL variables to generate the appropriate code pages and create and connect to repositories and Repository Services.
Do you want to continue ? Yes

Then the answer only:

  • Yes
  • 1Install Informatica.
  • 1. Run the Pre-Installation (i10PreInstallChecker) System Check Tool
  • Step 2 of 4: System information
    • Enter for the default installation dir /home/powercenter
    • Enter for the default starting port number 6005
  • Step 3 of 4: Database and JDBC Connection Information
    • 2 for SQLServer
    • infa-adm Database User Id
    • pwd Database User Password
    • No (Default) Schema Name
    • No Secure Database (Default)
    • 1: JDBC URL
    • msft-db-01:1433 for database address
    • INFA_DOM for database name
    • 1 configure JDBC Parameters
    • true: Snapshot Serializable

Result:

Informatica Pre-Installation (i10PreInstallChecker) System Check Tool Results
[Pass] Disk Space: Available disk space is 14,242 MB. Sufficient for the Informatica installation.
[Pass] Processors: Available number of processors is 2. Sufficient for the Informatica installation.
[Pass] Physical Memory: Available physical memory is 14,352 MB. Sufficient for the Informatica installation.
[Pass] Temporary Space: Available temporary disk space is 14,242 MB. Sufficient for the Informatica installation.
[Pass] Ports: Port range is 6,005 - 6,009. All port numbers within the port range are available for the Informatica installation.
[Pass] Locale Environment Variable: The LANG environment variable is set to language en_US.UTF-8. The LC_ALL environment variable is set to language null. Sufficient for the Informatica installation.
[Pass] JRE_HOME Environment Variable: The JRE_HOME environment variable does not contain a value. Sufficient for the Informatica installation.
[Pass] File Descriptor Limits: The file descriptor limits per process is 32000. Sufficient for the Informatica installation.
[Pass] CREATE TABLE Privilege: The database user account has the CREATE TABLE privilege. The installer successfully created a database table.
[Pass] INSERT RECORD Privilage: The installer successfully inserted a record into database table.
[Pass] DELETE RECORD Privilage: The installer successfully deleted a record from database table.
[Pass] CREATE VIEW Privilege: The database user account has the CREATE VIEW privilege. The installer successfully created a database view.
[Pass] DROP VIEW Privilege: The database user account has the DROP VIEW privilege. The installer successfully dropped a database view.
[Pass] DROP TABLE Privilege: The database user account has the DROP TABLE privilege. The installer successfully dropped a database table.
[Pass] SQL Server READ COMMITTED Isolation Level: The SQL Server READ COMMITTED isolation level for the database is set to ON. Sufficient for the Informatica installation.
[Pass] SQL Server Case Sensitivity: The SQL Server instance is not case-sensitive. Sufficient for the Informatica installation.
[Information] Informatica Installation Directory: /home/powercenter
[Information] Informatica Starting Port Number: 6005
[Information] Database Type: SQLServer
[Information] Database User ID: hi_infa_adm
[Information] Database Host Name: hi-msft-db-01
[Information] Database Port Number: 1433
[Information] Database Service Name: HI_INFA
[Information] Operating System: Operating system is Linux. Operating system version is 3.10.0-514.28.1.el7.x86_64.
[Information] RAM: The memory module size is 14,352 MB.
[Information] Virtual Memory: Virtual Memory is set to unlimited.
  • Press enter and select 2 to continue the installation.
1. Run the Informatica Kerberos SPN Format Generator
2. Run the Informatica services installation
Select the option to proceed : (Default : 2)

4.4.4 - Installation

  • Step 1 of 9: Welcome Agree 2
  • Step 1A of 9: Install Informatica Services 1 (2 is for Informatica Enterprise Information Catalog)
  • Step 1A of 9: Enable Kerberos: 1 No
  • Step 2 of 9: Installation prerequisites. Ok
  • Step 3 of 9: License and Installation directory:
    • Licence: /home/powercenter/license.key
    • Install Dir: /home/powercenter/
  • Step 4 of 9: Pre-Installation Summary
Product Name             :      Informatica 10.2.0
Installation Type        :      New Installation
Installation Directory   :      /home/powercenter
Disk Space Requirements:
Required Disk Space      :      11,563 MB
Available Disk Space     :      13,764 MB
  • Step 5 of 9: Installing
...........
Installing... 75%
Installing... 80%
Installing... 85%
Installing... 90%
Installing... 95%
Installing... 100%
  • Step 5A of 9: Domain Selection: 1 Create a new domain, 1, No secure domain, 2 Disable HTTPS, 1 Disable SAML
  • Step 5B of 9: Domain Configuration:
    • 2 Database Type, SQL Server
    • hi_infa_dom user Id
    • pwd password
    • 2 No Schema Name
    • 1 Jdbc Url
    • Database Adress: msft-db-01:1433
    • Database Service Name: infa_dom
  • Step 6 of 9: Domain Security: Encryption Key
    • Keyword: Example12 - The keyword must be between 8 and 20 characters long. It must include at least one upper case letter, one lower case letter, one number, and no spaces.
    • Encryption key directory: (default :- /home/powercenter/isp/config/keys)
Information !!! The encryption key will be generated in the /home/powercenter/isp/config/keys
with the file name siteKey. Save the name of the domain, the keyword for the encryption key,
and the encryption key file in a secure location. 

You need to specify the domain name, keyword, and encryption key when you change the encryption key for the domain or move a repository to another domain.
  • Step 6 of 9: Domain and Node Configuration
Domain name: (default :- Domain) :
Node host name: (default :- HI-INFA-BDM-01) :
Node name: (default :- node01) :
Node port number: (default :- 6005) :
Domain user name: (default :- Administrator) :
Domain password: (default :- ) :
Confirm password: (default :- ) :
  • Advanced Port config: No
  • Config MRS and DIS service: Yes
Executing the Command...
--
Defining the domain...
-
Registering the plugins...
-
Starting the service...
-
Pinging the domain...
...........................
...........................
Pinging the domain...
-
Pinging the Administrator service...
...........................
...........................
Pinging the Administrator service...
  • Step 7A of 9: Configure the Model Repository database.
Database type:2->SQLServer
Database user ID: (default :- admin) :hi_infa_mrs
User password: :
Specify Schema Name:  1->No
Secure database:  2->No
Database address: (default :- host_name:port_no) :hi-msft-db-01:1433
Database service name: (default :- DatabaseName) :HI_INFA_MRS
JDBC parameters (default :- SnapshotSerializable=true) :
  • Step 7B of 9: Application Service Parameters
Model Repository Service name: (default :- Model_Repository_Service) :
Data Integration Service name: (default :- Data_Integration_Service) :
HTTP protocol type: 1->http
HTTP Port: (default :- 8095) 

Creating the Model Repository Service...
-
Updating the Model Repository Service...
-
Enabling the Model Repository Service...
-
Creating the Model Repository Contents...
-
Updating the Model Repository Service...
-
Creating the Data Integration Service...
-
Updating the Data Integration Service...
-
Enabling the Data Integration Service...
  • Step 9 of 9: Post-Installation Summary
Installation Status: SUCCESS

The Informatica 10.2.0 installation is complete.

The system services are disabled by default after the installation is complete.
You must configure the services and then enable them in the Informatica Administrator tool.

For more information, see the debug log file:
/home/powercenter/Informatica_10.2.0_Services_2018_01_18_14_30_51.log

Installation Type :New Installation

Informatica Administrator Home Page:
http://HI-INFA-BDM-01:6008

Product Name:  Informatica 10.2.0

4.5 - Test

Go to the administrator console http://HI-INFA-BDM-01:6008 and enter the infa administrator credentials

5 - Admin

The next section are giving administration information.

5.1 - Environment variable

In the .bashrc of the powercenter user:

export INFA_HOME=/home/powercenter
export PATH=$PATH:/home/powercenter/tomcat/bin
export JAVA_HOME=${INFA_HOME}/java/jre

5.2 - Version

  • Tomcat
version.sh
# /home/powercenter/tomcat/bin/version.sh
Using CATALINA_BASE:   /home/powercenter/tomcat
Using CATALINA_HOME:   /home/powercenter/tomcat
Using CATALINA_TMPDIR: /home/powercenter/tomcat/temp
Using JRE_HOME:        /home/powercenter/java/jre
Using CLASSPATH:       /home/powercenter/tomcat/bin/bootstrap.jar:/home/powercenter/tomcat/bin/tomcat-juli.jar
Server version:
Server built:   Mar 28 2017 16:01:48 UTC
Server number:  7.0.77.0
OS Name:        Linux
OS Version:     3.10.0-514.28.1.el7.x86_64
Architecture:   amd64
JVM Version:    1.8.0_131-b11
JVM Vendor:     Oracle Corporation

5.3 - Start and Stop

  • Upload the below file to /tmp
infainit
#!/bin/sh
# chkconfig: 345 99 10
# description: Informatica auto start-stop script for the init system
 
INFA_OWNER=powercenter
export INFA_HOME=/home/powercenter
 
case "$1" in
    'start')
        # Start Informatica
        su - $INFA_OWNER -c "${INFA_HOME}/tomcat/bin/infaservice.sh startup"
        ;;
    'stop')
        # Stop Informatica
        su - $INFA_OWNER -c "${INFA_HOME}/tomcat/bin/infaservice.sh shutdown"
        ;;
esac
 
#
exit
  • As root
sudo su -
mv /tmp/infainit /etc/init.d/infa
chgrp powercenter /etc/init.d/infa
chmod 750 /etc/init.d/infa
chown powercenter /etc/init.d/infa
  • Symlink that gives the level and when to start the script (K=shutdown and S=startup)
ln -s /etc/init.d/infa /etc/rc.d/rc0.d/K01infa
ln -s /etc/init.d/infa /etc/rc.d/rc3.d/S99infa
ln -s /etc/init.d/infa /etc/rc.d/rc5.d/S99infa

Then:

service infa start
service infa stop

Output example after startup:

Starting Informatica services on node 'node01'
Using CURRENT_DIR:     /home/powercenter/tomcat/bin
Using INFA_HOME:       /home/powercenter
Using System log directory :   /home/powercenter/logs/node01

The INFA_HOME environment variable must be in the profile of the powercenter installation user.

5.4 - Sudo

Add the powercenter to the /etc/sudoers file. See SH - Sudo (Switch User and do)

powercenter  ALL=(ALL) NOPASSWD: /sbin/service infa start
powercenter  ALL=(ALL) NOPASSWD: /sbin/service infa stop

5.5 - Log

  • /home/powercenter/logs
  • /home/powercenter/logs/node01/services/DataIntegrationService
  • /home/powercenter/logs/node01/services/ModelRepositoryService
  • /home/powercenter/isp/logs

Adminconsole:

  • /home/powercenter/logs/node01/services/AdministratorConsole
  • /home/powercenter/logs/node01/services/AdministratorConsole/_AdminConsole_jsf.log

5.6 - Configuration

  • Node configuration metadata from configuration file [/home/powercenter/isp/config/nodemeta.xml].

5.7 - Privileges

For a developper, the minimum privileges are:

  • AccessDeveloper Privilege to access the repository from Developer
  • Create/Edit and Delete Project (MRS_CREATE_PROJECT, …)

6 - Azure Cluster Configuration

6.1 - Connections

A Wasb cluster is just an compatible HDFS file system. See HDFS compatible file system

When you provision Big Data Management, you must use the virtual network or vnet where the current instance of HDInsight is configured in order to have all node on your network.

To add a connection to your cluster, go to the Adminsitration console > Manage > Connections.

  • Open the node Domain > ClusterConfigurations
  • Right Click > Import
  • With Cluster Connection (if with file convention: <connection type>_<cluster configuration name>, such as Hive_ccMapR.)

Information from the cluster (needed for during the configuration of the Big Data Management instance)

  • HDInsight Cluster Hostname: Name of the HDInsight cluster where you want to create the Informatica domain.
  • HDInsight Cluster Login Username: User login for the cluster. This is usually the same login you use to log in to the Ambari cluster management tool.
  • Password: Password for the HDInsight cluster user.
  • HDInsight Cluster SSH Hostname: SSH name for the cluster.
  • HDInsight Cluster SSH Username: Account name you use to log in to the cluster head node.
  • Password: Password to access the cluster SSH host.
  • Ambari port: Port to access the Ambari cluster management web page. Default is 443.

When you perform the import, the cluster configuration wizard create:

  • Hadoop,
  • HBase,
  • HDFS,
  • and Hive connections

to access the Hadoop environment (if you choose it).

  • In the HDFS storage location, create the following directories on the cluster and set permissions to 777:
    • Blazeworkingdir
    • SPARK_HDFS_STAGING_DIR
    • SPARK_EVENTLOG_DIR

6.2 - Decryption

To be able to decrypt the storage azure key in core-site.xml, you need to pass the decryption key. The decryption key is send inside the application to the node, you need to do this operation each time the cluster is (re)created

As root

mkdir /usr/lib/hdinsight-common
mkdir /usr/lib/hdinsight-common/scripts/
mkdir /usr/lib/hdinsight-common/certs/

Copy the below file from the cluster to the BDM and gives to the user powercenter the ownership.

/usr/lib/hdinsight-common/scripts/decrypt.sh
/usr/lib/hdinsight-common/certs/key_decryption_cert.prv

Example from the tmp directory:

cp /tmp/decrypt.sh /usr/lib/hdinsight-common/scripts/
cp /tmp/key_decryption_cert.prv /usr/lib/hdinsight-common/certs/
chown -R powercenter:powercenter /usr/lib/hdinsight-common
chmod +x  /usr/lib/hdinsight-common/scripts/decrypt.sh

6.3 - Host file

  • Add an entry for the headnodehost in the /etc/hosts file

Example:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
 
 
10.0.0.16 headnodehost hn0-HI-CLU hn0-hi-clu.3qy321ubaea5iyw5joamf.ax.internal.cloudapp.net

6.4 - Hadoop Distribution

In the DIS configuration, be sure to have the good path in the property “Data Integration Service Hadoop Distribution Directory” It must be set to “HDInsight_3.6”.

6.5 - Node ODBC

To be able to get or push data from ODBC on each node:

  • ODBC must be installed
  • and the informatica environment must be set

6.5.1 - Node ODBC installation

  • Connect to each node (or create a bash file) with the hdinsight sshuser and performs the following as root (or with sudo)
cd /tmp
wget http://www.unixodbc.org/unixODBC-2.3.6.tar.gz
tar xvf unixODBC-2.3.6.tar.gz
cd unixODBC-2.3.6/
. ./configure
make
make install
export LD_LIBRARY_PATH=/usr/local/lib/

6.5.2 - Node ODBC environment variable

The file hadoopEnv.properties must have ODBC environment configuration parameters to be able to create ODBC mapping from or to a Hive Mapping. See ERROR: "RR_4036 Check the ODBCINI environment variable

The file is located on the BDM machine at /home/powercenter/services/shared/hadoop/HDInsight_3.6/infaConf/hadoopEnv.properties

hadoopEnv.properties
infapdo.env.entry.odbcini=ODBCINI=$HADOOP_NODE_INFA_HOME/ODBC7.1/odbc.ini
infapdo.env.entry.odbcini=ODBCHOME=$HADOOP_NODE_INFA_HOME/ODBC7.1

6.6 - Operations after each recreation of the cluster

  • Update the cluster information with the BDM admin console. Connection > Actions > Refresh Cluster Configuration
  • Add or update the entry for the headnodehost in the /etc/hosts file
  • Copy the private key from the cluster to BDM in the same directory location
/usr/lib/hdinsight-common/certs/key_decryption_cert.prv

7 - Support

7.1 - Error establishing socket to host and port: hostname.database.windows.net:1433. Reason: DISABLED (No such file or directory)

You can get this error when running a silent installation.

Test Connection Exception -java.sql.SQLNonTransientConnectionException: [informatica][SQLServer JDBC Driver]Error establishing socket to host and port: hostname.database.windows.net:1433. Reason: DISABLED (No such file or directory)

Possible resolution:

  • You are trying to install a secure domain but the property TRUSTSTORE_DB_FILE in the SilentInputProperties is not set.
SilentInputProperties.properties
# The TRUSTSTORE_DB_FILE indicates the path and file name of the truststore file for
# the secure domain configuration repository database. If the domain that you create
# or join uses a secure domain configuration repository, set this property to the
# truststore file of the repository database.
 
 
TRUSTSTORE_DB_FILE=/path/to/myCacerts
 
 
# TRUSTSTORE_DB_PASSWD to the password for the truststore file for the secure domain
# configuration repository database.
 
 
TRUSTSTORE_DB_PASSWD=changeit

8 - Documentation / Reference