overview

The process of installation and using of the cbaWorkflow package is take through the following stages:

  1. installation of the RPM using rpm or yum
  2. editing configuration files ( cbaWorkflow.ini for C++ based programs and connection.ini for the tomcat servlets)
  3. setting the users environment by including the executables in the PATH and shared libraries in the LD_LIBRARY_PATH. CBAWF_HOME should point to the installation directory (/app/cbaWorkflow) and optionally CBAWFINI should point to configuration file(defaults to ${CBAWF_HOME}/etc/cbaWorkflow.ini)
  4. creating a repository for the workflow to operate and gather statistics on the node runs
  5. deploying the tomcat interface using the war file


Installation of RPM

  1. login as root
  2. if the repository for yum is set up then follow the instructions in packaging cbaWorkflow and execute:  yum install cbaWorkflow
  3. If this is not possible obtain the rpm , currently cbaWorkflow-1.0.0-1.x86_64.rpm from Nexus or ftp it   :
    wget http://192.168.2.22:8081/repository/cbayum-hosted/cbaWorkflow/1.0.0/1/cbaWorkflow-1.0.0-1.x86_64.rpm
  4. in the directory where the rpm is located issue the following command: yum install local cbaWorkflow-1.0.0-1.x86_64.rpm

    Installing:
    cbaWorkflow x86_64 1.0.0-1 /cbaWorkflow-1.0.0-1.x86_64 8.9 M

    Transaction Summary
    ==============================================================================

    ============================================================================
    Install 1 Package

    Total size: 8.9 M
    Installed size: 8.9 M
    Is this ok [y/d/N]: y
    Downloading packages:
    Running transaction check
    Running transaction test
    Transaction test succeeded
    Running transaction
    Installing : cbaWorkflow-1.0.0-1.x86_64 1/1
    Verifying : cbaWorkflow-1.0.0-1.x86_64 1/1

    Installed:
    cbaWorkflow.x86_64 0:1.0.0-1

    Complete!


          Verify the installation:

         ls -l  /app/cbaWorkflow

        rpm -qi cbaWorkflow
Name : cbaWorkflow
Version : 1.0.0
Release : 1
Architecture: x86_64
Install Date: Sat 27 Mar 2021 09:38:29 PM EDT
Group : Applications/File
Size : 9320360
License : available for purchase, free license up to 5 nodes
Signature : (none)
Source RPM : cbaWorkflow-1.0.0-1.src.rpm
Build Date : Sat 27 Mar 2021 04:29:19 PM EDT
Build Host : r02edge.custom-built-apps.com
Relocations : (not relocatable)
Packager : Boris Alexandrov, Custom Built Apps ltd
Vendor : Custom Built Apps ltd, Toronto, Ontario
URL : www.custom-built-apps.com
Summary : workflow application for data pipelines
Description :
The application runs a data pipeline of execution nodes like spark-sql, spark-submit, shell, hdfs commands in orchestration and cadence.
Contains a tomcat based interface for adding the nodes and monitoring the process

check the files list:

rpm -ql cbaWorkflow
/app/cbaWorkflow/bin/cbaWFMonitorTest
/app/cbaWorkflow/bin/cbaWorkflowTest
/app/cbaWorkflow/bin/repobuilder
/app/cbaWorkflow/bin/shmcreate
/app/cbaWorkflow/bin/shmupdate
/app/cbaWorkflow/bin/shmview
/app/cbaWorkflow/etc/cbaWorkflow.ini
/app/cbaWorkflow/etc/config6
/app/cbaWorkflow/etc/env.sh
/app/cbaWorkflow/etc/run.sh
/app/cbaWorkflow/lib/libcbaEndNode.so
/app/cbaWorkflow/lib/libcbaGraph.so
/app/cbaWorkflow/lib/libcbaHdfsExecNode.so
/app/cbaWorkflow/lib/libcbaNodeWalker.so
/app/cbaWorkflow/lib/libcbaShellNode.so
/app/cbaWorkflow/lib/libcbaSparkSQLNode.so
/app/cbaWorkflow/lib/libcbaSparkSubmitNode.so
/app/cbaWorkflow/lib/libcbaStartNode.so
/app/cbaWorkflow/lib/libcbaUtils.so
/app/cbaWorkflow/lib/libcbaWFRepositoryConnector.so
/app/cbaWorkflow/lib/libcbaWFRepositoryConnectorPG.so
/app/cbaWorkflow/lib/libcbaWFSharedMemory.so
/app/cbaWorkflow/lib/libcbaWorkflowNode.so
/app/cbaWorkflow/wars/cbaWorkflow.war


Configuring the cbaWorflow.ini file


the file is located in the /app/cbaWorkflow/etc

-- FILE CONTENTS

dbname=cbaworkflowdb
user=dataexplorer1
password=2MuchTime

hostaddr=192.168.2.30
port=5432
polling_time=15
nodes_log_dir=/home/dataexplorer1/logs
connector_lib=libcbaWFRepositoryConnectorPG.so

--- END OF FILE

repository connector libcbaWFRepositoryConnectorPG.so is used to connect to PostgreSQL server

set database to the database where a repository is going to be created

update the parameters as needed

login as the user who is going to run the pipeline

copy the cbaWorkflow.ini into your home directory as a hidden file

cp /app/cbaWorkflow/etc/cbaWorkflow.ini ${HOME}/.cbaWorkflow.ini

create the directory for logs:

mkdir /home/dataexplorer1/logs

Setting the environment 

login as the user which is going to be running datapipeline.

copy the /app/cbaWorkflow/etc/env.sh to your directory

cp /app/cbaWorkflow/etc/env.sh $HOME

chmod 755 env.sh

vi env.sh

verify the variables match your environment.

add the path to libstdc++.so.6x to LD_LIBRARY_PATH as the first entry, e.g if  /usr/local/lib64/libstdc++.so

then edit the LD_LIBRARY_PATH as follows: 

export LD_LIBRARY_PATH=/usr/local/lib64:${CBAWF_HOME}/lib:${LD_LIBRARY_PATH}

source the env.sh: 

. env.sh

verify the environment:

[dataexplorer1@r01edge ~]$ env | grep CBA
CBAWFINI=/home/dataexplorer1/.cbaWorkflow.ini
CBAWF_HOME=/app/cbaWorkflow

env | grep PATH
LD_LIBRARY_PATH=/usr/local/lib64:/app/cbaWorkflow/lib:/app/cbaWorkflow/lib:/app/hadoop/lib/native:
PATH=/app/cbaWorkflow/bin:/app/cbaWorkflow/bin:/app/spark/bin:/app/spark/sbin:/app/oozie/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/app/hadoop/sbin:/app/hadoop/bin}:/app/hadoop/sbin:/app/hadoop/bin:/home/hdfs/hadoop/sbin:/home/hdfs/hadoop/bin:/app/hive/bin:/app/spark/bin:/home/dataexplorer1/.local/bin:/home/dataexplorer1/bin

which repobuilder
/app/cbaWorkflow/bin/repobuilder

verify there are no unresolved symbols in repobuilder

[dataexplorer1@r01edge ~]$ ldd -r $(which repobuilder)
linux-vdso.so.1 => (0x00007ffca33a8000)
librt.so.1 => /lib64/librt.so.1 (0x00007fc2a7f92000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc2a7d76000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc2a7b72000)
libcbaWFRepositoryConnector.so => /app/cbaWorkflow/lib/libcbaWFRepositoryConnector.so (0x00007fc2a796f000)
libcbaUtils.so => /app/cbaWorkflow/lib/libcbaUtils.so (0x00007fc2a772b000)
libstdc++.so.6 => /usr/local/lib64/libstdc++.so.6 (0x00007fc2a73b0000)
libm.so.6 => /lib64/libm.so.6 (0x00007fc2a70ae000)
libgcc_s.so.1 => /usr/local/lib64/libgcc_s.so.1 (0x00007fc2a6e97000)
libc.so.6 => /lib64/libc.so.6 (0x00007fc2a6ac9000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc2a819a000)

Creating the repository

verify the values in cbaWorkflow.ini are correct for the database, which is going to be a repository.

the user in the connection parameters needs to be able to create tables in the database

cat $CBAWFINI

edit if needed

run the generate repository program repobuilder 

repobuilder

Generating a new cbaWorkflow repository

Verify the tables have been created in the repository database:

Verify the node_types table has been populated:

Deploying the war file and setting the permissions in Tomcat

Deploy the war file using the tomcat hostmanager application. 

http://r01edge.custom-built-apps.com:1962/manager/html

enter the userid and password for the manager.

Deploy the application from /app/cbaWorkflow/wars/cbaWorkflow.war (browse to it if host running Tomcat is the same, otherwise copy it)



check the message and the path of the application:

log in as user who can deploy applications into the tomcat, most probably tomcat

su - tomcat

browse to the tomcat installation folder, webapps directory

cd /app/apache-tomcat-9.0.34/webapps

go to cbaWorkflow folder, WEB-INF, cgi

cd /app/apache-tomcat-9.0.34/webapps/cbaWorkflow/WEB-INF/cgi

change permissions to be able to execute the file cadence

chmod 755 cadence

edit connection parameters for the servlets connection

cd /app/apache-tomcat-9.0.34/webapps/cbaWorkflow/WEB-INF/classes

vi connection.ini

#connection information is kept here

url=jdbc:postgresql://192.168.2.30:5432/dataexplorerdb1
user=dataexplorer1
password=2MuchTime


Browse to the main page of the application in the web browser:

http://r01edge.custom-built-apps.com:1962/cbaWorkflow/

you should see this:


nodes list and add a node dependency will be empty.

Add a node will show a form you will be using to be adding the nodes.

show execution graph will show start and end nodes:

create directories for run and logs

create a directory for the workflow run. the scripts, jars, executables for the workflow will be residing here. Starting the workflow should be in this directory. Copy ${CBAWF_HOME}/etc/run.sh to this directory.

mkdir $HOME/run

cp  ${CBAWF_HOME}/etc/run.sh  ${HOME}/run

verify the variables in run.sh and update if needed, e.g JAVA_HOME is probably wrong

mkdir $HOME/logs


create the scripts and jars in the ${HOME}/run directory


Add nodes and dependencies in the web interface

start adding nodes in add a node link:


Nodes are looking like that now:


Add nodes dependency:


setup a cron job

the etc directory contains a run.sh file which has been copied to run directory

inspect it for variables match to the current environtment

make it executable


make a crontab entry:

crontab -e 

14 14 * * * sh /home/dataexplorer1/run/run.sh