overview
The process of installation and using of the cbaWorkflow package is take through the following stages:
- installation of the RPM using rpm or yum
- editing configuration files ( cbaWorkflow.ini for C++ based programs and connection.ini for the tomcat servlets)
- setting the users environment by including the executables in the PATH and shared libraries in the LD_LIBRARY_PATH. CBAWF_HOME should point to the installation directory (/app/cbaWorkflow) and optionally CBAWFINI should point to configuration file(defaults to ${CBAWF_HOME}/etc/cbaWorkflow.ini)
- creating a repository for the workflow to operate and gather statistics on the node runs
- deploying the tomcat interface using the war file
Installation of RPM
- login as root
- if the repository for yum is set up then follow the instructions in packaging cbaWorkflow and execute: yum install cbaWorkflow
- If this is not possible obtain the rpm , currently cbaWorkflow-1.0.0-1.x86_64.rpm from Nexus or ftp it :
wget http://192.168.2.22:8081/repository/cbayum-hosted/cbaWorkflow/1.0.0/1/cbaWorkflow-1.0.0-1.x86_64.rpm - in the directory where the rpm is located issue the following command: yum install local cbaWorkflow-1.0.0-1.x86_64.rpm
Installing:
cbaWorkflow x86_64 1.0.0-1 /cbaWorkflow-1.0.0-1.x86_64 8.9 MTransaction Summary
==========================================================================================================================================================
Install 1 PackageTotal size: 8.9 M
Installed size: 8.9 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : cbaWorkflow-1.0.0-1.x86_64 1/1
Verifying : cbaWorkflow-1.0.0-1.x86_64 1/1Installed:
cbaWorkflow.x86_64 0:1.0.0-1Complete!
Verify the installation:
ls -l /app/cbaWorkflow
rpm -qi cbaWorkflow
Name : cbaWorkflow
Version : 1.0.0
Release : 1
Architecture: x86_64
Install Date: Sat 27 Mar 2021 09:38:29 PM EDT
Group : Applications/File
Size : 9320360
License : available for purchase, free license up to 5 nodes
Signature : (none)
Source RPM : cbaWorkflow-1.0.0-1.src.rpm
Build Date : Sat 27 Mar 2021 04:29:19 PM EDT
Build Host : r02edge.custom-built-apps.com
Relocations : (not relocatable)
Packager : Boris Alexandrov, Custom Built Apps ltd
Vendor : Custom Built Apps ltd, Toronto, Ontario
URL : www.custom-built-apps.com
Summary : workflow application for data pipelines
Description :
The application runs a data pipeline of execution nodes like spark-sql, spark-submit, shell, hdfs commands in orchestration and cadence.
Contains a tomcat based interface for adding the nodes and monitoring the process
check the files list:
rpm -ql cbaWorkflow
/app/cbaWorkflow/bin/cbaWFMonitorTest
/app/cbaWorkflow/bin/cbaWorkflowTest
/app/cbaWorkflow/bin/repobuilder
/app/cbaWorkflow/bin/shmcreate
/app/cbaWorkflow/bin/shmupdate
/app/cbaWorkflow/bin/shmview
/app/cbaWorkflow/etc/cbaWorkflow.ini
/app/cbaWorkflow/etc/config6
/app/cbaWorkflow/etc/env.sh
/app/cbaWorkflow/etc/run.sh
/app/cbaWorkflow/lib/libcbaEndNode.so
/app/cbaWorkflow/lib/libcbaGraph.so
/app/cbaWorkflow/lib/libcbaHdfsExecNode.so
/app/cbaWorkflow/lib/libcbaNodeWalker.so
/app/cbaWorkflow/lib/libcbaShellNode.so
/app/cbaWorkflow/lib/libcbaSparkSQLNode.so
/app/cbaWorkflow/lib/libcbaSparkSubmitNode.so
/app/cbaWorkflow/lib/libcbaStartNode.so
/app/cbaWorkflow/lib/libcbaUtils.so
/app/cbaWorkflow/lib/libcbaWFRepositoryConnector.so
/app/cbaWorkflow/lib/libcbaWFRepositoryConnectorPG.so
/app/cbaWorkflow/lib/libcbaWFSharedMemory.so
/app/cbaWorkflow/lib/libcbaWorkflowNode.so
/app/cbaWorkflow/wars/cbaWorkflow.war
Configuring the cbaWorflow.ini file
the file is located in the /app/cbaWorkflow/etc
-- FILE CONTENTS
dbname=cbaworkflowdb
user=dataexplorer1
password=2MuchTime
hostaddr=192.168.2.30
port=5432
polling_time=15
nodes_log_dir=/home/dataexplorer1/logs
connector_lib=libcbaWFRepositoryConnectorPG.so
--- END OF FILE
repository connector libcbaWFRepositoryConnectorPG.so is used to connect to PostgreSQL server
set database to the database where a repository is going to be created
update the parameters as needed
login as the user who is going to run the pipeline
copy the cbaWorkflow.ini into your home directory as a hidden file
cp /app/cbaWorkflow/etc/cbaWorkflow.ini ${HOME}/.cbaWorkflow.ini
create the directory for logs:
mkdir /home/dataexplorer1/logs
Setting the environment
login as the user which is going to be running datapipeline.
copy the /app/cbaWorkflow/etc/env.sh to your directory
cp /app/cbaWorkflow/etc/env.sh $HOME
chmod 755 env.sh
vi env.sh
verify the variables match your environment.
add the path to libstdc++.so.6x to LD_LIBRARY_PATH as the first entry, e.g if /usr/local/lib64/libstdc++.so
then edit the LD_LIBRARY_PATH as follows:
export LD_LIBRARY_PATH=/usr/local/lib64:${CBAWF_HOME}/lib:${LD_LIBRARY_PATH}
source the env.sh:
. env.sh
verify the environment:
[dataexplorer1@r01edge ~]$ env | grep CBA
CBAWFINI=/home/dataexplorer1/.cbaWorkflow.ini
CBAWF_HOME=/app/cbaWorkflow
env | grep PATH
LD_LIBRARY_PATH=/usr/local/lib64:/app/cbaWorkflow/lib:/app/cbaWorkflow/lib:/app/hadoop/lib/native:
PATH=/app/cbaWorkflow/bin:/app/cbaWorkflow/bin:/app/spark/bin:/app/spark/sbin:/app/oozie/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/app/hadoop/sbin:/app/hadoop/bin}:/app/hadoop/sbin:/app/hadoop/bin:/home/hdfs/hadoop/sbin:/home/hdfs/hadoop/bin:/app/hive/bin:/app/spark/bin:/home/dataexplorer1/.local/bin:/home/dataexplorer1/bin
which repobuilder
/app/cbaWorkflow/bin/repobuilder
verify there are no unresolved symbols in repobuilder
[dataexplorer1@r01edge ~]$ ldd -r $(which repobuilder)
linux-vdso.so.1 => (0x00007ffca33a8000)
librt.so.1 => /lib64/librt.so.1 (0x00007fc2a7f92000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc2a7d76000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc2a7b72000)
libcbaWFRepositoryConnector.so => /app/cbaWorkflow/lib/libcbaWFRepositoryConnector.so (0x00007fc2a796f000)
libcbaUtils.so => /app/cbaWorkflow/lib/libcbaUtils.so (0x00007fc2a772b000)
libstdc++.so.6 => /usr/local/lib64/libstdc++.so.6 (0x00007fc2a73b0000)
libm.so.6 => /lib64/libm.so.6 (0x00007fc2a70ae000)
libgcc_s.so.1 => /usr/local/lib64/libgcc_s.so.1 (0x00007fc2a6e97000)
libc.so.6 => /lib64/libc.so.6 (0x00007fc2a6ac9000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc2a819a000)
Creating the repository
verify the values in cbaWorkflow.ini are correct for the database, which is going to be a repository.
the user in the connection parameters needs to be able to create tables in the database
cat $CBAWFINI
edit if needed
run the generate repository program repobuilder
repobuilder
Generating a new cbaWorkflow repository
Verify the tables have been created in the repository database:
Verify the node_types table has been populated:
Deploying the war file and setting the permissions in Tomcat
Deploy the war file using the tomcat hostmanager application.
http://r01edge.custom-built-apps.com:1962/manager/html
enter the userid and password for the manager.
Deploy the application from /app/cbaWorkflow/wars/cbaWorkflow.war (browse to it if host running Tomcat is the same, otherwise copy it)
check the message and the path of the application:
log in as user who can deploy applications into the tomcat, most probably tomcat
su - tomcat
browse to the tomcat installation folder, webapps directory
cd /app/apache-tomcat-9.0.34/webapps
go to cbaWorkflow folder, WEB-INF, cgi
cd /app/apache-tomcat-9.0.34/webapps/cbaWorkflow/WEB-INF/cgi
change permissions to be able to execute the file cadence
chmod 755 cadence
edit connection parameters for the servlets connection
cd /app/apache-tomcat-9.0.34/webapps/cbaWorkflow/WEB-INF/classes
vi connection.ini
#connection information is kept here
url=jdbc:postgresql://192.168.2.30:5432/dataexplorerdb1
user=dataexplorer1
password=2MuchTime
Browse to the main page of the application in the web browser:
http://r01edge.custom-built-apps.com:1962/cbaWorkflow/
you should see this:
nodes list and add a node dependency will be empty.
Add a node will show a form you will be using to be adding the nodes.
show execution graph will show start and end nodes:
create directories for run and logs
create a directory for the workflow run. the scripts, jars, executables for the workflow will be residing here. Starting the workflow should be in this directory. Copy ${CBAWF_HOME}/etc/run.sh to this directory.
mkdir $HOME/run
cp ${CBAWF_HOME}/etc/run.sh ${HOME}/run
verify the variables in run.sh and update if needed, e.g JAVA_HOME is probably wrong
mkdir $HOME/logs
create the scripts and jars in the ${HOME}/run directory
Add nodes and dependencies in the web interface
start adding nodes in add a node link:
Nodes are looking like that now:
Add nodes dependency:
setup a cron job
the etc directory contains a run.sh file which has been copied to run directory
inspect it for variables match to the current environtment
make it executable
make a crontab entry:
crontab -e
14 14 * * * sh /home/dataexplorer1/run/run.sh
Attachments:
image2021-3-29_11-45-12.png (image/png)
image2021-3-29_12-34-35.png (image/png)
image2021-3-29_12-37-31.png (image/png)
image2021-3-29_12-41-14.png (image/png)
image2021-3-29_12-42-8.png (image/png)
image2021-3-29_12-51-17.png (image/png)
image2021-3-29_12-52-24.png (image/png)
image2021-3-29_12-54-49.png (image/png)
image2021-3-29_13-22-34.png (image/png)
image2021-3-29_13-24-22.png (image/png)
image2021-3-29_13-34-16.png (image/png)
image2021-3-29_15-34-51.png (image/png)