cbaWorkfow application

Custom Built Apps ltd offers the cbaWorkflow application

cbaWorkflow application is a Linux based workflow application which creates a set of execution nodes for Big Data and warehouse projects. It is written in C++ and uses free Linux libraries

cbaWorkflow keeps track of the execution schedule, states of the nodes, involves data consistency verification in a set of nonblocked components coordinated by the main execution engine. The execution status is kept in shared memory, so service applications can have access to the map to modify it, if required or obtain a real time information. Other applications can check the execution status from the database repository table. CbaWorkflow persists the map to database on a set interval.

General description

Overall process

Resources

key elements used:

workflow framework will call the applications written in different languages and get the result codes of execution and logs parsing

installation and configuration instructions

Installation of RPM

  1. login as root
  2. if the repository for yum is set up then follow the instructions in packaging cbaWorkflow and execute: yum install cbaWorkflow
  3. If the yum hosted Nexus repository has not been set up ,then obtain the rpm , currently cbaWorkflow-1.0.5-1.x86_64.rpm from Nexus or ftp it : wget http://192.168.2.22:8081/repository/cbayum-hosted/cbaWorkflow/1.0.5/1/cbaWorkflow-1.0.5-1.x86_64.rpm

    In the directory where the rpm is located issue the following command: yum install local cbaWorkflow-1.0.0-1.x86_64.rpm

    Installing: cbaWorkflow x86_64 1.0.0-1 /cbaWorkflow-1.0.0-1.x86_64 8.9 M

  4. Verify the installation:
    ls -l /app/cbaWorkflow

    rpm -qi cbaWorkflow

    Name : cbaWorkflow

    Version : 1.0.0

    Release : 1

    Architecture: x86_64

    Install Date: Sat 27 Mar 2021 09:38:29 PM EDT

    Group : Applications/File

    Size : 9320360

    License : available for purchase, free license up to 5 nodes

    Signature : (none)

    Source RPM : cbaWorkflow-1.0.0-1.src.rpm

    Build Date : Sat 27 Mar 2021 04:29:19 PM EDT

    Build Host : r02edge.custom-built-apps.com

    Relocations : (not relocatable)

    Packager : Boris Alexandrov, Custom Built Apps ltd

    Vendor : Custom Built Apps ltd, Toronto, Ontario

    URL : www.custom-built-apps.com

    Summary : workflow application for data pipelines

    Description :

    The application runs a data pipeline of execution nodes like spark-sql, spark-submit, shell, hdfs commands in orchestration and cadence.

    Contains a tomcat based interface for adding the nodes and monitoring the process

    check the files list:
    rpm -ql cbaWorkflow
    /app/cbaWorkflow/bin/cbaWFMonitorTest

    /app/cbaWorkflow/bin/cbaWorkflowTest

    /app/cbaWorkflow/bin/repobuilder

    /app/cbaWorkflow/bin/shmcreate

    /app/cbaWorkflow/bin/shmupdate

    /app/cbaWorkflow/bin/shmview

    /app/cbaWorkflow/etc/cbaWorkflow.ini

    /app/cbaWorkflow/etc/config6

    /app/cbaWorkflow/etc/env.sh

    /app/cbaWorkflow/etc/run.sh

    /app/cbaWorkflow/lib/libcbaEndNode.so

    /app/cbaWorkflow/lib/libcbaGraph.so

    /app/cbaWorkflow/lib/libcbaHdfsExecNode.so

    /app/cbaWorkflow/lib/libcbaNodeWalker.so

    /app/cbaWorkflow/lib/libcbaShellNode.so

    /app/cbaWorkflow/lib/libcbaSparkSQLNode.so

    /app/cbaWorkflow/lib/libcbaSparkSubmitNode.so

    /app/cbaWorkflow/lib/libcbaStartNode.so

    /app/cbaWorkflow/lib/libcbaUtils.so

    /app/cbaWorkflow/lib/libcbaWFRepositoryConnector.so

    /app/cbaWorkflow/lib/libcbaWFRepositoryConnectorPG.so

    /app/cbaWorkflow/lib/libcbaWFSharedMemory.so

    /app/cbaWorkflow/lib/libcbaWorkflowNode.so

    /app/cbaWorkflow/wars/cbaWorkflow.war

Configuring the cbaWorkflow.ini file


the file is located in the /app/cbaWorkflow/etc
-- FILE CONTENTS
dbname=cbaworkflowdb
user=dataexplorer1
password=2MuchTime
hostaddr=192.168.2.30
port=5432
polling_time=15
nodes_log_dir=/home/dataexplorer1/logs
connector_lib=libcbaWFRepositoryConnectorPG.so
--- END OF FILE
repository connector libcbaWFRepositoryConnectorPG.so is used to connect to PostgreSQL server
set database to the database where a repository is going to be created
update the parameters as needed
login as the user who is going to run the pipeline
copy the cbaWorkflow.ini into your home directory as a hidden file
cp /app/cbaWorkflow/etc/cbaWorkflow.ini ${HOME}/.cbaWorkflow.ini
create the directory for logs:
mkdir /home/dataexplorer1/logs

Setting the environment


login as the user which is going to be running datapipeline.
copy the /app/cbaWorkflow/etc/env.sh to your directory
cp /app/cbaWorkflow/etc/env.sh $HOME
chmod 755 env.sh
vi env.sh
verify the variables match your environment.
add the path to libstdc++.so.6x to LD_LIBRARY_PATH as the first entry, e.g if /usr/local/lib64/libstdc++.so
then edit the LD_LIBRARY_PATH as follows:
export LD_LIBRARY_PATH=/usr/local/lib64:${CBAWF_HOME}/lib:${LD_LIBRARY_PATH}
source the env.sh:
. env.sh
verify the environment:
[dataexplorer1@r01edge ~]$ env | grep CBA
CBAWFINI=/home/dataexplorer1/.cbaWorkflow.ini
CBAWF_HOME=/app/cbaWorkflow
env | grep PATH
LD_LIBRARY_PATH=/usr/local/lib64:/app/cbaWorkflow/lib:/app/cbaWorkflow/lib:/app/hadoop/lib/native:
PATH=/app/cbaWorkflow/bin:/app/cbaWorkflow/bin:/app/spark/bin:/app/spark/sbin:/app/oozie/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/app/hadoop/sbin:/app/hadoop/bin}:/app/hadoop/sbin:/app/hadoop/bin:/home/hdfs/hadoop/sbin:/home/hdfs/hadoop/bin:/app/hive/bin:/app/spark/bin:/home/dataexplorer1/.local/bin:/home/dataexplorer1/bin
which repobuilder
/app/cbaWorkflow/bin/repobuilder
verify there are no unresolved symbols in repobuilder
[dataexplorer1@r01edge ~]$ ldd -r $(which repobuilder)
linux-vdso.so.1 => (0x00007ffca33a8000)
librt.so.1 => /lib64/librt.so.1 (0x00007fc2a7f92000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc2a7d76000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc2a7b72000)
libcbaWFRepositoryConnector.so => /app/cbaWorkflow/lib/libcbaWFRepositoryConnector.so (0x00007fc2a796f000)
libcbaUtils.so => /app/cbaWorkflow/lib/libcbaUtils.so (0x00007fc2a772b000)
libstdc++.so.6 => /usr/local/lib64/libstdc++.so.6 (0x00007fc2a73b0000)
libm.so.6 => /lib64/libm.so.6 (0x00007fc2a70ae000)
libgcc_s.so.1 => /usr/local/lib64/libgcc_s.so.1 (0x00007fc2a6e97000)
libc.so.6 => /lib64/libc.so.6 (0x00007fc2a6ac9000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc2a819a000)

Creating the repository


verify the values in cbaWorkflow.ini are correct for the database, which is going to be a repository.
the user in the connection parameters needs to be able to create tables in the database
cat $CBAWFINI
edit if needed
run the generate repository program repobuilder
repobuilder
Generating a new cbaWorkflow repository
Verify the tables have been created in the repository database,using PostgreSQL interface: