cbaWorkflow : Generating data for the workflow testing

the data for running the scripts and shell commands will be generated as follows:

a set of books from Guttenberg project is imported from https://www.gutenberg.org/ebooks/
the books are in txt format
the books are put into /stage directory of the development server

4. create hdfs directories:

5 Create files in the hdfs directories by generating multiple copies of the same file

for opt in $(seq 480)
> do
> hdfs dfs -put herodotus1.txt books/herodotus1/hdw${CNT}.txt
> let CNT++
> done

6. Verify the size is sufficient for the applications to run at least 5 minutes:

hdfs dfs -du -h books

hdfs dfs -count -h -q -v books

Attachments: