cbaWorkflow : Generating data for the workflow testing

overview

the data for running the scripts and shell commands will be generated as follows:

  1. a set of books from Guttenberg project is imported from https://www.gutenberg.org/ebooks/
  2. the books are in txt format
  3. the books are put into /stage directory of the development server

4. create hdfs directories:


5 Create  files in the hdfs directories by generating multiple copies of the same file

for opt in $(seq 480)
> do
> hdfs dfs -put herodotus1.txt books/herodotus1/hdw${CNT}.txt
> let CNT++
> done


6. Verify the size is sufficient for the applications to run at least 5 minutes:

hdfs dfs -du -h books

hdfs dfs -count -h -q -v books









Attachments: