overview
the data for running the scripts and shell commands will be generated as follows:
- a set of books from Guttenberg project is imported from https://www.gutenberg.org/ebooks/
- the books are in txt format
- the books are put into /stage directory of the development server
4. create hdfs directories:
5 Create files in the hdfs directories by generating multiple copies of the same file
for opt in $(seq 480)
> do
> hdfs dfs -put herodotus1.txt books/herodotus1/hdw${CNT}.txt
> let CNT++
> done
6. Verify the size is sufficient for the applications to run at least 5 minutes:
hdfs dfs -du -h books
hdfs dfs -count -h -q -v books