Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The hdfs environment is configured, how do I test it? #111

Open
EthanTang-Zq opened this issue Jul 19, 2021 · 0 comments
Open

The hdfs environment is configured, how do I test it? #111

EthanTang-Zq opened this issue Jul 19, 2021 · 0 comments

Comments

@EthanTang-Zq
Copy link

How do I run commands for several tests I want to perform?

E.g

  1. TestDFSIO
    TestDFSIO is used to test the IO performance of HDFS. It uses a MapReduce job to perform read and write operations concurrently. Each map task is used to read or write each file. The output of the map is used to collect statistical information related to processing files. To accumulate statistical information and generate a summary.

View instructions:

hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
TestDFSIO
TestDFSIO.1.7
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB |MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes]

  1. Test HDFS write performance
    Test content: Write 10 128M files to the HDFS cluster:

hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
TestDFSIO
-write
-nrFiles 10
-size 128MB
-resFile /tmp/TestDFSIO_results.log
Note: Because it is to switch the hdfs user to run on Hadoop, the path for generating local logs does not need to be specified, but it must be run under the right path written by the hdfs user, and the generated log is also under the running path, otherwise the path needs to be specified.

View Results:

cat /tmp/TestDFSIO_results.log
----- TestDFSIO -----: write
Date & time: Thu Jun 27 13:46:41 CST 2019
Number of files: 10
Total MBytes processed: 1280.0
Throughput mb/sec: 16.125374788984352
Average IO rate mb/sec: 17.224742889404297
IO rate std deviation: 4.657439940376364
Test exec time sec: 28.751

  1. Test HDFS read performance
    Test content: read 10 128M files in HDFS cluster

sudo -uhdfs hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
TestDFSIO
-read
-nrFiles 10
-size 128MB
-resFile /tmp/TestDFSIO_results.log

  1. Clear test data

hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
TestDFSIO -clean
19/06/27 13:57:21 INFO fs.TestDFSIO: TestDFSIO.1.7
19/06/27 13:57:21 INFO fs.TestDFSIO: nrFiles = 1
19/06/27 13:57:21 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
19/06/27 13:57:21 INFO fs.TestDFSIO: bufferSize = 1000000
19/06/27 13:57:21 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
19/06/27 13:57:22 INFO fs.TestDFSIO: Cleaning up test files

2.nnbench
nnbench is used to test the load of the NameNode. It generates a lot of HDFS-related requests and puts greater pressure on the NameNode. This test can simulate operations such as creating, reading, renaming and deleting files on HDFS.

View instructions:

hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
nnbench -help
NameNode Benchmark 0.4
Usage: nnbench
Options:
-operation
* NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
-maps <number of maps. default is 1. This is not mandatory>
-reduces <number of reduces. default is 1. This is not mandatory>
-startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory
-blockSize <Block size in bytes. default is 1. This is not mandatory>
-bytesToWrite <Bytes to write. default is 0. This is not mandatory>
-bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
-numberOfFiles <number of files to create. default is 1. This is not mandatory>
-replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
-baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
-readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
-help: Display the help statement

The test uses 10 mappers and 5 reducers to create 1000 files:

hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench
-operation create_write
-maps 10
-reduces 5
-blockSize 1
-bytesToWrite 0
-numberOfFiles 1000
-replicationFactorPerFile 3
-readFileAfterOpen true
-baseDir /benchmarks/NNBench-hostname

Results stored on HDFS:

  1. mrbench
    mrbench will repeatedly execute a small job multiple times to check whether the operation of the small job on the cluster is repeatable and efficient.

View instructions:

hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
mrbench -help
MRBenchmark.0.0.2
Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns < number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]

Test run a job 50 times:

hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
mrbench
-numRuns 50
-maps 10
-reduces 5
-inputLines 10
-inpu

How do I execute these commands?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant