Exam Code: CCA-500 (Practice Exam Latest Test Questions VCE PDF)
Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH)
Certification Provider: Cloudera
Free Today! Guaranteed Training- Pass CCA-500 Exam.
2021 Jan CCA-500 dumps
Q1. You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You consistently see that MapReduce map tasks on your cluster are running slowly because of excessive garbage collection of JVM, how do you increase JVM heap size property to 3GB to optimize performance?
A. yarn.application.child.java.opts=-Xsx3072m
B. yarn.application.child.java.opts=-Xmx3072m
C. mapreduce.map.java.opts=-Xms3072m
D. mapreduce.map.java.opts=-Xmx3072m
Answer: C
Explanation: Reference:http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
Q2. Given:
You want to clean up this list by removing jobs where the State is KILLED. What command you enter?
A. Yarn application –refreshJobHistory
B. Yarn application –kill application_1374638600275_0109
C. Yarn rmadmin –refreshQueue
D. Yarn rmadmin –kill application_1374638600275_0109
Answer: B
Explanation: Reference:http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_using-apache-hadoop/content/common_mrv2_commands.html
Q3. Your Hadoop cluster is configuring with HDFS and MapReduce version 2 (MRv2) on YARN. Can you configure a worker node to run a NodeManager daemon but not a DataNode daemon and still have a functional cluster?
A. Yes. The daemon will receive data from the NameNode to run Map tasks
B. Yes. The daemon will get data from another (non-local) DataNode to run Map tasks
C. Yes. The daemon will receive Map tasks only
D. Yes. The daemon will receive Reducer tasks only
Answer: B
Q4. For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log files stored?
A. Cached by the NodeManager managing the job containers, then written to a log directory on the NameNode
B. Cached in the YARN container running the task, then copied into HDFS on job completion
C. In HDFS, in the directory of the user who generates the job
D. On the local disk of the slave mode running the task
Answer: D
Q5. You are running Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undeselected?
A. HDFS is almost full
B. The NameNode goes down
C. A DataNode is disconnected from the cluster
D. Map or reduce tasks that are stuck in an infinite loop
E. MapReduce jobs are causing excessive memory swaps
Answer: B
Most up-to-date CCA-500 exam engine:
Q6. Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar SampleJar MyClass on a client machine?
A. SampleJar.Jar is sent to the ApplicationMaster which allocates a container for SampleJar.Jar
B. Sample.jar is placed in a temporary directory in HDFS
C. SampleJar.jar is sent directly to the ResourceManager
D. SampleJar.jar is serialized into an XML file which is submitted to the ApplicatoionMaster
Answer: A
Q7. You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum Storage. What is the purpose of ZooKeeper in such a configuration?
A. It only keeps track of which NameNode is Active at any given time
B. It monitors an NFS mount point and reports if the mount point disappears
C. It both keeps track of which NameNode is Active at any given time, and manages the Edits file. Which is a log of changes to the HDFS filesystem
D. If only manages the Edits file, which is log of changes to the HDFS filesystem
E. Clients connect to ZooKeeper to determine which NameNode is Active
Answer: A
Explanation: Reference: Reference:http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/PDF/CDH4-High-Availability-Guide.pdf(page 15)
Q8. You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.
Which data serialization system gives the flexibility to do this?
A. CSV
B. XML
C. HTML
D. Avro
E. SequenceFiles
F. JSON
Answer: E
Explanation: Sequence files are block-compressed and provide direct serialization and deserialization of several arbitrary data types (not just text). Sequence files can be generated as the output of other MapReduce tasks and are an efficient intermediate representation for data that is passing from one MapReduce job to anther.
Q9. You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?
A. Sample the web server logs web servers and copy them into HDFS using curl
B. Ingest the server web logs into HDFS using Flume
C. Channel these clickstreams into Hadoop using Hadoop Streaming
D. Import all user clicks from your OLTP databases into Hadoop using Sqoop
E. Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers
Answer: B
Explanation: Apache Flume is a service for streaming logs into Hadoop.
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.
Q10. You use the hadoop fs –put command to add a file “sales.txt” to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replicationfactor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of file in this situation?
A. The file will remain under-replicated until the administrator brings that node back online
B. The cluster will re-replicate the file the next time the system administrator reboots the NameNode daemon (as long as the file’s replication factor doesn’t fall below)
C. This will be immediately re-replicated and all other HDFS operations on the cluster will halt until the cluster’s replication values are resorted
D. The file will be re-replicated automatically after the NameNode determines it is under- replicated based on the block reports it receives from the NameNodes
Answer: D