Cloudera CCA-500 Exam - Questions and Answers

Question 1

You are working on a project where you need to chain together MapReduce, Pig jobs. You also need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to perform these actions?

A. Oozie
B. ZooKeeper
C. HBase
D. Sqoop
E. HUE

Answer : A

Question 2

You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?

A. When your workload generates a large amount of output data, significantly larger than the amount of intermediate data
B. When your workload consumes a large amount of input data, relative to the entire capacity if HDFS
C. When your workload consists of processor-intensive tasks
D. When your workload generates alarge amount of intermediate data, on the order of the input data itself

Answer : A

Question 3

Identify two features/issues that YARN is designated to address: (Choose two)

A. Standardize on a single MapReduce API
B. Single point of failure in the NameNode
C. Reduce complexity of the MapReduce APIs
D. Resource pressure on the JobTracker
E. Ability to run framework other than MapReduce, such as MPI
F. HDFS latency

Answer : D,E

Reference:http://www.revelytix.com/?q=content/hadoop-ecosystem(YARN, first para)

Question 4

Table schemas in Hive are:

A. Stored as metadata on the NameNode
B. Stored alongwith the data in HDFS
C. Stored in the Metadata
D. Stored in ZooKeeper

Answer : B

Explanation: http://stackoverflow.com/questions/22989592/how-to-get-hive-table-name- based-on-hdfs-location-path-with-out-connecting-to-m

Question 5

Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to cluster?

A. Nothing, other than ensuring that the DNS (or/etc/hosts files on all machines) contains any entry for the new node.
B. Restart the NameNode and ResourceManager daemons and resubmit any running jobs.
C. Add a new entry to /etc/nodes on the NameNode host.
D. Restart the NameNode of dfs.number.of.nodes in hdfs-site.xml

Answer : A

Explanation:
http://wiki.apache.org/hadoop/FAQ#I_have_a_new_node_I_want_to_add_to_a_running_H adoop_cluster.3B_how_do_I_start_services_on_just_one_node.3F

Question 6

Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar SampleJar MyClass on a client machine?

A. SampleJar.Jar is sent to the ApplicationMasterwhich allocates a container for SampleJar.Jar
B. Sample.jar is placed in a temporary directory in HDFS
C. SampleJar.jar is sent directly to the ResourceManager
D. SampleJar.jar is serialized into an XML file which is submitted to the ApplicatoionMaster

Answer : A

Question 7

Your cluster has the following characteristics:
-> A rack aware topology is configured and on
-> Replication is set to 3
-> Cluster block size is set to 64MB
Which describes thefile read process when a client application connects into the cluster and requests a 50MB file?

A. The client queries the NameNode for the locations of the block, and reads all three copies. The first copy to complete transfer to the client is the one theclient reads as part of hadoops speculative execution framework.
B. The client queries the NameNode for the locations of the block, and reads from the first location in the list it receives.
C. The client queries the NameNode for the locations of the block, and reads from a random location in the list it receives to eliminate network I/O loads by balancing which nodes it retrieves data from any given time.
D. The client queries the NameNode which retrieves the block from the nearest DataNode to the clientthen passes that block back to the client.

Answer : B

Question 8

For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log files stored?

A. Cached by the NodeManager managing thejob containers, then written to a log directory on the NameNode
B. Cached in the YARN container running the task, then copied into HDFS on job completion
C. In HDFS, in the directory of the user who generates the job
D. On the local disk of the slave moderunning the task

Answer : D

Question 9

Youre upgrading a Hadoop cluster from HDFS and MapReduce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce a block size of 128MB for all new files written to the cluster after upgrade. What should you do?

A. You cannot enforce this, since client code can always override this value
B. Set dfs.block.sizeto 128M on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final
C. Set dfs.block.size to 128 M on all the worker nodes and client machines, and set the parameter to final. You do not need to set this value on the NameNode
D. Set dfs.block.size to 134217728 on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final
E. Set dfs.block.size to 134217728 on all the worker nodes and client machines, and set the parameter to final. You do not need to set this value on the NameNode

Answer : C

Question 10

You want to node to only swap Hadoop daemon data from RAM to disk when absolutely necessary. What should you do?

A. Delete the /dev/vmswap file onthe node
B. Delete the /etc/swap file on the node
C. Set the ram.swap parameter to 0 in core-site.xml
D. Set vm.swapfile file on the node
E. Delete the /swapfile file on the node

Answer : D

Question 11

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isnt optimized for storing and processing many small files, you decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoopstreaming.
Which data serialization system gives the flexibility to do this?

A. CSV
B. XML
C. HTML
D. Avro
E. SequenceFiles
F. JSON

Answer : E

Explanation: Sequence files are block-compressed and provide direct serialization and deserialization of several arbitrarydata types (not just text). Sequence files can be generated as the output of other MapReduce tasks and are an efficient intermediate representation for data that is passing from one MapReduce job to anther.

Question 12

Which three basic configuration parameters must you set to migrate your cluster from
MapReduce 1 (MRv1) to MapReduce V2 (MRv2)? (Choose three)

A. Configure the NodeManager to enable MapReduce services on YARN by setting the following property in yarn-site.xml: <name>yarn.nodemanager.hostname</name> <value>your_nodeManager_shuffle</value>
B. Configure the NodeManager hostname and enable node services on YARN by setting the following propertyin yarn-site.xml: <name>yarn.nodemanager.hostname</name> <value>your_nodeManager_hostname</value>
C. Configure a default scheduler to run on YARN by setting the following property in mapred-site.xml: <name>mapreduce.jobtracker.taskScheduler</name> <Value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>
D. Configure the number of map tasks per jon YARN by setting the following property in mapred: <name>mapreduce.job.maps</name> <value>2</value>
E. Configure the ResourceManager hostname and enable node services on YARN by setting the following property in yarn-site.xml: <name>yarn.resourcemanager.hostname</name> <value>your_resourceManager_hostname</value>
F. Configure MapReduce as a Framework running on YARN by setting the following property inmapred-site.xml: <name>mapreduce.framework.name</name> <value>yarn</value>

Answer : A,E,F

Question 13

You have installed a cluster HDFS andMapReduce version 2 (MRv2) on YARN. You have no dfs.hosts entry(ies) in your hdfs-site.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start theDataNode daemon on that worker node. What do you have to do on the cluster to allow the worker node to join, and start sorting HDFS blocks?

A. Without creating a dfs.hosts file or making any entries, run the commands hadoop.dfsadmin-refreshModes on the NameNode
B. Restart the NameNode
C. Creating a dfs.hosts file on the NameNode, add the worker Node’s name to it, then issue the command hadoop dfsadmin –refresh Nodes = on the Namenode
D. Nothing; the worker node will automatically join the cluster when NameNode daemon is started

Answer : A

Question 14

During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the intermediate data of each Map Task?

A. The Mapper stores the intermediate data on the node running the Jobs ApplicationMaster so that it is available to YARN ShuffleService before the data is presented to the Reducer
B. The Mapper stores the intermediate data in HDFS on the node where the Map tasks ran in the HDFS /usercache/&(user)/apache/application_&(appid) directory for the user who ran the job
C. The Mapper transfers the intermediate data immediately to the reducers as it is generated by the Map Task
D. YARN holds the intermediate data in the NodeManagers memory (a container) until it is transferred to the Reducer
E. The Mapper stores the intermediate data on the underlyingfilesystem of the local disk in the directories yarn.nodemanager.locak-DIFS

Answer : E

Question 15

Which two are features of Hadoop’s rack topology? (Choose two)

A. Configuration of rack awareness is accomplished using a configuration file. You cannot use a rack topology script.
B. Hadoop gives preference to intra-rack data transfer in order to conserve bandwidth
C. Rack location is considered in the HDFS block placement policy
D. HDFS is rack aware but MapReduce daemon are not
E. Even for small clusters on a single rack, configuring rack awareness will improve performance

Answer : BC

Cloudera Certified Administrator for Apache Hadoop (CCAH) v6.8

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Talk to us!