Home LinkedIn Skill Assessment LinkedIn Skill Assessment – Hadoop

LinkedIn Skill Assessment

LinkedIn Skill Assessment – Hadoop

Michael Kaiser

August 17, 2022

525

Q1. Partitioner controls the partitioning of what data?

[ ] final keys
[ ] final values
[x] intermediate keys
[ ] intermediate values

Q2. SQL Windowing functions are implemented in Hive using which keywords?

[ ] UNION DISTINCT, RANK
[x] OVER, RANK
[ ] OVER, EXCEPT
[ ] UNION DISTINCT, RANK

Q3. Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?

[ ] Add a partitioned shuffle to the Map job.
[x] Add a partitioned shuffle to the Reduce job.
[ ] Break the Reduce job into multiple, chained Reduce jobs.
[ ] Break the Reduce job into multiple, chained Map jobs.

Q4. Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?

[ ] encrypted HTTP
[ ] unsigned HTTP
[ ] compressed HTTP
[x] signed HTTP

Q5. MapReduce jobs can be written in which language?

[x] Java or Python
[ ] SQL only
[ ] SQL or Java
[ ] Python or SQL

Q6. To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?

[ ] Reducer
[x] Combiner
[ ] Mapper
[ ] Counter

Q7. To verify job status, look for the value `_` in the `_`.

[ ] SUCCEEDED; syslog
[x] SUCCEEDED; stdout
[ ] DONE; syslog
[ ] DONE; stdout

Q8. Which line of code implements a Reducer method in MapReduce 2.0?

[x] public void reduce(Text key, Iterator values, Context context){…}
[ ] public static void reduce(Text key, IntWritable[] values, Context context){…}
[ ] public static void reduce(Text key, Iterator values, Context context){…}
[ ] public void reduce(Text key, IntWritable[] values, Context context){…}

Q9. To get the total number of mapped input records in a map job task, you should review the value of which counter?

[ ] FileInputFormatCounter
[ ] FileSystemCounter
[ ] JobCounter
[x] TaskCounter (NOT SURE)

Q10. Hadoop Core supports which CAP capabilities?

[x] A, P
[ ] C, A
[ ] C, P
[ ] C, A, P

Q11. What are the primary phases of a Reducer?

[ ] combine, map, and reduce
[x] shuffle, sort, and reduce
[ ] reduce, sort, and combine
[ ] map, sort, and combine

Q12. To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the `_` service, which is `_`.

[ ] Oozie; open source
[ ] Oozie; commercial software
[ ] Zookeeper; commercial software
[x] Zookeeper; open source

Q13. For high availability, use multiple nodes of which type?

[ ] data
[x] name
[ ] memory
[ ] worker

Q14. DataNode supports which type of drives?

[x] hot swappable
[ ] cold swappable
[ ] warm swappable
[ ] non-swappable

Q15. Which method is used to implement Spark jobs?

[ ] on disk of all workers
[ ] on disk of the master node
[ ] in memory of the master node
[x] in memory of all workers

Q16. In a MapReduce job, where does the map() function run?

[ ] on the reducer nodes of the cluster
[x] on the data nodes of the cluster (NOT SURE)
[ ] on the master node of the cluster
[ ] on every node of the cluster

Q17. To reference a master file for lookups during Mapping, what type of cache should be used?

[x] distributed cache
[ ] local cache
[ ] partitioned cache
[ ] cluster cache

Q18. Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?

[ ] cache inputs
[ ] reducer inputs
[ ] intermediate values
[x] map inputs

Q19. Which command imports data to Hadoop from a MySQL database?

[ ] spark import –connect jdbc:mysql://mysql.example.com/spark –username spark –warehouse-dir user/hue/oozie/deployments/spark
[ ] sqoop import –connect jdbc:mysql://mysql.example.com/sqoop –username sqoop –warehouse-dir user/hue/oozie/deployments/sqoop
[x] sqoop import –connect jdbc:mysql://mysql.example.com/sqoop –username sqoop –password sqoop –warehouse-dir user/hue/oozie/deployments/sqoop
[ ] spark import –connect jdbc:mysql://mysql.example.com/spark –username spark –password spark –warehouse-dir user/hue/oozie/deployments/spark

Q20. In what form is Reducer output presented?

[x] compressed (NOT SURE)
[ ] sorted
[ ] not sorted
[ ] encrypted

Q21. Which library should be used to unit test MapReduce code?

[ ] JUnit
[ ] XUnit
[x] MRUnit
[ ] HadoopUnit

Q22. If you started the NameNode, then which kind of user must you be?

[ ] hadoop-user
[x] super-user
[ ] node-user
[ ] admin-user

Q23. State _ between the JVMs in a MapReduce job

[ ] can be configured to be shared
[ ] is partially shared
[ ] is shared
[x] is not shared (https://www.lynda.com/Hadoop-tutorials/Understanding-Java-virtual-machines-JVMs/191942/369545-4.html)

Q24. To create a MapReduce job, what should be coded first?

[ ] a static job() method
[x] a Job class and instance (NOT SURE)
[ ] a job() method
[ ] a static Job class

Q25. To connect Hadoop to AWS S3, which client should you use?

[x] S3A
[ ] S3N
[ ] S3
[ ] the EMR S3

Q26. HBase works with which type of schema enforcement?

[ ] schema on write
[ ] no schema
[ ] external schema
[x] schema on read

Q27. HDFS file are of what type?

[ ] read-write
[ ] read-only
[ ] write-only
[x] append-only

Q28. A distributed cache file path can originate from what location?

[ ] hdfs or top
[ ] http
[x] hdfs or http
[ ] hdfs

Q29. Which library should you use to perform ETL-type MapReduce jobs?

[ ] Hive
[x] Pig
[ ] Impala
[ ] Mahout

Q30. What is the output of the Reducer?

[ ] a relational table
[ ] an update to the input file
[ ] a single, combined list
[x] a set of pairs

map function processes a certain key-value pair and emits a certain number of key-value pairs and the Reduce function processes values grouped by the same key and emits another set of key-value pairs as output.

Q31. To optimize a Mapper, what should you perform first?

[ ] Override the default Partitioner.
[ ] Skip bad records.
[ ] Break up Mappers that do more than one task into multiple Mappers.
[ ] Combine Mappers that do one task into large Mappers.

Q32. When implemented on a public cloud, with what does Hadoop processing interact?

[x] files in object storage
[ ] graph data in graph databases
[ ] relational data in managed RDBMS systems
[ ] JSON data in NoSQL databases

Q33. In the Hadoop system, what administrative mode is used for maintenance?

[ ] data mode
[x] safe mode
[ ] single-user mode
[ ] pseudo-distributed mode

Q34. In what format does RecordWriter write an output file?

[x] pairs
[ ] keys
[ ] values
[ ] pairs

Q35. To what does the Mapper map input key/value pairs?

[ ] an average of keys for values
[ ] a sum of keys for values
[x] a set of intermediate key/value pairs
[ ] a set of final key/value pairs

Q36. Which Hive query returns the first 1,000 values?

[ ] SELECT…WHERE value = 1000
[x] SELECT … LIMIT 1000
[ ] SELECT TOP 1000 …
[ ] SELECT MAX 1000…

Q37. To implement high availability, how many instances of the master node should you configure?

[ ] one
[ ] zero
[ ] shared
[x] two or more (https://data-flair.training/blogs/hadoop-high-availability-tutorial)

Q38. Hadoop 2.x and later implement which service as the resource coordinator?

[ ] kubernetes
[ ] JobManager
[ ] JobTracker
[x] YARN

Q39. In MapReduce, _ have _

[ ] tasks; jobs
[ ] jobs; activities
[x] jobs; tasks
[ ] activities; tasks

Q40. What type of software is Hadoop Common?

[ ] database
[x] distributed computing framework
[ ] operating system
[ ] productivity tool

Q41. If no reduction is desired, you should set the numbers of _ tasks to zero

[ ] combiner
[x] reduce
[ ] mapper
[ ] intermediate

Q42. MapReduce applications use which of these classes to report their statistics?

[ ] mapper
[ ] reducer
[ ] combiner
[x] counter

Q43. _ is the query language, and _ is storage for NoSQL on Hadoop

[ ] HDFS; HQL
[x] HQL; HBase
[ ] HDFS; SQL
[ ] SQL; HBase

Q44. MapReduce 1.0 _ YARN

[x] does not include
[ ] is the same thing as
[ ] includes
[ ] replaces

Q45. Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories?

[ ] ControllerNode
[ ] DataNode
[ ] MetadataNode
[x] NameNode

Q46. HQL queries produce which job types?

[ ] Impala
[ ] MapReduce
[ ] Spark
[ ] Pig

Q47. Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below?

1 data = LOAD '/user/hue/pig/examples/data/midsummer.txt'...
2

[ ] as (text:CHAR[]); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
[x] as (text:CHARARRAY); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
[ ] as (text:CHAR[]); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
[ ] as (text:CHARARRAY); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

Q48. In a MapReduce job, which phase runs after the Map phase completes?

[x] Combiner
[ ] Reducer
[ ] Map2
[ ] Shuffle and Sort

Q49. Where would you configure the size of a block in a Hadoop environment?

[x] dfs.block.size in hdfs-site.xmls
[ ] orc.write.variable.length.blocks in hive-default.xml
[ ] mapreduce.job.ubertask.maxbytes in mapred-site.xml
[ ] hdfs.block.size in hdfs-site.xml

Q50. Hadoop systems are _ RDBMS systems.

[ ] replacements for
[ ] not used with
[ ] substitutes for
[x] additions for

Q51. Which object can be used to distribute jars or libraries for use in MapReduce tasks?

[x] distributed cache
[ ] library manager
[ ] lookup store
[ ] registry

Q52. To view the execution details of an Impala query plan, which function would you use ?

[x] explain
[ ] query action
[ ] detail
[ ] query plan

Q53. Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time?

[ ] partitioning
[x] snapshot
[ ] replication
[ ] high availability

Reference

Q54. Hadoop Common is written in which language?

[ ] C++
[ ] C
[ ] Haskell
[x] Java

Q55. Which file system does Hadoop use for storage?

[ ] NAS
[ ] FAT
[x] HDFS
[ ] NFS

Q56. What kind of storage and processing does Hadoop support?

[ ] encrypted
[ ] verified
[x] distributed
[ ] remote

Q57. Hadoop Common consists of which components?

[ ] Spark and YARN
[x] HDFS and MapReduce
[ ] HDFS and S3
[ ] Spark and MapReduce

Q58. Most Apache Hadoop committers’ work is done at which commercial company?

[ ] Cloudera
[ ] Microsoft
[ ] Google
[x] Amazon

Q59. To get information about Reducer job runs, which object should be added?

[] Reporter
[ ] IntReadable
[ ] IntWritable
[ ] Writer

Q60. After changing the default block size and restarting the cluster, to which data does the new size apply?

[ ] all data
[ ] no data
[ ] existing data
[ ] new data

Q61. Which statement should you add to improve the performance of the following query?

SELECT
  c.id,
  c.name,
  c.email_preferences.categories.surveys
FROM customers c;

[ ] GROUP BY
[ ] FILTER
[ ] SUB-SELECT
[ ] SORT

Q62. What custom object should you implement to reduce IO in MapReduce?

[ ] Comparator
[ ] Mapper
[x] Combiner
[ ] Reducer

Q63. You can optimize Hive queries using which method?

[ ] secondary indices
[ ] summary statistics
[ ] column-based statistics
[ ] a primary key index

Q64. If you are processing a single action on each input, what type of job should you create?

[ ] partition-only
[x] map-only
[ ] reduce-only
[ ] combine-only

Q65. The simplest possible MapReduce job optimization is to perform which of these actions?

[ ] Add more master nodes.
[x] Implement optimized InputSplits.
[ ] Add more DataNodes.
[ ] Implement a custom Mapper.

Q66. When you implement a custom Writable, you must also define which of these object?

[ ] a sort policy
[ ] a combiner policy
[ ] a compression policy
[ ] a filter policy

Q67. To copy a file into the Hadoop file system, what command should you use?

[ ] hadoop fs -copy
[ ] hadoop fs -copy
[x] hadoop fs -copyFromLocal
[ ] hadoop fs -copyFromLocal

Q68. delete a Hive _ table and you will delete the table _.

[ ] managed; metadata
[ ] external; data and metadata
[ ] external; metadata
[ ] managed; data

Q69. To see how Hive executed a JOIN operation, use the _ statement and look for the _ value.

[ ] EXPLAIN; JOIN Operator
[ ] QUERY; MAP JOIN Operator
[ ] EXPLAIN; MAP JOIN Operator
[ ] QUERY; JOIN Operator