Friday, August 2, 2013

Interview questions for Freshers and Experience

What is a SequenceFile in Hadoop?

A. ASequenceFilecontains a binaryencoding ofan arbitrary numberof homogeneous writable objects.
B. ASequenceFilecontains a binary encoding of an arbitrary number of heterogeneous writable objects.
C. ASequenceFilecontains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D. ASequenceFilecontains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be sametype.

Is there a map input format in Hadoop?

A.  Yes, but only in Hadoop 0.22+.
B.  Yes, there is a special format for map files.
C.  No, but sequence file input format can read map files.
D.  Both 2 and 3 are correct answers.

What happens if mapper output does not match reducer input in Hadoop?

A.  Hadoop API will convert the data to the type that is needed by the reducer.
B.  Data input/output inconsistency cannot occur. A preliminary validation check is executed prior to the full execution of the job to ensure there is consistency.
C.  The java compiler will report an error during compilation but the job will complete with exceptions.
D.  A real-time exception will be thrown and map-reduce job will fail.

Can you provide multiple input paths to a map-reduce jobs Hadoop?

A.  Yes, but only in Hadoop 0.22+.
B.  No, Hadoop always operates on one input directory.
C.  Yes, developers can add any number of input paths.
D.  Yes, but the limit is currently capped at 10 input paths.

 
Can a custom type for data Map-Reduce processing be implemented in Hadoop?

A.  No, Hadoop does not provide techniques for custom datatypes.
B.  Yes, but only for mappers.
C.  Yes, custom data types can be implemented as long as they implement writable interface.
D.  Yes, but only for reducers.

The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes. What are these writable data types optimized for?

A.  Writable data types are specifically optimized for network transmissions
B.  Writable data types are specifically optimized for file system storage
C.  Writable data types are specifically optimized for map-reduce processing
D.  Writable data types are specifically optimized for data retrieval

 What is writable in  Hadoop?

A.  Writable is a java interface that needs to be implemented for streaming data to remote servers.
B.  Writable is a java interface that needs to be implemented for HDFS writes.
C.  Writable is a java interface that needs to be implemented for MapReduce processing.
D.  None of these answers are correct.


What is the best performance one can expect from a Hadoop cluster?

A.  The best performance expectation one can have is measured in seconds. This is because Hadoop can only be used for batch processing
B.  The best performance expectation one can have is measured in milliseconds. This is because Hadoop executes in parallel across so many machines
C.  The best performance expectation one can have is measured in minutes. This is because Hadoop can only be used for batch processing
D.  It depends on on the design of the map-reduce program, how many machines in the cluster, and the amount of data being retrieved

What is distributed cache in Hadoop?

A.  The distributed cache is special component on namenode that will cache frequently used data for faster client response. It is used during reduce step.
B.  The distributed cache is special component on datanode that will cache frequently used data for faster client response. It is used during map step.
C.  The distributed cache is a component that caches java objects.
D.  The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.

 
 Can you run Map - Reduce jobs directly on Avro data in Hadoop?

A.  Yes, Avro was specifically designed for data processing via Map-Reduce
B.  Yes, but additional extensive coding is required
C.  No, Avro was specifically designed for data storage only
D.  Avro specifies metadata that allows easier data access. This data cannot be used as part of map-reduce execution, rather input specification only.

 

No comments:

Post a Comment