Hadoop MapReduce & Apache Spark

This month we're taking a deeper dive into some of the differences between MapReduce and Apache Spark. I have answered some simple questions below in order to offer a sense of both platforms including some of the pros and cons of these highly regarded technologies. I have also included some information on Mahut.
  1. What is the bottleneck in Hadoop MapReduce?
    1. A bottleneck occurs when one of the system resources consumes more time or energy than typically required, which slows other resources and decreases the overall system performance. In Hadoop MapReduce, system performance measures could include; RAM (memory), CPU, Storage I/O and Network Bandwidth. Of the noted possible bottlenecks, RAM is known to play a key role in the ecosystem of resources. RAM is the Random Access Memory that stores data temporarily. MapReduce completes its tasks by reading and writing back to the disk, which is inefficient. Overall, Memory should be configured and set at each node to handle the workload, otherwise it could be a potential bottle neck that significantly slows Hadoop MapReduce’s performance (Apprize, 2014). For example, Memory could be a bottleneck in a case where machine learning requires greater workload iterations than a system could handle. A MapReduce workflow for a machine learning model is shown below in Figure 1. If the mapping and reducing nodes are not able to handle the data workload being produced, they could fail and cause overall system resource downtime.
Figure 1
  1. What is the bottleneck or issues in Mahout as a machine learning Hadoop ecosystems?
    1. Mahout is used as a data mining/machine learning framework to develop models for recommendation, classification and clustering applications within an Hadoop environment. Some of the historically known issues with Mahout are similar to the legacy developments found in MapReduce. Mahouts legacy algorithms are based on Hadoops MapReduce jobs, which have been found to drag because they do not have in memory processing for faster iterative algorithms. Although it is still functional, Mahout does not have the same support it originally had because the focus has turned over to Apache Spark and its libraries. Ultimately, Mahout is older and includes more legacy support. Mahout is now adapting to cover-up its bottlenecks by integration of Spark back end support (Barga, 2015). Figure 2 below is an image of where Mahout plays a role in the Data accessing section of Hadoops ecosystems.
Figure 2

  1. How Spark can get over MapReduce’s bottleneck?
    1. Apache Spark could improve upon MapReduces bottleneck regarding RAM by utilizing in memory computing. Rather than writing its temporary information on a disk, Spark has been noted to utilize in memory computing to execute jobs upto 100 times faster than MapReduce. The in-memory computing feature not only simplifies jobs, but also enables Spark for real time analytical processing, rather than batch processing. Spark has a Direct Acyclic Graph (DAG) execution engine, which supports cyclic data flows for in-memory computing. DAG engine helps optimize Spark over MapReduce by creating partitions of RDD’s so that they can be computed or recovered at any point in time. (Apache Spark, 2017) Figure 3 below displays the difference in MapReduce’s workflow versus Apache Spark’s DAG workflow, which shows a major difference in how many times data is read and written back to (Hadoop Distributed File System) HDFS. DAG is able to keep its computations within memory (RAM) for quicker performance on tasks, which is a major advantage.
Figure 3


References:

Apache Spark. (2017, April 8). Direct Acyclic Graph DAG in Apache Spark. Retrieved from Data Flair: https://data-flair.training/blogs/dag-in-apache-spark/
Apprize. (2014). Optimizing Hadoop for MapReduce, Detecting System Bottlenecks. Retrieved from Apprize.info: http://apprize.info/data/hadoop/4.html
Barga, M. (2015, 10 22). Apache Mahout and Spark Comparison. Retrieved from MatthewBarga.com: http://matthewbarga.com/blog/index.php/2015/10/22/apache-mahout-and-spark-comparison/

Comments

  1. Top online casino sites: Betsoft, Betsoft, Betsoft and more
    Top online casino sites: Betsoft, Betsoft, Betsoft and more · 1. Spin Palace Casino · 2. Wild Casino · 3. Playpawa 온카지노 검증 Casino · 4. Lucky 7 Casino · 5. Joker123 Casino · 6.

    ReplyDelete
  2. "Cotton Tip" T-Shirt - Titanium Tube
    Watch baoji titanium video: Watch video: "Cotton Tip" 사이트 추천 T-Shirt 원피스 바카라 by T-Shirt (2) by 실시간 바카라 사이트 샤오 미 T-Shirt on YouTube. 라이브스코어 Watch more videos:.

    ReplyDelete

Post a Comment