$show=/label

Java Spark RDD reduce() Examples - sum, min and max opeartions

SHARE:

A quick guide to explore the Spark RDD reduce() method in java programming to find sum, min and max values from the data set.

1. Overview

In this tutorial, we will learn how to use the Spark RDD reduce() method using the java programming language. Most of the developers use the same method reduce() in pyspark but in this article, we will understand how to get the sum, min and max operations with Java RDD.

Java Spark RDD reduce() Examples



2. Java Spark RDD - reduce() method


First let understand the syntax of java reduce() spark method.

public T reduce(scala.Function2<T,T,T> f)
 

This method takes the Function2 functional interface which is the concept of Java 8. But the Function2 is implemented in Scala language.

Function2 takes two arguments as input and returns one value. Here, always input and output type should be the same.


3. Java Spark RDD reduce() Example to find the sum


In the below examples, we first created the SparkConf and JavaSparkContext with local mode for the testing purpose.

We've provided the step by step meaning in the program.

We must have to pass the lambda expression to the reduce() method. If you are new to java, please read the in-depth article on Java 8 Lambda expressions.

You might be surprised with the logic behind the reduce() method. Below is the explanation of its internals. As a developer, you should know the basic knowledge on hood what is going on.

On the RDD, reduce() method is called with the logic of value1 + value2. That means this formula will be applied to all the values in each partition untill partition will have only one value.

If there are more than one partitions then all the outputs of partitions are moved to another data node. Then next, again the same logic value1 + value2 is applied to get the final result.

if only one partition is for the input file or dataset then it will return the final output of the single partion.


package com.javaprogramto.rdd.reduce;

import java.util.Arrays;
import java.util.List;

import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class RDDReduceExample {

	public static void main(String[] args) {
		
		// to remove the unwanted loggers from output.
		Logger.getLogger("org.apache").setLevel(Level.WARN);

		// Getting the numbers list.
		List<Integer> numbersList = getSampleData();
		
		// Creating the SparkConf object
		SparkConf sparkConf = new SparkConf().setAppName("Java RDD_Reduce Example").setMaster("local");

		// Creating JavaSprakContext object
		JavaSparkContext sc = new JavaSparkContext(sparkConf);
		
		// Converting List into JavaRDD.
		JavaRDD<Integer> integersRDD =  sc.parallelize(numbersList);
		
		// Getting the sum of all numbers using reduce() method
		Integer sumResult = integersRDD.reduce( (value1, value2) -> value1 + value2);

		// printing the sum
		
		System.out.println("Sum of RDD numbers using reduce() : "+sumResult);
		
		// closing Spark Context
		sc.close();
		
	}

	/**
	 * returns a list of integer numbers
	 * 
	 * @return
	 */
	private static List<Integer> getSampleData() {

		return Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9);

	}

}

 
Output:
Sum of RDD numbers using reduce() : 45


 

4. Java Spark RDD reduce() min and max Examples


Next, let us find the min and max values from the RDD.

package com.javaprogramto.rdd.reduce;

import java.util.Arrays;
import java.util.List;

import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class RDDReduceExample {

	public static void main(String[] args) {
		
		// to remove the unwanted loggers from output.
		Logger.getLogger("org.apache").setLevel(Level.WARN);

		// Getting the numbers list.
		List<Integer> numbersList = getSampleData();
		
		// Creating the SparkConf object
		SparkConf sparkConf = new SparkConf().setAppName("Java RDD_Reduce Example").setMaster("local");

		// Creating JavaSprakContext object
		JavaSparkContext sc = new JavaSparkContext(sparkConf);
		
		// Converting List into JavaRDD.
		JavaRDD<Integer> integersRDD =  sc.parallelize(numbersList);
		
		// Finding Min and Max values using reduce() method
		
		Integer minResult = integersRDD.reduce( (value1, value2) -> Math.min(value1, value2));
		
		System.out.println("Min of RDD numbers using reduce() : "+minResult);
		
		Integer maxResult = integersRDD.reduce( (value1, value2) -> Math.max(value1, value2));
		
		System.out.println("Max of RDD numbers using reduce() : "+maxResult);
		
		// closing Spark Context
		sc.close();
		
	}

	/**
	 * returns a list of integer numbers
	 * 
	 * @return
	 */
	private static List<Integer> getSampleData() {

		return Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9);

	}

}

 
Output:
Min of RDD numbers using reduce() : 1
Max of RDD numbers using reduce() : 9
 

5. Conclusion


In this post, we've seen how to use reduce() aggregate operation on the RDD dataset to find the sum,  min and max values with an example program in java.




COMMENTS

BLOGGER

About Us

Author: Venkatesh - I love to learn and share the technical stuff.
Name

accumulo,1,ActiveMQ,2,Adsense,1,API,37,ArrayList,18,Arrays,24,Bean Creation,3,Bean Scopes,1,BiConsumer,1,Blogger Tips,1,Books,1,C Programming,1,Collection,8,Collections,37,Collector,1,Command Line,1,Comparator,1,Compile Errors,1,Configurations,7,Constants,1,Control Statements,8,Conversions,6,Core Java,149,Corona India,1,Create,2,CSS,1,Date,3,Date Time API,38,Dictionary,1,Difference,2,Download,1,Eclipse,3,Efficiently,1,Error,1,Errors,1,Exceptions,8,Fast,1,Files,17,Float,1,Font,1,Form,1,Freshers,1,Function,3,Functional Interface,2,Garbage Collector,1,Generics,4,Git,9,Grant,1,Grep,1,HashMap,2,HomeBrew,2,HTML,2,HttpClient,2,Immutable,1,Installation,1,Interview Questions,6,Iterate,2,Jackson API,3,Java,32,Java 10,1,Java 11,6,Java 12,5,Java 13,2,Java 14,2,Java 8,128,Java 8 Difference,2,Java 8 Stream Conversions,4,java 8 Stream Examples,12,Java 9,1,Java Conversions,14,Java Design Patterns,1,Java Files,1,Java Program,3,Java Programs,114,Java Spark,1,java.lang,4,java.util. function,1,JavaScript,1,jQuery,1,Kotlin,11,Kotlin Conversions,6,Kotlin Programs,10,Lambda,2,lang,29,Leap Year,1,live updates,1,LocalDate,1,Logging,1,Mac OS,3,Math,1,Matrix,6,Maven,1,Method References,1,Mockito,1,MongoDB,3,New Features,1,Operations,1,Optional,6,Oracle,5,Oracle 18C,1,Partition,1,Patterns,1,Programs,1,Property,1,Python,2,Quarkus,1,Read,1,Real Time,1,Recursion,2,Remove,2,Rest API,1,Schedules,1,Serialization,1,Servlet,2,Sort,1,Sorting Techniques,8,Spring,2,Spring Boot,23,Spring Email,1,Spring MVC,1,Streams,31,String,61,String Programs,28,String Revese,1,StringBuilder,1,Swing,1,System,1,Tags,1,Threads,11,Tomcat,1,Tomcat 8,1,Troubleshoot,26,Unix,3,Updates,3,util,5,While Loop,1,
ltr
item
JavaProgramTo.com: Java Spark RDD reduce() Examples - sum, min and max opeartions
Java Spark RDD reduce() Examples - sum, min and max opeartions
A quick guide to explore the Spark RDD reduce() method in java programming to find sum, min and max values from the data set.
https://blogger.googleusercontent.com/img/a/AVvXsEjdIjoBANWJ9p0QRbiyKqut7uc0N1iFSRTzXXOsBrs2c5XxXTUyg8HoNbGyQD_H2I77eg2FStxU6O2mJWV24LFBajeybnkCGib_XuXthfke5cVGo3OlnlSE5QrVEFSc1YsQmId6n-hB9lhMc96DMm3NkulzjHWvtZ2EycbNOSgwiU2noYoLvV4Gc9eS=w400-h213
https://blogger.googleusercontent.com/img/a/AVvXsEjdIjoBANWJ9p0QRbiyKqut7uc0N1iFSRTzXXOsBrs2c5XxXTUyg8HoNbGyQD_H2I77eg2FStxU6O2mJWV24LFBajeybnkCGib_XuXthfke5cVGo3OlnlSE5QrVEFSc1YsQmId6n-hB9lhMc96DMm3NkulzjHWvtZ2EycbNOSgwiU2noYoLvV4Gc9eS=s72-w400-c-h213
JavaProgramTo.com
https://www.javaprogramto.com/2021/06/java-spark-rdd-reduce-example.html
https://www.javaprogramto.com/
https://www.javaprogramto.com/
https://www.javaprogramto.com/2021/06/java-spark-rdd-reduce-example.html
true
3124782013468838591
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS PREMIUM CONTENT IS LOCKED STEP 1: Share to a social network STEP 2: Click the link on your social network Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy Table of Content