The streams API was added in Java 8 to ease the task of performing bulk operations, sequentially or in parallel. This API provides two key abstractions:
the stream, which represents a finite or infinite sequence of data elements, and
the stream pipeline, which represents a multistage computation on these elements.
The elements in a stream can come from anywhere. Common sources include collections, arrays, files, regular expression pattern matchers, pseudorandom number generators, and other streams. The data elements in a stream can be object references or primitive values. Three primitive types are supported: int
, long
, and double
.
A stream pipeline consists of a source stream followed by zero or more intermediate operations and one terminal operation. Each intermediate operation transforms the stream in some way, such as mapping each element to a function of that element or filtering out all elements that do not satisfy some condition. Intermediate operations all transform one stream into another, whose element type may be the same as the input stream or different from it. The terminal operation performs a final computation on the stream resulting from the last intermediate operation, such as storing its elements into a collection, returning a certain element, or printing all of its elements.
Stream pipelines are evaluated lazily: evaluation doesn’t start until the terminal operation is invoked, and data elements that aren’t required in order to complete the terminal operation are never computed. This lazy evaluation is what makes it possible to work with infinite streams. Note that a stream pipeline without a terminal operation is a silent no-op, so don’t forget to include one.
The streams API is fluent: it is designed to allow all of the calls that comprise a pipeline to be chained into a single expression. In fact, multiple pipelines can be chained together into a single expression.
Consider the following program, which reads the words from a dictionary file and prints all the anagram groups whose size meets a user-specified minimum. The program reads each word from a user-specified dictionary file and places the words into a map.The map value is a list containing all of the words that share an alphabetized form. After the dictionary has been processed, each list is a complete anagram group. The program then iterates through the map’s values()
view and prints each list whose size meets the threshold:
The map key is the word with its letters alphabetized, so the key for "staple"
is "aelpst"(
sorted)
// Prints all large anagram groups in a dictionary iterativelypublic class Anagrams { public static void main(String[] args) throws IOException { File dictionary = new File(args[0]); int minGroupSize = Integer.parseInt(args[1]);
Map<String, Set<String>> groups = new HashMap<>(); try (Scanner s = new Scanner(dictionary)) { while (s.hasNext()) { String word = s.next(); groups.computeIfAbsent(alphabetize(word), (unused) -> new TreeSet<>()).add(word); } }
for (Set<String> group : groups.values()) if (group.size() >= minGroupSize) System.out.println(group.size() + ": " + group); }
private static String alphabetize(String s) { char[] a = s.toCharArray(); Arrays.sort(a); return new String(a); }}
The insertion of each word into the map, which is shown in bold, uses the computeIfAbsent
method, which was added in Java 8. This method looks up a key in the map: If the key is present, the method simply returns the value associated with it. If not, the method computes a value by applying the given function object to the key, associates this value with the key, and returns the computed value. The computeIfAbsent
method simplifies the implementation of maps that associate multiple values with each key
// Overuse of streams - don't do this!
public class Anagrams {
public static void main(String[] args) throws IOException {
Path dictionary = Paths.get(args[0]);
int minGroupSize = Integer.parseInt(args[1]);
try (Stream<String> words = Files.lines(dictionary)) {
words.collect(
groupingBy(word -> word.chars().sorted()
.collect(StringBuilder::new,
(sb, c) -> sb.append((char) c),
StringBuilder::append).toString()))
.values().stream()
.filter(group -> group.size() >= minGroupSize)
.map(group -> group.size() + ": " + group)
.forEach(System.out::println);
}
}
}
It is shorter, but it is also less readable, especially to programmers who are not experts in the use of streams. Overusing streams makes programs hard to read and maintain.
Luckily, there is a happy medium. The following program solves the same problem, using streams without overusing them. The result is a program that’s both shorter and clearer than the original:
// Tasteful use of streams enhances clarity and conciseness
public class Anagrams {
public static void main(String[] args) throws IOException {
Path dictionary = Paths.get(args[0]);
int minGroupSize = Integer.parseInt(args[1]);
try (Stream<String> words = Files.lines(dictionary)) {
words.collect(groupingBy(word -> alphabetize(word)))
.values().stream()
.filter(group -> group.size() >= minGroupSize)
.forEach(g -> System.out.println(g.size() + ": " + g));
}
}
// alphabetize method is the same as in original version
}
Even if you have little previous exposure to streams, this program is not hard to understand. It opens the dictionary file in a
try
-with-resources block, obtaining a stream consisting of all the lines in the file. The
stream
variable is named
words
to suggest that each element in the stream is a word. The pipeline on this stream has no intermediate operations; its terminal operation collects all the words into a map that groups the words by their alphabetized form (
Item 46). This is exactly the same map that was constructed in both previous versions of the program. Then a new
Stream<List<String>>
is opened on the
values()
view of the map. The elements in this stream are, of course, the anagram groups. The stream is filtered so that all of the groups whose size is less than
minGroupSize
are ignored, and finally, the remaining groups are printed by the terminal operation
forEach
.
Note that the lambda parameter names were chosen carefully. The parameter g
should really be named group
, but the resulting line of code would be too wide for the book. In the absence of explicit types, careful naming of lambda parameters is essential to the readability of stream pipelines.
Note also that word alphabetization is done in a separate alphabetize
method. This enhances readability by providing a name for the operation and keeping implementation details out of the main program. Using helper methods is even more important for readability in stream pipelines than in iterative code because pipelines lack explicit type information and named temporary variables.
The alphabetize
method could have been reimplemented to use streams, but a stream-based alphabetize
method would have been less clear, more difficult to write correctly, and probably slower. These deficiencies result from Java’s lack of support for primitive char
streams (which is not to imply that Java should have supported char
streams; it would have been infeasible to do so). To demonstrate the hazards of processing char
values with streams, consider the following code:
"Hello world!".chars().forEach(System.out::print);
You might expect it to print Hello world!
, but if you run it, you’ll find that it prints 721011081081113211911111410810033
. This happens because the elements of the stream returned by "Hello world!".chars()
are not char
values but int
values, so the int
overloading of print
is invoked. It is admittedly confusing that a method named chars
returns a stream of int
values. You could fix the program by using a cast to force the invocation of the correct overloading:
"Hello world!".chars().forEach(x -> System.out.print((char) x));
but ideally you should refrain from using streams to process char
values.
When you start using streams, you may feel the urge to convert all your loops into streams, but resist the urge.As a rule, even moderately complex tasks are best accomplished using some combination of streams and iteration, as illustrated by the
Anagrams
programs above. So
refactor existing code to use streams and use them in new code only where it makes sense to do so.
There are some things you can do from code blocks that you can’t do from function objects:
• From a code block, you can read or modify any local variable in scope; from a lambda, you can only read final or effectively final variables [
JLS 4.12.4], and you can’t modify any local variables.
• From a code block, you can return
from the enclosing method, break
or continue
an enclosing loop, or throw any checked exception that this method is declared to throw; from a lambda you can do none of these things.
If a computation is best expressed using these techniques, then it’s probably not a good match for streams. Conversely, streams make it very easy to do some things:
* Uniformly transform sequences of elements
• Filter sequences of elements
• Combine sequences of elements using a single operation (for example to add them, concatenate them, or compute their minimum)
• Accumulate sequences of elements into a collection, perhaps grouping them by some common attribute
• Search a sequence of elements for an element satisfying some criterion
One thing that is hard to do with streams is to access corresponding elements from multiple stages of a pipeline simultaneously: once you map a value to some other value, the original value is lost. One workaround is to map each value to a pair object containing the original value and the new value, but this is not a satisfying solution, especially if the pair objects are required for multiple stages of a pipeline. The resulting code is messy and verbose, which defeats a primary purpose of streams. When it is applicable, a better workaround is to invert the mapping when you need access to the earlier-stage value.
For example, let’s write a program to print the first twenty Mersenne primes. To refresh your memory, a Mersenne number is a number of the form 2p − 1. If p is prime, the corresponding Mersenne number may be prime; if so, it’s a Mersenne prime. As the initial stream in our pipeline, we want all the prime numbers. Here’s a method to return that (infinite) stream. We assume a static import has been used for easy access to the static members of BigInteger
:
static Stream<BigInteger> primes() {
return Stream.iterate(TWO, BigInteger::nextProbablePrime);
}
public static void main(String[] args) {
primes().map(p -> TWO.pow(p.intValueExact()).subtract(ONE))
.filter(mersenne -> mersenne.isProbablePrime(50))
.limit(20)
.forEach(System.out::println);
}
.forEach(mp -> System.out.println(mp.bitLength() + ": " + mp));
There are plenty of tasks where it is not obvious whether to use streams or iteration. For example, consider the task of initializing a new deck of cards. Assume that Card
is an immutable value class that encapsulates a Rank
and a Suit
, both of which are enum types. This task is representative of any task that requires computing all the pairs of elements that can be chosen from two sets. Mathematicians call this the Cartesian product of the two sets. Here’s an iterative implementation with a nested for-each loop that should look very familiar to you:
// Iterative Cartesian product computation
private static List<Card> newDeck() {
List<Card> result = new ArrayList<>();
for (Suit suit : Suit.values())
for (Rank rank : Rank.values())
result.add(new Card(suit, rank));
return result;
}
// Stream-based Cartesian product computation
private static List<Card> newDeck() {
return Stream.of(Suit.values())
.flatMap(suit ->
Stream.of(Rank.values())
.map(rank -> new Card(suit, rank)))
.collect(toList());
}
In summary, some tasks are best accomplished with streams, and others with iteration. Many tasks are best accomplished by combining the two approaches. There are no hard and fast rules for choosing which approach to use for a task, but there are some useful heuristics. In many cases, it will be clear which approach to use; in some cases, it won’t. If you’re not sure whether a task is better served by streams or iteration, try both and see which works better.