Java spliterator что это
A Spliterator may traverse elements individually ( tryAdvance() ) or sequentially in bulk ( forEachRemaining() ).
A Spliterator may also partition off some of its elements (using trySplit() ) as another Spliterator, to be used in possibly-parallel operations. Operations using a Spliterator that cannot split, or does so in a highly imbalanced or inefficient manner, are unlikely to benefit from parallelism. Traversal and splitting exhaust elements; each Spliterator is useful for only a single bulk computation.
A Spliterator also reports a set of characteristics() of its structure, source, and elements from among ORDERED , DISTINCT , SORTED , SIZED , NONNULL , IMMUTABLE , CONCURRENT , and SUBSIZED . These may be employed by Spliterator clients to control, specialize or simplify computation. For example, a Spliterator for a Collection would report SIZED , a Spliterator for a Set would report DISTINCT , and a Spliterator for a SortedSet would also report SORTED . Characteristics are reported as a simple unioned bit set. Some characteristics additionally constrain method behavior; for example if ORDERED , traversal methods must conform to their documented ordering. New characteristics may be defined in the future, so implementors should not assign meanings to unlisted values.
A Spliterator that does not report IMMUTABLE or CONCURRENT is expected to have a documented policy concerning: when the spliterator binds to the element source; and detection of structural interference of the element source detected after binding. A late-binding Spliterator binds to the source of elements at the point of first traversal, first split, or first query for estimated size, rather than at the time the Spliterator is created. A Spliterator that is not late-binding binds to the source of elements at the point of construction or first invocation of any method. Modifications made to the source prior to binding are reflected when the Spliterator is traversed. After binding a Spliterator should, on a best-effort basis, throw ConcurrentModificationException if structural interference is detected. Spliterators that do this are called fail-fast. The bulk traversal method ( forEachRemaining() ) of a Spliterator may optimize traversal and check for structural interference after all elements have been traversed, rather than checking per-element and failing immediately.
Spliterators can provide an estimate of the number of remaining elements via the estimateSize() method. Ideally, as reflected in characteristic SIZED , this value corresponds exactly to the number of elements that would be encountered in a successful traversal. However, even when not exactly known, an estimated value value may still be useful to operations being performed on the source, such as helping to determine whether it is preferable to split further or traverse the remaining elements sequentially.
Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition. A thread calling trySplit() may hand over the returned Spliterator to another thread, which in turn may traverse or further split that Spliterator. The behaviour of splitting and traversal is undefined if two or more threads operate concurrently on the same spliterator. If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance() , as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun.
Primitive subtype specializations of Spliterator are provided for int , long , and double values. The subtype default implementations of tryAdvance(java.util.function.Consumer) and forEachRemaining(java.util.function.Consumer) box primitive values to instances of their corresponding wrapper class. Such boxing may undermine any performance advantages gained by using the primitive specializations. To avoid boxing, the corresponding primitive-based methods should be used. For example, Spliterator.OfInt.tryAdvance(java.util.function.IntConsumer) and Spliterator.OfInt.forEachRemaining(java.util.function.IntConsumer) should be used in preference to Spliterator.OfInt.tryAdvance(java.util.function.Consumer) and Spliterator.OfInt.forEachRemaining(java.util.function.Consumer) . Traversal of primitive values using boxing-based methods tryAdvance() and forEachRemaining() does not affect the order in which the values, transformed to boxed values, are encountered.
Spliterators, like Iterator s, are for traversing the elements of a source. The Spliterator API was designed to support efficient parallel traversal in addition to sequential traversal, by supporting decomposition as well as single-element iteration. In addition, the protocol for accessing elements via a Spliterator is designed to impose smaller per-element overhead than Iterator , and to avoid the inherent race involved in having separate methods for hasNext() and next() .
For mutable sources, arbitrary and non-deterministic behavior may occur if the source is structurally interfered with (elements added, replaced, or removed) between the time that the Spliterator binds to its data source and the end of traversal. For example, such interference will produce arbitrary, non-deterministic results when using the java.util.stream framework.
- The source cannot be structurally interfered with.
For example, an instance of CopyOnWriteArrayList is an immutable source. A Spliterator created from the source reports a characteristic of IMMUTABLE . - The source manages concurrent modifications.
For example, a key set of a ConcurrentHashMap is a concurrent source. A Spliterator created from the source reports a characteristic of CONCURRENT . - The mutable source provides a late-binding and fail-fast Spliterator.
Late binding narrows the window during which interference can affect the calculation; fail-fast detects, on a best-effort basis, that structural interference has occurred after traversal has commenced and throws ConcurrentModificationException . For example, ArrayList , and many other non-concurrent Collection classes in the JDK, provide a late-binding, fail-fast spliterator. - The mutable source provides a non-late-binding but fail-fast Spliterator.
The source increases the likelihood of throwing ConcurrentModificationException since the window of potential interference is larger. - The mutable source provides a late-binding and non-fail-fast Spliterator.
The source risks arbitrary, non-deterministic behavior after traversal has commenced since interference is not detected. - The mutable source provides a non-late-binding and non-fail-fast Spliterator.
The source increases the risk of arbitrary, non-deterministic behavior since non-detected interference may occur after construction.
Example. Here is a class (not a very useful one, except for illustration) that maintains an array in which the actual data are held in even locations, and unrelated tag data are held in odd locations. Its Spliterator ignores the tags.
Java spliterator что это
A Spliterator may traverse elements individually ( tryAdvance() ) or sequentially in bulk ( forEachRemaining() ).
A Spliterator may also partition off some of its elements (using trySplit() ) as another Spliterator, to be used in possibly-parallel operations. Operations using a Spliterator that cannot split, or does so in a highly imbalanced or inefficient manner, are unlikely to benefit from parallelism. Traversal and splitting exhaust elements; each Spliterator is useful for only a single bulk computation.
A Spliterator also reports a set of characteristics() of its structure, source, and elements from among ORDERED , DISTINCT , SORTED , SIZED , NONNULL , IMMUTABLE , CONCURRENT , and SUBSIZED . These may be employed by Spliterator clients to control, specialize or simplify computation. For example, a Spliterator for a Collection would report SIZED , a Spliterator for a Set would report DISTINCT , and a Spliterator for a SortedSet would also report SORTED . Characteristics are reported as a simple unioned bit set. Some characteristics additionally constrain method behavior; for example if ORDERED , traversal methods must conform to their documented ordering. New characteristics may be defined in the future, so implementors should not assign meanings to unlisted values.
A Spliterator that does not report IMMUTABLE or CONCURRENT is expected to have a documented policy concerning: when the spliterator binds to the element source; and detection of structural interference of the element source detected after binding. A late-binding Spliterator binds to the source of elements at the point of first traversal, first split, or first query for estimated size, rather than at the time the Spliterator is created. A Spliterator that is not late-binding binds to the source of elements at the point of construction or first invocation of any method. Modifications made to the source prior to binding are reflected when the Spliterator is traversed. After binding a Spliterator should, on a best-effort basis, throw ConcurrentModificationException if structural interference is detected. Spliterators that do this are called fail-fast. The bulk traversal method ( forEachRemaining() ) of a Spliterator may optimize traversal and check for structural interference after all elements have been traversed, rather than checking per-element and failing immediately.
Spliterators can provide an estimate of the number of remaining elements via the estimateSize() method. Ideally, as reflected in characteristic SIZED , this value corresponds exactly to the number of elements that would be encountered in a successful traversal. However, even when not exactly known, an estimated value may still be useful to operations being performed on the source, such as helping to determine whether it is preferable to split further or traverse the remaining elements sequentially.
Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition. A thread calling trySplit() may hand over the returned Spliterator to another thread, which in turn may traverse or further split that Spliterator. The behaviour of splitting and traversal is undefined if two or more threads operate concurrently on the same spliterator. If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance() , as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun.
Primitive subtype specializations of Spliterator are provided for int , long , and double values. The subtype default implementations of tryAdvance(java.util.function.Consumer) and forEachRemaining(java.util.function.Consumer) box primitive values to instances of their corresponding wrapper class. Such boxing may undermine any performance advantages gained by using the primitive specializations. To avoid boxing, the corresponding primitive-based methods should be used. For example, Spliterator.OfPrimitive.tryAdvance(java.util.function.IntConsumer) and Spliterator.OfPrimitive.forEachRemaining(java.util.function.IntConsumer) should be used in preference to Spliterator.OfInt.tryAdvance(java.util.function.Consumer) and Spliterator.OfInt.forEachRemaining(java.util.function.Consumer) . Traversal of primitive values using boxing-based methods tryAdvance() and forEachRemaining() does not affect the order in which the values, transformed to boxed values, are encountered.
Spliterators, like Iterator s, are for traversing the elements of a source. The Spliterator API was designed to support efficient parallel traversal in addition to sequential traversal, by supporting decomposition as well as single-element iteration. In addition, the protocol for accessing elements via a Spliterator is designed to impose smaller per-element overhead than Iterator , and to avoid the inherent race involved in having separate methods for hasNext() and next() .
For mutable sources, arbitrary and non-deterministic behavior may occur if the source is structurally interfered with (elements added, replaced, or removed) between the time that the Spliterator binds to its data source and the end of traversal. For example, such interference will produce arbitrary, non-deterministic results when using the java.util.stream framework.
- The source cannot be structurally interfered with.
For example, an instance of CopyOnWriteArrayList is an immutable source. A Spliterator created from the source reports a characteristic of IMMUTABLE . - The source manages concurrent modifications.
For example, a key set of a ConcurrentHashMap is a concurrent source. A Spliterator created from the source reports a characteristic of CONCURRENT . - The mutable source provides a late-binding and fail-fast Spliterator.
Late binding narrows the window during which interference can affect the calculation; fail-fast detects, on a best-effort basis, that structural interference has occurred after traversal has commenced and throws ConcurrentModificationException . For example, ArrayList , and many other non-concurrent Collection classes in the JDK, provide a late-binding, fail-fast spliterator. - The mutable source provides a non-late-binding but fail-fast Spliterator.
The source increases the likelihood of throwing ConcurrentModificationException since the window of potential interference is larger. - The mutable source provides a late-binding and non-fail-fast Spliterator.
The source risks arbitrary, non-deterministic behavior after traversal has commenced since interference is not detected. - The mutable source provides a non-late-binding and non-fail-fast Spliterator.
The source increases the risk of arbitrary, non-deterministic behavior since non-detected interference may occur after construction.
Example. Here is a class (not a very useful one, except for illustration) that maintains an array in which the actual data are held in even locations, and unrelated tag data are held in odd locations. Its Spliterator ignores the tags.
Использование Spliterator в Java
Итераторы в Java используются для прохождения элементов данного источника. Spliterator в Java является одним из четырех доступных итераторов Java – Iterator, Enumeration, ListIterator и Spliterator . Это интерфейс, доступный в пакете java.util .
Spliterator был впервые представлен в Java 8 для поддержки параллельного программирования. Однако мы можем использовать его как для последовательной, так и для параллельной обработки элементов данных. Чтобы получить экземпляр Java Spliterator , мы будем использовать метод spliterator () :
Мы можем думать о Java Spliterator как:
Характеристики сплитератора :
Интерфейс Spliterator определяет некоторые интегральные константы, представляющие его характеристики. Наш экземпляр может иметь одну или несколько из следующих восьми характеристик:
- SIZED – способен возвращать точное количество элементов в источнике, когда мы вызываем метод valuSize ()
- SUBSIZED – когда мы разделяем экземпляр с помощью trySplit () и получаем также SIZED SplitIterators
- ORDERED – перебор упорядоченной последовательности
- SORTED – перебирать отсортированную последовательность
- NONNULL – источник гарантирует, что значения не равныNULL
- DISTINCT – в нашей исходной последовательности нет дубликатов
- IMMUTABLE – если мы не можем структурно изменить источник элемента
- CONCURRENT – источник элемента может быть безопасно одновременно изменен
Мы можем использовать метод int Характеристики () для запроса характеристик нашего экземпляра Spliterator . Он возвращает значение OR для всех значений квалифицирующих признаков для нашего Spliterator . Для нашего определенного разделителя у нас будет:
hasCharacteristics () :
Мы можем использовать логический метод hasCharacteristics (int признак), чтобы проверить, имеет ли наш экземпляр данную характеристику или нет:
estimateSize ():
МетодtimateSize () возвращает приблизительное количество элементов, оставшихся для итерации. Он возвращает Long.MAX_VALUE, если значение бесконечно, неизвестно или слишком дорого для вычисления. Для SIZED Spliterator он возвращает значение, которое точно соответствует количеству элементов, которые будут встречаться при успешном обходе:
getExactSizeIfKnown ():
Это просто вспомогательный метод, который возвращает оценкуSize (), если это SIZED Spliterator или возвращает -1 :
tryAdvance ():
Подпись метода tryAdvance () выглядит следующим образом:
Метод tryAdvance () в Spliterator объединяет операторы hasNext () и next (), присутствующие в базовом итераторе . Поэтому, если остающийся элемент существует, он выполняет с ним заданное действие, возвращая true; иначе возвращает ложь . Другими словами, он выполняет действие над следующим элементом в последовательности, а затем продвигает итератор.
Если у нас есть ORDERED Spliterator , действие выполняется над следующим элементом в порядке столкновения.
forEachRemaining ():
Метод forEachRemaining (Consumer <? SuperT> action) выполняет данное действие для каждого оставшегося элемента, последовательно в текущем потоке, пока все элементы не будут обработаны или действие не вызовет исключение:
Текущая реализация по умолчанию неоднократно вызывает tryAdvance (), пока не вернет false .
trySplit ():
Если разбиение возможно, метод trySplit () разделяет вызывающий Spliterator и возвращает ссылку на элементы покрытия Spliterator, которые не будут покрываться этим Spliterator по возвращении из этого метода. В противном случае возвращается ноль .
Таким образом, после успешного разделения оригинальный Spliterator будет выполнять итерацию по одной части последовательности, а возвращенный Spliterator – по другой ее части.
Кроме того, возвращаемый Spliterator охватывает строгий префикс элементов для начального ORDERED Spliterator (например, над списком ) :
Spliterator: Concurrency in Java Parallel Stream
Java 1.8 came with streams API which supports parallel execution on Collection. It is simple, elegant but when and how to use them is not well looked at by the developer community. In this post, I will explore Spliterator, which helps to use parallel stream effectively.
Q. How would you create a new list of uppercased words given a list of words.
A. Using Stream this can be done in a single line.
But shall we use parallelStream() in place of stream() ? The answer is, it depends. If the list size is small, it is not suggested to use ParallelStream. The next question could be what is “not small” for ParallelStream? As a rule of thumb, for non-trivial CPU bound operation, if the size of the Collection is more than 1000, it can be considered “not small” and Parallel Streams can become useful. Doug Lea has guidance here http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html . Going by guidance, it is not recommended to use parallelStream() in the above code snippet.
Q. What if operations on Stream are blocking I/O?
For blocking I/O bound traditionally, there would a thread pool which gets one job at a time or all jobs together and the application program hopes it does the right thing. With ParallelStream work is much simple, just adding parallelStream() in stream execution pipeline makes execution concurrent. Caution: As all streams by default share common java run time environment Fork Join Pool, it is strongly recommended to have a custom pool for blocking I/O bound execution. Many executors, along with Fork Join Pool works fine for this, though testing for different scenario is recommended.
In the above example, all words, Hello, World, With, Parallel, Stream would be printed after 5 seconds. This may or may not be desirable. Say if these words are links of action that came as part of the request, and sleep is actually blocking rest calls on the links. Few requests with a high number of links can make the application unresponsive to almost all other requests. This is the classical starvation problem.
Q. How to control the number of concurrent operations in ParallelStream.
A. For this we have to use a new Iterator which is specifically introduced in 1.8 for ParallelStream called Spliterator. Spliterator is a portmanteau of split and iterator. At a high level, this is used to split the stream into chunks of streams which can be executed concurrently, with each chunk iterated sequentially. trySplit() takes care of the first part, i.e. split. It either returns a smaller chunk if the stream is large enough or reruns null signifying stream should not be divided further for parallel execution. tryAdvance() takes care of the second part, i.e. iteration. Iteration for a given chunk of the stream should be continued till tryAdvance() returns true. Let’s see how can we override these two methods and control the number of splits.
Above is an implementation of Spliterator for List . It divides the input List into N splits passed in as the second argument of the static factory method. Disclaimer: Code has no validation and is not production-ready.
Let’s re-write our steam with I/O using the above implementation.
It is almost the same as the last try. The difference is the usage of custom Spliterator . With this change in the above solution for a given request, we will have a maximum of 2 splits. Thus, it ensures all work submitted via CappedListSpliterator would get a maximum of 2 threads, eliminating starvation.
In the above example, elements would be divided into two splits, Hello, World , With, Parallel, Stream. We should see Hello, With followed by World, Parallel and at end Stream printed with a 5-second interval. Below is my try to represent splits using pictures. The same color represents the same passage of time from the original submission to ForkJoinPool. Note this assumes the system had enough thread and no other requests are executing on the pool.
Hopefully, this will encourage you to look at Spliterator and get to know them more. Happy Learning .