java 中數據聚合分解的神器
對比例子
---一個排序、取值
---------傳統寫法
List<Transaction> groceryTransactions = new Arraylist<>();
for(Transaction t: transactions){
if(t.getType() == Transaction.GROCERY){
groceryTransactions.add(t);
}
}
Collections.sort(groceryTransactions, new Comparator(){
public int compare(Transaction t1, Transaction t2){
return t2.getValue().compareTo(t1.getValue());
}
});
List<Integer> transactionIds = new ArrayList<>();
for(Transaction t: groceryTransactions){
transactionsIds.add(t.getId());
}
----------stream寫法:代碼更加簡潔易讀;而且使用并發模式,程序執行速度更快
List<Integer> transactionsIds = transactions.parallelStream().
filter(t -> t.getType() == Transaction.GROCERY).
sorted(comparing(Transaction::getValue).reversed()).
map(Transaction::getId).
collect(toList());
數據聚合3個步驟
- 取數:形成stream
- 中間方法(Intermediate):如過濾、排序、去重、模型轉換等
- 吐出終端數據(Terminal):給出想要呈現的數據形式、或結果數據
stream 方法
Intermediate:map (mapToInt, flatMap 等)、 filter、 distinct、 sorted、 peek、 skip、 parallel、 sequential、 unordered
Terminal:forEach、 forEachOrdered、 toArray、 reduce、 collect、 min、 max、 count、iterator
Short-circuiting:
anyMatch、 allMatch、 noneMatch、 findFirst、 findAny、 limit
Intermediate
concat
將兩個Stream連接在一起,合成一個Stream。若兩個輸入的Stream都是排序的,則新Stream也是排序的;若輸入的Stream中任何一個是并行的,則新的Stream也是并行的;若關閉新的Stream時,原兩個輸入的Stream都將執行關閉處理。
Stream.concat(Stream.of(1, 2, 3), Stream.of(4, 5))
.forEach(integer -> System.out.print(integer + " "));
// 打印結果
// 1 2 3 4 5
distinct
除掉原Stream中重復的元素,生成的新Stream中沒有沒有重復的元素。
Stream.of(1,2,3,1,2,3)
.distinct()
.forEach(System.out::println); // 打印結果:1,2,3
filter
對原Stream按照指定條件過濾,在新建的Stream中,只包含滿足條件的元素,將不滿足條件的元素過濾掉。
Stream.of(1, 2, 3, 4, 5)
.filter(item -> item > 3)
.forEach(System.out::println);// 打印結果:4,5
map
map方法將對于Stream中包含的元素使用給定的轉換函數進行轉換操作,新生成的Stream只包含轉換生成的元素。
為了提高處理效率,官方已封裝好了,三種變形:mapToDouble,mapToInt,mapToLong。如果想將原Stream中的數據類型,轉換為double,int或者是long是可以調用相對應的方法。
Stream.of("a", "b", "hello")
.map(item-> item.toUpperCase())
.forEach(System.out::println);
Stream.of("1", "2", "3")
.mapToDouble(e->Double.valueOf(e))
.forEach(d-> System.out.println("d = " + d));
flatMap
flatMap方法與map方法類似,都是將原Stream中的每一個元素通過轉換函數轉換,不同的是,該換轉函數的對象是一個Stream,也不會再創建一個新的Stream,而是將原Stream的元素取代為轉換的Stream。如果轉換函數生產的Stream為null,應由空Stream取代。flatMap有三個對于原始類型的變種方法,分別是:flatMapToInt,flatMapToLong和flatMapToDouble。
Stream.of(1, 2, 3)
.flatMap(integer -> Stream.of(integer * 10))
.forEach(System.out::println);
// 打印結果
// 10,20,30
peek
peek方法生成一個包含原Stream的所有元素的新Stream,同時會提供一個消費函數(Consumer實例),新Stream每個元素被消費的時候都會執行給定的消費函數,并且消費函數優先執行
Stream.of(1, 2, 3, 4, 5)
.peek(integer -> System.out.println("accept:" + integer))
.forEach(System.out::println);
// 打印結果
// accept:1
// 1
// accept:2
// 2
...
skip
skip方法將過濾掉原Stream中的前N個元素,返回剩下的元素所組成的新Stream。如果原Stream的元素個數大于N,將返回原Stream的后(原Stream長度-N)個元素所組成的新Stream;如果原Stream的元素個數小于或等于N,將返回一個空Stream。
Stream.of(1, 2, 3,4,5)
.skip(2)
.forEach(System.out::println);
// 打印結果
// 3,4,5
sorted
對原Stream進行排序,返回一個有序列的新Stream。sorterd有兩種變體sorted(),sorted(Comparator),前者將默認使用Object.equals(Object)進行排序,而后者接受一個自定義排序規則函數(Comparator),可按照意愿排序。
Stream.of(5, 4, 3, 2, 1)
.sorted()
.forEach(System.out::println);
Stream.of(1, 2, 3, 4, 5)
.sorted( (a, b)-> a >= b ? -1 : 1)
.forEach(System.out::println);
Terminal
collect ??系列
在Stream接口提供了Collect的方法:
<R> R collect(Supplier<R> supplier, //提供數據容器
BiConsumer<R, ? super T> accumulator, // 如何添加到容器
BiConsumer<R, R> combiner // 多個容器的聚合策略
);
如:
String concat = stringStream.collect(StringBuilder::new, StringBuilder::append,StringBuilder::append).toString();
//等價于上面,這樣看起來應該更加清晰
String concat = stringStream.collect(() -> new StringBuilder(),(l, x) -> l.append(x), (r1, r2) -> r1.append(r2)).toString();
// List轉Map
Lists.<Person>newArrayList().stream()
.collect(() -> new HashMap<Integer,List<Person>>(),
(h, x) -> {
List<Person> value = h.getOrDefault(x.getType(), Lists.newArrayList());
value.add(x);
h.put(x.getType(), value);
},
HashMap::putAll
);
<R, A> R collect(Collector<? super T, A, R> collector);
// 提供初始容器->加入元素到容器->并發下多容器聚合->對聚合后結果進行操作
Collector是Stream的可變減少操作接口(可變減少操作如:集合轉換;計算元素相關的統計信息,例如sum,min,max或average等)
提供初始容器->加入元素到容器->并發下多容器聚合->對聚合后結果進行操作
Stream是支持并發操作的,為了避免競爭,對于reduce線程都會有獨立的result,combiner的作用在于合并每個線程的result得到最終結果。
---collector 接口定義
* @param <T> the type of input elements to the reduction operation
* @param <A> the mutable accumulation type of the reduction operation (often
* hidden as an implementation detail)
* @param <R> the result type of the reduction operation
* @since 1.8
*/
public interface Collector<T, A, R> {
/**
* A function that creates and returns a new mutable result container.
*
* @return a function which returns a new, mutable result container
*/
Supplier<A> supplier();
/**
* A function that folds a value into a mutable result container.
*
* @return a function which folds a value into a mutable result container
*/
BiConsumer<A, T> accumulator();
/**
* A function that accepts two partial results and merges them. The
* combiner function may fold state from one argument into the other and
* return that, or may return a new result container.
*
* @return a function which combines two partial results into a combined
* result
*/
BinaryOperator<A> combiner();
/**
* Perform the final transformation from the intermediate accumulation type
* {@code A} to the final result type {@code R}.
*
* <p>If the characteristic {@code IDENTITY_TRANSFORM} is
* set, this function may be presumed to be an identity transform with an
* unchecked cast from {@code A} to {@code R}.
*
* @return a function which transforms the intermediate result to the final
* result
*/
Function<A, R> finisher();
/**
* Returns a {@code Set} of {@code Collector.Characteristics} indicating
* the characteristics of this Collector. This set should be immutable.
*
* @return an immutable set of collector characteristics
*/
Set<Characteristics> characteristics();
/**
* Returns a new {@code Collector} described by the given {@code supplier},
* {@code accumulator}, and {@code combiner} functions. The resulting
* {@code Collector} has the {@code Collector.Characteristics.IDENTITY_FINISH}
* characteristic.
*
* @param supplier The supplier function for the new collector
* @param accumulator The accumulator function for the new collector
* @param combiner The combiner function for the new collector
* @param characteristics The collector characteristics for the new
* collector
* @param <T> The type of input elements for the new collector
* @param <R> The type of intermediate accumulation result, and final result,
* for the new collector
* @throws NullPointerException if any argument is null
* @return the new {@code Collector}
*/
public static<T, R> Collector<T, R, R> of(Supplier<R> supplier,
BiConsumer<R, T> accumulator,
BinaryOperator<R> combiner,
Characteristics... characteristics) {
Objects.requireNonNull(supplier);
Objects.requireNonNull(accumulator);
Objects.requireNonNull(combiner);
Objects.requireNonNull(characteristics);
Set<Characteristics> cs = (characteristics.length == 0)
? Collectors.CH_ID
: Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH,
characteristics));
return new Collectors.CollectorImpl<>(supplier, accumulator, combiner, cs);
}
/**
* Returns a new {@code Collector} described by the given {@code supplier},
* {@code accumulator}, {@code combiner}, and {@code finisher} functions.
*
* @param supplier The supplier function for the new collector
* @param accumulator The accumulator function for the new collector
* @param combiner The combiner function for the new collector
* @param finisher The finisher function for the new collector
* @param characteristics The collector characteristics for the new
* collector
* @param <T> The type of input elements for the new collector
* @param <A> The intermediate accumulation type of the new collector
* @param <R> The final result type of the new collector
* @throws NullPointerException if any argument is null
* @return the new {@code Collector}
*/
public static<T, A, R> Collector<T, A, R> of(Supplier<A> supplier,
BiConsumer<A, T> accumulator,
BinaryOperator<A> combiner,
Function<A, R> finisher,
Characteristics... characteristics) {
Objects.requireNonNull(supplier);
Objects.requireNonNull(accumulator);
Objects.requireNonNull(combiner);
Objects.requireNonNull(finisher);
Objects.requireNonNull(characteristics);
Set<Characteristics> cs = Collectors.CH_NOID;
if (characteristics.length > 0) {
cs = EnumSet.noneOf(Characteristics.class);
Collections.addAll(cs, characteristics);
cs = Collections.unmodifiableSet(cs);
}
return new Collectors.CollectorImpl<>(supplier, accumulator, combiner, finisher, cs);
}
/**
* Characteristics indicating properties of a {@code Collector}, which can
* be used to optimize reduction implementations.
*/
enum Characteristics {
/**
* Indicates that this collector is <em>concurrent</em>, meaning that
* the result container can support the accumulator function being
* called concurrently with the same result container from multiple
* threads.
*
* <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
* then it should only be evaluated concurrently if applied to an
* unordered data source.
*/
CONCURRENT,
/**
* Indicates that the collection operation does not commit to preserving
* the encounter order of input elements. (This might be true if the
* result container has no intrinsic order, such as a {@link Set}.)
*/
UNORDERED,
/**
* Indicates that the finisher function is the identity function and
* can be elided. If set, it must be the case that an unchecked cast
* from A to R will succeed.
*/
IDENTITY_FINISH
}
}
-- jdk inf
<R> R collect(Supplier<R> supplier,
BiConsumer<R, ? super T> accumulator,
BiConsumer<R, R> combiner);
-- emp
List<String> asList = stringStream.collect(ArrayList::new, ArrayList::add,
ArrayList::addAll);
String concat = stringStream.collect(StringBuilder::new, StringBuilder::append,
StringBuilder::append)
.toString();
-- jdk inf
<R, A> R collect(Collector<? super T, A, R> collector);
-- emp
List<String> asList = stringStream.collect(Collectors.toList());
Map<String, List<Person>> peopleByCity
= personStream.collect(Collectors.groupingBy(Person::getCity));
Map<String, Map<String, List<Person>>> peopleByStateAndCity
= personStream.collect(Collectors.groupingBy(Person::getState,
Collectors.groupingBy(Person::getCity)));
Collector<T, A, R>接受三個泛型參數,對可變減少操作的數據類型作相應限制:
T:輸入元素類型
A:可變處理函數
R:結果類型
Collector接口聲明了4個函數,一起協作,將元素放入容器,經過轉換輸出想要結果:
- Supplier<A> supplier(): 創建新的結果
- BiConsumer<A, T> accumulator(): 將元素添加到結果容器
- BinaryOperator<A> combiner(): 將兩個結果容器合并為一個結果容器
- Function<A, R> finisher(): 對結果容器作相應的變換
todo 自定義Collector
轉換成其他集合
對于前面提到了很多Stream的鏈式操作,但是,我們總是要將Strea生成一個集合,比如:
- 將流轉換成集合
- 在集合上進行一系列鏈式操作后, 最終希望生成一個值
- 寫單元測試時, 需要對某個具體的集合做斷言
toList、toSet、...、toCollection、toMap
List<Integer> collectList = Stream.of(1, 2, 3, 4)
.collect(Collectors.toList());
// Collectors.toList() 內部實現
public static <T>
Collector<T, ?, List<T>> toList() {
return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
(left, right) -> { left.addAll(right); return left; },
CH_ID);
}
如希望生成一個不是由Stream類庫自動指定的一種類型(如TreeSet)。此時使用toCollection,它接受一個函數作為參數, 來創建集合。
toMap最少應接受兩個參數,一個用來生成key,另外一個用來生成value。toMap方法有三種變形:
toMap(Function<? super T, ? extends K> keyMapper,Function<? super T, ? extends U> valueMapper)
轉成值
使用collect可以將Stream轉換成值。如maxBy和minBy允許用戶按照某個特定的順序生成一個值。
Optional<Integer> collectMaxBy = Stream.of(1, 2, 3, 4)
.collect(Collectors.maxBy(Comparator.comparingInt(o -> o)));
分割數據塊 partitioningBy
collect的一個常用操作將Stream分解成兩個集合。
- 兩次過濾,如果過濾操作復雜,每個流上都要執行這樣的操作, 代碼也會變得冗余。
-
partitioningBy方法,它接受一個流,并將其分成兩部分:使用Predicate對象,指定條件并判斷一個元素應該屬于哪個部分,并根據布爾值返回一個Map到列表
Map<Boolean, List<Integer>> collectParti = Stream.of(1, 2, 3, 4)
.collect(Collectors.partitioningBy(it -> it % 2 == 0));
數據分組 groupingBy
數據分組是一種更自然的分割數據操作, 與將數據分成true和false兩部分不同,可以使用任意值對數據分組。
調用Stream的collect方法,傳入一個收集器,groupingBy接受一個分類函數,用來對數據分組,就像partitioningBy一樣,接受一個
Predicate對象將數據分成true和false兩部分。我們使用的分類器是一個Function對象,和map操作用到的一樣。
Map<Boolean, List<Integer>> collectGroup= Stream.of(1, 2, 3, 4)
.collect(Collectors.groupingBy(it -> it > 3));
// 內部實現
public static <T, K, D, A, M extends Map<K, D>>
Collector<T, ?, M> groupingBy(Function<? super T, ? extends K> classifier, // 對key分類
Supplier<M> mapFactory, // Map的容器具體類型
Collector<? super T, A, D> downstream // 對Value的收集操作
) {
.......
}
// 舉例
//原生形式
Lists.<Person>newArrayList().stream()
.collect(() -> new HashMap<Integer,List<Person>>(),
(h, x) -> {
List<Person> value = h.getOrDefault(x.getType(), Lists.newArrayList());
value.add(x);
h.put(x.getType(), value);
},
HashMap::putAll
);
//groupBy形式
Lists.<Person>newArrayList().stream()
.collect(Collectors.groupingBy(Person::getType, HashMap::new, Collectors.toList()));
//因為對值有了操作,因此我可以更加靈活的對值進行轉換
// 返回Map<type, Map<name, Set>>
Lists.<Person>newArrayList().stream()
.collect(Collectors.groupingBy(Person::getType, HashMap::new, Collectors.mapping(Person::getName,Collectors.toSet())));
聚合 reducing
public static <T> Collector<T, ?, T>
reducing(T identity, BinaryOperator<T> op) {
return new CollectorImpl<>(
boxSupplier(identity), // 一個長度為1的Object[]數組 作為容器,為何用數組?因為不可變類型
(a, t) -> { a[0] = op.apply(a[0], t); }, // 加入容器操作
(a, b) -> { a[0] = op.apply(a[0], b[0]); return a; }, // 多容器合并
a -> a[0], // 取聚合后的結果
CH_NOID);
}
// --------舉例
//原生操作
final Integer[] integers = Lists.newArrayList(1, 2, 3, 4, 5)
.stream()
.collect(() -> new Integer[]{0}, (a, x) -> a[0] += x, (a1, a2) -> a1[0] += a2[0]);
//reducing操作
final Integer collect = Lists.newArrayList(1, 2, 3, 4, 5)
.stream()
.collect(Collectors.reducing(0, Integer::sum));
//當然Stream也提供了reduce操作
final Integer collect = Lists.newArrayList(1, 2, 3, 4, 5)
.stream().reduce(0, Integer::sum)
字符串拼接
String strJoin = Stream.of("1", "2", "3", "4")
.collect(Collectors.joining(",", "[", "]"));
System.out.println("strJoin: " + strJoin);
// 打印結果
// strJoin: [1,2,3,4]
// Collectors.joining 內部實現:
public static Collector<CharSequence, ?, String> joining() {
return new CollectorImpl<CharSequence, StringBuilder, String>(
StringBuilder::new, StringBuilder::append,
(r1, r2) -> { r1.append(r2); return r1; },
StringBuilder::toString, CH_NOID);
}
組合Collector
可以將Colletor 組合起來使用
Map<Boolean, Long> partiCount = Stream.of(1, 2, 3, 4)
.collect(Collectors.partitioningBy(it -> it.intValue() % 2 == 0,
Collectors.counting()));
自定義Collector
求一段數字的和,如果是奇數,直接相加;如果是偶數,乘以2后在相加
定義一個類IntegerSum作為過渡容器
public class IntegerSum {
Integer sum;
public IntegerSum(Integer sum) {
this.sum = sum;
}
public IntegerSum doSum(Integer item) {
if (item % 2 == 0) {
this.sum += item * 2;
} else {
this.sum += item;
}
return this;
}
public IntegerSum doCombine(IntegerSum it) {
this.sum += it.sum;
return this;
}
public Integer toValue() {
return this.sum;
}
}
Integer sumRes = Stream.of(1, 2, 3, 4, 5).collect(new Collector<Integer, IntegerSum, Integer>() {
/**
* A function that creates and returns a new mutable result container.
*
* @return a function which returns a new, mutable result container
*/
@Override
public Supplier<IntegerSum> supplier() {
return ()-> new IntegerSum(0);
}
/**
* A function that folds a value into a mutable result container.
*
* @return a function which folds a value into a mutable result container
*/
@Override
public BiConsumer<IntegerSum, Integer> accumulator() {
// return (a, b) -> a.doSum(b);
return IntegerSum::doSum;
}
/**
* A function that accepts two partial results and merges them. The
* combiner function may fold state from one argument into the other and
* return that, or may return a new result container.
*
* @return a function which combines two partial results into a combined
* result
*/
@Override
public BinaryOperator<IntegerSum> combiner() {
// return (a,b)->a.doCombine(b);
return IntegerSum::doCombine;
}
/**
* Perform the final transformation from the intermediate accumulation type
* {@code A} to the final result type {@code R}.
*
* <p>If the characteristic {@code IDENTITY_TRANSFORM} is
* set, this function may be presumed to be an identity transform with an
* unchecked cast from {@code A} to {@code R}.
*
* @return a function which transforms the intermediate result to the final
* result
*/
@Override
public Function<IntegerSum, Integer> finisher() {
return IntegerSum::toValue;
}
/**
* Returns a {@code Set} of {@code Collector.Characteristics} indicating
* the characteristics of this Collector. This set should be immutable.
*
* @return an immutable set of collector characteristics
*/
@Override
public Set<Characteristics> characteristics() {
return Collections.emptySet();
}
});
forEachOrdered
forEachOrdered方法與forEach類似,都是遍歷Stream中的所有元素,不同的是,如果該Stream預先設定了順序,會按照預先設定的順序執行(Stream是無序的),默認為元素插入的順序。
Stream.of(5,2,1,4,3)
.forEachOrdered(integer -> {
System.out.println("integer:"+integer);
});
// 打印結果
// integer:5
// integer:2
// integer:1
// integer:4
// integer:3
max、min
// Comparator 可指定排序規則
Optional<Integer> max = Stream.of(1, 2, 3, 4, 5)
.max((o1, o2) -> o2 - o1);
System.out.println("max:" + max.get());// 打印結果:max:1
Optional<Integer> max = Stream.of(1, 2, 3, 4, 5)
.max((o1, o2) -> o1 - o2);
System.out.println("max:" + max.get());// 打印結果:min:5
Short-circuiting
- allMatch:判斷Stream中的元素是否全部滿足指定條件。如果全部滿足條件返回true,否則返回false。
- anyMatch:是否有滿足指定條件的元素。如果最少有一個滿足條件返回true,否則返回false。
- findAny:獲取含有Stream中的某個元素的Optional,如果Stream為空,則返回一個空的Optional。
- findFirst:獲取含有Stream中的第一個元素的Optional
- limit方法將截取原Stream,截取后Stream的最大長度不能超過指定值N。如果原Stream的元素個數大于N,將截取原Stream的前N個元素;如果原Stream的元素個數小于或等于N,將截取原Stream中的所有元素。
- noneMatch方法將判斷Stream中的所有元素是否滿足指定的條件,如果所有元素都不滿足條件,返回true;否則,返回false.
Ref:
https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
https://www.ibm.com/developerworks/cn/java/j-lo-java8streamapi/
https://www.baeldung.com/java-8-collectors
https://blog.csdn.net/IO_Field/article/details/54971608