Netty.docs: マイクロベンチマーク

Nettyには、一連のマイクロベンチマークテストを実行する 'netty-microbench' というモジュールがあります。これは、HotSpotの推奨マイクロベンチマークソリューションであるOpenJDK JMH上に構築されています。「必要なものはすべて含まれている」ため、すぐに始めるために追加の依存関係は必要ありません。

ベンチマークの実行

ベンチマークは、maven を使用してコマンドラインから実行することも、IDE で直接実行することもできます。すべてのテストをデフォルト設定で実行するには、mvn -DskipTests=false test を使用します。通常のテスト実行中に（時間のかかる可能性のある）マイクロベンチマークがユニットテストとして実行されないようにするために、skipTests=false を明示的に設定する必要があります。

すべてがうまくいけば、JMH がフォークの数に対してウォームアップとベンチマークのイテレーションを実行し、優れた概要が表示されます。典型的なベンチマーク実行は次のようになります（出力にたくさん表示されます）。

# Fork: 2 of 2
# Warmup: 10 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Running: io.netty.microbench.buffer.ByteBufAllocatorBenchmark.pooledDirectAllocAndFree_1_0
# Warmup Iteration   1: 8454.103 ops/ms
# Warmup Iteration   2: 11551.524 ops/ms
# Warmup Iteration   3: 11677.575 ops/ms
# Warmup Iteration   4: 11404.954 ops/ms
# Warmup Iteration   5: 11553.299 ops/ms
# Warmup Iteration   6: 11514.766 ops/ms
# Warmup Iteration   7: 11661.768 ops/ms
# Warmup Iteration   8: 11667.577 ops/ms
# Warmup Iteration   9: 11551.240 ops/ms
# Warmup Iteration  10: 11692.991 ops/ms
Iteration   1: 11633.877 ops/ms
Iteration   2: 11740.063 ops/ms
Iteration   3: 11751.798 ops/ms
Iteration   4: 11260.071 ops/ms
Iteration   5: 11461.010 ops/ms
Iteration   6: 11642.912 ops/ms
Iteration   7: 11808.595 ops/ms
Iteration   8: 11683.780 ops/ms
Iteration   9: 11750.292 ops/ms
Iteration  10: 11769.986 ops/ms

Result : 11650.238 ±(99.9%) 229.698 ops/ms
  Statistics: (min, avg, max) = (11260.071, 11650.238, 11808.595), stdev = 169.080
  Confidence interval (99.9%): [11420.540, 11879.937]

最後に、テスト出力は（システムのセットアップと構成によって異なりますが）次のようになります。

Benchmark                                                                Mode   Samples         Mean   Mean error    Units
i.n.m.b.ByteBufAllocatorBenchmark.pooledDirectAllocAndFree_1_0          thrpt        20    11658.812      120.728   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledDirectAllocAndFree_2_256        thrpt        20    10308.626      147.528   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledDirectAllocAndFree_3_1024       thrpt        20     8855.815       55.933   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledDirectAllocAndFree_4_4096       thrpt        20     5545.538     1279.721   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledDirectAllocAndFree_5_16384      thrpt        20     6741.581       75.975   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledDirectAllocAndFree_6_65536      thrpt        20     7252.869       70.609   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledHeapAllocAndFree_1_0            thrpt        20     9750.225       73.900   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledHeapAllocAndFree_2_256          thrpt        20     9936.639      657.818   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledHeapAllocAndFree_3_1024         thrpt        20     8903.130      197.533   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledHeapAllocAndFree_4_4096         thrpt        20     6664.157       74.163   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledHeapAllocAndFree_5_16384        thrpt        20     6374.924      337.869   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.pooledHeapAllocAndFree_6_65536        thrpt        20     6386.337       44.960   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledDirectAllocAndFree_1_0        thrpt        20     2137.241       30.792   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledDirectAllocAndFree_2_256      thrpt        20     1873.727       41.843   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledDirectAllocAndFree_3_1024     thrpt        20     1902.025       34.473   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledDirectAllocAndFree_4_4096     thrpt        20     1534.347       20.509   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledDirectAllocAndFree_5_16384    thrpt        20      838.804       12.575   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledDirectAllocAndFree_6_65536    thrpt        20      276.976        3.021   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledHeapAllocAndFree_1_0          thrpt        20    35820.568      259.187   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledHeapAllocAndFree_2_256        thrpt        20    19660.951      295.012   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledHeapAllocAndFree_3_1024       thrpt        20     6264.614       77.704   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledHeapAllocAndFree_4_4096       thrpt        20     2921.598       95.492   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledHeapAllocAndFree_5_16384      thrpt        20      991.631       49.220   ops/ms
i.n.m.b.ByteBufAllocatorBenchmark.unpooledHeapAllocAndFree_6_65536      thrpt        20      261.718       11.108   ops/ms
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 993.382 sec - in io.netty.microbench.buffer.ByteBufAllocatorBenchmark

ベンチマークは、IDE から直接実行することもできます。netty 親プロジェクトをインポートした場合は、microbench サブプロジェクトを開き、src/main/java/io/netty/microbench 名前空間に移動します。buffer 名前空間で、他の JUnit ベースのテストと同様に ByteBufAllocatorBenchmark を実行できます。主な違いは、（今のところ）各サブベンチマークを個別に実行することはできず、フルベンチマークを一度に実行することしかできないことです。mvn を使用して直接実行した場合と同じ出力がコンソールに表示されるはずです。

ベンチマークの記述

ベンチマーク自体の記述は難しくありませんが、正しく行うのは難しいです。これは、マイクロベンチプロジェクトの使用が難しいからではなく、ベンチマークを記述する際に一般的な落とし穴を避ける必要があるためです。幸いなことに、JMH スイートには、それらのほとんどを軽減するための便利なアノテーションと機能が用意されています。開始するには、ベンチマークを AbstractMicrobenchmark から拡張する必要があります。これにより、テストが JUnit を介して実行されることが保証され、いくつかのデフォルトが構成されます。

public class MyBenchmark extends AbstractMicrobenchmark {

}

次のステップは、@GenerateMicroBenchmark でアノテーションが付けられたメソッドを作成することです（そして、わかりやすい名前を付けます）。

@GenerateMicroBenchmark
public void measureSomethingHere() {

}

適切な JMH テストを記述する方法のサンプルとインスピレーションを得るには、こちらを参照するのが一番良いでしょう。また、JMH の主要な著者の1人の講演も確認してください。

ランタイム条件のカスタマイズ

（AbstractMicrobenchmark にある）デフォルト設定は次のとおりです。

ウォームアップイテレーション: 10
測定イテレーション: 10
フォーク数: 2

これらの設定は、ランタイムにシステムプロパティ（warmupIterations、measureIterations、forks）を通じてカスタマイズできます。

mvn -DskipTests=false -DwarmupIterations=2 -DmeasureIterations=3 -Dforks=1 test

一般的に、非常に少ないイテレーションを使用することは推奨されていませんが、ベンチマークが機能するかどうかを確認してから、後で包括的なベンチマークを実行するのに役立つ場合があります。

また、アノテーションを介してテストごとにこれらのデフォルト設定をカスタマイズできることに注意してください。

@Warmup(iterations = 20)
@Fork(1)
public class MyBenchmark extends AbstractMicrobenchmark {

}

これは、クラスごとおよびメソッドごと（ベンチマーク）に実行できます。コマンドライン引数は常にアノテーションのデフォルトを上書きすることに注意してください。