Linuxのプリエンプションモデル

Linuxのプリエンプションモデル†

気が付いたら、色々な Preemption Model が設定できるようになっていたのでメモ。
RTパッチは将来に向けてということで、調査が必要かも。

Preemption Model の設定†

linux-4.19 からメニュー位置が変更。

旧メニュー構成)
Kernel Features  --->
  Preemption Model

新メニュー構成)
General setup  --->
  Preemption Model

↑

Preemption Model†

linux-5.3 で CONFIG_PREEMPT_RT の導入

Introduce CONFIG_PREEMPT_RT as a clear sign that the RT patchset will be fully integrated into the mainline kernel in the future merge

linux-5.12 で CONFIG_PREEMPT_DYNAMIC の導入
CONFIG_PREEMPT_DYNAMIC の記載はない(以下はgitから)

CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if
the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE,
CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() /
__preempt_schedule_notrace_function()).

linux-5.14 で CONFIG_SCHED_CORE の導入
CONFIG_SCHED_CORE の記載はない(以下はgitから)

Introduce the basic infrastructure to have a core wide rq->lock.

This relies on the rq->__lock order being in increasing CPU number
(inside a core). It is also constrained to SMT8 per lockdep (and
SMT256 per preempt_count).

Luckily SMT8 is the max supported SMT count for Linux (Mips, Sparc and
Power are known to have this).

以下は、kernel/Kconfig.preempt からの抜粋

↑

PREEMPT_NONE†

No Forced Preemption (Server)

This is the traditional Linux preemption model, geared towards
throughput. It will still provide good latencies most of the
time, but there are no guarantees and occasional longer delays
are possible.

Select this option if you are building a kernel for a server or
scientific/computation system, or if you want to maximize the
raw processing power of the kernel, irrespective of scheduling
latencies.

これは、スループットを対象とした従来のLinuxプリエンプションモデルです。ほとんどの場合、それでも十分な遅延が得られますが、保証はなく、場合によってはさらに長い遅延が発生する可能性があります。

サーバーまたは科学/計算システム用のカーネルを構築している場合、またはスケジューリングの待ち時間に関係なく、カーネルの生の処理能力を最大化したい場合は、このオプションを選択します。

↑

PREEMPT_VOLUNTARY†

Voluntary Kernel Preemption (Desktop)

This option reduces the latency of the kernel by adding more
"explicit preemption points" to the kernel code. These new
preemption points have been selected to reduce the maximum
latency of rescheduling, providing faster application reactions,
at the cost of slightly lower throughput.

This allows reaction to interactive events by allowing a
low priority process to voluntarily preempt itself even if it
is in kernel mode executing a system call. This allows
applications to run more 'smoothly' even when the system is
under load.

Select this if you are building a kernel for a desktop system.

このオプションは、カーネルコードに「明示的なプリエンプションポイント」を追加することにより、カーネルの待ち時間を短縮します。これらの新しいプリエンプションポイントは、再スケジュールの最大遅延を減らし、スループットをわずかに低下させながら、アプリケーションの反応を高速化するように選択されています。

これにより、優先度の低いプロセスが、システムコールを実行しているカーネルモードであっても、自発的にプリエンプトできるようになるため、対話型イベントへの対応が可能になります。これにより、システムに負荷がかかっている場合でも、アプリケーションをより「スムーズに」実行できます。

デスクトップシステムのカーネルを構築する場合は、これを選択します。

↑

PREEMPT†

Preemptible Kernel (Low-Latency Desktop)

This option reduces the latency of the kernel by making
all kernel code (that is not executing in a critical section)
preemptible.  This allows reaction to interactive events by
permitting a low priority process to be preempted involuntarily
even if it is in kernel mode executing a system call and would
otherwise not be about to reach a natural preemption point.
This allows applications to run more 'smoothly' even when the
system is under load, at the cost of slightly lower throughput
and a slight runtime overhead to kernel code.

Select this if you are building a kernel for a desktop or
embedded system with latency requirements in the milliseconds
range.

このオプションは、すべてのカーネルコード（クリティカルセクションで実行されていない）を作成することにより、カーネルのレイテンシーを削減します。プリエンプティブ。これにより、システムコールを実行するカーネルモードであり、そうでなければ自然なプリエンプションポイントに到達しようとしていない場合でも、優先度の低いプロセスが非自発的にプリエンプションされることを許可することにより、インタラクティブイベントへの対応が可能になります。これにより、システムに負荷がかかっている場合でも、アプリケーションをより「スムーズに」実行できますが、スループットがわずかに低下し、カーネルコードの実行時のオーバーヘッドがわずかになります。

ミリ秒の範囲の遅延要件を持つデスクトップまたは組み込みシステムのカーネルを構築している場合は、これを選択します。

↑

PREEMPT_RT†

Fully Preemptible Kernel (Real-Time)

This option turns the kernel into a real-time kernel by replacing
various locking primitives (spinlocks, rwlocks, etc.) with
preemptible priority-inheritance aware variants, enforcing
interrupt threading and introducing mechanisms to break up long
non-preemptible sections. This makes the kernel, except for very
low level and critical code paths (entry code, scheduler, low
level interrupt handling) fully preemptible and brings most
execution contexts under scheduler control.

Select this if you are building a kernel for systems which
require real-time guarantees.

このオプションは、さまざまなロックプリミティブ（spinlocks、rwlocksなど）をプリエンプティブな優先度継承対応のバリアントに置き換え、割り込みスレッドを適用し、プリエンプティブでない長いセクションを分割するメカニズムを導入することで、カーネルをリアルタイムカーネルに変えます。これにより、非常に低レベルで重要なコードパス（エントリコード、スケジューラ、低レベルの割り込み処理）を除いて、カーネルが完全にプリエンプト可能になり、ほとんどの実行コンテキストがスケジューラの制御下に置かれます。

リアルタイム保証が必要なシステムのカーネルを構築する場合は、これを選択します。

↑

PREEMPT_DYNAMIC†

Preemption behaviour defined on boot

This option allows to define the preemption model on the kernel
command line parameter and thus override the default preemption
model defined during compile time.

The feature is primarily interesting for Linux distributions which
provide a pre-built kernel binary to reduce the number of kernel
flavors they offer while still offering different usecases.

The runtime overhead is negligible with HAVE_STATIC_CALL_INLINE enabled
but if runtime patching is not available for the specific architecture
then the potential overhead should be considered.

Interesting if you want the same pre-built kernel should be used for
both Server and Desktop workloads.

このオプションを使用すると、カーネルコマンドラインパラメーターでプリエンプションモデルを定義できるため、コンパイル時に定義されたデフォルトのプリエンプションモデルをオーバーライドできます。

この機能は、さまざまなユースケースを提供しながら、提供するカーネルフレーバーの数を減らすためにビルド済みのカーネルバイナリを提供するLinuxディストリビューションにとって主に興味深いものです。

HAVE_STATIC_CALL_INLINEを有効にすると、ランタイムオーバーヘッドは無視できますが、特定のアーキテクチャでランタイムパッチが利用できない場合は、潜在的なオーバーヘッドを考慮する必要があります。

同じビルド済みカーネルが必要な場合は、サーバーとデスクトップの両方のワークロードに使用する必要があります。

↑

SCHED_CORE†

Core Scheduling for SMT

This option permits Core Scheduling, a means of coordinated task
selection across SMT siblings. When enabled -- see
prctl(PR_SCHED_CORE) -- task selection ensures that all SMT siblings
will execute a task from the same 'core group', forcing idle when no
matching task is found.

Use of this feature includes:
 - mitigation of some (not all) SMT side channels;
 - limiting SMT interference to improve determinism and/or performance.

SCHED_CORE is default disabled. When it is enabled and unused,
which is the likely usage by Linux distributions, there should
be no measurable impact on performance.

このオプションにより、SMT兄弟間でタスクを調整して選択する手段であるコアスケジューリングが可能になります。有効にすると、prctl（PR_SCHED_CORE）を参照してください。タスクを選択すると、すべてのSMT兄弟が同じ「コアグループ」からタスクを実行し、一致するタスクが見つからない場合は強制的にアイドル状態になります。

この機能の使用には次のものが含まれます。