From: Frank van der Linden <fvdl@google.com>
To: Alexander Krabler <Alexander.Krabler@kuka.com>
Cc: "linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Dennis Schimmel <Dennis.Schimmel@kuka.com>,
Daniel Braunwarth <Daniel.Braunwarth@kuka.com>
Subject: Re: Realtime threads delayed due to kcompactd0
Date: Thu, 31 Jul 2025 11:34:50 -0700 [thread overview]
Message-ID: <CAPTztWYf82p6hq=Q2mVq6_9WTjRaQiH81z4idZL6MZ1PXufPwQ@mail.gmail.com> (raw)
In-Reply-To: <DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com>
On Thu, Jul 24, 2025 at 10:30 PM Alexander Krabler
<Alexander.Krabler@kuka.com> wrote:
>
> Hi all,
>
> some of our realtime tasks get delayed from time to time due to activity of kcompactd0.
> Out of nothing, realtime tasks go into uninterruptable sleep for some time.
> This delay can be as much as 1.1ms, which is not acceptable for us.
>
> Our hardware is an aarch64-based SOC with 8 A72 cores, kernel is 6.12.17 with PREEMPT_RT.
> We have CONFIG_COMPACTION and CONFIG_MIGRATION enabled.
>
> Here are some snippets from ftrace:
> kcompactd0-88 [001] 13112.100041: mm_compaction_begin: zone_start=0x80000 migrate_pfn=0x80000 free_pfn=0xffe00 zone_end=0x100000, mode=sync
> ...
> kcompactd0-88 [001] 13112.159782: mm_compaction_isolate_migratepages: range=(0x85800 ~ 0x85841) nr_scanned=65 nr_taken=32
> kcompactd0-88 [001] 13112.159810: mm_compaction_isolate_freepages: range=(0xddc40 ~ 0xddc48) nr_scanned=8 nr_taken=8
> kcompactd0-88 [001] 13112.160002: irq_handler_entry: irq=11 name=arch_timer
> kcompactd0-88 [001] 13112.160012: irq_handler_exit: irq=11 ret=handled
> kcompactd0-88 [001] 13112.160121: mm_compaction_migratepages: nr_migrated=32 nr_failed=0
> kcompactd0-88 [001] 13112.160122: mm_compaction_finished: node=0 zone=DMA order=-1 ret=continue
> kcompactd0-88 [001] 13112.160185: mm_compaction_isolate_migratepages: range=(0x85841 ~ 0x85a00) nr_scanned=447 nr_taken=166
> kcompactd0-88 [001] 13112.160204: mm_compaction_isolate_freepages: range=(0xddc48 ~ 0xddd80) nr_scanned=312 nr_taken=196
> tRealtime-16499 [004] 13112.160511: sched_switch: tRealtime:16499 [25] D ==> tKRC:16479 [39]
> tRealtime-16499 [004] 13112.160512: kernel_stack: <stack trace >
> => __schedule (ffffcde843022d6c)
> => schedule (ffffcde843023464)
> => io_schedule (ffffcde8430235ec)
> => migration_entry_wait_on_locked (ffffcde8424a1ad8)
> => migration_entry_wait (ffffcde84254c400)
> => do_swap_page (ffffcde8424f7fac)
> => __handle_mm_fault (ffffcde8424f8b64)
> => handle_mm_fault (ffffcde8424f9bc0)
> => do_page_fault (ffffcde843030380)
> => do_translation_fault (ffffcde84303072c)
> => do_mem_abort (ffffcde84222f674)
> => el0_ia (ffffcde84301eb20)
> => el0t_64_sync_handler (ffffcde84301f020)
> => el0t_64_sync (ffffcde842211514)
> kcompactd0-88 [001] 13112.160557: sched_pi_setprio: comm=kcompactd0 pid=88 oldprio=39 newprio=120
> kcompactd0-88 [001] 13112.160569: sched_waking: comm=tKRC pid=16479 prio=39 target_cpu=004
> kcompactd0-88 [001] 13112.160986: sched_waking: comm=tKRC pid=16479 prio=39 target_cpu=004
> kcompactd0-88 [001] 13112.161412: sched_waking: comm=tOther pid=16520 prio=40 target_cpu=004
> kcompactd0-88 [001] 13112.161457: sched_pi_setprio: comm=kcompactd0 pid=88 oldprio=40 newprio=120
> kcompactd0-88 [001] 13112.161465: sched_waking: comm=tOther pid=16520 prio=40 target_cpu=004
> kcompactd0-88 [001] 13112.161654: sched_waking: comm=tRealtime pid=16499 prio=25 target_cpu=004
>
> In our setup kcompactd0 gets enough CPU time (on core 1), however, it seems strange that it doesn't get the priority inherited from blocked realtime tasks.
> (It does for short amounts of time, which seems to be due to the locks inside migration_entry_wait_on_locked.)
>
> Is there anything we can do here?
>
> Thanks,
> Alexander
Yes, we have (likely) seen this issue too, in a !CONFIG_PREEMPT setting.
The basic problem is that the calling thread (kcompactd or it could be
any thread that goes in to direct compaction) creates a resource that
needs to be waited for until it's done, in the form of the migration
PTEs. Since a migration PTE is not a lock that is held by the thread
doing the migration, there is no priority inheritance in the realtime
case, and priority inversion can happen.
This issue has always been there, but it has been made more prominent
with batch migration. With batch migration, all migration PTEs are set
up in the first step, followed by a TLB flush, and then the copy / new
map setup is done. So, the migration PTEs stick around for longer, and
the chance that other threads block on them is higher. For the
!CONFIG_PREEMPT case, the cond_resched() in the loop can also cause
the thread creating the migration PTEs to be descheduled while a
number of migration PTEs are in place, so there is a similar priority
inversion chance.
Not sure what the right thing to do would be. Either explicitly boost
the priority of a thread temporarily during migrate_pages_batch, or
mitigate the issue by dealing with 'busy' pages more quickly in
migrate_pages_batch.
- Frank
next prev parent reply other threads:[~2025-07-31 18:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-25 5:30 Alexander Krabler
2025-07-31 18:34 ` Frank van der Linden [this message]
2025-07-31 18:41 ` Vlastimil Babka
2025-08-01 2:46 ` Mike Galbraith
2025-08-01 9:58 ` Vlastimil Babka
2025-08-01 11:23 ` Alexander Krabler
2025-08-01 12:57 ` Vlastimil Babka
2025-08-01 13:40 ` Alexander Krabler
2025-08-07 10:48 ` Vlastimil Babka
2025-08-07 12:21 ` Hugh Dickins
2025-08-07 15:49 ` Alexander Krabler
2025-08-08 7:37 ` Vlastimil Babka
2025-08-20 14:29 ` Sebastian Andrzej Siewior
2025-08-01 19:27 ` Frank van der Linden
2025-08-05 14:11 ` Alexander Krabler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPTztWYf82p6hq=Q2mVq6_9WTjRaQiH81z4idZL6MZ1PXufPwQ@mail.gmail.com' \
--to=fvdl@google.com \
--cc=Alexander.Krabler@kuka.com \
--cc=Daniel.Braunwarth@kuka.com \
--cc=Dennis.Schimmel@kuka.com \
--cc=linux-mm@kvack.org \
--cc=linux-rt-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox