Re: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mateusz Guzik <mjguzik@gmail.com>
To: Yibin Liu <liuyibin@hygon.cn>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	Liam.Howlett@oracle.com,  viro@zeniv.linux.org.uk,
	brauner@kernel.org, wujianyong@hygon.cn,  huangsj@hygon.cn,
	zhongyuan@hygon.cn
Subject: Re: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing
Date: Tue, 21 Apr 2026 21:46:47 +0200	[thread overview]
Message-ID: <CAGudoHHki3gv-HXXMALePDoC+tmao4oWcYgCo9kXNDkEhW4E4g@mail.gmail.com> (raw)
In-Reply-To: <20260421020932.3212532-1-liuyibin@hygon.cn>

On Tue, Apr 21, 2026 at 4:11 AM Yibin Liu <liuyibin@hygon.cn> wrote:
>
> UnixBench execl/shellscript (dynamically linked binaries) at 64+ cores are
> bottlenecked on the i_mmap_rwsem semaphore due to heavy vma insert/remove
> operations on the i_mmap tree, where libc.so.6 is the most frequent,
> followed by ld-linux-x86-64.so.2 and the test executable itself.
>
> This patch marks such files to skip rmap operations, avoiding frequent
> interval tree insert/remove that cause i_mmap_rwsem lock contention.
> The downside is these files can no longer be reclaimed (along with compact
> and ksm), but since they are small and resident anyway, it's acceptable.
> When all mapping processes exit, files can still be reclaimed normally.
>
> Performance testing shows ~80% improvement in UnixBench execl/shellscript
> scores on Hygon 7490, AMD zen4 9754 and Intel emerald rapids platform.
>

The other responders have been a little harsh and despite raising
valid points I don't think they gave a proper review.

The bigger picture is that the problematic rwsem is taken several
times during fork + exec + exit cycle. Normally you end up with 5
distinct mappings per binary/so, each created with a separate lock
acquire.

Some time ago I patched exit to batch processing, leaving 1 acquire in
that codepath. fork can and should be patched in a similar vein, but I
don't know if unixbench runs it in this benchmark (i.e., real
workloads certainly suffer from it, I don't know if this particular
bench includes that aspect). This is on top of forking itself being
avoidable should the kernel grow a better interface for executing
binaries.

This leaves us with mapping creation on exec. This problem is
unfixable without introduction of better APIs for userspace, which
constitutes quite a challenge.

The end result is the absolutely horrible case of multiple acquires of
the same lock per iteration.

One common idea how to reduce contention boils down to shortening lock
hold time. This has very limited effect in face of the aforementioned
multiple acquires and is at best a stop gap -- no matter what, the
ceiling is dictated by the extra acquires and it is incredibly low.

Your patch keeps the problematic acquire pattern intact and while the
80% win might sound encouraging, the end result is still severely
underperforming even a state where the lock is taken once in total
during exec.

Besides that, the internally-visible side effect of non-functional
rmap is pretty bad (and thus e.g., truncate) is pretty bad in its own
right, but let's ignore it. The primary problem here is that the patch
exposes a mechanism for userspace to dictate this in the first place.
Even ignoring the question of who should be using it and when, the
real solution to the problem would be confined to the kernel. Suppose
this patch lands and such a solution is implemented later -- now the
kernel is stuck having to support a now-useless (if not outright
harmful) feature.

What will fix the problem is sharding the state in some capacity,
provided no unfixable stopgap shows up.

Any other approach is putting small bandaids on it and can be a
consideration only if the decentralizing locking is proven too
problematic.

Pedro apparently volunteered to do the work, so I think we can wait to
see what he is going to end up cooking.

I hope this helps.

     prev parent reply	other threads:[~2026-04-21 19:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  2:09 Yibin Liu
2026-04-21 14:38 ` Matthew Wilcox
2026-04-21 15:37 ` Pedro Falcato
2026-04-21 19:46 ` Mateusz Guzik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGudoHHki3gv-HXXMALePDoC+tmao4oWcYgCo9kXNDkEhW4E4g@mail.gmail.com \
    --to=mjguzik@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=huangsj@hygon.cn \
    --cc=linux-mm@kvack.org \
    --cc=liuyibin@hygon.cn \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wujianyong@hygon.cn \
    --cc=zhongyuan@hygon.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox