linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: linux-mm@kvack.org
Cc: Gabriel Krisman Bertazi <krisman@suse.de>,
	linux-kernel@vger.kernel.org, jack@suse.cz,
	Mateusz Guzik <mjguzik@gmail.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Michal Hocko <mhocko@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>
Subject: [RFC PATCH 0/4] Optimize rss_stat initialization/teardown for single-threaded tasks
Date: Thu, 27 Nov 2025 18:36:27 -0500	[thread overview]
Message-ID: <20251127233635.4170047-1-krisman@suse.de> (raw)

The cost of the pcpu memory allocation is non-negligible for systems
with many cpus, and it is quite visible when forking a new task, as
reported in a few occasions.  In particular, Jan Kara reported the
commit introducing per-cpu counters for rss_stat caused a 10% regression
of system time for gitsource in his system [1].  In that same occasion,
Jan suggested we special-cased the single-threaded case: since we know
there won't be frequent remote updates of rss_stats for single-threaded
applications, we could special case it with a local counter for most
updates, and an atomic counter for the infrequent remote updates.  This
patchset implements this idea.

It exposes a dual-mode counter that starts as a simple counter, cheap to
initialize on single-threaded tasks, that can be upgraded inflight to a
fully-fledged per cpu counter later.  Patch 3 then modifies the rss_stat
counters to use that structure, forcing the upgrade as soon as a second
task sharing the mm_struct is spawned.  By delaying the initialization
cost until the MM is shared, we cover single-threaded applications
fairly cheaply, while not penalizing applications that spawn multiple
threads.  On a 256c system, where the pcpu allocation of the rss_stats
is quite noticeable, this has reduced the wall-clock time between 6%
15% (depending on the number of cores) of an artificial fork-intensive
microbenchmark (calling /bin/true in a loop).  In a more realistic
benchmark, it showed an improvement of 1.5% on kernbench elapsed time.

More performance data, including profilings is available in the patch
modifying the rss_stat counters.

While this patch exposes a single users of this API, this should be
useful in more cases.  This is why I made it into a proper API.  In
addition, considering the recent efforts in this area, such as
hierarchical per-cpu counters which are orthogonal to this work because
they improve multi-threaded workloads, abstracting this with a new API
could help the merging of both works.

Finally, this is a RFC because it is an early work. in particular, I'd
be interested in more benchmarks suggestions, and I'd like feedback
whether this new interface should be implemented inside percpu_counters
as lazy counters or as a completely separated interface.

Thanks,

[1] https://lore.kernel.org/all/20230608111408.s2minsenlcjow7q3@quack3

---

Cc: linux-kernel@vger.kernel.org
Cc: jack@suse.cz
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>

Gabriel Krisman Bertazi (4):
  lib/percpu_counter: Split out a helper to insert into hotplug list
  lib: Support lazy initialization of per-cpu counters
  mm: Avoid percpu MM counters on single-threaded tasks
  mm: Split a slow path for updating mm counters

 arch/s390/mm/gmap_helpers.c         |   4 +-
 arch/s390/mm/pgtable.c              |   4 +-
 fs/exec.c                           |   2 +-
 include/linux/lazy_percpu_counter.h | 145 ++++++++++++++++++++++++++++
 include/linux/mm.h                  |  26 ++---
 include/linux/mm_types.h            |   4 +-
 include/linux/percpu_counter.h      |   5 +-
 include/trace/events/kmem.h         |   4 +-
 kernel/events/uprobes.c             |   2 +-
 kernel/fork.c                       |  14 ++-
 lib/percpu_counter.c                |  68 ++++++++++---
 mm/filemap.c                        |   2 +-
 mm/huge_memory.c                    |  22 ++---
 mm/khugepaged.c                     |   6 +-
 mm/ksm.c                            |   2 +-
 mm/madvise.c                        |   2 +-
 mm/memory.c                         |  20 ++--
 mm/migrate.c                        |   2 +-
 mm/migrate_device.c                 |   2 +-
 mm/rmap.c                           |  16 +--
 mm/swapfile.c                       |   6 +-
 mm/userfaultfd.c                    |   2 +-
 22 files changed, 276 insertions(+), 84 deletions(-)
 create mode 100644 include/linux/lazy_percpu_counter.h

-- 
2.51.0



             reply	other threads:[~2025-11-27 23:36 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-27 23:36 Gabriel Krisman Bertazi [this message]
2025-11-27 23:36 ` [RFC PATCH 1/4] lib/percpu_counter: Split out a helper to insert into hotplug list Gabriel Krisman Bertazi
2025-11-27 23:36 ` [RFC PATCH 2/4] lib: Support lazy initialization of per-cpu counters Gabriel Krisman Bertazi
2025-11-27 23:36 ` [RFC PATCH 3/4] mm: Avoid percpu MM counters on single-threaded tasks Gabriel Krisman Bertazi
2025-11-27 23:36 ` [RFC PATCH 4/4] mm: Split a slow path for updating mm counters Gabriel Krisman Bertazi
2025-12-01 10:19   ` David Hildenbrand (Red Hat)
2025-11-28 13:30 ` [RFC PATCH 0/4] Optimize rss_stat initialization/teardown for single-threaded tasks Mathieu Desnoyers
2025-11-28 20:10   ` Jan Kara
2025-11-28 20:12     ` Mathieu Desnoyers
2025-11-29  5:57     ` Mateusz Guzik
2025-11-29  7:50       ` Mateusz Guzik
2025-12-01 10:38       ` Harry Yoo
2025-12-01 11:31         ` Mateusz Guzik
2025-12-01 14:47           ` Mathieu Desnoyers
2025-12-01 15:23       ` Gabriel Krisman Bertazi
2025-12-01 19:16         ` Harry Yoo
2025-12-03 11:02         ` Mateusz Guzik
2025-12-03 11:54           ` Mateusz Guzik
2025-12-03 14:36             ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251127233635.4170047-1-krisman@suse.de \
    --to=krisman@suse.de \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=david@redhat.com \
    --cc=dennis@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhocko@kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox