linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "zhangpeng (AS)" <zhangpeng362@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<akpm@linux-foundation.org>, <dennisszhou@gmail.com>,
	<shakeelb@google.com>, <surenb@google.com>,
	<kent.overstreet@linux.dev>, <mhocko@suse.cz>, <vbabka@suse.cz>,
	<yuzhao@google.com>, <yu.ma@intel.com>,
	<wangkefeng.wang@huawei.com>, <sunnanyong@huawei.com>
Subject: Re: [RFC PATCH 0/3] mm: convert mm's rss stats into lazy_percpu_counter
Date: Mon, 15 Apr 2024 20:33:44 +0800	[thread overview]
Message-ID: <e30f3d21-6957-a5fd-d61f-44d7d52f63cb@huawei.com> (raw)
In-Reply-To: <20240412135333.btd6e7wfprg4cmx2@quack3>

On 2024/4/12 21:53, Jan Kara wrote:

> On Fri 12-04-24 17:24:38, Peng Zhang wrote:
>> From: ZhangPeng <zhangpeng362@huawei.com>
>>
>> Since commit f1a7941243c1 ("mm: convert mm's rss stats into
>> percpu_counter"), the rss_stats have converted into percpu_counter,
>> which convert the error margin from (nr_threads * 64) to approximately
>> (nr_cpus ^ 2). However, the new percpu allocation in mm_init() causes a
>> performance regression on fork/exec/shell. Even after commit
>> 14ef95be6f55 ("kernel/fork: group allocation/free of per-cpu counters
>> for mm struct"), the performance of fork/exec/shell is still poor
>> compared to previous kernel versions.
>>
>> To mitigate performance regression, we use lazy_percpu_counter[1] to
>> delay the allocation of percpu memory for rss_stats. After lmbench test,
>> we will get 3% ~ 6% performance improvement for lmbench
>> fork_proc/exec_proc/shell_proc after conversion.
>>
>> The test results are as follows:
>>
>>               base           base+revert        base+lazy_percpu_counter
>>
>> fork_proc    427.4ms        394.1ms  (7.8%)    413.9ms  (3.2%)
>> exec_proc    2205.1ms       2042.2ms (7.4%)    2072.0ms (6.0%)
>> shell_proc   3180.9ms       2963.7ms (6.8%)    3010.7ms (5.4%)
>>
>> This solution has not been fully evaluated and tested. The main idea of
>> this RFC patch series is to get the community's opinion on this approach.
> Thanks! I like the idea and in fact I wanted to do something similar (just
> never got to it). Thread [2] has couple of good observations regarding this
> problem. Couple of thoughts regarding your approach:
>
> 1) I think switching to pcpu counter when update rate exceeds 256 updates/s
> is not a great fit for RSS because the updates are going to be frequent in
> some cases but usually they will all happen from one thread. So I think it
> would make more sense to move the decision of switching to pcpu mode from
> the counter itself into the callers and just switch on clone() when the
> second thread gets created.
>
> 2) I thought that for RSS lazy percpu counters, we could directly use
> struct percpu_counter and just make it that if 'counters' is NULL, the
> counter is in atomic mode (count is used as atomic_long_t), if counters !=
> NULL, we are in pcpu mode.

Thanks for your reply!
Agree with your thoughts, I'll implement it in the next version.

> 3) In [2] Mateusz had a good observation that the old RSS counters actually
> used atomic operations only in rare cases so even lazy pcpu counters are
> going to have worse performance for singlethreaded processes than the old
> code. We could *almost* get away with non-atomic updates to counter->count
> if it was not for occasional RSS updates from unrelated tasks. So it might
> be worth it to further optimize the counters as:
>
> struct rss_counter_single {
> 	void *state;			/* To detect switching to pcpu mode */
> 	atomic_long_t counter_atomic;	/* Used for foreign updates */
> 	long counter;			/* Used by local updates */
> }
>
> struct rss_counter {
> 	union {
> 		struct rss_counter_single single;
> 		/* struct percpu_counter needs to be modified to have
> 		 * 'counters' first to avoid issues for different
> 		 * architectures or with CONFIG_HOTPLUG_CPU enabled */
> 		struct percpu_counter pcpu;
> 	}
> }
>
> But I'm not sure this complexity is worth it so I'd do it as a separate
> patch with separate benchmarking if at all.
>
> 								Honza

Agreed. Single-threaded processes don't need atomic operations, and
this scenario needs to be thoroughly tested.
I'll try to implement it in another patch series after I finish the
basic approach.

> [2] https://lore.kernel.org/all/ZOPSEJTzrow8YFix@snowbird/
>
>> [1] https://lore.kernel.org/linux-iommu/20230501165450.15352-8-surenb@google.com/
>>
>> Kent Overstreet (1):
>>    Lazy percpu counters
>>
>> ZhangPeng (2):
>>    lazy_percpu_counter: include struct percpu_counter in struct
>>      lazy_percpu_counter
>>    mm: convert mm's rss stats into lazy_percpu_counter
>>
>>   include/linux/lazy-percpu-counter.h |  88 +++++++++++++++++++
>>   include/linux/mm.h                  |   8 +-
>>   include/linux/mm_types.h            |   4 +-
>>   include/trace/events/kmem.h         |   4 +-
>>   kernel/fork.c                       |  12 +--
>>   lib/Makefile                        |   2 +-
>>   lib/lazy-percpu-counter.c           | 131 ++++++++++++++++++++++++++++
>>   7 files changed, 232 insertions(+), 17 deletions(-)
>>   create mode 100644 include/linux/lazy-percpu-counter.h
>>   create mode 100644 lib/lazy-percpu-counter.c
>>
>> -- 
>> 2.25.1
>>
-- 
Best Regards,
Peng



      reply	other threads:[~2024-04-15 12:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-12  9:24 Peng Zhang
2024-04-12  9:24 ` [RFC PATCH 1/3] Lazy percpu counters Peng Zhang
2024-04-12  9:24 ` [RFC PATCH 2/3] lazy_percpu_counter: include struct percpu_counter in struct lazy_percpu_counter Peng Zhang
2024-04-12  9:24 ` [RFC PATCH 3/3] mm: convert mm's rss stats into lazy_percpu_counter Peng Zhang
2024-04-12 13:53 ` [RFC PATCH 0/3] " Jan Kara
2024-04-15 12:33   ` zhangpeng (AS) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e30f3d21-6957-a5fd-d61f-44d7d52f63cb@huawei.com \
    --to=zhangpeng362@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=dennisszhou@gmail.com \
    --cc=jack@suse.cz \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=shakeelb@google.com \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=yu.ma@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox