Re: [PATCH v3 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: akpm@linux-foundation.org,  mgorman@techsingularity.net,
	linux-mm@kvack.org,  Matthew Wilcox <willy@infradead.org>,
	 David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v3 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max
Date: Mon, 05 Aug 2024 11:02:17 +0800	[thread overview]
Message-ID: <87mslrfz9i.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <CALOAHbCPUgbtijz8o1vUjLbZO2714aKXxrXWGsQvqStAiJsxhQ@mail.gmail.com> (Yafang Shao's message of "Mon, 5 Aug 2024 09:58:33 +0800")

Yafang Shao <laoar.shao@gmail.com> writes:

> On Mon, Aug 5, 2024 at 9:41 AM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Yafang Shao <laoar.shao@gmail.com> writes:
>>
>> [snip]
>>
>> >
>> > Why introduce a systl knob?
>> > ===========================
>> >
>> > From the above data, it's clear that different CPU types have varying
>> > allocation latencies concerning zone->lock contention. Typically, people
>> > don't release individual kernel packages for each type of x86_64 CPU.
>> >
>> > Furthermore, for latency-insensitive applications, we can keep the default
>> > setting for better throughput.
>>
>> Do you have any data to prove that the default setting is better for
>> throughput?  If so, that will be a strong support for your patch.
>
> No, I don't. The primary reason we can't change the default value from
> 5 to 0 across our fleet of servers is that you initially set it to 5.
> The sysadmins believe you had a strong reason for setting it to 5 by
> default; otherwise, it would be considered careless for the upstream
> kernel. I also believe you must have had a solid justification for
> setting the default value to 5; otherwise, why would you have
> submitted your patches?

In commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to
avoid too long latency"), I tried my best to run test on the machines
available with a micro-benchmark (will-it-scale/page_fault1) which
exercises kernel page allocator heavily.  From the data in commit,
larger CONFIG_PCP_BATCH_SCALE_MAX helps throughput a little, but not
much.  The 99% alloc/free latency can be kept within about 100us with
CONFIG_PCP_BATCH_SCALE_MAX == 5.  So, we chose 5 as default value.

But, we can always improve the default value with more data, on more
types of machines and with more types of benchmarks, etc.

Your data suggest smaller default value because you have data to show
that larger default value has the latency spike issue (as large as tens
ms) for some practical workloads.  Which weren't tested previously.  In
contrast, we don't have strong data to show the throughput advantages of
larger CONFIG_PCP_BATCH_SCALE_MAX value.

So, I suggest to use a smaller default value for
CONFIG_PCP_BATCH_SCALE_MAX.  But, we may need more test to check the
data for 1, 2, 3, and 4, in addtion to 0 and 5 to determine the best
choice.

>>
>> > In our production environment, we set this
>> > value to 0 for applications running on Kubernetes servers while keeping it
>> > at the default value of 5 for other applications like big data. It's not
>> > common to release individual kernel packages for each application.
>> >
>>
>> [snip]
>>

--
Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2024-08-05  3:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-04  8:01 [PATCH v3 0/3] mm: " Yafang Shao
2024-08-04  8:01 ` [PATCH v3 1/3] mm/page_alloc: A minor fix to the calculation of pcp->free_count Yafang Shao
2024-08-04  8:01 ` [PATCH v3 2/3] mm/page_alloc: Avoid changing pcp->high decaying when adjusting CONFIG_PCP_BATCH_SCALE_MAX Yafang Shao
2024-08-04  8:01 ` [PATCH v3 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max Yafang Shao
2024-08-05  1:38   ` Huang, Ying
2024-08-05  1:58     ` Yafang Shao
2024-08-05  3:02       ` Huang, Ying [this message]
2024-08-05  3:17         ` Yafang Shao
2024-08-05  4:32           ` Huang, Ying
2024-08-05  4:48             ` Yafang Shao
2024-08-05  5:00               ` Huang, Ying
2024-08-05  5:36                 ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mslrfz9i.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=rientjes@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox