linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: akpm@linux-foundation.org,  linux-mm@kvack.org,
	 Matthew Wilcox <willy@infradead.org>,
	 David Rientjes <rientjes@google.com>,
	 Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH] mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the minimum pagelist
Date: Tue, 02 Jul 2024 15:23:32 +0800	[thread overview]
Message-ID: <878qykntor.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <20240701142046.6050-1-laoar.shao@gmail.com> (Yafang Shao's message of "Mon, 1 Jul 2024 22:20:46 +0800")

Hi, Yafang,

Yafang Shao <laoar.shao@gmail.com> writes:

> Currently, we're encountering latency spikes in our container environment
> when a specific container with multiple Python-based tasks exits.

Can you show some data?  On which kind of machine, how long is the
latency?

> These
> tasks may hold the zone->lock for an extended period, significantly
> impacting latency for other containers attempting to allocate memory.

So, the allocation latency is influenced, not application exit latency?
Could you measure the run time of free_pcppages_bulk(), this can be done
via ftrace function_graph tracer.  We want to check whether this is a
common issue.

In commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to
avoid too long latency"), we have measured the allocation/free latency
for different CONFIG_PCP_BATCH_SCALE_MAX.  The target in the commit is
to control the latency <= 100us.

> As a workaround, we've found that minimizing the pagelist size, such as
> setting it to 4 times the batch size, can help mitigate these spikes.
> However, managing vm.percpu_pagelist_high_fraction across a large fleet of
> servers poses challenges due to variations in CPU counts, NUMA nodes, and
> physical memory capacities.
>
> To enhance practicality, we propose allowing the setting of -1 for
> vm.percpu_pagelist_high_fraction to designate a minimum pagelist size.

If it is really necessary, can we just use a large enough number for
vm.percpu_pagelist_high_fraction?  For example, (1 << 30)?

> Furthermore, considering the challenges associated with utilizing
> vm.percpu_pagelist_high_fraction, it would be beneficial to introduce a
> more intuitive parameter, vm.percpu_pagelist_high_size, that would permit
> direct specification of the pagelist size as a multiple of the batch size.
> This methodology would mirror the functionality of vm.dirty_ratio and
> vm.dirty_bytes, providing users with greater flexibility and control.
>
> We have discussed the possibility of introducing multiple small zones to
> mitigate the contention on the zone->lock[0], but this approach is likely
> to require a longer-term implementation effort.
>
> Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [0]
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Huang, Ying" <ying.huang@intel.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>

[snip]

--
Best Regards,
Huang, Ying


      parent reply	other threads:[~2024-07-02  7:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-01 14:20 Yafang Shao
2024-07-02  2:51 ` Andrew Morton
2024-07-02  6:37   ` Yafang Shao
2024-07-02  9:08     ` Huang, Ying
2024-07-02 12:07       ` Yafang Shao
2024-07-03  1:55         ` Huang, Ying
2024-07-03  2:13           ` Yafang Shao
2024-07-03  3:21             ` Huang, Ying
2024-07-03  3:44               ` Yafang Shao
2024-07-03  5:34                 ` Huang, Ying
2024-07-04 13:27                   ` Yafang Shao
2024-07-05  1:28                     ` Huang, Ying
2024-07-05  3:03                       ` Yafang Shao
2024-07-05  5:31                         ` Huang, Ying
2024-07-05 13:09   ` Mel Gorman
2024-07-02  7:23 ` Huang, Ying [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878qykntor.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=rientjes@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox