From: Yafang Shao <laoar.shao@gmail.com>
To: akpm@linux-foundation.org
Cc: ying.huang@intel.com, mgorman@techsingularity.net,
linux-mm@kvack.org, Yafang Shao <laoar.shao@gmail.com>
Subject: [PATCH v2 0/3] mm: Introduce a new sysctl knob vm.pcp_batch_scale_max
Date: Mon, 29 Jul 2024 10:35:29 +0800 [thread overview]
Message-ID: <20240729023532.1555-1-laoar.shao@gmail.com> (raw)
Background
==========
In our containerized environment, we have a specific type of container
that runs 18 processes, each consuming approximately 6GB of RSS. These
processes are organized as separate processes rather than threads due
to the Python Global Interpreter Lock (GIL) being a bottleneck in a
multi-threaded setup. Upon the exit of these containers, other
containers hosted on the same machine experience significant latency
spikes.
Investigation
=============
Duration my investigation on this issue, I found the latency spikes were
caused by the zone->lock contention. That can be illustrated as follows,
CPU A (Freer) CPU B (Allocator)
lock zone->lock
free pages lock zone->lock
unlock zone->lock
alloc pages
unlock zone->lock
If the Freer holds the zone->lock for an extended period, the Allocator
has to wait and thus latency spikes occures.
I also wrote a python script to reproduce it on my test servers. See the
dedails in patch #3. It is worth to note that the reproducer is based on
the upstream kernel.
Experimenting
=============
As the more pages to be freed in one batch, the long the duration will
be. So my attempt involves reducing the batch size. After I restrict the
batch to the smallest size, there is no complains on the latency spikes
any more.
However, duration my experiment, I found that the
CONFIG_PCP_BATCH_SCALE_MAX is hard to use in practice. So I try to
improve it in this series.
The Proposal
============
This series encompasses two minor refinements to the PCP high watermark
auto-tuning mechanism, along with the introduction of a new sysctl knob
that serves as a more practical alternative to the previous configuration
method.
Future work
===========
To ultimately mitigate the zone->lock contention issue, several suggestions
have been proposed. One approach involves dividing large zones into multi
smaller zones, as suggested by Matthew[0], while another entails splitting
the zone->lock using a mechanism similar to memory arenas and shifting away
from relying solely on zone_id to identify the range of free lists a
particular page belongs to, as suggested by Mel[1]. However, implementing
these solutions is likely to necessitate a more extended development
effort.
Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [0]
Link: https://lore.kernel.org/linux-mm/20240705130943.htsyhhhzbcptnkcu@techsingularity.net/ [1]
Changes:
- v1-> v2: Commit log refinement
- v1: mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max
https://lwn.net/Articles/981069/
- mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the
minimum pagelist
https://lore.kernel.org/linux-mm/20240701142046.6050-1-laoar.shao@gmail.com/
Yafang Shao (3):
mm/page_alloc: A minor fix to the calculation of pcp->free_count
mm/page_alloc: Avoid changing pcp->high decaying when adjusting
CONFIG_PCP_BATCH_SCALE_MAX
mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max
Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++
mm/Kconfig | 11 -------
mm/page_alloc.c | 40 ++++++++++++++++++-------
3 files changed, 47 insertions(+), 21 deletions(-)
--
2.43.5
next reply other threads:[~2024-07-29 2:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-29 2:35 Yafang Shao [this message]
2024-07-29 2:35 ` [PATCH v2 1/3] mm/page_alloc: A minor fix to the calculation of pcp->free_count Yafang Shao
2024-07-29 2:35 ` [PATCH v2 2/3] mm/page_alloc: Avoid changing pcp->high decaying when adjusting CONFIG_PCP_BATCH_SCALE_MAX Yafang Shao
2024-07-29 2:35 ` [PATCH v2 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max Yafang Shao
2024-07-29 3:18 ` Huang, Ying
2024-07-29 3:40 ` Yafang Shao
2024-07-29 5:12 ` Huang, Ying
2024-07-29 5:45 ` Yafang Shao
2024-07-29 5:50 ` Huang, Ying
2024-07-29 6:00 ` Yafang Shao
2024-07-29 6:00 ` Huang, Ying
2024-07-29 6:13 ` Yafang Shao
2024-07-29 6:14 ` Huang, Ying
2024-07-29 7:50 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240729023532.1555-1-laoar.shao@gmail.com \
--to=laoar.shao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox