From: "Harry (Hyeonggon) Yoo" <42.hyeyoo@gmail.com>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: gourry@gourry.net, hyeonggon.yoo@sk.com,
ying.huang@linux.alibaba.com, rafael@kernel.org, lenb@kernel.org,
gregkh@linuxfoundation.org, akpm@linux-foundation.org,
honggyu.kim@sk.com, rakie.kim@sk.com, dan.j.williams@intel.com,
Jonathan.Cameron@huawei.com, dave.jiang@intel.com,
horen.chuang@linux.dev, hannes@cmpxchg.org,
linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
linux-mm@kvack.org, kernel-team@meta.com
Subject: Re: [PATCH v4] Weighted Interleave Auto-tuning
Date: Tue, 4 Feb 2025 16:50:26 +0900 [thread overview]
Message-ID: <Z6HGwq731v+VX1CP@localhost.localdomain> (raw)
In-Reply-To: <20250128222332.3835931-1-joshua.hahnjy@gmail.com>
On Tue, Jan 28, 2025 at 02:23:31PM -0800, Joshua Hahn wrote:
> On machines with multiple memory nodes, interleaving page allocations
> across nodes allows for better utilization of each node's bandwidth.
> Previous work by Gregory Price [1] introduced weighted interleave, which
> allowed for pages to be allocated across nodes according to user-set ratios.
>
> Ideally, these weights should be proportional to their bandwidth, so
> that under bandwidth pressure, each node uses its maximal efficient
> bandwidth and prevents latency from increasing exponentially.
>
> At the same time, we want these weights to be as small as possible.
> Having ratios that involve large co-prime numbers like 7639:1345:7 leads
> to awkward and inefficient allocations, since the node with weight 7
> will remain mostly unused (and despite being proportional to bandwidth,
> will not aid in relieving the bandwidth pressure in the other two nodes).
>
> This patch introduces an auto-configuration mode for the interleave
> weights that aims to balance the two goals of setting node weights to be
> proportional to their bandwidths and keeping the weight values low.
> In order to perform the weight re-scaling, we use an internal
> "weightiness" value (fixed to 32) that defines interleave aggression.
>
> In this auto configuration mode, node weights are dynamically updated
> every time there is a hotplug event that introduces new bandwidth.
>
> Users can also enter manual mode by writing "N" or "0" to the new "auto"
> sysfs interface. When a user enters manual mode, the system stops
> dynamically updating any of the node weights, even during hotplug events
> that can shift the optimal weight distribution. The system also enters
> manual mode any time a user sets a node's weight directly by using the
> nodeN interface introduced in [1]. On the other hand, auto mode is
> only entered by explicitly writing "Y" or "1" to the auto interface.
>
> There is one functional change that this patch makes to the existing
> weighted_interleave ABI: previously, writing 0 directly to a nodeN
> interface was said to reset the weight to the system default. Before
> this patch, the default for all weights were 1, which meant that writing
> 0 and 1 were functionally equivalent.
>
> This patch introduces "real" defaults, but moves away from letting users
> use 0 as a "set to default" interface. Rather, users who want to use
> system defaults should use auto mode. This patch seems to be the
> appropriate place to make this change, since we would like to remove
> this usage before users begin to rely on the feature in userspace.
> Moreover, users will not be losing any functionality; they can still
> write 1 into a node if they want a weight of 1. Thus, we deprecate the
> "write zero to reset" feature in favor of returning an error, the same
> way we would return an error when the user writes any other invalid
> weight to the interface.
>
> [1] https://lore.kernel.org/linux-mm/20240202170238.90004-1-gregory.price@memverge.com/
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Co-developed-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
Hi Joshua,
I'm glad we're close to finalizing the interface.
I believe the author has successfully addressed major concerns
through the revisions. The interface and the code now look good to me.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
With a few nits:
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave
> index 0b7972de04e9..c26879f59d5d 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave
> @@ -20,6 +20,34 @@ Description: Weight configuration interface for nodeN
[...snip...]
> +What: /sys/kernel/mm/mempolicy/weighted_interleave/auto
> +Date: January 2025
> +Contact: Linux memory management mailing list <linux-mm@kvack.org>
> +Description: Auto-weighting configuration interface
> +
> + Configuration mode for weighted interleave. A 'Y' indicates
> + that the system is in auto mode, and a 'N' indicates that
> + the system is in manual mode. All other values are invalid.
> +
> + In auto mode, all node weights are re-calculated and overwritten
> + (visible via the nodeN interfaces) whenever new bandwidth data
> + is made available during either boot or hotplug events.
> +
> + In manual mode, node weights can only be updated by the user.
> + If a node is hotplugged while the user is in manual mode,
> + the node will have a default weight of 1.
> +
> + Modes can be changed by writing Y, N, 1, or 0 to the interface.
> + All other strings will be ignored, and -EINVAL will be returned.
> + If Y or 1 is written to the interface but the recalculation or
> + updates fail at any point (-ENOMEM or -ENODEV), then the mode
> + will remain in manual mode.
nit: the commit log describes that writing 'N' or '0' means
switching to manual mode and writing 1 means switching to auto mode,
but the Documentation does not explicitly states what '0' and '1' does?
> + Writing a new weight to a node directly via the nodeN interface
> + will also automatically update the system to manual mode.
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 80a3481c0470..cc94cba112dd 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -20,6 +20,7 @@
> #include <linux/list_sort.h>
> #include <linux/memregion.h>
> #include <linux/memory.h>
> +#include <linux/mempolicy.h>
nit: is this #include directive necessary?
--
Harry
next prev parent reply other threads:[~2025-02-04 7:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 22:23 Joshua Hahn
2025-01-31 14:23 ` Honggyu Kim
2025-02-01 16:49 ` Gregory Price
2025-02-01 16:53 ` Gregory Price
2025-02-02 13:51 ` Honggyu Kim
2025-02-02 14:12 ` Joshua Hahn
2025-02-03 12:44 ` Honggyu Kim
2025-02-03 15:38 ` Gregory Price
2025-02-05 2:26 ` Honggyu Kim
2025-02-02 13:44 ` Honggyu Kim
2025-02-04 7:50 ` Harry (Hyeonggon) Yoo [this message]
2025-02-04 16:30 ` Gregory Price
2025-02-12 17:27 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6HGwq731v+VX1CP@localhost.localdomain \
--to=42.hyeyoo@gmail.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=gourry@gourry.net \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=honggyu.kim@sk.com \
--cc=horen.chuang@linux.dev \
--cc=hyeonggon.yoo@sk.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kernel-team@meta.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rafael@kernel.org \
--cc=rakie.kim@sk.com \
--cc=ying.huang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox