From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7DD9C10F16 for ; Sat, 27 Apr 2024 14:05:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E29D96B0082; Sat, 27 Apr 2024 10:05:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD9246B0083; Sat, 27 Apr 2024 10:05:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C79BF6B0085; Sat, 27 Apr 2024 10:05:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AB0876B0082 for ; Sat, 27 Apr 2024 10:05:57 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 14CA980A62 for ; Sat, 27 Apr 2024 14:05:57 +0000 (UTC) X-FDA: 82055485554.18.63FC851 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 0EF4F40005 for ; Sat, 27 Apr 2024 14:05:53 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ekq+Yyb8; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of vbendel@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vbendel@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714226754; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vjoo7O2aZKdFnLnLxnXyATUk9tUsaJOzPGRkBAglhlc=; b=uvP6Zzf9cowdq5Hx6+9ZuybL/BUY/QLfLpDr8noM9JR0rimSBsS/5bfnla3tnaa7KaJKFX 7trCb8bOV4wWflWG3aK2b08C+9QTsAbTY/AxLHNSWCgQGaAg2aSrL79B6AbVF8Qy7QJrbE 9y5ozwGrs3njTtISWZ2V9GZE41nhlEI= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ekq+Yyb8; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of vbendel@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vbendel@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714226754; a=rsa-sha256; cv=none; b=BZkEpUB3zYUl6UsAV4NbOpoX07RyQqJ6x50Y56En+tbsoCpD2gatk4cE0Jds3PtongIL1g TP+QpR3KsLQhusXbawiKZPh8z+EzP+mDlgw+lKLxH4W3a2NZ0Rxm/aSNppRR+kA+97Qr7u hgMDiwuURH3P1GoAdDKqCVLP614o6zc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1714226753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Vjoo7O2aZKdFnLnLxnXyATUk9tUsaJOzPGRkBAglhlc=; b=Ekq+Yyb8H3N2QoGG6jKUXLYG9/U4t6sq54YsIubHKoghMakkqQw9D24eZV6GnLAv/5n7uV rXrGypGCinvXl5XTPsU1oZZdJSdrSPb2AudCYMT4VayNI/2T0L6KkXwiK9/v2q6knWVHIx F2sVUHTXQNs7Wj9oWkPVQqwg8hxdqy4= Received: from mail-yw1-f198.google.com (mail-yw1-f198.google.com [209.85.128.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-610-t6BNNC95NKqd3ajLoet2qg-1; Sat, 27 Apr 2024 10:05:49 -0400 X-MC-Unique: t6BNNC95NKqd3ajLoet2qg-1 Received: by mail-yw1-f198.google.com with SMTP id 00721157ae682-61b2abd30f9so56846157b3.0 for ; Sat, 27 Apr 2024 07:05:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714226749; x=1714831549; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Vjoo7O2aZKdFnLnLxnXyATUk9tUsaJOzPGRkBAglhlc=; b=A2G6S1XxHsQC2h56dWZcBaPzUPyksOH+D+6yGYBvnaFmQX814mSRLGplvRAhZtAvHZ JoEQRMM8qawuzI6nTUhBYYJyX13E5iLhM6HyfqlmB7hQXFhP6c/FbZf0aOM0eCSoWDMx jgWiZREFjdia6+w6pLhKncH68K2jfPJI6bQANvZQyyK6x83RLUT7EKKZowILy+MbrseE 3KhV6PNK3SJMtTZMy4KseFl51yEgpvUVnRzR3WM9Kf1lYaTLuyUyIh+8euNsTxA8TJ20 Ugq+OBmpFGU8SqcPy9YtjO2lrlTFvQP010cv5WPKkZc4aVWawmNbzXw4VAMIZP9dkeUK +V+g== X-Forwarded-Encrypted: i=1; AJvYcCXViQTb5QOvFJk9UKYjRHNFTtxZqArjAfd6zrKQSnNxNcnVD8XYcTa3rZTbW9WMTnA61BtRPaHP7KjBl2izxsU5nso= X-Gm-Message-State: AOJu0YyVfH0gZVeTVk0bBn2fAfST92yIW5GYw9JVIk8ZenCpYPZNujp9 7UvlEH4KtC5mE5fkkgH0UdNHXMbJNcN6NgwoPxD849EMAhJSsOEwh06rCoT3zfWcxynUuQ7BwBl P/VryBXC0rWSopbVwK66YRvDdtrrAi3SBx9Y93BLe+L7a4f6ndzKYXShEA/IXbKzO0oiUbcOn5W DkYqLv52FME3vz1f9XOI4VK4k= X-Received: by 2002:a5b:c48:0:b0:de4:738b:c2ea with SMTP id d8-20020a5b0c48000000b00de4738bc2eamr6059838ybr.24.1714226748819; Sat, 27 Apr 2024 07:05:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFw1ioLCT024pgW88XexguRpkkdu55d83+sFnURsNxXcBxoTjeg5sc51E+xP5qijJlto9uf3qQtf2y/5LZ5ppU= X-Received: by 2002:a5b:c48:0:b0:de4:738b:c2ea with SMTP id d8-20020a5b0c48000000b00de4738bc2eamr6059819ybr.24.1714226748524; Sat, 27 Apr 2024 07:05:48 -0700 (PDT) MIME-Version: 1.0 References: <20240405164920.2844-1-mcassell411@gmail.com> In-Reply-To: From: Vratislav Bendel Date: Sat, 27 Apr 2024 16:05:32 +0200 Message-ID: Subject: Re: [PATCH] Documentation/admin-guide/sysctl/vm.rst adding the importance of NUMA-node count to documentation To: Matthew Cassell Cc: corbet@lwn.net, akpm@linux-foundation.org, rppt@kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/alternative; boundary="0000000000006936ec06171484cc" X-Rspamd-Server: rspam01 X-Stat-Signature: 37e3tfqcyd7qr7y6cf4o9zqgryqgdo1o X-Rspam-User: X-Rspamd-Queue-Id: 0EF4F40005 X-HE-Tag: 1714226753-220040 X-HE-Meta: U2FsdGVkX19O/rP0dizN9tjWGF9f8g/vmeYcqq6oV+PsX11JnB0MSRRaiZSX5+nuNgCMgXRnjNJKP6sMemvdb7XgqJQasfaIrYaQvIIIniUxqCbzTy6Ld+eKghuupbY5Ck274vcjagjIyAViYRV+6ANI+A9pGWj2Jj7RmhmJt682ghX2rGro3Ffe0Ysq2DN8zNdhpR+sUglykUu6gurQvwOzW5Tklrd+WBWLLGSn2+mj98ZxohCRKxGytEWMyiUdr6sVh5/ac4p2fMaDoyCQXoWQ93Nj4tCCd94awiVKBhP4QHXlY3BkWkSyxlc3R7z4YluWMroHlq1HWdfxhPdfpnR5CefOIH0dts9QeEPdxIeY5ozDMPC0AtFvKO2rxq3XX1XE1RMxwgqpVqxoC5dpkaW7EW/J9iUgLcrgoFv3PA2jYM6Q3eEwa9y6a8vHl0Bmr9agQzm29ovOp3Y7Q8WM+3f3cnyV/ybp5dAGiv1ppk0dfFe11ejBIdRtGu+3pi3dc7snkJqeGxK+2YVKuLaCSF1rhuJRVlL0gKbNDgGNHOmV+heXzSeNg8ToRYRdVL6szVt4tzvU8zJ8laCj8BLMnuYbEribcewDOGBKyuyt5l7BxWyOwJyyJiqm+1goObu/64FE7UHHD7mS/uos6pgD7IAJ/2LzSD6Khe37hbfyumgog+NjM3+Mzc/WukdaurJQtfS3IiU7584a6X4y5bJ+ZmaeAyUH9N1gvmaHaeOVspKdTQ0L78NeQgAeVC87+PyIckMJfep3oKfzapbBZJyELxfqCAJbZ/UV2sJGWo4RV5rq2fSjwD9OT7nXg3OFvvcm+0Bn3MMbFE3VOQmMK9dgwRB5vQFlyxoMKg7xiI6e2bMxTyD/GC0S7nisEH5g1xT7GEX2SEnGIv/vlJ1Rz+uQHJRRTYqQwN71TEn5PYC62CIj/dV02ON2nawQMtieRFzJ8ETkLrFAAdTCLPohGQN xNy5argq A7Zb+z5j+Pp8lEP9QdlS68djkZw7attf88O1RFDtM5QO2QhE6nV8YCAO3Qmh41VxUoVx0G3NFShqyFbN8d8+8jpkAweLi3A/KqwInNZ8OPe6yzAZdm2qaKlj1IYGuDDYTYublK8SK85jHlu/phUL8XOa59jG7aAqHWEswyYA2zB6W6MflxWf+NcuCvCNm9kliMyq3KlAZIbdgBoxl+J7edLzpP4fg953Ms7JuK7Ul6hiyNySn2M3IxOf7f5UaEaQfIpjaX0CrhlOS5dHWNJEdLN4MO203v3nnZiUEcVG6ps/J08SE0kxCgmFL0VR0NQWHtnmbf2c3F07QDUHyqFqOOT2l+oHqu6Bqb2lJHuwO+E4QpHrfdVe5VBrMgDR4ffUBpLxlAmzuIhgmSKkezRcK+KJYaB0V5ucOhXpiysfJFp8YVdH1xOZWaX1rAdGwYdpw3010i+CzdmbJnArQU048scnoLM/wEkOsHLuyvV84g7abaDJqwa68t+80Xw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --0000000000006936ec06171484cc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable IMHO you went too encyclopedia-style this time. :) My extra point was just to elaborate on what the "additional actions" means, but I would suggest keeping it as concise as possible. Also the bit values are already explained - no need to repeat that. On Fri, Apr 12, 2024 at 10:48=E2=80=AFPM Matthew Cassell wrote: > Thanks for the feedback. Here is a quick outline I came up with on your > advice: > > [...] (original content) > > Keep in mind enabling bits in zone_reclaim_mode makes the most sense > for topologies consisting of multiple NUMA nodes. In addition to vanilla > zone_reclaim (clean and unmapped pages), there exist additional bits that > expand which pages are eligible to be reclaimed and dictate scan_control > policy during the reclaim process. The page allocator will attempt to > reclaim > memory locally in accordance with these bits before attempting to allocat= e > on remote nodes. > > Allow dirty pages to become candidates for memory reclaim:: > > echo 2 > /proc/sys/vm/zone_reclaim_mode > > [...] (original content) > > Allow mapped pages to become candidates for memory reclaim:: > > echo 4 > /proc/sys/vm/zone_reclaim_mode > > [...] (original content) > > I'm trying to balance between keeping the original content, being > descriptive, > and not going into encyclopedia-mode. My motivation was to stress the > importance > of NUMA-node count and describe the additional bits more per your advice. > I added the echo snippets to better segue the aggressive options. Any > thoughts > on the above? > > On Thu, Apr 11, 2024 at 2:54=E2=80=AFAM Vratislav Bendel > wrote: > > > > On Fri, Apr 5, 2024 at 6:49=E2=80=AFPM Matthew Cassell > wrote: > > > > > > If any bits are set in node_reclaim_mode (tunable via > > > /proc/sys/vm/zone_reclaim_mode) within get_pages_from_freelist(), the= n > > > page allocations start getting early access to reclaim via the > > > node_reclaim() code path when memory pressure increases. This behavio= r > > > provides the most optimization for multiple NUMA node machines. The > above > > > is mentioned in: > > > > > > Commit 9eeff2395e3cfd05c9b2e6 ("[PATCH] Zone reclaim: Reclaim logic") > > > states "Zone reclaim is of particular importance for NUMA machines. I= t > > > can be more beneficial to reclaim a page than taking the performance > > > penalties that come with allocating a page on a REMOTE zone." > > > > > > While the pros/cons of staying on node versus allocating remotely are > > > mentioned in commit histories and mailing lists. It isn't specificall= y > > > mentioned in Documentation/ and isn't possible with a lone node. > Imagine a > > > situation where CONFIG_NUMA=3Dy (the default on most major distributi= ons) > > > and only a single NUMA node exists. The latter is an oxymoron > > > (single-node =3D=3D uniform memory access). Informing the user via vm= .rst > that > > > the most bang for their buck is when multiple nodes exist seems > helpful. > > > > > > > I agree that the documentation could be improved to better express the > > implications > > and relevance of setting zone_reclaim_mode bits. > > > > Though I would suggest to go a step further and also elaborate on > > those "additional actions", > > for example something like: > > "The page allocator will attempt to reclaim memory within the zone, > > depending on the bits set, > > before looking for free pages in other zones, namely on remote memory > nodes." > > > > > Signed-off-by: Matthew Cassell > > > --- > > > Documentation/admin-guide/sysctl/vm.rst | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/Documentation/admin-guide/sysctl/vm.rst > b/Documentation/admin-guide/sysctl/vm.rst > > > index c59889de122b..10270548af2a 100644 > > > --- a/Documentation/admin-guide/sysctl/vm.rst > > > +++ b/Documentation/admin-guide/sysctl/vm.rst > > > @@ -1031,7 +1031,8 @@ Consider enabling one or more zone_reclaim mode > bits if it's known that the > > > workload is partitioned such that each partition fits within a NUMA > node > > > and that accessing remote memory would cause a measurable performanc= e > > > reduction. The page allocator will take additional actions before > > > -allocating off node pages. > > > +allocating off node pages. Keep in mind enabling bits in > zone_reclaim_mode > > > +makes the most sense for topologies consisting of multiple NUMA node= s. > > > > > > Allowing zone reclaim to write out pages stops processes that are > > > writing large amounts of data from dirtying pages on other nodes. Zo= ne > > > -- > > > 2.34.1 > > > > > > > --0000000000006936ec06171484cc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
IMHO you went too encyclopedia-style this time. :)
My extra point was just to elabo= rate on what the "additional actions" means, but I would suggest = keeping it as concise as possible.=C2=A0
Also the bit values = are already explained - no need to repeat that.=C2=A0

On Fri, = Apr 12, 2024 at 10:48=E2=80=AFPM Matthew Cassell <mcassell411@gmail.com> wrote:
Thanks for the feedback. Here is a = quick outline I came up with on your
advice:

[...] (original content)

Keep in mind enabling bits in zone_reclaim_mode makes the most sense
for topologies consisting of multiple NUMA nodes. In addition to vanilla zone_reclaim (clean and unmapped pages), there exist additional bits that expand which pages are eligible to be reclaimed and dictate scan_control policy during the reclaim process. The page allocator will attempt to recla= im
memory locally in accordance with these bits before attempting to allocate<= br> on remote nodes.

Allow dirty pages to become candidates for memory reclaim::

=C2=A0 =C2=A0 =C2=A0 =C2=A0 echo 2 > /proc/sys/vm/zone_reclaim_mode

[...] (original content)

Allow mapped pages to become candidates for memory reclaim::

=C2=A0 =C2=A0 =C2=A0 =C2=A0 echo 4 > /proc/sys/vm/zone_reclaim_mode

[...] (original content)

I'm trying to balance between keeping the original content, being descr= iptive,
and not going into encyclopedia-mode. My motivation was to stress the impor= tance
of NUMA-node count and describe the additional bits more per your advice. I added the echo snippets to better segue the aggressive options. Any thoug= hts
on the above?

On Thu, Apr 11, 2024 at 2:54=E2=80=AFAM Vratislav Bendel <vbendel@redhat.com> wrote:=
>
> On Fri, Apr 5, 2024 at 6:49=E2=80=AFPM Matthew Cassell <mcassell411@gmail.com&g= t; wrote:
> >
> > If any bits are set in node_reclaim_mode (tunable via
> > /proc/sys/vm/zone_reclaim_mode) within get_pages_from_freelist(),= then
> > page allocations start getting early access to reclaim via the > > node_reclaim() code path when memory pressure increases. This beh= avior
> > provides the most optimization for multiple NUMA node machines. T= he above
> > is mentioned in:
> >
> > Commit 9eeff2395e3cfd05c9b2e6 ("[PATCH] Zone reclaim: Reclai= m logic")
> > states "Zone reclaim is of particular importance for NUMA ma= chines. It
> > can be more beneficial to reclaim a page than taking the performa= nce
> > penalties that come with allocating a page on a REMOTE zone."= ;
> >
> > While the pros/cons of staying on node versus allocating remotely= are
> > mentioned in commit histories and mailing lists. It isn't spe= cifically
> > mentioned in Documentation/ and isn't possible with a lone no= de. Imagine a
> > situation where CONFIG_NUMA=3Dy (the default on most major distri= butions)
> > and only a single NUMA node exists. The latter is an oxymoron
> > (single-node =3D=3D uniform memory access). Informing the user vi= a vm.rst that
> > the most bang for their buck is when multiple nodes exist seems h= elpful.
> >
>
> I agree that the documentation could be improved to better express the=
> implications
> and relevance of setting zone_reclaim_mode bits.
>
> Though I would suggest to go a step further and also elaborate on
> those "additional actions",
> for example something like:
> "The page allocator will attempt to reclaim memory within the zon= e,
> depending on the bits set,
> before looking for free pages in other zones, namely on remote memory = nodes."
>
> > Signed-off-by: Matthew Cassell <mcassell411@gmail.com>
> > ---
> >=C2=A0 Documentation/admin-guide/sysctl/vm.rst | 3 ++-
> >=C2=A0 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentat= ion/admin-guide/sysctl/vm.rst
> > index c59889de122b..10270548af2a 100644
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -1031,7 +1031,8 @@ Consider enabling one or more zone_reclaim = mode bits if it's known that the
> >=C2=A0 workload is partitioned such that each partition fits withi= n a NUMA node
> >=C2=A0 and that accessing remote memory would cause a measurable p= erformance
> >=C2=A0 reduction.=C2=A0 The page allocator will take additional ac= tions before
> > -allocating off node pages.
> > +allocating off node pages. Keep in mind enabling bits in zone_re= claim_mode
> > +makes the most sense for topologies consisting of multiple NUMA = nodes.
> >
> >=C2=A0 Allowing zone reclaim to write out pages stops processes th= at are
> >=C2=A0 writing large amounts of data from dirtying pages on other = nodes. Zone
> > --
> > 2.34.1
> >
>

--0000000000006936ec06171484cc--