From: Baoquan He <bhe@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>,
Gabriel Krisman Bertazi <krisman@suse.de>,
akpm@linux-foundation.org, linux-mm@kvack.org,
Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH] Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
Date: Thu, 27 Feb 2025 18:24:09 +0800 [thread overview]
Message-ID: <Z8A9SafZwbNzSlYw@MiWiFi-R3L-srv> (raw)
In-Reply-To: <b8b939e8-e00c-4013-a45f-425ce06a8bd5@suse.cz>
On 02/27/25 at 10:16am, Vlastimil Babka wrote:
> On 2/26/25 16:57, Baoquan He wrote:
> > On 02/26/25 at 01:01pm, Michal Hocko wrote:
> >>
> >> Why do you think anything needs to be adjusted?
> >
> > No, I don't think like that. But I am wondering what makes you get
> > the conclusion.
> >
> >>
> >> > I haven't thought of the whole zone fallback list to interleave nodes
> >> > which invovles a lot of change.
> >> >
> >> > >
> >> > > Btw. has 96a5c186efff tried to fix any actual runtime problem? The
> >> > > changelog doesn't say much about that.
> >> >
> >> > No, no actual problem was observed on tht.
> >>
> >> OK
> >>
> >> > I was just trying to make
> >> > clear the semantics because I was confused by its obscure value printing
> >> > of zone->lowmem_reserve[] in /proc/zoneinfo.
> >> >
> >> > I think we can merge this reverting firstly, then to investigate how to
> >> > better clarify it.
> >>
> >> What do you think needs to be clarify? How exactly is the original
> >> output confusing?
> >
> > When I did the change, I wrote the reason in commit log. I don't think
> > you care to read it from your talking. Let me explain again, in
> > setup_per_zone_lowmem_reserve(), each zone's protection value is
> > calculated based on its own node's zones. E.g below on node 0, its
> > Movable zone and Device zone are empty but still show influence on
> > Normal/DMA32/DMA zone, this is unreasonable from the protection value
> > calculating code and its showing.
Ah, I saw your mail when I finished my replying to Michal. Thanks for
your sharing with deliberate details, I almost agree with them all.
>
> It's not unreasonable. A GFP_HIGHUSER_MOVABLE can use up to the Movable
> zone, so e.g. the dma32 zone should be protected from such an allocation, so
> it has space for GFP_DMA32 restricted allocations.
Yes, I didn't realize that when I did the change in commit 96a5c186efff,
sorry about that.
>
> If no Movable zone exists, but Normal zone does, the result is the
> protection will be the same for GFP_KERNEL allocations (that can use up to
> the Normal zone) and GFP_HIGHUSER_MOVABLE allocations. (i.e. the number of
> 22134 in your listing is the same for both indexes). That's fine. But
> setting the protection from Movable allocations to 0 as commit 96a5c186efff
> did was simply a bug, as that can directly lead to GFP_HIGHUSER_MOVABLE
> depleting ZONE_DMA32.
Yes, agree. I think that's the reason Gabriel observed the regression.
>
> The only "unreasonable" part here is that we define and show protections
> from ZONE_DEVICE allocations. The usage of this zone is AFAIK completely
> separate from normal page allocation through zonelists, so we could exclude
> it, if anyone cared enough.
I think this is not the only unreasonable part.
sysctl_lowmem_reserve_ratio is a knob provided for user to tune the
memory management. While the underlying code relative to the set ratio
can't meet the expections. Even though we revert my patch, it seems to
work well, while the protection value is not under good management. It
just happens to work. Because the protection value is calculated
relative to __GFP_THISNODE allocation, while ignoring the FALLBACK
allocation. I think you have pointed that out greatly in below comment.
>
> > If really as your colleague Gabriel said, the protection value of DMA zone
> > on node 0 will impact allocation when targeted zone is Movable zone, we
> > may need consider the protection value calcuation acorss nodes. Because
> > the impact happens among different nodes. I only said we can do
> > investigation, I didn't said we need change or have to change.
>
> There might be a theoretical issue if e.g. Node 0 only contained DMA and
> DMA32 zones and nothing else, while the Normal zone is on Node 1, there
> would be no protection for DMA/DMA32 zones from Normal allocations, as
> setup_per_zone_lowmem_reserve() considers each node separately and thus
> would not take Normal zone size from Node 1 into account.
>
> Should we sum zone sizes accross all nodes then? But then __GFP_THISNODE
> Normal allocations for node 0 would never succeed? Or we'd need a separate
> lowmem_reserve array for those?
Yeah, I have the same thought as you here. We may need adapation here.
>
> I guess the issue doesn't happen in practice. In any case it's out of scope
> of the reverted commit and the revert.
It could happen on arm64 because arm64 only has ZONE_DMA by default and
its boundary is not fixed. I saw all zones are ZONE_DMA on arm64, I
guess it could be easier to see a arm64 system which only has ZONE_DMA
on node 0 and ZONE_NORMAL/MOVABLE on other nodes.
>
> > Node 0, zone DMA
> > ......
> > pages free 2816
> > ......
> > protection: (0, 1582, 23716, 23716, 23716)
> > Node 0, zone DMA32
> > pages free 403269
> > ......
> > protection: (0, 0, 22134, 22134, 22134)
> > Node 0, zone Normal
> > pages free 5423879
> > ......
> > protection: (0, 0, 0, 0, 0)
> > Node 0, zone Movable
> > pages free 0
> > ......
> > protection: (0, 0, 0, 0, 0)
> > Node 0, zone Device
> > pages free 0
> > ......
> > protection: (0, 0, 0, 0, 0)
> >
> >>
> >> --
> >> Michal Hocko
> >> SUSE Labs
> >>
> >
> >
>
next prev parent reply other threads:[~2025-02-27 10:24 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-26 3:22 Gabriel Krisman Bertazi
2025-02-26 6:54 ` Michal Hocko
2025-02-26 10:00 ` Baoquan He
2025-02-26 10:52 ` Michal Hocko
2025-02-26 11:00 ` Michal Hocko
2025-02-26 11:51 ` Baoquan He
2025-02-26 12:01 ` Michal Hocko
2025-02-26 15:57 ` Baoquan He
2025-02-26 17:46 ` Michal Hocko
2025-02-27 9:41 ` Baoquan He
2025-02-27 9:16 ` Vlastimil Babka
2025-02-27 10:24 ` Baoquan He [this message]
2025-02-27 13:16 ` Vlastimil Babka
2025-02-27 15:53 ` Baoquan He
2025-02-26 13:07 ` Vlastimil Babka
2025-02-26 16:05 ` Gabriel Krisman Bertazi
2025-02-26 23:00 ` Andrew Morton
2025-02-26 13:00 ` Vlastimil Babka
2025-02-27 11:50 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8A9SafZwbNzSlYw@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=krisman@suse.de \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox