From: Vlastimil Babka <vbabka@suse.cz>
To: Dennis Zhou <dennis@kernel.org>, Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Filipe Manana <fdmanana@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm, percpu: do not consider sleepable allocations atomic
Date: Fri, 21 Feb 2025 10:48:28 +0100 [thread overview]
Message-ID: <cfc2d4f2-08d4-45c1-830f-d1786306454a@suse.cz> (raw)
In-Reply-To: <Z7fmnsHTU49eYEaU@snowbird>
On 2/21/25 03:36, Dennis Zhou wrote:
> I've thought about this in the back of my head for the past few weeks. I
> think I have 2 questions about this change.
>
> 1. Back to what TJ said earlier about probing. I feel like GFP_KERNEL
> allocations should be okay because that more or less is control plane
> time? I'm not sure dropping PR_SET_IO_FLUSHER is all that big of a
> work around?
This solves the iscsid case but not other cases, where GFP_KERNEL
allocations are fundamentally impossible.
> 2. This change breaks the feedback loop as we discussed above.
> Historically we've targeted 2-4 free pages worth of percpu memory.
> This is done by kicking the percpu work off. That does GFP_KERNEL
> allocations and if that requires reclaim then it goes and does it.
> However, now we're saying kswapd is going to work in parallel while
> we try to get pages in the worker thread.
>
> Given you're more versed in the reclaim side. I presume it must be
> pretty bad if we're failing to get order-0 pages even if we have
> NOFS/NOIO set?
IMHO yes, so I don't think we need to pre-emptively fear that situation that
much. OTOH in the current state, depleting pcpu's atomic reserves and
failing pcpu_alloc due to not being allowed to take the mutex can happen
easily and even if there's plenty of free memory.
> My feeling is that we should add back some knowledge of the
> dependency so if the worker fails to get pages, it doesn't reschedule
> immediately. Maybe it's as simple as adding a sleep in the worker or
> playing with delayed work...
I think if we wanted things to be more robust (and perhaps there's no need
to, see above), the best way would be to make the worker preallocate with
GFP_KERNEL outside of pcpu_alloc_mutex. I assume it's probably not easy to
implement as page table allocations are involved in the process and we don't
have a way to supply preallocated memory for those.
> Thanks,
> Dennis
next prev parent reply other threads:[~2025-02-21 9:48 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 12:26 Michal Hocko
2025-02-11 15:05 ` Vlastimil Babka
2025-02-11 20:55 ` Tejun Heo
2025-02-12 16:57 ` Michal Hocko
2025-02-12 18:14 ` Tejun Heo
2025-02-12 20:53 ` Michal Hocko
2025-02-12 21:30 ` Tejun Heo
2025-02-12 21:39 ` Dennis Zhou
2025-02-14 15:52 ` Michal Hocko
2025-02-21 2:36 ` Dennis Zhou
2025-02-21 9:48 ` Vlastimil Babka [this message]
2025-03-05 15:10 ` Michal Hocko
2025-03-05 15:35 ` Vlastimil Babka
2025-02-14 15:43 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cfc2d4f2-08d4-45c1-830f-d1786306454a@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=dennis@kernel.org \
--cc=fdmanana@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox