linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>, bpf <bpf@vger.kernel.org>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	 Steven Rostedt <rostedt@goodmis.org>,
	Hou Tao <houtao1@huawei.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	shakeel.butt@linux.dev,  Thomas Gleixner <tglx@linutronix.de>,
	Tejun Heo <tj@kernel.org>, linux-mm <linux-mm@kvack.org>,
	 Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation
Date: Tue, 10 Dec 2024 14:06:32 -0800	[thread overview]
Message-ID: <CAADnVQKj40zerCcfcLwXOTcL+13rYzrraxWABRSRQcPswz6Brw@mail.gmail.com> (raw)
In-Reply-To: <Z1gEUmHkF1ikgbor@tiehlicka>

On Tue, Dec 10, 2024 at 1:05 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 10-12-24 05:31:30, Matthew Wilcox wrote:
> > On Mon, Dec 09, 2024 at 06:39:31PM -0800, Alexei Starovoitov wrote:
> > > +   if (preemptible() && !rcu_preempt_depth())
> > > +           return alloc_pages_node_noprof(nid,
> > > +                                          GFP_NOWAIT | __GFP_ZERO,
> > > +                                          order);
> > > +   return alloc_pages_node_noprof(nid,
> > > +                                  __GFP_TRYLOCK | __GFP_NOWARN | __GFP_ZERO,
> > > +                                  order);
> >
> > [...]
> >
> > > @@ -4009,7 +4018,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
> > >      * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
> > >      */
> > >     alloc_flags |= (__force int)
> > > -           (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
> > > +           (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM | __GFP_TRYLOCK));
> >
> > It's not quite clear to me that we need __GFP_TRYLOCK to implement this.
> > I was originally wondering if this wasn't a memalloc_nolock_save() /
> > memalloc_nolock_restore() situation (akin to memalloc_nofs_save/restore),
> > but I wonder if we can simply do:
> >
> >       if (!preemptible() || rcu_preempt_depth())
> >               alloc_flags |= ALLOC_TRYLOCK;
>
> preemptible is unusable without CONFIG_PREEMPT_COUNT but I do agree that
> __GFP_TRYLOCK is not really a preferred way to go forward. For 3
> reasons.
>
> First I do not really like the name as it tells what it does rather than
> how it should be used. This is a general pattern of many gfp flags
> unfotrunatelly and historically it has turned out error prone. If a gfp
> flag is really needed then something like __GFP_ANY_CONTEXT should be
> used.  If the current implementation requires to use try_lock for
> zone->lock or other changes is not an implementation detail but the user
> should have a clear understanding that allocation is allowed from any
> context (NMI, IRQ or otherwise atomic contexts).

__GFP_ANY_CONTEXT would make sense if we wanted to make it available
for all kernel users. In this case I agree with Sebastian.
This is bpf specific feature, since it doesn't know the context.
All other kernel users should pick GFP_KERNEL or ATOMIC or NOWAIT.
Exposing GFP_ANY_CONTEXT to all may lead to sloppy code in drivers
and elsewhere.

> Is there any reason why GFP_ATOMIC cannot be extended to support new
> contexts? This allocation mode is already documented to be usable from
> atomic contexts except from NMI and raw_spinlocks. But is it feasible to
> extend the current implementation to use only trylock on zone->lock if
> called from in_nmi() to reduce unexpected failures on contention for
> existing users?

No. in_nmi() doesn't help. It's the lack of reentrance of slab and page
allocator that is an issue.
The page alloctor might grab zone lock. In !RT it will disable irqs.
In RT will stay sleepable. Both paths will be calling other
kernel code including tracepoints, potential kprobes, etc
and bpf prog may be attached somewhere.
If it calls alloc_page() it may deadlock on zone->lock.
pcpu lock is thankfully trylock already.
So !irqs_disabled() part of preemptible() guarantees that
zone->lock won't deadlock in !RT.
And rcu_preempt_depth() case just steers bpf into try lock only path in RT.
Since there is no way to tell whether it's safe to call
sleepable spin_lock(&zone->lock).

>
> Third, do we even want such a strong guarantee in the generic page
> allocator path and make it even more complex and harder to maintain?

I'm happy to add myself as R: or M: for trylock bits,
since that will be a fundamental building block for bpf.

> We
> already have a precence in form of __alloc_pages_bulk which is a special
> case allocator mode living outside of the page allocator path. It seems
> that it covers most of your requirements except the fallback to the
> regular allocation path AFAICS. Is this something you could piggy back
> on?

__alloc_pages_bulk() has all the same issues. It takes locks.
Also it doesn't support GFP_ACCOUNT which is a show stopper.
All bpf allocations are going through memcg.


  parent reply	other threads:[~2024-12-10 22:06 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-10  2:39 [PATCH bpf-next v2 0/6] bpf, mm: Introduce __GFP_TRYLOCK Alexei Starovoitov
2024-12-10  2:39 ` [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation Alexei Starovoitov
2024-12-10  5:31   ` Matthew Wilcox
2024-12-10  9:05     ` Michal Hocko
2024-12-10 20:25       ` Shakeel Butt
2024-12-11 10:08         ` Michal Hocko
2024-12-10 22:06       ` Alexei Starovoitov [this message]
2024-12-11 10:19         ` Michal Hocko
2024-12-12 15:07         ` Sebastian Sewior
2024-12-12 15:21           ` Michal Hocko
2024-12-12 15:35             ` Sebastian Sewior
2024-12-12 15:48               ` Steven Rostedt
2024-12-12 16:00                 ` Sebastian Sewior
2024-12-13 17:44                   ` Steven Rostedt
2024-12-13 18:44                     ` Alexei Starovoitov
2024-12-13 18:57                       ` Alexei Starovoitov
2024-12-13 20:09                       ` Steven Rostedt
2024-12-13 21:00                         ` Steven Rostedt
2024-12-13 22:02                           ` Alexei Starovoitov
2024-12-12 21:57               ` Alexei Starovoitov
2024-12-10 21:42     ` Alexei Starovoitov
2024-12-10  9:01   ` Sebastian Andrzej Siewior
2024-12-10 21:53     ` Alexei Starovoitov
2024-12-11  8:38       ` Vlastimil Babka
2024-12-12  2:14         ` Alexei Starovoitov
2024-12-12  8:54           ` Vlastimil Babka
2024-12-10 18:39   ` Vlastimil Babka
2024-12-10 22:42     ` Alexei Starovoitov
2024-12-11  8:48       ` Vlastimil Babka
2024-12-10  2:39 ` [PATCH bpf-next v2 2/6] mm, bpf: Introduce free_pages_nolock() Alexei Starovoitov
2024-12-10  8:35   ` Sebastian Andrzej Siewior
2024-12-10 22:49     ` Alexei Starovoitov
2024-12-12 14:44       ` Sebastian Andrzej Siewior
2024-12-12 19:57         ` Alexei Starovoitov
2024-12-11 10:11   ` Vlastimil Babka
2024-12-12  1:43     ` Alexei Starovoitov
2024-12-10  2:39 ` [PATCH bpf-next v2 3/6] locking/local_lock: Introduce local_trylock_irqsave() Alexei Starovoitov
2024-12-11 10:53   ` Vlastimil Babka
2024-12-11 11:55     ` Vlastimil Babka
2024-12-12  2:49       ` Alexei Starovoitov
2024-12-12  9:15         ` Vlastimil Babka
2024-12-13 14:02           ` Vlastimil Babka
2024-12-12 15:15   ` Sebastian Andrzej Siewior
2024-12-12 19:59     ` Alexei Starovoitov
2024-12-10  2:39 ` [PATCH bpf-next v2 4/6] memcg: Add __GFP_TRYLOCK support Alexei Starovoitov
2024-12-11 23:47   ` kernel test robot
2024-12-10  2:39 ` [PATCH bpf-next v2 5/6] mm, bpf: Use __GFP_ACCOUNT in try_alloc_pages() Alexei Starovoitov
2024-12-11 12:05   ` Vlastimil Babka
2024-12-12  2:54     ` Alexei Starovoitov
2024-12-10  2:39 ` [PATCH bpf-next v2 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQKj40zerCcfcLwXOTcL+13rYzrraxWABRSRQcPswz6Brw@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=houtao1@huawei.com \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=memxor@gmail.com \
    --cc=mhocko@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox