From: Jason Gunthorpe <jgg@ziepe.ca>
To: Yonghong Song <yhs@fb.com>
Cc: Liam Howlett <liam.howlett@oracle.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Luigi Rizzo <lrizzo@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Daniel Borkmann <daniel@iogearbox.net>, bpf <bpf@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
"kernel-team@fb.com" <kernel-team@fb.com>,
"walken@fb.com" <walken@fb.com>
Subject: Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock
Date: Wed, 8 Sep 2021 20:33:31 -0300 [thread overview]
Message-ID: <20210908233331.GA3544071@ziepe.ca> (raw)
In-Reply-To: <7aece51f-141c-db55-5d4c-8c6658b6a1fc@fb.com>
On Wed, Sep 08, 2021 at 12:11:54PM -0700, Yonghong Song wrote:
>
>
> On 9/8/21 11:49 AM, Jason Gunthorpe wrote:
> > On Wed, Sep 08, 2021 at 06:30:52PM +0000, Liam Howlett wrote:
> >
> > > /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */
> > > -struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> > > +struct vm_area_struct *find_vma_non_owner(struct mm_struct *mm,
> > > + unsigned long addr)
> > > {
> > > struct rb_node *rb_node;
> > > struct vm_area_struct *vma;
> > > - mmap_assert_locked(mm);
> > > + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> > > /* Check the cache first. */
> > > vma = vmacache_find(mm, addr);
> > > if (likely(vma))
> > > @@ -2325,6 +2326,11 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> > > return vma;
> > > }
> > > +struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> > > +{
> > > + lockdep_assert_held(&mm->mmap_lock);
> > > + return find_vma_non_owner(mm, addr);
> > > +}
> > > EXPORT_SYMBOL(find_vma);
> > > /*
> > >
> > >
> > > Although this leaks more into the mm API and was referred to as ugly
> > > previously, it does provide a working solution and still maintains the
> > > same level of checking.
> >
> > I think it is no better than before.
> >
> > The solution must be to not break lockdep in the BPF side. If Peter's
> > reworked algorithm is not OK then BPF should drop/acquire the lockdep
> > when it punts the unlock to the WQ.
>
> The current warning is triggered by bpf calling find_vma().
Yes, but that is because the lockdep has already been dropped.
It looks to me like it basically does this:
mmap_read_trylock_non_owner(current->mm)
vma = find_vma(current->mm, ips[i]);
if (!work) {
mmap_read_unlock_non_owner(current->mm);
} else {
work->mm = current->mm;
irq_work_queue(&work->irq_work);
And the only reason for this lockdep madness is because the
irq_work_queue() does:
static void do_up_read(struct irq_work *entry)
{
struct stack_map_irq_work *work;
if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT)))
return;
work = container_of(entry, struct stack_map_irq_work, irq_work);
mmap_read_unlock_non_owner(work->mm);
}
This is all about deferring the unlock to outside an IRQ context. The
lockdep ownership is transfered from the IRQ to the work, which is
something that we don't usually do or model in lockdep.
Lockdep complains with the straightforward code because exiting an IRQ
with locks held is illegal.
The saner version is more like:
mmap_read_trylock(current->mm)
vma = find_vma(current->mm, ips[i]);
if (!work) {
mmap_read_unlock(current->mm);
} else {
work->mm = current->mm;
<tell lockdep we really do mean to return with
the lock held>
rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_);
irq_work_queue(&work->irq_work);
do_up_read():
<tell lockdep the lock was already released from the map>
mmap_read_unlock_non_owner(work->mm);
ie properly model in lockdep that ownership moves from the IRQ to the
work. At least we don't corrupt the core mm code with this insanity.
Jason
next prev parent reply other threads:[~2021-09-08 23:33 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-08 4:44 Yonghong Song
2021-09-08 12:20 ` Daniel Borkmann
2021-09-08 13:53 ` Jason Gunthorpe
2021-09-08 14:15 ` Peter Zijlstra
2021-09-08 14:43 ` Luigi Rizzo
2021-09-08 15:12 ` Liam Howlett
2021-09-08 16:09 ` Yonghong Song
2021-09-08 17:09 ` Luigi Rizzo
2021-09-08 17:21 ` Alexei Starovoitov
2021-09-08 17:52 ` Andrew Morton
2021-09-08 18:02 ` Alexei Starovoitov
2021-09-08 18:15 ` Andrew Morton
2021-09-08 18:20 ` Alexei Starovoitov
2021-09-08 18:30 ` Liam Howlett
2021-09-08 18:45 ` Yonghong Song
2021-09-08 18:49 ` Jason Gunthorpe
2021-09-08 19:11 ` Yonghong Song
2021-09-08 23:33 ` Jason Gunthorpe [this message]
2021-09-09 5:50 ` Yonghong Song
2021-09-09 8:05 ` Peter Zijlstra
2021-09-08 18:43 ` Liam Howlett
2021-09-08 19:42 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210908233331.GA3544071@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=liam.howlett@oracle.com \
--cc=linux-mm@kvack.org \
--cc=lrizzo@google.com \
--cc=peterz@infradead.org \
--cc=walken@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox