From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B07AEC433EF for ; Wed, 8 Sep 2021 23:33:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 220FE6044F for ; Wed, 8 Sep 2021 23:33:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 220FE6044F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 72B596B0071; Wed, 8 Sep 2021 19:33:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D9AE6B0072; Wed, 8 Sep 2021 19:33:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 552CA900002; Wed, 8 Sep 2021 19:33:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 437646B0071 for ; Wed, 8 Sep 2021 19:33:35 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E04C32CFF9 for ; Wed, 8 Sep 2021 23:33:34 +0000 (UTC) X-FDA: 78566010348.21.C832CB3 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf23.hostedemail.com (Postfix) with ESMTP id 8DFCE900009B for ; Wed, 8 Sep 2021 23:33:34 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id y144so4498102qkb.6 for ; Wed, 08 Sep 2021 16:33:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=MrFx8f2Sbic52YCjTt/XxUCD8D0mz2UtAejCbMntRqM=; b=Y7otoefmsPF3VMf0m5rBzollI1ZgyX0Tgmchly0g12QfYjznjH4lzQQe11mcXXqYvJ +WssRuUBIHSFxPzQpMZJCht2WaU1h2VWq7yIuXtjkHDlDgQ+c7iI6eYzJAq7HuUKHgIg fjCFIPZvkiDP6zU8HLt40739TRlDYSKRzL/478UEmcplbdmDUiRGjwvdcpIdBh6D9N2r 5GhxU5yPUxVCAM54aNCh7hGOLS8a+wreviKplPqFKwVxoj/Pv5cDyDhtH7yoAO3v/ueh SKEjt31f79K1SXgL+0hFrhpqE7GudawpI1u0Rn6fga+z9GifcS8uLqvAjCbok2ZNYU8A d9XA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=MrFx8f2Sbic52YCjTt/XxUCD8D0mz2UtAejCbMntRqM=; b=TjJfcYu7xgA9Lb9AxwnitRcZ3REGzpDsoyEiF5p+fuHRM/8E2NdhcsjALjvH+YkTxD f5qoY/kgdp0onToIhQgcA6Zt1QdtXkUrleY7rVgNaegnRpFl8zcQWsabA4nUICBVUnOy ipDFx1AbPCgeKjW2uiGDbjpFm+VO+hydQwNDYmmVwKVekUKY/FLXpLbdabD/8YcSDOR5 65h2iyMOh2/28gGLz7XAFA2an5SC75naAsV/HWR+OK5hVrtJsiN+c5AIEJctuaLx3BxS BJs/Qbh/keJx5iFZMuKht2khs0kgupPk+nXw8f3z0fdyeDLgH3lYH+J8qlglMk11Dwwv RW6g== X-Gm-Message-State: AOAM53182r4MYxUIXnVpxqp0uqb1WrKCIrtQDSamMoENNWPeSaw33wFG PsgEsMiksm2Uqa4spyqfJWRH5Q== X-Google-Smtp-Source: ABdhPJxQQSxNdwoc1gf8pPeDJefusJYSa7mYD4j+G6VAAQpFQNRMbsDasczgOrbWfUIEJPcBee/tgA== X-Received: by 2002:a05:620a:4514:: with SMTP id t20mr176750qkp.114.1631144012886; Wed, 08 Sep 2021 16:33:32 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id k186sm101073qkd.47.2021.09.08.16.33.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Sep 2021 16:33:32 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mO74N-00EsAz-LZ; Wed, 08 Sep 2021 20:33:31 -0300 Date: Wed, 8 Sep 2021 20:33:31 -0300 From: Jason Gunthorpe To: Yonghong Song Cc: Liam Howlett , Alexei Starovoitov , Andrew Morton , Luigi Rizzo , Peter Zijlstra , Daniel Borkmann , bpf , "linux-mm@kvack.org" , Alexei Starovoitov , Andrii Nakryiko , "kernel-team@fb.com" , "walken@fb.com" Subject: Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock Message-ID: <20210908233331.GA3544071@ziepe.ca> References: <20210908172118.n2f4w7epm6hh62zf@ast-mbp.dhcp.thefacebook.com> <20210908105259.c47dcc4e4371ebb5e147ee6e@linux-foundation.org> <20210908180258.yjh62e5oouckar5b@ast-mbp.dhcp.thefacebook.com> <20210908111527.9a611426e257d55ccbbf46eb@linux-foundation.org> <20210908183032.zoh6dj5xh455z47f@revolver> <20210908184912.GA1200268@ziepe.ca> <7aece51f-141c-db55-5d4c-8c6658b6a1fc@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7aece51f-141c-db55-5d4c-8c6658b6a1fc@fb.com> X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8DFCE900009B Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=Y7otoefm; dmarc=none; spf=pass (imf23.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.222.175 as permitted sender) smtp.mailfrom=jgg@ziepe.ca X-Stat-Signature: y41w441hknudoi6k11a58hbx6x93b75x X-HE-Tag: 1631144014-703746 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 08, 2021 at 12:11:54PM -0700, Yonghong Song wrote: > > > On 9/8/21 11:49 AM, Jason Gunthorpe wrote: > > On Wed, Sep 08, 2021 at 06:30:52PM +0000, Liam Howlett wrote: > > > > > /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */ > > > -struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) > > > +struct vm_area_struct *find_vma_non_owner(struct mm_struct *mm, > > > + unsigned long addr) > > > { > > > struct rb_node *rb_node; > > > struct vm_area_struct *vma; > > > - mmap_assert_locked(mm); > > > + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > > > /* Check the cache first. */ > > > vma = vmacache_find(mm, addr); > > > if (likely(vma)) > > > @@ -2325,6 +2326,11 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) > > > return vma; > > > } > > > +struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) > > > +{ > > > + lockdep_assert_held(&mm->mmap_lock); > > > + return find_vma_non_owner(mm, addr); > > > +} > > > EXPORT_SYMBOL(find_vma); > > > /* > > > > > > > > > Although this leaks more into the mm API and was referred to as ugly > > > previously, it does provide a working solution and still maintains the > > > same level of checking. > > > > I think it is no better than before. > > > > The solution must be to not break lockdep in the BPF side. If Peter's > > reworked algorithm is not OK then BPF should drop/acquire the lockdep > > when it punts the unlock to the WQ. > > The current warning is triggered by bpf calling find_vma(). Yes, but that is because the lockdep has already been dropped. It looks to me like it basically does this: mmap_read_trylock_non_owner(current->mm) vma = find_vma(current->mm, ips[i]); if (!work) { mmap_read_unlock_non_owner(current->mm); } else { work->mm = current->mm; irq_work_queue(&work->irq_work); And the only reason for this lockdep madness is because the irq_work_queue() does: static void do_up_read(struct irq_work *entry) { struct stack_map_irq_work *work; if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT))) return; work = container_of(entry, struct stack_map_irq_work, irq_work); mmap_read_unlock_non_owner(work->mm); } This is all about deferring the unlock to outside an IRQ context. The lockdep ownership is transfered from the IRQ to the work, which is something that we don't usually do or model in lockdep. Lockdep complains with the straightforward code because exiting an IRQ with locks held is illegal. The saner version is more like: mmap_read_trylock(current->mm) vma = find_vma(current->mm, ips[i]); if (!work) { mmap_read_unlock(current->mm); } else { work->mm = current->mm; rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_); irq_work_queue(&work->irq_work); do_up_read(): mmap_read_unlock_non_owner(work->mm); ie properly model in lockdep that ownership moves from the IRQ to the work. At least we don't corrupt the core mm code with this insanity. Jason