From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0AF2C4363A for ; Wed, 28 Oct 2020 12:10:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C0E2A246EE for ; Wed, 28 Oct 2020 12:10:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C0E2A246EE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A3BD86B005C; Wed, 28 Oct 2020 08:10:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C5466B005D; Wed, 28 Oct 2020 08:10:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88CCD6B0062; Wed, 28 Oct 2020 08:10:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0133.hostedemail.com [216.40.44.133]) by kanga.kvack.org (Postfix) with ESMTP id 52F846B005C for ; Wed, 28 Oct 2020 08:10:14 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DE8A9181AEF00 for ; Wed, 28 Oct 2020 12:10:13 +0000 (UTC) X-FDA: 77421216306.09.nail70_2e0b72227284 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id C2C27180AD806 for ; Wed, 28 Oct 2020 12:10:13 +0000 (UTC) X-HE-Tag: nail70_2e0b72227284 X-Filterd-Recvd-Size: 7250 Received: from fireflyinternet.com (unknown [77.68.26.236]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Wed, 28 Oct 2020 12:10:10 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from localhost (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP (TLS) id 22822675-1500050 for multiple; Wed, 28 Oct 2020 12:10:09 +0000 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable In-Reply-To: <20200727211012.30948-1-urezki@gmail.com> References: <20200727211012.30948-1-urezki@gmail.com> Subject: Re: [PATCH v2 1/1] rcu/tree: Drop the lock before entering to page allocator From: Chris Wilson Cc: Andrew Morton , Theodore Y . Ts'o , Matthew Wilcox , Joel Fernandes , Sebastian Andrzej Siewior , Uladzislau Rezki , Oleksiy Avramchenko To: LKML , Paul E . McKenney , RCU , Uladzislau Rezki (Sony) , linux-mm@kvack.org Date: Wed, 28 Oct 2020 12:10:06 +0000 Message-ID: <160388700630.31056.4624109658766915987@build.alporthouse.com> User-Agent: alot/0.9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Quoting Uladzislau Rezki (Sony) (2020-07-27 22:10:12) > If the kernel is built with CONFIG_PROVE_RAW_LOCK_NESTING > option, the lockedp will complain about violation of the > nesting rules: >=20 > > [ 28.060389] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > [ 28.060389] [ BUG: Invalid wait context ] > [ 28.060389] 5.8.0-rc3-rcu #211 Tainted: G E > [ 28.060389] ----------------------------- > [ 28.060390] vmalloc_test/0/523 is trying to lock: > [ 28.060390] ffff96df7ffe0228 (&zone->lock){-.-.}-{3:3}, at: get_page_f= rom_freelist+0xcf0/0x16d0 > [ 28.060391] other info that might help us debug this: > [ 28.060391] context-{5:5} > [ 28.060392] 2 locks held by vmalloc_test/0/523: > [ 28.060392] #0: ffffffffc06750d0 (prepare_for_test_rwsem){++++}-{4:4}= , at: test_func+0x76/0x240 [test_vmalloc] > [ 28.060393] #1: ffff96df5fa1d390 (krc.lock){..-.}-{2:2}, at: kvfree_c= all_rcu+0x5c/0x230 > [ 28.060395] stack backtrace: > [ 28.060395] CPU: 0 PID: 523 Comm: vmalloc_test/0 Tainted: G = E 5.8.0-rc3-rcu #211 > [ 28.060395] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO= S 1.12.0-1 04/01/2014 > [ 28.060396] Call Trace: > [ 28.060397] dump_stack+0x96/0xd0 > [ 28.060397] __lock_acquire.cold.65+0x166/0x2d7 > [ 28.060398] ? find_held_lock+0x2d/0x90 > [ 28.060399] lock_acquire+0xad/0x370 > [ 28.060400] ? get_page_from_freelist+0xcf0/0x16d0 > [ 28.060401] ? mark_held_locks+0x48/0x70 > [ 28.060402] _raw_spin_lock+0x25/0x30 > [ 28.060403] ? get_page_from_freelist+0xcf0/0x16d0 > [ 28.060404] get_page_from_freelist+0xcf0/0x16d0 > [ 28.060405] ? __lock_acquire+0x3ee/0x1b90 > [ 28.060407] __alloc_pages_nodemask+0x16a/0x3a0 > [ 28.060408] __get_free_pages+0xd/0x30 > [ 28.060409] kvfree_call_rcu+0x18a/0x230 > We've encountered the same warning and applying this patches resolves the issue. > Internally the kfree_rcu() uses raw_spinlock_t whereas the > page allocator internally deals with spinlock_t to access > to its zones. >=20 > In order to prevent such vialation that is in question we > can drop the internal raw_spinlock_t before entering to > the page allocaor. >=20 > Short changelog (v1 -> v2): > - rework the commit message; > - rework the patch making it smaller; > - add more comments. >=20 > Signed-off-by: Uladzislau Rezki (Sony) > --- > kernel/rcu/tree.c | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) >=20 > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 21c2fa5bd8c3..2de112404121 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3287,6 +3287,8 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cp= u *krcp, void *ptr) > return false; > =20 > lockdep_assert_held(&krcp->lock); > + lockdep_assert_irqs_disabled(); > + > idx =3D !!is_vmalloc_addr(ptr); > =20 > /* Check if a new block is required. */ > @@ -3306,6 +3308,29 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_c= pu *krcp, void *ptr) > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > return false; > =20 > + /* > + * If built with CONFIG_PROVE_RAW_LOCK_NESTING op= tion, > + * the lockedp will complain about violation of t= he > + * nesting rules. It does the raw_spinlock vs. sp= inlock > + * nesting checks. > + * > + * That is why we drop the raw lock. Please note = IRQs are > + * still disabled it guarantees that the "current= " stays > + * on the same CPU later on when the raw lock is = taken > + * back. > + * > + * It is important because if the page allocator = is invoked > + * in fully preemptible context, it can be that w= e get a page > + * but end up on another CPU. That another CPU mi= ght not need > + * a page because of having some extra spots in i= ts internal > + * array for pointer collecting. Staying on same = CPU eliminates > + * described issue. > + * > + * Dropping the lock also reduces the critical se= ction by > + * the time taken by the page allocator to obtain= a page. > + */ > + raw_spin_unlock(&krcp->lock); Aiui, this makes the CONFIG_PREEMPT_RT check and subsequent comment redunda= nt. > /* > * NOTE: For one argument of kvfree_rcu() we can > * drop the lock and get the page in sleepable > @@ -3315,6 +3340,8 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cp= u *krcp, void *ptr) > */ > bnode =3D (struct kvfree_rcu_bulk_data *) > __get_free_page(GFP_NOWAIT | __GFP_NOWARN= ); > + > + raw_spin_lock(&krcp->lock); > } The consequence of dropping the lock here is that we may create two bnodes and add them both to krcp->bkvhead[idx]. The list remains intact, but the second entry may be underutilised. That should be transient. It doesn't seem like the patch will break things, and removes the lockdep splat, Tested-by: Chris Wilson -Chris