From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2187C4743F for ; Mon, 7 Jun 2021 01:49:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4A58E61107 for ; Mon, 7 Jun 2021 01:49:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A58E61107 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AA71F6B006C; Sun, 6 Jun 2021 21:49:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A561C6B006E; Sun, 6 Jun 2021 21:49:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CFFA6B0070; Sun, 6 Jun 2021 21:49:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id 5AB7B6B006C for ; Sun, 6 Jun 2021 21:49:45 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D4110181AC9CC for ; Mon, 7 Jun 2021 01:49:44 +0000 (UTC) X-FDA: 78225246288.40.66B2A74 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf30.hostedemail.com (Postfix) with ESMTP id F363DE000240 for ; Mon, 7 Jun 2021 01:49:42 +0000 (UTC) IronPort-SDR: KGfG6d0dmcLEl/Arw1jf4Y2YvyXsy8QfwL+3HNmIN6KYsvw23SP9M9VzgYfxw0rdHI4TxbIVW8 VRwZmwKwGNXA== X-IronPort-AV: E=McAfee;i="6200,9189,10007"; a="204362004" X-IronPort-AV: E=Sophos;i="5.83,254,1616482800"; d="scan'208";a="204362004" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2021 18:49:40 -0700 IronPort-SDR: 3PUzMW2jkoWSM/qOdA+Fl+k07nVjPV4EcfDX3+4V/seeO66ik3CWRopufaoE8IWxuWckxM16Bu c4IC/M7e6MbQ== X-IronPort-AV: E=Sophos;i="5.83,254,1616482800"; d="scan'208";a="481348673" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.159.119]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2021 18:49:36 -0700 From: "Huang, Ying" To: Will Deacon Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Daniel Jordan , Dan Carpenter , Andrea Parri , Peter Zijlstra , Andi Kleen , Dave Hansen , Omar Sandoval , Paul McKenney , Tejun Heo , Will Deacon , Miaohe Lin Subject: Re: [PATCH -V2] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() References: <20210520073301.1676294-1-ying.huang@intel.com> <20210602160351.GG31179@willie-the-truck> Date: Mon, 07 Jun 2021 09:49:35 +0800 In-Reply-To: <20210602160351.GG31179@willie-the-truck> (Will Deacon's message of "Wed, 2 Jun 2021 17:03:52 +0100") Message-ID: <87mts24f68.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F363DE000240 Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf30.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=ying.huang@intel.com X-Stat-Signature: 391aakgemkcamgpucq13ik5ncxattcsu X-HE-Tag: 1623030582-592566 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Will, Will Deacon writes: > On Thu, May 20, 2021 at 03:33:01PM +0800, Huang Ying wrote: >> Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array >> accesses to avoid NULL derefs"), the typical code to reference the >> swap_info[] is as follows, >> >> type = swp_type(swp_entry); >> if (type >= nr_swapfiles) >> /* handle invalid swp_entry */; >> p = swap_info[type]; >> /* access fields of *p. OOPS! p may be NULL! */ >> >> Because the ordering isn't guaranteed, it's possible that >> swap_info[type] is read before "nr_swapfiles". And that may result >> in NULL pointer dereference. >> >> So after commit c10d38cc8d3e, the code becomes, >> >> struct swap_info_struct *swap_type_to_swap_info(int type) >> { >> if (type >= READ_ONCE(nr_swapfiles)) >> return NULL; >> smp_rmb(); >> return READ_ONCE(swap_info[type]); >> } >> >> /* users */ >> type = swp_type(swp_entry); >> p = swap_type_to_swap_info(type); >> if (!p) >> /* handle invalid swp_entry */; >> /* dereference p */ >> >> Where the value of swap_info[type] (that is, "p") is checked to be >> non-zero before being dereferenced. So, the NULL deferencing >> becomes impossible even if "nr_swapfiles" is read after >> swap_info[type]. Therefore, the "smp_rmb()" becomes unnecessary. >> >> And, we don't even need to read "nr_swapfiles" here. Because the >> non-zero checking for "p" is sufficient. We just need to make sure we >> will not access out of the boundary of the array. With the change, >> nr_swapfiles will only be accessed with swap_lock held, except in >> swapcache_free_entries(). Where the absolute correctness of the value >> isn't needed, as described in the comments. >> >> We still need to guarantee swap_info[type] is read before being >> dereferenced. That can be satisfied via the data dependency ordering >> enforced by READ_ONCE(swap_info[type]). This needs to be paired with >> proper write barriers. So smp_store_release() is used in >> alloc_swap_info() to guarantee the fields of *swap_info[type] is >> initialized before swap_info[type] itself being written. Note that >> the fields of *swap_info[type] is initialized to be 0 via kvzalloc() >> firstly. The assignment and deferencing of swap_info[type] is like >> rcu_assign_pointer() and rcu_dereference(). >> >> Signed-off-by: "Huang, Ying" >> Cc: Daniel Jordan >> Cc: Dan Carpenter >> Cc: Andrea Parri >> Cc: Peter Zijlstra (Intel) >> Cc: Andi Kleen >> Cc: Dave Hansen >> Cc: Omar Sandoval >> Cc: Paul McKenney >> Cc: Tejun Heo >> Cc: Will Deacon >> Cc: Miaohe Lin >> >> v2: >> >> - Revise the patch description and comments per Peter's comments. >> >> --- >> mm/swapfile.c | 15 ++++++--------- >> 1 file changed, 6 insertions(+), 9 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 2aad85751991..65dd979a0f94 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -100,11 +100,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); >> >> static struct swap_info_struct *swap_type_to_swap_info(int type) >> { >> - if (type >= READ_ONCE(nr_swapfiles)) >> + if (type >= MAX_SWAPFILES) >> return NULL; >> >> - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ >> - return READ_ONCE(swap_info[type]); >> + return READ_ONCE(swap_info[type]); /* rcu_dereference() */ >> } >> >> static inline unsigned char swap_count(unsigned char ent) >> @@ -2884,14 +2883,12 @@ static struct swap_info_struct *alloc_swap_info(void) >> } >> if (type >= nr_swapfiles) { >> p->type = type; >> - WRITE_ONCE(swap_info[type], p); >> /* >> - * Write swap_info[type] before nr_swapfiles, in case a >> - * racing procfs swap_start() or swap_next() is reading them. >> - * (We never shrink nr_swapfiles, we never free this entry.) >> + * Publish the swap_info_struct after initializing it. >> + * Note that kvzalloc() above zeroes all its fields. >> */ >> - smp_wmb(); >> - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); >> + smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */ >> + nr_swapfiles++; > > Although I like this change, I comment you are removing refers to some > dodgy-looking code. For example, swap_start() has this loop: > > for (type = 0; (si = swap_type_to_swap_info(type)); type++) { > if (!(si->flags & SWP_USED) || !si->swap_map) > continue; > > so won't this just end up dereferencing NULL if nr_swapfiles < MAX_SWAPFILES? for (type = 0; (si = swap_type_to_swap_info(type)); type++) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Because this is the second sub-statement inside "for ()", I think that "si" will be checked to be non-NULL before executing the statements inside "{}" follows "for ()"? > I think you need to check all callers of swap_type_to_swap_info() are > either validating the 'type' they pass in or check the returned pointer. Yes. I have checked all callers of swap_type_to_swap_info(). One suspecting caller is swap_cluster_readahead(). But after the following patch in mmotm tree, mm/swapfile: use percpu_ref to serialize against concurrent swapoff get/put_swap_device() will enclose the swap_cluster_readahead() to check the swap entry beforehand. Best Regards, Huang, Ying