From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2666C433E0 for ; Tue, 30 Mar 2021 03:44:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6FD1C61929 for ; Tue, 30 Mar 2021 03:44:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6FD1C61929 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DAFC26B0080; Mon, 29 Mar 2021 23:44:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5FAA6B0081; Mon, 29 Mar 2021 23:44:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C74D86B0082; Mon, 29 Mar 2021 23:44:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0221.hostedemail.com [216.40.44.221]) by kanga.kvack.org (Postfix) with ESMTP id AC84C6B0080 for ; Mon, 29 Mar 2021 23:44:44 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6B44352D1 for ; Tue, 30 Mar 2021 03:44:44 +0000 (UTC) X-FDA: 77975148888.40.FD3B632 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf26.hostedemail.com (Postfix) with ESMTP id 0187240002C3 for ; Tue, 30 Mar 2021 03:44:40 +0000 (UTC) IronPort-SDR: caw1l4vzlTDxGmsqbGu7zWdwlDpBuWF7j+GMhKInH7Sg6HMloFohT48tYP0hMSqFUGqsCGnyaa I6xbdSm9wjsA== X-IronPort-AV: E=McAfee;i="6000,8403,9938"; a="188412919" X-IronPort-AV: E=Sophos;i="5.81,289,1610438400"; d="scan'208";a="188412919" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Mar 2021 20:44:40 -0700 IronPort-SDR: u8TygG76tgB2/BF8eHsxidPX/jPZXbGq7FdywqnQRECn/F92ZI599I8g9QGb5zptNpFuQbbskQ iUnn34uvPOPw== X-IronPort-AV: E=Sophos;i="5.81,289,1610438400"; d="scan'208";a="417968992" Received: from yhuang6-desk1.sh.intel.com (HELO yhuang6-desk1.ccr.corp.intel.com) ([10.239.13.1]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Mar 2021 20:44:37 -0700 From: "Huang, Ying" To: Miaohe Lin Cc: Linux-MM , linux-kernel , Andrew Morton , Matthew Wilcox , Yu Zhao , "Shakeel Butt" , Alex Shi , "Minchan Kim" Subject: Re: [Question] Is there a race window between swapoff vs synchronous swap_readpage References: <364d7ce9-ccb7-fa04-7067-44a96be87060@huawei.com> <8735wdbdy4.fsf@yhuang6-desk1.ccr.corp.intel.com> <0cb765aa-1783-cd62-c4a4-b3fbc620532d@huawei.com> Date: Tue, 30 Mar 2021 11:44:35 +0800 In-Reply-To: <0cb765aa-1783-cd62-c4a4-b3fbc620532d@huawei.com> (Miaohe Lin's message of "Tue, 30 Mar 2021 11:15:52 +0800") Message-ID: <87h7kt9ufw.fsf@yhuang6-desk1.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0187240002C3 X-Stat-Signature: jn7zh53zaxku4d6n3q4heqpg7k6zme9p Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=mga11.intel.com; client-ip=192.55.52.93 X-HE-DKIM-Result: none/none X-HE-Tag: 1617075880-760081 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Miaohe Lin writes: > On 2021/3/30 9:57, Huang, Ying wrote: >> Hi, Miaohe, >> >> Miaohe Lin writes: >> >>> Hi all, >>> I am investigating the swap code, and I found the below possible race window: >>> >>> CPU 1 CPU 2 >>> ----- ----- >>> do_swap_page >>> skip swapcache case (synchronous swap_readpage) >>> alloc_page_vma >>> swapoff >>> release swap_file, bdev, or ... >>> swap_readpage >>> check sis->flags is ok >>> access swap_file, bdev or ...[oops!] >>> si->flags = 0 >>> >>> The swapcache case is ok because swapoff will wait on the page_lock of swapcache page. >>> Is this will really happen or Am I miss something ? >>> Any reply would be really grateful. Thanks! :) >> >> This appears possible. Even for swapcache case, we can't guarantee the > > Many thanks for reply! > >> swap entry gotten from the page table is always valid too. The > > The page table may change at any time. And we may thus do some useless work. > But the pte_same() check could handle these races correctly if these do not > result in oops. > >> underlying swap device can be swapped off at the same time. So we use >> get/put_swap_device() for that. Maybe we need similar stuff here. > > Using get/put_swap_device() to guard against swapoff for swap_readpage() sounds > really bad as swap_readpage() may take really long time. Also such race may not be > really hurtful because swapoff is usually done when system shutdown only. > I can not figure some simple and stable stuff out to fix this. Any suggestions or > could anyone help get rid of such race? Some reference counting on the swap device can prevent swap device from swapping-off. To reduce the performance overhead on the hot-path as much as possible, it appears we can use the percpu_ref. Best Regards, Huang, Ying