From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2CD5C433E0 for ; Tue, 30 Mar 2021 05:47:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2371061935 for ; Tue, 30 Mar 2021 05:47:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2371061935 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7843B6B007D; Tue, 30 Mar 2021 01:47:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 732036B007E; Tue, 30 Mar 2021 01:47:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 586F36B0080; Tue, 30 Mar 2021 01:47:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id 33D056B007D for ; Tue, 30 Mar 2021 01:47:27 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id EB3E42488 for ; Tue, 30 Mar 2021 05:47:26 +0000 (UTC) X-FDA: 77975458092.21.02DF9AB Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) by imf02.hostedemail.com (Postfix) with ESMTP id EB3B340002C0 for ; Tue, 30 Mar 2021 05:47:18 +0000 (UTC) Received: by mail-wr1-f51.google.com with SMTP id b9so14923503wrt.8 for ; Mon, 29 Mar 2021 22:47:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IlLL4fvHksCUIpTZMtr7ANhRamajd04EVg5QFhFYyZE=; b=pKnFLgoaj6O6ETjDMMxJbU4V7xil35gLDNmD/qX7/CtarhM57gyNt53XcrLb+ang6u y02MuaQ5Ndqkd8NzfzW7GFIZoVxGE1wM5zWKQDjiZ8+Vfzz5otVEJKx/YA+J2JXa+Z2n 2rXaX/mh1lTILI27VvPtZa3WyCMSSkHq/773rs08vfHkZFGp19rBxo5aGSy54bzjn8LU o15gALpM46tf/EA+EQUXG7+kCw9svTo4RwWdz+gN9qsvbyKqhDVrz0ARedTOJ6i1/yAD RgolWQ8jjhPxC/9uUmOuV+zE1JuhcyVz5AkWk0C7M0/1gYy7bUznQP+kPcQrS4ElXxaW csaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IlLL4fvHksCUIpTZMtr7ANhRamajd04EVg5QFhFYyZE=; b=uV3E/ltGPBaehJBLf9+ZRC4BswGhq+YVrGTR98NR09mQO15Wn182VspqekmykAPmUM 8s9wmmhOHfXwbULvAKd89UYrXNR7IGKBuxvzGH4QqaA4iCliiKr5NzsTL3AGzu2kUF35 ugGgxK13cSxNFniLszhFKVPmBRY6d5f2MLfd/uwq0+wFnlNVtL47tVCZeubCWvelLFYf VS1BxXezq2cU56QAa6TD49miCtv7b5KqZWgf5Lrd/bSRARDRsIegCSF9VEtDIaqrTFR8 +n2wXp7skmjHmK3Y6n8MnVRqsF7ncIf57Apr6JSapIFPgfTGizEWQ1iw61gzZ3x2Rx24 effQ== X-Gm-Message-State: AOAM531+5Tqv3FfZqxKBsXeo6NarY1VoaoSnKvUZHPoxTbh/AVOc7Gbr 3+TdOgSyrZ0VJRGJaQlZ49QONEb2omL84ww3Glt/CQ== X-Google-Smtp-Source: ABdhPJxzx5s7EayhUieTVgjM6x/ZvlThDc3E5x6FtchI1twpqHbTsVCBiRugkoKrA+Qa0cy9pshYvvuvzjjvxM08ZC4= X-Received: by 2002:a5d:53c8:: with SMTP id a8mr31292925wrw.323.1617083245060; Mon, 29 Mar 2021 22:47:25 -0700 (PDT) MIME-Version: 1.0 References: <364d7ce9-ccb7-fa04-7067-44a96be87060@huawei.com> <8735wdbdy4.fsf@yhuang6-desk1.ccr.corp.intel.com> <0cb765aa-1783-cd62-c4a4-b3fbc620532d@huawei.com> <87h7kt9ufw.fsf@yhuang6-desk1.ccr.corp.intel.com> In-Reply-To: <87h7kt9ufw.fsf@yhuang6-desk1.ccr.corp.intel.com> From: Yu Zhao Date: Mon, 29 Mar 2021 23:47:13 -0600 Message-ID: Subject: Re: [Question] Is there a race window between swapoff vs synchronous swap_readpage To: "Huang, Ying" , Miaohe Lin Cc: Linux-MM , linux-kernel , Andrew Morton , Matthew Wilcox , Shakeel Butt , Alex Shi , Minchan Kim Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: EB3B340002C0 X-Stat-Signature: 8xn1zxdhk984h66drj4gi38emh8sybec X-Rspamd-Server: rspam02 Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from=""; helo=mail-wr1-f51.google.com; client-ip=209.85.221.51 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617083238-419915 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Mar 29, 2021 at 9:44 PM Huang, Ying wrote: > > Miaohe Lin writes: > > > On 2021/3/30 9:57, Huang, Ying wrote: > >> Hi, Miaohe, > >> > >> Miaohe Lin writes: > >> > >>> Hi all, > >>> I am investigating the swap code, and I found the below possible race window: > >>> > >>> CPU 1 CPU 2 > >>> ----- ----- > >>> do_swap_page > >>> skip swapcache case (synchronous swap_readpage) > >>> alloc_page_vma > >>> swapoff > >>> release swap_file, bdev, or ... > >>> swap_readpage > >>> check sis->flags is ok > >>> access swap_file, bdev or ...[oops!] > >>> si->flags = 0 > >>> > >>> The swapcache case is ok because swapoff will wait on the page_lock of swapcache page. > >>> Is this will really happen or Am I miss something ? > >>> Any reply would be really grateful. Thanks! :) > >> > >> This appears possible. Even for swapcache case, we can't guarantee the > > > > Many thanks for reply! > > > >> swap entry gotten from the page table is always valid too. The > > > > The page table may change at any time. And we may thus do some useless work. > > But the pte_same() check could handle these races correctly if these do not > > result in oops. > > > >> underlying swap device can be swapped off at the same time. So we use > >> get/put_swap_device() for that. Maybe we need similar stuff here. > > > > Using get/put_swap_device() to guard against swapoff for swap_readpage() sounds > > really bad as swap_readpage() may take really long time. Also such race may not be > > really hurtful because swapoff is usually done when system shutdown only. > > I can not figure some simple and stable stuff out to fix this. Any suggestions or > > could anyone help get rid of such race? > > Some reference counting on the swap device can prevent swap device from > swapping-off. To reduce the performance overhead on the hot-path as > much as possible, it appears we can use the percpu_ref. Hi, I've been seeing crashes when testing the latest kernels with stress-ng --class vm -a 20 -t 600s --temp-path /tmp I haven't had time to look into them yet: DEBUG_VM: BUG: unable to handle page fault for address: ffff905c33c9a000 Call Trace: get_swap_pages+0x278/0x590 get_swap_page+0x1ab/0x280 add_to_swap+0x7d/0x130 shrink_page_list+0xf84/0x25f0 reclaim_pages+0x313/0x430 madvise_cold_or_pageout_pte_range+0x95c/0xaa0 KASAN: ================================================================== BUG: KASAN: slab-out-of-bounds in __frontswap_store+0xc9/0x2e0 Read of size 8 at addr ffff88901f646f18 by task stress-ng-mrema/31329 CPU: 2 PID: 31329 Comm: stress-ng-mrema Tainted: G S I L 5.12.0-smp-DEV #2 Call Trace: dump_stack+0xff/0x165 print_address_description+0x81/0x390 __kasan_report+0x154/0x1b0 ? __frontswap_store+0xc9/0x2e0 ? __frontswap_store+0xc9/0x2e0 kasan_report+0x47/0x60 kasan_check_range+0x2f3/0x340 __kasan_check_read+0x11/0x20 __frontswap_store+0xc9/0x2e0 swap_writepage+0x52/0x80 pageout+0x489/0x7f0 shrink_page_list+0x1b11/0x2c90 reclaim_pages+0x6ca/0x930 madvise_cold_or_pageout_pte_range+0x1260/0x13a0 Allocated by task 16813: ____kasan_kmalloc+0xb0/0xe0 __kasan_kmalloc+0x9/0x10 __kmalloc_node+0x52/0x70 kvmalloc_node+0x50/0x90 __se_sys_swapon+0x353a/0x4860 __x64_sys_swapon+0x5b/0x70 The buggy address belongs to the object at ffff88901f640000 which belongs to the cache kmalloc-32k of size 32768 The buggy address is located 28440 bytes inside of 32768-byte region [ffff88901f640000, ffff88901f648000) The buggy address belongs to the page: page:0000000032d23e33 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x101f640 head:0000000032d23e33 order:4 compound_mapcount:0 compound_pincount:0 flags: 0x400000000010200(slab|head) raw: 0400000000010200 ffffea00062b8408 ffffea000a6e9008 ffff888100040300 raw: 0000000000000000 ffff88901f640000 0000000100000001 000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88901f646e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88901f646e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88901f646f00: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88901f646f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88901f647000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Relevant config options I could think of: CONFIG_MEMCG_SWAP=y CONFIG_THP_SWAP=y CONFIG_ZSWAP=y