From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8917C433DF for ; Thu, 20 Aug 2020 04:36:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 75E0F2076E for ; Thu, 20 Aug 2020 04:36:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 75E0F2076E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D01EF6B0088; Thu, 20 Aug 2020 00:36:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8BBC6B0089; Thu, 20 Aug 2020 00:36:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B54DE8D0003; Thu, 20 Aug 2020 00:36:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0137.hostedemail.com [216.40.44.137]) by kanga.kvack.org (Postfix) with ESMTP id 9D42F6B0088 for ; Thu, 20 Aug 2020 00:36:14 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 57DC41DEB for ; Thu, 20 Aug 2020 04:36:14 +0000 (UTC) X-FDA: 77169685068.03.coal65_30028742702d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 29C8328A4E8 for ; Thu, 20 Aug 2020 04:36:14 +0000 (UTC) X-HE-Tag: coal65_30028742702d X-Filterd-Recvd-Size: 4473 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Thu, 20 Aug 2020 04:36:12 +0000 (UTC) IronPort-SDR: 2FciP+6+L4rvGLiGeMGT8bot4GllzP5uPatQrouVmHGptbo7h8tlKeoelTOc1qfko+QaIHPy0c SIX+an2xwo2w== X-IronPort-AV: E=McAfee;i="6000,8403,9718"; a="143050114" X-IronPort-AV: E=Sophos;i="5.76,332,1592895600"; d="scan'208";a="143050114" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2020 21:36:11 -0700 IronPort-SDR: XZn4dWScW7OdwVe3M0ntHuTNGaiNEarXemCJtZ8yy28pTCGf9mvDtgFnI2DaGKQ/JZA0gjvIPf 0T0wJyyAuCtg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,332,1592895600"; d="scan'208";a="472488442" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.164]) by orsmga005.jf.intel.com with ESMTP; 19 Aug 2020 21:36:08 -0700 From: "Huang\, Ying" To: Gao Xiang Cc: Andrew Morton , , , Rafael Aquini , Carlos Maiolino , Eric Sandeen , stable Subject: Re: [PATCH] mm, THP, swap: fix allocating cluster for swapfile by mistake References: <20200819195613.24269-1-hsiangkao@redhat.com> Date: Thu, 20 Aug 2020 12:36:08 +0800 In-Reply-To: <20200819195613.24269-1-hsiangkao@redhat.com> (Gao Xiang's message of "Thu, 20 Aug 2020 03:56:13 +0800") Message-ID: <871rk2x7bb.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 29C8328A4E8 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Gao Xiang writes: > SWP_FS doesn't mean the device is file-backed swap device, > which just means each writeback request should go through fs > by DIO. Or it'll just use extents added by .swap_activate(), > but it also works as file-backed swap device. > > So in order to achieve the goal of the original patch, > SWP_BLKDEV should be used instead. > > FS corruption can be observed with SSD device + XFS + > fragmented swapfile due to CONFIG_THP_SWAP=y. > > Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device") > Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out") > Cc: "Huang, Ying" > Cc: stable > Signed-off-by: Gao Xiang Good catch! The fix itself looks good me! Although the description is a little confusing. After some digging, it seems that SWP_FS is set on the swap devices which make swap entry read/write go through the file system specific callback (now used by swap over NFS only). Best Regards, Huang, Ying > --- > > I reproduced the issue with the following details: > > Environment: > QEMU + upstream kernel + buildroot + NVMe (2 GB) > > Kernel config: > CONFIG_BLK_DEV_NVME=y > CONFIG_THP_SWAP=y > > Some reproducable steps: > mkfs.xfs -f /dev/nvme0n1 > mkdir /tmp/mnt > mount /dev/nvme0n1 /tmp/mnt > bs="32k" > sz="1024m" # doesn't matter too much, I also tried 16m > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw > > mkswap /tmp/mnt/sw > swapon /tmp/mnt/sw > > stress --vm 2 --vm-bytes 600M # doesn't matter too much as well > > Symptoms: > - FS corruption (e.g. checksum failure) > - memory corruption at: 0xd2808010 > - segfault > ... > > mm/swapfile.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 6c26916e95fd..2937daf3ca02 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1074,7 +1074,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) > goto nextsi; > } > if (size == SWAPFILE_CLUSTER) { > - if (!(si->flags & SWP_FS)) > + if (si->flags & SWP_BLKDEV) > n_ret = swap_alloc_cluster(si, swp_entries); > } else > n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,