From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5ADBC4338F for ; Fri, 13 Aug 2021 00:21:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 53471610A5 for ; Fri, 13 Aug 2021 00:21:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 53471610A5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DC8CD6B0071; Thu, 12 Aug 2021 20:21:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D525D6B0072; Thu, 12 Aug 2021 20:21:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1BD56B0073; Thu, 12 Aug 2021 20:21:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id A820D6B0071 for ; Thu, 12 Aug 2021 20:21:44 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 5930B1802BC6B for ; Fri, 13 Aug 2021 00:21:44 +0000 (UTC) X-FDA: 78468154128.24.22771D7 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf27.hostedemail.com (Postfix) with ESMTP id 40680700CFF0 for ; Fri, 13 Aug 2021 00:21:43 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10074"; a="213626517" X-IronPort-AV: E=Sophos;i="5.84,317,1620716400"; d="scan'208";a="213626517" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2021 17:21:41 -0700 X-IronPort-AV: E=Sophos;i="5.84,317,1620716400"; d="scan'208";a="517673887" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.159.119]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2021 17:21:40 -0700 From: "Huang, Ying" To: Matthew Wilcox Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: Data corruption problem with swapfiles and THP References: Date: Fri, 13 Aug 2021 08:21:38 +0800 In-Reply-To: (Matthew Wilcox's message of "Thu, 12 Aug 2021 16:07:32 +0100") Message-ID: <87a6lm6vot.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Authentication-Results: imf27.hostedemail.com; dkim=none; spf=none (imf27.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=ying.huang@intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 40680700CFF0 X-Stat-Signature: 1qno74eu5pjpcrxym6g4nf3nox6x8u6o X-HE-Tag: 1628814103-919741 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Matthew Wilcox writes: > There is an assumption in the swap writepage path that a THP is physically > contiguous on swap: > > bio->bi_iter.bi_sector = swap_page_sector(page); > bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc); > bio->bi_end_io = end_write_func; > bio_add_page(bio, page, thp_size(page), 0); > > As far as I can tell, this is not necessarily true. If a file is not > contiguous, we can have an extent which is 1MB long followed by an extent > somewhere else on storage that's 1MB long. When we try to write a 2MB > page to swap, we overwrite whatever's on the block device after that > first 1MB extent. > > (Came across this by code examination while looking at getting rid of > the bio path entirely; no attempt has been made to produce this problem; > something else may prevent it from actually happening) Yes. THP needs to be split firstly before swapping out to a swap device backed by a file. Please take a look at the get_swap_pages() if (size == SWAPFILE_CLUSTER) { if (si->flags & SWP_BLKDEV) n_ret = swap_alloc_cluster(si, swp_entries); } else n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries); If the swap device is backed by a file, si->flags & SWP_BLKDEV == 0, only normal swap entry (not huge) can be allocated. This will result that the THP is split. Best Regards, Huang, Ying