From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E515C47258 for ; Thu, 1 Feb 2024 00:54:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3DA36B0075; Wed, 31 Jan 2024 19:54:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EED6A6B0078; Wed, 31 Jan 2024 19:54:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDBC86B007D; Wed, 31 Jan 2024 19:54:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CECDB6B0075 for ; Wed, 31 Jan 2024 19:54:49 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 85C81405B3 for ; Thu, 1 Feb 2024 00:54:49 +0000 (UTC) X-FDA: 81741415098.02.C113B28 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by imf05.hostedemail.com (Postfix) with ESMTP id 101BC100004 for ; Thu, 1 Feb 2024 00:54:46 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=i963c31J; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.10 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706748887; a=rsa-sha256; cv=none; b=iyAB9VmIdgdMlyPH6nbUwBcXTLBDd1JUip6RZDM2rIBYS/b9pGGoE1ujj+Y+0qSJt3BPom zLs9HDGc6QPlvG78MK7lvIpN8MSwsxcYPskFQpVF2lJDWh3+NKpddw4M4lu5N9KPYL4aXD kCLWHszh1P5/TOardtEJkkOxhEovLu0= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=i963c31J; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.10 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706748887; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xeCWGUWoc/GqB6HiMRXz6/s5xTqPOklsJXrdZAtboTs=; b=aLy+MQcU+kiDu9BF6QFoJ7zu4pBjSy8L7mYPzYKpwlCr860m/d032K7FyO2MxtK7cFI2cq e8tK14fGs99fe77cprA6XBOQ3r3rgqX0Kyaz2FjGwcSKEShfn62vBleS/XTAsZY49PNTly Oa36Ng/Ix/FZRpybtxsXVcYn+//TKbQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706748887; x=1738284887; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=qujnpASse8+r8ZzJznge6AEXFi6ewoeRbuwBmjdvRgs=; b=i963c31JKDOsy8Jyo/yBYjeyygNMupxtpRmVONsNUDzUo4vL5M63mYq2 xogcgTSJbXTjRUsP4I4mk3hBfyeqPaKttF4w5UJKQ/BfDFypLl6EMHQqH KMdTpPD9AuSuHbaAqOuP/IWHyEtpHSYbddnuUyniQjlKXG9liRjOTDvKE LmVifYDfaHiqGmgQeC7qWA3pqnA8A+ryIl8KfSrhye8qJgkWOOgaWxvjN DzJQT6mpXkcyYLbWTS5f2RGgPMf97Wm7BBpG0gcVl9yobcDgrb1h3IJYT vdHjS68bteSC5ONt+9mX91/83qNNM2b8KS2Fgea6S5vt+Z2ZdwbcunUVq A==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="17157879" X-IronPort-AV: E=Sophos;i="6.05,233,1701158400"; d="scan'208";a="17157879" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 16:54:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="932031458" X-IronPort-AV: E=Sophos;i="6.05,233,1701158400"; d="scan'208";a="932031458" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 16:54:42 -0800 From: "Huang, Ying" To: Chris Li Cc: Kairui Song , Minchan Kim , linux-mm@kvack.org, Andrew Morton , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Yu Zhao Subject: Re: Whether is the race for SWP_SYNCHRONOUS_IO possible? (was Re: [PATCH v3 6/7] mm/swap, shmem: use unified swapin helper for shmem) In-Reply-To: (Chris Li's message of "Wed, 31 Jan 2024 15:45:18 -0800") References: <20240129175423.1987-1-ryncsn@gmail.com> <20240129175423.1987-7-ryncsn@gmail.com> <87bk92gqpx.fsf_-_@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 01 Feb 2024 08:52:46 +0800 Message-ID: <87plxhf1k1.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 101BC100004 X-Stat-Signature: iqqbqmzxm6iticti6xbgejaa63an9idh X-Rspam-User: X-HE-Tag: 1706748886-330598 X-HE-Meta: U2FsdGVkX1/LRiYZ9y2gEs7L0bmPZ6ir7bi9P0s1XwwGwG+7KHpe3B6+KMRjJfvMGQmxY32wMz8WilvG5rLf/9TJBgpaZnC4g87XoGhHfp2fLpOQPA/e1qmOhlOmwh1pbUN8HQ+r2ZNfGLqa7mPSrDN8+PVMvjVZz4GxjmfoKrRAZTOb6QAmnzJO8meps/zQjsLrUxpxOfu5Bn2eM8FdA6giK3BRqrR4RBRESrifdA75SEr1CRVWhifYJ+ZTbMPkLBFz+uf0EzzB0sNbTkKO9MBbWw8B4lKzfSEJJ94V89VKF4GG6xpNHmz1OLCoLSyon5n+W6Sh99f6jnzYyzupVJ7iEJ2E2sey/3VT8YbSdSZKJbuEaARnace0Oa6y7EqVTBiQanXh66tpAVjSSH+9Vez41CNAIGwbw1Eu7NrduxyYi1gTA4ylqw5woXK5p/oaQZFaBRSgJJxBkpv9BopORgXlVSTB1iRvbGSMr697aCsfMMbrJhZ8Mg7YwtO2KhHLsgS5Pt/4LSORTTdwsfeRhTEJ5TnWWQhVXermJkQLSvV1d9MsQUnK62KymArxX2t3m+qpnCXLAgbcxpkAUzig/bchxgNDJ9JQ03Kq1zYLwa47FZITiU4DOyV3zpXwk3751UJAzDldHoKSKkJSajnw+dvp4OW8i3tU1NxlxfIzffUSjr9DA1ryWd+LYvWGGNQldibXrdbCorWqQRZZHqUuln3BaSzcFdTNrXNygwNLXr82ApBZTMoP+2vXGZn0f64m610r9Wg0DyBUCw14+GcMo81AoUiQQSNqJ1eBMSuKm54ZhbbmWPE37SE+YVPbM5XlWDI5TvAMcjNOtt/cT3MtYcDqT5YetKtyDqABjiiM9n3im55J9eKR2kY7zaTpyEFAhGVdjEcV6zweUuXe6aVMDTERyEtBlwn3OCBcdJjIgYQQnKdZNL7Wh98L6+UMXzep X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li writes: > On Tue, Jan 30, 2024 at 7:58=E2=80=AFPM Kairui Song wr= ote: >> >> Hi Ying, >> >> On Wed, Jan 31, 2024 at 10:53=E2=80=AFAM Huang, Ying wrote: >> > >> > Hi, Minchan, >> > >> > When I review the patchset from Kairui, I checked the code to skip swap >> > cache in do_swap_page() for swap device with SWP_SYNCHRONOUS_IO. Is t= he >> > following race possible? Where a page is swapped out to a swap device >> > with SWP_SYNCHRONOUS_IO and the swap count is 1. Then 2 threads of the >> > process runs on CPU0 and CPU1 as below. CPU0 is running do_swap_page(= ). >> >> Chris raised a similar issue about the shmem path, and I was worrying >> about the same issue in previous discussions about do_swap_page: >> https://lore.kernel.org/linux-mm/CAMgjq7AwFiDb7cAMkWMWb3vkccie1-tocmZfT7= m4WRb_UKPghg@mail.gmail.com/ > > Ha thanks for remembering that. > >> >> """ >> In do_swap_page path, multiple process could swapin the page at the >> same time (a mapped once page can still be shared by sub threads), >> they could get different folios. The later pte lock and pte_same check >> is not enough, because while one process is not holding the pte lock, >> another process could read-in, swap_free the entry, then swap-out the >> page again, using same entry, an ABA problem. The race is not likely >> to happen in reality but in theory possible. >> """ >> >> > >> > CPU0 CPU1 >> > ---- ---- >> > swap_cache_get_folio() >> > check sync io and swap count >> > alloc folio >> > swap_readpage() >> > folio_lock_or_retry() >> > swap in the swap entry >> > write page >> > swap out to same swap entry >> > pte_offset_map_lock() >> > check pte_same() >> > swap_free() <-- new content lost! >> > set_pte_at() <-- stale page! >> > folio_unlock() >> > pte_unmap_unlock() >> >> Thank you very much for highlighting this! >> >> My concern previously is the same as yours (swapping out using the >> same entry is like an ABA issue, where pte_same failed to detect the >> page table change), later when working on V3, I mistakenly thought >> that's impossible as entry should be pinned until swap_free on CPU0, >> and I'm wrong. CPU1 can also just call swap_free, then swap count is >> dropped to 0 and it can just swap out using the same entry. Now I >> think my patch 6/7 is also affected by this potential race. Seems >> nothing can stop it from doing this. >> >> Actually I was trying to make a reproducer locally, due to swap slot >> cache, swap allocation algorithm, and the short race window, this is >> very unlikely to happen though. > > You can put some sleep in some of the CPU0 where expect the other race > to happen to manual help triggering it. Yes, it sounds hard to trigger > in real life due to reclaim swap out. > >> >> How about we just increase the swap count temporarily in the direct >> swap in path (after alloc folio), then drop the count after pte_same >> (or shmem_add_to_page_cache in shmem path)? That seems enough to >> prevent the entry reuse issue. > > Sounds like a good solution. Yes. It seems that this can solve the race. -- Best Regards, Huang, Ying