From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43122C6FD1F for ; Fri, 29 Mar 2024 05:49:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A91166B009E; Fri, 29 Mar 2024 01:49:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A40F66B00A0; Fri, 29 Mar 2024 01:49:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 908226B00A1; Fri, 29 Mar 2024 01:49:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 73B1A6B009E for ; Fri, 29 Mar 2024 01:49:21 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 152EE1A093A for ; Fri, 29 Mar 2024 05:49:21 +0000 (UTC) X-FDA: 81948998922.20.7510A8B Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by imf18.hostedemail.com (Postfix) with ESMTP id 3103B1C000B for ; Fri, 29 Mar 2024 05:49:18 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=E6+BLhLA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711691359; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zWh5XvZuNSI0MNsT9yYNdgNOdtZxhUHRtmIKymQhmqw=; b=gymtE51amuDhvt/905fN4iWBW1fnADHVCtyJ4kJcp4Osk+EYxTSRbNh1DzN0hnY0EP3uza BfKCQ3nd1YfBaHz+8YQsVjy3T0HL+w5Y4X9orw2VAon17SgeENMQu1ZPx6811Cpg+VQIFQ S+CqBeGU4GKNDaGNFNyC8msz6qPjt8A= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=E6+BLhLA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711691359; a=rsa-sha256; cv=none; b=No5EU51VAbYXNlEZiqwvlbYnAwF9+kezQqckp3aaKefEunX6QF+29X+rWrt5B9D8rMiOcE t8BTF5jc8GbdQ9awIURBCRjO1TyyKtatwNuHIPHocKn9Esd8Dnj3hG7VMQ/91yFLl/kpzS kxT0dwlx/FvTHmguJX0IGDCDva3EEA4= Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-2d4360ab3daso20432111fa.3 for ; Thu, 28 Mar 2024 22:49:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711691357; x=1712296157; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zWh5XvZuNSI0MNsT9yYNdgNOdtZxhUHRtmIKymQhmqw=; b=E6+BLhLAZeVVnrBivDpVgZofkoVCqrqX4TF/slb93ExYYgy4W3JS+KzDPlwRxMn/uT 7lgb1LOtZoppFkh0Ty3DWyv9qieHMqSvUUIXBdPSJUZ0Gsu/2FMDT9HUrcgnF9bCZoYA CocjHDHshTWlwJ94lK2Iz2C7+mNbggv2Dc0Vyzv7Gx7vnsFuIQ7vuB0oZ2qKECQs72ya rzTu4G6Xk6I7YH6Twy4ar2JfF7i2qUXkiBwCNGfVmK8Jwcx++mTaTXSAOA2Klto/tKl6 QUcq2JRKIFULY9ERmypZFeJId3ntKvCM9Rtl8hiJB3KcYrPQSeV2Yudgwb8P4weQ/sf3 +cgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711691357; x=1712296157; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zWh5XvZuNSI0MNsT9yYNdgNOdtZxhUHRtmIKymQhmqw=; b=Wd+1gqTuOa3GEXiYELUjx8olNm0CLCzy+FFhl5wpXC6phMrJYs+wk90Prwb7bHVBUy UVLJrzniXzLCJSCkZVOWoku8SjUGb0U4G87bRl5VHBtp8DaFU0KQxjRDZ/8pWzu2eACC 55uWNDdsGnR9TJQzVymLxe/gKh/TGeQuiBuP6wL9nL9fplmA9UBe6hyah/X476T36gsw wroC3TAqNfXgpKlM+yNbJjm7GfK+2e/s97KY8VYdBoWrwR4RTZcCH3kKWUFC9djNON4A veTQXv7prGNEfnwvNPs26LlsSHSgF6DDOxGM5cQiCRzhimtyO70JU/9sirlLIOQTXhjr FkZw== X-Forwarded-Encrypted: i=1; AJvYcCXnNiYp4RnzRuYXMAG9ssdHiFxnAPldCOaNzcrwbcWEpH2TTIZv20X9i2xu7h4CITaimbaajz2l1GoGibYMPCeKk4Q= X-Gm-Message-State: AOJu0YzkVV69kBSt+blWX4wpfLrgzkxp72gx6yiC0DDhmgmLvi0GzofL QlItq81aqawNrarhxn4bDrZHaZYCT4L2v8klvNoKWguSRuKWsFPaujcB7PmawRy0yxF/CYSQYGQ hlJteeg+EXJrHLv9y1iDLudit1JQ= X-Google-Smtp-Source: AGHT+IHNszf3yjMtPQSQfg0jO0L5dBCy4o0EbE1Rt8A2Pswsp2ttuCMGV6of+mS7TP848kwMhi5wJCrWe0vRDwcSsLg= X-Received: by 2002:a2e:be15:0:b0:2d6:cb82:24e6 with SMTP id z21-20020a2ebe15000000b002d6cb8224e6mr879339ljq.37.1711691356982; Thu, 28 Mar 2024 22:49:16 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Zhaoyang Huang Date: Fri, 29 Mar 2024 13:49:05 +0800 Message-ID: Subject: Re: summarize all information again at bottom//reply: reply: [PATCH] mm: fix a race scenario in folio_isolate_lru To: Matthew Wilcox Cc: =?UTF-8?B?6buE5pyd6ZizIChaaGFveWFuZyBIdWFuZyk=?= , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?UTF-8?B?5bq357qq5ruoIChTdGV2ZSBLYW5nKQ==?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3103B1C000B X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: k1cfjjcrsfd6sc3tiek3ja3p7mqp8sdc X-HE-Tag: 1711691358-266731 X-HE-Meta: U2FsdGVkX1+IIetdOMEoIiDRrBCldr64+nXM2n+QwCceK0SfV67Ti8EK4sWaEKiQ9g2PDtm70senzskuDsVIAwDmBoslTHnmuZQ2vfDobYgT9/dSwzrY1osrwtVnwBhtQJwkRPmVJr6lOq1S9wPIVJn2CwKQ/y5hxo4eDFef6+65I6rlVoNHtzrIrjz0C1XybKiHjaWTko7ZwU3IELAb5w+8tY3dEjktlqAJ7XyJfrFhdttU5MXnacBH0D4VqLaDp7DwLUCrWogLghW2eb0Vs0NdLToid9P3a86Cb6pQxNEX/yiPXB8Z7USAy1bzzBLSIdQOEvc9KwPgYI9EICt4xNbT+X1Q9d7ZeGtnMcLsjpxTYgQjqAFjNUF/cEg567M32dD4vn+et3gTTVeLtLb8OqIlRSj40Kxhi1ywGIrB1yXOWFYmW3136rUXzG5xoon1KYuQ12wZfWlydSjbn3piw+2tz5a+vDSLwh7+rTVllOz6iDP4mDh4YY6slJyfpX6H1yhzuyME9TwANYOMBQ+ueruK3eAY3n3Tsjlb2kQkyPerUqI7NRVUseYgy9xlagx4ncEXLQmoWtMruWKH+etC3mlTyZHfqTkFlmBZZf5mEyftnPLfbChfpFAhTO3/PKVRw9mVaQdBtHH6ZaiO/yM42QB4OA3phh7UQkZHfbrm7sk08esVyQgL9E6WK/xQgIIrXOEd99hn8ER58lKyqmolGn6ykm6lfB+oYz2YR/nZiFI5Oaa7+225R7xb2YXv8o00Yuxfh0Ux8d9at6wLO2cX41LipU/QqMrFIhzAK0OJ0DBa7kFStoXLY2oHxJR1z6UkhxCGEgQOtymkE2pqLQUaKHtxIEu8/I+1VOUHSY2OWxj0Ith7H9mwCWRZTI6e3qeJyM8yaCjmpOPrLfp/hjlYdURD83v3Y+veYIHoUCSczCQnek2Purfiz8Pbl2HskrPbNB8kHAxZ7HFQ3b2WjAk yasoBKTW cb1P1Y3e+7eiLQzG+dqC2tXunfODg9nIy2Rw5zbUvqimWGl4lVdaLxvsw9w2C7SwVsqU9X6IfaYKP+YsrE+69lOuT8bx0R13lK8mRy2mtVKZfTEceia0BKbifc+C9X4JjTMmVdiQy+PigqxUnVfxdAbgBiBiPMg8/VsWwqJiEX0W3D4aHuLnGi/yNHTcA98IVSbKve0Q+nThQJQO0Hr3g+a2TDY8A5LAFAgkXCfZHW4c5446sBNYxiihdDOJ0WFnzVc5NAwqy/zX2vsAOjVkKNycqjegz8zfD8YddvIw2Vv6GyLEfemADo7h5k+8bI4VaHfUSICceQvQ6dgv0QaWBXjBbzJ1bWNWyBju+0NFYI9bconNUh5In9bdaXl6cSb/sV+Ya+LfHxzikeOr+/G1ZMGLvjvyiOP92+mm1fFZPUPbeNcqeG3oaqQ1cRXCYc3hHoSaPBbCHOOq7ihMp5faniNo7lOkA99Si9Hqt8xM4EIMBHtdaqwFSwPgl5g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 28, 2024 at 10:12=E2=80=AFPM Matthew Wilcox wrote: > > On Thu, Mar 28, 2024 at 12:03:02PM +0800, Zhaoyang Huang wrote: > > On Thu, Mar 28, 2024 at 11:18=E2=80=AFAM Matthew Wilcox wrote: > > > > > > On Thu, Mar 28, 2024 at 09:27:31AM +0800, Zhaoyang Huang wrote: > > > > ok, I missed the refcnt from alloc_pages. However, I still think it= is > > > > a bug to call readahead_folio in read_pages as the refcnt obtained = by > > > > alloc_pages should be its final guard which is paired to the one wh= ich > > > > checked in shrink_folio_list->__remove_mapping->folio_ref_freeze(2)= (this > > > > 2 represent alloc_pages & page cache). If we removed this one witho= ut > > > > > > __remove_mapping() requires that the caller holds the folio locked. > > > Since the readahead code unlocks the folio, __remove_mapping() cannot > > > be run because the caller of __remove_mapping() will wait for the fol= io > > > lock. > > repost the whole timing sequence to make it more clear and fix > > incorrect description of previous feedback > > I can't understand what you think the problem is here. Please try > again. > > > Follow the refcount through. > > > > In page_cache_ra_unbounded(): > > > > folio =3D filemap_alloc_folio(gfp_mask, 0); > > (folio has refcount 1) > > ret =3D filemap_add_folio(mapping, folio, index + i, gf= p_mask); > > (folio has refcount 2, PG_lru) > > > > Then we call read_pages() > > First we call ->readahead() which for some reason stops early. > > Then we call readahead_folio() which calls folio_put() > > (folio has refcount 1) > > Then we call folio_get() > > (folio has refcount 2) > > Then we call filemap_remove_folio() > > (folio has refcount 1) > > Then we call folio_unlock() > > Then we call folio_put() > > > > Amending steps for previous timing sequence below where [1] races with > > [2] that has nothing to do with __remove_mapping(). IMO, no file_folio > > should be freed by folio_put as the refcnt obtained by alloc_pages > > keep it always imbalanced until shrink_folio_list->__remove_mapping, > > where the folio_ref_freeze(2) implies the refcnt of alloc_pages and > > isolation should be the last two. release_pages is a special scenario > > that the refcnt of alloc_pages is freed implicitly in > > delete_from_page_cache_batch->filemap_free_folio. > > > > folio_put() > > { > > if(folio_put_test_zero()) > > *** we should NOT be here as the refcnt of alloc_pages should NOT be dr= opped *** > > if (folio_test_lru()) > > *** preempted here with refcnt =3D=3D 0 and pass PG_lru check *** > > [1] > > lruvec_del_folio() > > Then thread_isolate call folio_isolate_lru() > > folio_isolate_lru() > > { > > folio_test_clear_lru() > > folio_get() > > [2] > > lruvec_del_folio() > > } > > -----------------------------------------------------------------------= --------------------- > > shrink_folio_list() > > { > > __remove_mapping() > > { > > refcount =3D 1 + folio_nr_pages; > > *** the refcount =3D 1 + 1 implies there should be only the refcnt of > > alloc_pages and previous isolation for a no-busy folio as all PTE has > > gone*** > > if (!folio_ref_freeze(refcount)) > > goto keeplock; > > } > > } key steps in brief: Thread_truncate get folio to its local fbatch by find_get_entry in step 2 The refcnt is deducted to 1 which is not as expect as from alloc_pages but from thread_truncate's local fbatch in step 7 Thread_reclaim succeed to isolate the folio by the wrong refcnt(not the value but meaning) in step 8 Thread_truncate hit the VM_BUG_ON in step 9 all steps: Thread_readahead: 0. folio =3D filemap_alloc_folio(gfp_mask, 0); (folio has refcount 1) 1. ret =3D filemap_add_folio(mapping, folio, index + i, gfp_mask); (folio has refcount 2) 2. thread_truncate hold one refcnt and add this folio to fbatch_truncate (folio has refcount 3(alloc, page cache, fbatch_truncate), PG_lru) 3. Then we call read_pages() First we call ->readahead() which for some reason stops early. 4. Then we call readahead_folio() which calls folio_put() (folio has refcount 2) 5. Then we call folio_get() (folio has refcount 3) 6. Then we call filemap_remove_folio() (folio has refcount 2) 7. Then we call folio_unlock() Then we call folio_put() (folio has refcount 1(fbatch_truncate)) 8. thread_reclaim call shrink_inactive_list->isolate_lru_folios shrink_inactive_list isolate_lru_folios if (!folio_test_lru(folio)) if (!folio_try_get(folio)) if (!folio_test_clear_lru(folio)) list_move(folio, dst) (folio has refcount 2) 8.1. thread_reclaim call shrink_folio_list->__remove_mapping shrink_folio_list() __remove_mapping() (refcount =3D 2) if (!folio_ref_freeze(2)) //true list_add(folio, free_folios); (folio has refcount 0) 9. thread_truncate will hit the refcnt VM_BUG_ON(refcnt =3D=3D 0) in folio_put_testzero