From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0F34C54E58 for ; Tue, 26 Mar 2024 09:07:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5614A6B008A; Tue, 26 Mar 2024 05:07:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 510186B0093; Tue, 26 Mar 2024 05:07:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D9636B0095; Tue, 26 Mar 2024 05:07:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2C4646B008A for ; Tue, 26 Mar 2024 05:07:11 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C1665A097E for ; Tue, 26 Mar 2024 09:07:10 +0000 (UTC) X-FDA: 81938611020.07.E54DACB Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf29.hostedemail.com (Postfix) with ESMTP id D8723120017 for ; Tue, 26 Mar 2024 09:07:08 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=G8py1BDj; spf=pass (imf29.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711444029; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N6TyakL5Bj5ST2mAiDgxAT/2s+jquNsrgmWWT3NFtug=; b=U6Oitqq1sFeaAnd/JTabTJXK4WPgZRbfykh/0DQR8VAClUlC8xz1ehbYUm09/PBeL9jgjp r/pzuHOeFMe7sx9cHDME6QT0GAZAeh16tZbJDsiffPPIvfud2qRoCUytm5zJxfQgNIqCaU KiTCmgAS+dfWRxedBafdq2kCDf2otds= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711444029; a=rsa-sha256; cv=none; b=DFQiLgTE3/QEBQXqQd1lh8U982Zg8pZ4TaamvxATUPf6eGOmOTc1XGK3AbL37A5+8Bv1Y0 eJMkkvvShWDPpI27aU9Fuu5GZADeuXRiW+i5EsvrwzCfO+laIJrXA8ZQbEUUsPreguLEr6 arYlej7POgLfPsUIxO84KHtE27k4R7Q= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=G8py1BDj; spf=pass (imf29.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-2d6ee81bc87so2197981fa.1 for ; Tue, 26 Mar 2024 02:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711444027; x=1712048827; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=N6TyakL5Bj5ST2mAiDgxAT/2s+jquNsrgmWWT3NFtug=; b=G8py1BDjEewCZaC3M4Xr5WwKrlupf0foBwHSPQJ4YonPgWa7HUjZQchpDBnPFIzVPl xyV9aWHUYwP2+O2WYmFoN3cHgeQYLwPNCOnuCvfa0670Jz5klfRGWxe/amlMPfZXC929 MvnQIMAG458MGUo5KcsND6a0b8ozs9QyXcjuOUhUrlORi826RbJOGhhQqlN3lmwe4j57 kbAWARjeqErBOSQDAQaxfpmOCduCGyKn0+Hs2yDVzVObXO2cU8SmOYlgaSonjopUfTrn q9cpxXMvbJmeT2b97Ex8bVzQ/+mRRJLERk7cGbwMtiYDprHkdznVnnvZHORCtbY06KZN 5c4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711444027; x=1712048827; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N6TyakL5Bj5ST2mAiDgxAT/2s+jquNsrgmWWT3NFtug=; b=f5RHETcO0W9Ltlm0uJkf+ZJVU7MfvTHZpk5qiYVbLdsaXaFqV9dg9dpoZFYl7Y0SSS VuSVlPfQ5XwbnnXb99W+SI+foGDRhwAeRlESNWjItuOcV/phA9NqpyGyuxNmK2iKp38Q CHv2O7aOQl1u0FBc7i2JsVrgeiXaOk6o/MGg7MvnDLyX9Zs4O0Ti1vPS3pZpks1istzx PUjiKn5gq44JwH6vM2c3rPJ0/cJULDAZRaQaEyGzwSgGHWiIfET4OsbtEjjLXIe8tKYj jZi6hD7yo5X6H0bQajR/gygr2VJdFx1KsINMM2ntwRdYkWPlq5MukvvAs0A3wIScIsSf ThcA== X-Forwarded-Encrypted: i=1; AJvYcCXFaJbPMES5UZoMi1ZwEIzF5J490fPXXSb1MkUyuRtHwqI0bbtyoH1AM8YXXp+COpv0/wma4DagQlTeKMSIiLAc7gI= X-Gm-Message-State: AOJu0YzRgnyov91mOMD2+sbbE9+TRLrSzqiIuhxn4DjOfq5NIa9RTstf XDcljos2HlWSIXHPgr/MpdnHV/mQitj3Yvnvs0tFMuYD1EZkWj8x01J+V8njUjfX/+mb3uN9ZVn SlArnt+CpzoLcuEXfJ/H27BxBrX8= X-Google-Smtp-Source: AGHT+IHIB4wK3C5TDKvBShvHPWZmeo86zhsPskI57FdMMUIk1AazdmGx21OdqBmUzW18A/+WwsL8r4sR95oV1FzovC8= X-Received: by 2002:a2e:bc81:0:b0:2d6:d3fd:325f with SMTP id h1-20020a2ebc81000000b002d6d3fd325fmr4905290ljf.32.1711444026612; Tue, 26 Mar 2024 02:07:06 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Zhaoyang Huang Date: Tue, 26 Mar 2024 17:06:55 +0800 Message-ID: Subject: Re: summarize all information again at bottom//reply: reply: [PATCH] mm: fix a race scenario in folio_isolate_lru To: Matthew Wilcox Cc: =?UTF-8?B?6buE5pyd6ZizIChaaGFveWFuZyBIdWFuZyk=?= , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?UTF-8?B?5bq357qq5ruoIChTdGV2ZSBLYW5nKQ==?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D8723120017 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: hnjg65ruwndfoteye8hd9rnfsnhgiaa5 X-HE-Tag: 1711444028-393409 X-HE-Meta: U2FsdGVkX1+NwA4UWsJj5+l2vcVkfGVosNYGRBHM4F4tJijDp9U9HLL85k1TQHJpsH/Ts++onUiacPSl4+W9qYzAbduoIWVmROK/kRcSLWBUOcwxx024Tne+CCcAvTG9NHvGD+uHzQuMZGIhfY5dP1IE5zaPTwYk6AyfRFa4n+cnezIbbEam2srmSwkNkfYnmiLUP86vt+YhugMboyVOtzVgpdS7ihukIj0VwT6cuU5nLcYcRxXGhIhwNXc7f0x6TGKMSs7hsBnWCqiUUSholHEKOA1RO4qOuWFW9OokmNZyK1mzeKOtVf/d1Yam8lSA/VKnzvMugq16gJ2vPgRjulBxR/SYe95T7HUAft5OfFX7b1FpXZebo3Vr86J2KwcIxwzpJkjB2Uk8cACApDsKupuoOZqO3PL2tuYs1dIiD1p5k7OxfjjmDhw5NLLi8nakvg33aqLaEBgQpmDm8DRVjpGTP4sZSks+YMW6LaPh1KQD3Pz9i+LTBrqQVGylAwhwsNPqt/nTeQp/n6fKkNaxn/1O2a2tkKIpJsRZjs8DylaVDLDWWaL19MFBnVDTxLWZHvl19g0sUjtYCLR4l5KY1EQkA4ww6LdA70CJ+wPab6cQvj7S414l4tGyhwK2RU8WtnQvM3uMG3QOmK4TxmkPjHVEHvRtBVi19uYRFTvTgSyo1Y1KE6OQFxt/SjTquZ+6AfSdrNX5MK1dR4J1W73Pmv3IUUKH1yKQEoqyTz2NLUpFofvIXncOHcWkBvQQLebWDl90FzImkzqxtyoM0zoeaG4iWTjgxhH2viS1VYAy3QfswzcPoec1G0spy0PBpSbTdXyAAtcO+ap6PNU52D5VrFlrtCpM3bKVoW/1gB0RQntOGgqHGi4z/APrYtYzMzqUNF6fbAWtYS8EhHCPlJbJb4pgKQgU3prers/L9jnwaX+LAg+NfwXppKccz6r7lbcW43UgM87fhM3qFc6bz0O AHgK/VVm btnOR4dHUaMGoOH7GGQV9eSSwnByKF7hS7/pZ5JQc5bMlcnCuzgJcBb4qpkBZLM+UZo9iSNbayeFrlcob1//x3a3KB+I+vLS3IvRlvWZ3c5WxW5iTN0bRLiToiyjWD5l3ZNz4vFTgfVMcUqx50g6C4TI1CJkX9SeCzn3AUEM+bPQBAQ75typvM7c8VbAHrdODxm5GiFSdaV/O7hdHbBmZSThIH72ADxECip7OwYABbM4Y/IGaKDNDqSg/v5Up/BYxMTfcVR4bM8cPXvnll1sLpXi2mx3r5R9YRYgsY2TUCpZ8gsKfFarJQCYfMzcFZ2G0WddBN2zkaZEWmJPUZc+K3+tId+mP97AemIkEWpMRFceIJttVg0G0fldLHELh8Zpxt6ybYtri4TdiTm/afsKZ1+LKvRgVwW/uPBUAOsc4HjDZZn6DPBzu2T407LYkB+SlsESQL1RR8n4rA8+EsSRyPowp9Oz9b+FAspKHkRp/CRweBu+GhLpgGGD9J7SFCH+TQr55otsppaKCs/0mRBj8lTxzxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000296, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 25, 2024 at 11:22=E2=80=AFAM Matthew Wilcox wrote: > > On Sun, Mar 24, 2024 at 07:14:27PM +0800, Zhaoyang Huang wrote: > > ok. It seems like madvise is robust enough to leave no BUGs. I presume > > another two scenarios which call folio_isloate_lru by any other ways > > but PTE. Besides, scenario 2 reminds me of a previous bug reported by > > me as find_get_entry entered in a livelock where the folio's refcnt =3D= =3D > > 0 but remains at xarray which causes the reset->retry loops forever. I > > would like to reply in that thread for more details. > > > > Scenario 1: > > 0. Thread_bad gets the folio by find_get_entry and preempted before > > folio_lock (could be the second round scan of > > truncate_inode_pages_range) > > refcnt =3D=3D 2(page_cache, fbatch_bad), PG_lru =3D=3D true, PG_loc= k =3D=3D false > > folio =3D find_get_entry > > folio_try_get_rcu > > > > folio_try_lock > > > > 1. Thread_truncate get the folio via > > truncate_inode_pages_range->find_lock_entries > > refcnt =3D=3D 3(page_cache, fbatch_bad, fbatch_truncate), PG_lru = =3D=3D > > true, PG_lock =3D=3D true > > Hang on, you can't have two threads in truncate_inode_pages_range() > at the same time. I appreciate that we don't have any documentation > of that, but if it were possible, we'd see other crashes. Removing > the folio from the page cache sets folio->mapping to NULL. And > __filemap_remove_folio() uses folio->mapping in > filemap_unaccount_folio() and page_cache_delete(), so we'd get NULL > pointer dereferences. ok. I will check if it is possible to have another way of entering this scenario. > > I see a hint in the DAX code that it's an fs-dependent lock: > > /* > * This gets called from truncate / punch_hole path. As such, the= caller > * must hold locks protecting against concurrent modifications of= the > * page cache (usually fs-private i_mmap_sem for writing). Since = the > * caller has seen a DAX entry for this index, we better find it > * at that index as well... > */ > > so maybe that's why there's no lockdep_assert() in > truncate_inode_pages_range(), but there should be a comment. > > > Scenario 2: > > 0. Thread_bad gets the folio by find_get_entry and preempted before > > folio_lock (could be the second round scan of > > truncate_inode_pages_range) > > refcnt =3D=3D 2(page_cache, fbatch_bad), PG_lru =3D=3D true, PG_loc= k =3D=3D false > > folio =3D find_get_entry > > folio_try_get_rcu > > > > folio_try_lock > > > > 1. Thread_readahead remove the folio from page cache and drop one > > refcnt by filemap_remove_folio(get rid of the folios which failed to > > launch IO during readahead) > > refcnt =3D=3D 1(fbatch_bad), PG_lru =3D=3D true, PG_lock =3D=3D tru= e > > So readaahead inserts the folio locked, and then calls > filemap_remove_folio() without having unlocked it. > filemap_remove_folio() sets folio->mapping to NULL in > page_cache_delete(). When "Thread_bad" wakes up, it gets the > folio lock, calls truncate_inode_folio() and sees that > folio->mapping !=3D mapping, so it doesn't call filemap_remove_folio(). > > > 4. Thread_bad schedule back from step 0 and clear one refcnt wrongly > > when doing truncate_inode_folio->filemap_remove_folio as it take this > > refcnt as the page cache one > > refcnt =3D=3D 1'(thread_isolate), PG_lru =3D=3D false, PG_lock =3D= =3D false > > find_get_entries > > folio =3D find_get_entry > > folio_try_get_rcu > > folio_lock > > mapping !=3D mapping as folio_lock_entries = does> > > truncate_inode_folio > > filemap_remove_folio > > Please review the following scenario, where the folio dropped two refcnt wrongly when cleaning Non-IO folios within ractl. Should we change it to __readahead_folio? 0. Thread_bad gets the folio by find_get_entry and preempted after folio_try_get_rcu (could be the second round scan of truncate_inode_pages_range) refcnt =3D=3D 2(page_cache, fbatch_bad), PG_lru =3D=3D true, PG_lock = =3D=3D false folio =3D find_get_entry folio_try_get_rcu 1. Thread_readahead remove the folio from page cache and drop 2 refcnt by readahead_folio & filemap_remove_folio(get rid of the folios which failed to launch IO during readahead) refcnt =3D=3D 0, PG_lru =3D=3D true, PG_lock =3D=3D true read_pages ... folio =3D readahead_folio ********For the folio which can not launch IO, we should NOT drop refcnt here??? replaced by __readahead_folio???********** folio_get filemap_remove_folio(folio) folio_unlock folio_put 2. folio_unlock refcnt =3D=3D 0, PG_lru =3D=3D true, PG_lock =3D=3D false 3. Thread_isolate get one refcnt and call folio_isolate_lru(could be any process) refcnt =3D=3D 1(thread_isolate), PG_lru =3D=3D true, PG_lock =3D=3D fal= se 4. Thread_isolate proceed to clear PG_lru and get preempted before folio_ge= t refcnt =3D=3D 1(thread_isolate), PG_lru =3D=3D false, PG_lock =3D=3D fa= lse folio_test_clear_folio folio_get 5. Thread_bad schedule back from step 0 and proceed to drop one refcnt by release_pages and hit the BUG refcnt =3D=3D 0'(thread_isolate), PG_lru =3D=3D false, PG_lock =3D=3D f= alse