From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5308C54E58 for ; Fri, 22 Mar 2024 01:52:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22F4F6B0083; Thu, 21 Mar 2024 21:52:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DF226B0087; Thu, 21 Mar 2024 21:52:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A84A6B0088; Thu, 21 Mar 2024 21:52:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EBFD96B0083 for ; Thu, 21 Mar 2024 21:52:51 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8B97B120242 for ; Fri, 22 Mar 2024 01:52:51 +0000 (UTC) X-FDA: 81923001342.26.C653BF9 Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf10.hostedemail.com (Postfix) with ESMTP id A827BC000C for ; Fri, 22 Mar 2024 01:52:49 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FX9Vrpst; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711072369; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zNBMFTM15k325ob8+xUYC/3jWDb7dhwaiyxnvR1gLmo=; b=Kkdf6vsGaJ0Yl+JJfThHRLuyeFQmPP6T4LGt9bfGqyUgyasClrHuOLrqU1jfoPV8URqRP2 QWwSv54Ph7oS94l+ptOaN2QhaY1kP9P7detd32XBxuA8e+Xes1BOKqKrmIqdGPfNR1KU// n01cLnMdkKQ1cCpA7I2NbkBgg5PLybI= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FX9Vrpst; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711072369; a=rsa-sha256; cv=none; b=oRTwVPTdGQutskcZn/FELyZ3+BKVxgP2catigLhIcbiAqOZA5puhdhJBB5DulRuBe9WelM VahIcsi/JBowExHT4IaEsHDcDoWIwbkXaRnOezJseQdEMPokxjp8KiNOMDuh1uZJJ+u3KK 8vSL0uFjLmQFL8m0ijIpbiD3lIknZ3Y= Received: by mail-lf1-f48.google.com with SMTP id 2adb3069b0e04-513dc9d6938so2163751e87.2 for ; Thu, 21 Mar 2024 18:52:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711072368; x=1711677168; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zNBMFTM15k325ob8+xUYC/3jWDb7dhwaiyxnvR1gLmo=; b=FX9VrpstQIGhGPx+hcnbY0hQo8zbeouUp+/z8NBSfJJL/mZA7nkfRMwBfUlMkXewFr lgnzJ95VfCnhqzkz9GJOge/XfQGAwSdIhkeWBDUcC6fonXxY9iG4AyOiZgEPgaTX2zNU /9DqRQkOxzrdamVMU8K11Em5q6s8fKBrjX/oM6mh3rHoMl44/5CwWwYdarlggNcsXKcM 3w9RQ4eleWNWvPnM6aZ59An4N5WEVY9+I9Ny4NnKUEAaLja6DhwGYZm9fS9b1DQHqmf2 4qOI65y9FCTEqQeq1yizB7L4kOZs4n2QR0zrAJ3fdJoKTQX0SQMQscWhz3WIMOqKE1u8 fivQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711072368; x=1711677168; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zNBMFTM15k325ob8+xUYC/3jWDb7dhwaiyxnvR1gLmo=; b=BWOXrqWkgZwfpraAg+H7zP47HYYUyYBIw74mCPd5aj+WBoVh3PVQiSbWeUzE6g4F/Z oWLc2zDHdhyEzM3GuSdOI1SHnYZekIlEQTxR5kpkiFAOpw4K30Pk3PCWFKiobGZe6d0I lXnEIbjjcZpruy1ZdR+LP7jRj8EquWh/J7zT6zDIzhfpIRMGPFS6yGyPdogukbM8IvR9 PyVyks3qYhOVgo/G0qQRsEyOGZVKdB9pQSHdWYS7IsTm4SeWtHn+fqw0enxbgaJ5tgBw 8SN+iFYWyxqciivlSS5u+2cugS4Z8KGmjsQNkdWKqkyrRgCiZG+NNxR7fomV2Y6IS5hf CCMg== X-Forwarded-Encrypted: i=1; AJvYcCVgeIsjZkI3jBSpSWZJXwlONEirDKTxT24r6dtU9x9iU09D2Bbc20Rycmlq9NUOptHrMxQjtToG/Q8W/dYsYjBCRzQ= X-Gm-Message-State: AOJu0YwCSRz55TxfhxA+XBHE9Br5RmEqc2bgtg9BuL0unwvdZTGLWo3V /A/8ft4VjheVxqOE/RW9kFCR719YahqCl3aBFlFnVG5bsGTBDWdYkdoHH4X/XyzH+crF4Lbxxrd /Y7zNArVGHwnMad18ye4ymgmWmAM= X-Google-Smtp-Source: AGHT+IEeVnwKVqQhi5/EAi48TQmP5Pco8j9Rxqyao2UqtcVj/zE0WgShGkJiiYVvCS1gy/WZno//xhIZGU07s18k74g= X-Received: by 2002:ac2:5eca:0:b0:515:90e8:f873 with SMTP id d10-20020ac25eca000000b0051590e8f873mr594954lfq.35.1711072367413; Thu, 21 Mar 2024 18:52:47 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Zhaoyang Huang Date: Fri, 22 Mar 2024 09:52:36 +0800 Message-ID: Subject: Re: summarize all information again at bottom//reply: reply: [PATCH] mm: fix a race scenario in folio_isolate_lru To: Matthew Wilcox Cc: =?UTF-8?B?6buE5pyd6ZizIChaaGFveWFuZyBIdWFuZyk=?= , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?UTF-8?B?5bq357qq5ruoIChTdGV2ZSBLYW5nKQ==?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: wjsddrsx4bh4euphdor4qfimmimnd69e X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A827BC000C X-HE-Tag: 1711072369-234676 X-HE-Meta: U2FsdGVkX1+MKyxgriGBbbRywnlyuzJvmKWMcobr+/pQfdlAA7PsgjYn38RYpdqCc8eclDyLbbK3CSJ0B9CIHVOQWRlcd9HHGhC+nHJl87yitiEm0Yle7gPF5z1ZAHlmjns6G2o1w3Pdevs5ctCBiYg6OonQF7SO71ub+VZM8QI4cRWnZQiQOnrWueubrf0SehC8awEP7VHp8RZMYLDGTnZTVw4XwC5qmiYPaL7DosvNQb26EJ4n3UZIE/GVllNX3CLakId7MonOOXaRfRidNhORknXyGtUs2AY0XFGL3rDu27lTWZJtS2m+M+IP+2irRgTMJX23C+IjQ9oFK80Pqn6drUYiliw4ihLSUxOVdbjwsB8iVTzFeSrqOwfYI9BIkJpoN1m7iVf7nsJuFKKg3uTMSGlpmJDPrfKNVPA30U/INd7RfjPdWSkA7iVtI69Z9GbIksL9fCN8j0WKMA0FaC8S6tsjETuORM3PrRFSoDG40Vb7qinJA104j8kSrxQiWWBbmKA7hTyOBn+fDNpCdNZs1T++gy6Vam//u6ECNmzMjYdIB8/ItxUYajSgINU2d8FGpDZ0UrLyPCR1dQG810VJ8yGJ+nJ7iKIx1jA+zPfI+kqY4GKeGnBnEAOmDPYou7nwwZfOnUhfD+DqIrn7CsBnlrl9+ADEqXZXfBAnsaGddsjv2Kqq3ipgoSl9t0hRP+Par3RhicuQgZDsS1xLWKjuNex2WJSX5kqHZj1QKtwKZugn+FMm58L6QWkDhwk9KyfG6Fin9VLKf+dR3aPGk6a4ZSa3XIRXsp+pnTuCXwMiurArOtmoChuG/iJtwZth56RQFOZb9F7NSkLnTbom7qOE/hVmtT4CN19Z+Crju/Ti2S2/HOgP/WIAWZFZkWLDnmNidxE6HvP74GAxH55BfCzHymF89SAJQuy/fGbbz39QP3zjmhNydWy9aIDYZWWaCwhEyiAb1yg+6JCTw6p K5KJB7Ai ZC2Br25mEBKfZzQVNyclFaLIXCnN/Ebpdz9tz8Pr/ysJ89kTvdI7wWAhGCWKkrC/WoufdXJWxmT3S7ABObV1V1G8fWze2OvP6xkHngS7vH+NpfUdLfMJOuv9KvUosTdFaszOPowzazr0hWdtgwRyco+gXJw9/r1DAxweF5JPrfnwSrRzBCHbhiFAQ8lranGMDQkkFc/+REisUhpMfwFrobzGACCferYskEVR+556mDLMsSKX0A56aMsCIrzfJaOVxh6hpEowVf5Oy0XiZDp6Ca0/SFvjEpsg4ciW2QO/sGAR94ULIvDs9WvGap2myGfHsQEIljVa5dvxSm9oh4odAUBLBT3oup6iQPtSgAk1yAz6x7cxV058WKe+zj8CvlwuhqZdhHC7DmgNKel0h7uzvZMtqQDHC/1zqBNzn0ArYIrrir2qaiEK19cLONgCZLLXUe7V/Vy5+buGFye/1+7vqwq7ecwe3zKtlhoMCJrBc2VOLGIamIihp+wUNhA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.025903, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 21, 2024 at 8:36=E2=80=AFPM Matthew Wilcox wrote: > > On Thu, Mar 21, 2024 at 04:25:07PM +0800, Zhaoyang Huang wrote: > > ok. Could the scenario below be suspicious on leaving an orphan folio > > in step 7 and introduce the bug in step 8. In the scenario, > > Thread_filemap behaves as a backdoor for Thread_madv by creating the > > pte after Thread_truncate finishes cleaning all page tables. > > > > 0. Thread_bad gets the folio by folio_get_entry and stores it in its > > local fbatch_bad and go to sleep > > There's no function called folio_get_entry(), but clearly thread_bad > should have a refcount on it at this point. > > > 1. Thread_filemap get the folio via > > filemap_map_pages->next_uptodate_folio->xas_next_entry and gets > > preempted > > refcnt =3D=3D 1(page_cache), PG_lru =3D=3D true > > so the refcount should be 2 here. > > > 2. Thread_truncate get the folio via > > truncate_inode_pages_range->find_lock_entries > > refcnt =3D=3D 2(fbatch_trunc, page_cache), PG_lru =3D=3D true > > > > 3. Thread_truncate proceed to truncate_cleanup_folio > > refcnt =3D=3D 2(fbatch_trunc, page_cache), PG_lru =3D=3D true > > > > 4. Thread_truncate proceed to delete_from_page_cache_batch > > refcnt =3D=3D 1(fbatch_trunc), PG_lru =3D=3D true > > > > 5. Thread_filemap schedule back and proceed to setup a pte and have > > folio->_mapcnt =3D 0 & folio->refcnt +=3D 1 > > refcnt =3D=3D 2(pte, fbatch_temp), PG_lru =3D=3D true > > > > 6. Thread_madv clear folio's PG_lru by > > madvise_xxx_pte_range->folio_isolate_lru->folio_test_clear_lru > > refcnt =3D=3D 2(pte,fbatch_temp), PG_lru =3D=3D false > > > > 7. Thread_truncate call folio_fbatch_release and failed in freeing > > folio as refcnt not reach 0 > > refcnt =3D=3D 1(pte), PG_lru =3D=3D false > > ********folio becomes an orphan here which is not on the page cache > > but on the task's VM********** > > > > 8. Thread_xxx scheduled back from 0 to do release_pages(fbatch_bad) > > and have the folio introduce the bug. > > ... because if these steps happen as 7, 8, 6, you hit the BUG in > folio_isolate_lru(). Thanks for the comments. fix the typo and update the timing sequence by amending possible preempt points to have the refcnt make sense. 0. Thread_bad gets the folio by find_get_entry and preempted before take refcnt(could be the second round scan of truncate_inode_pages_range) refcnt =3D=3D 1(page_cache), PG_lru =3D=3D true, PG_lock =3D=3D false find_get_entry folio =3D xas_find folio_try_get_rcu 1. Thread_filemap get the folio via filemap_map_pages->next_uptodate_folio->xas_next_entry and gets preempted refcnt =3D=3D 1(page_cache), PG_lru =3D=3D true, PG_lock =3D=3D false filemap_map_pages next_uptodate_folio xas_next_entry folio_try_get_rcu 2. Thread_truncate get the folio via truncate_inode_pages_range->find_lock_entries refcnt =3D=3D 2(page_cache, fbatch_truncate), PG_lru =3D=3D true, PG_lo= ck =3D=3D true 3. Thread_truncate proceed to truncate_cleanup_folio refcnt =3D=3D 2(page_cache, fbatch_truncate), PG_lru =3D=3D true, PG_lo= ck =3D=3D true 4. Thread_truncate proceed to delete_from_page_cache_batch refcnt =3D=3D 1(fbatch_truncate), PG_lru =3D=3D true, PG_lock =3D=3D tr= ue 4.1 folio_unlock refcnt =3D=3D 1(fbatch_truncate), PG_lru =3D=3D true, PG_lock =3D=3D fa= lse 5. Thread_filemap schedule back from '1' and proceed to setup a pte and have folio->_mapcnt =3D 0 & folio->refcnt +=3D 1 refcnt =3D=3D 1->2(+fbatch_filemap)->3->2(pte, fbatch_truncate), PG_lru =3D=3D true, PG_lock =3D=3D true->false 6. Thread_madv clear folio's PG_lru by madvise_xxx_pte_range->folio_isolate_lru->folio_test_clear_lru refcnt =3D=3D 2(pte,fbatch_truncate), PG_lru =3D=3D false, PG_lock =3D= =3D false 7. Thread_truncate call folio_fbatch_release and failed in freeing folio as refcnt not reach 0 refcnt =3D=3D 1(pte), PG_lru =3D=3D false, PG_lock =3D=3D false ********folio becomes an orphan here which is not on the page cache but on the task's VM********** 8. Thread_bad scheduled back from '0' to be collected in fbatch_bad refcnt =3D=3D 2(pte, fbatch_bad), PG_lru =3D=3D false, PG_lock =3D=3D t= rue 9. Thread_bad clear one refcnt wrongly when doing filemap_remove_folio as it take this refcnt as the page cache one refcnt =3D=3D 1(fbatch_bad), PG_lru =3D=3D false, PG_lock =3D=3D true->= false truncate_inode_folio filemap_remove_folio filemap_free_folio ******refcnt decreased wrongly here by being taken as the page cache one **= **** 10. Thread_bad calls release_pages(fbatch_bad) and has the folio introduce the bug. release_pages folio_put_testzero =3D=3D true folio_test_lru =3D=3D false list_add(folio->lru, pages_to_free)