From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DEC5C54E58 for ; Mon, 18 Mar 2024 03:28:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5C456B0082; Sun, 17 Mar 2024 23:28:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B0CEE6B0083; Sun, 17 Mar 2024 23:28:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FCCA6B0085; Sun, 17 Mar 2024 23:28:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 91ECB6B0082 for ; Sun, 17 Mar 2024 23:28:21 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5F9D8C0845 for ; Mon, 18 Mar 2024 03:28:21 +0000 (UTC) X-FDA: 81908726802.04.4947C38 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP id 259F4A0017 for ; Mon, 18 Mar 2024 03:28:18 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=JAE4AHGL; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710732499; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Mywcfnh5NNAq/uHuTcHdgmDU+e61uebZtDEUql6Qjmw=; b=3nr96W1ucFtfJG/r2A6/i8k8tMascWDepIiJLtq0G4BBoj6p+eObUzY5rpvv04CH2ZcHbG yjJB7MqQut9j0WIY487VYb+Dtol6RMverTvq+FCB08+12pzOSrQobHB1+fWb7qZTm66Gob ADyLm9xDji4+Ih8WjK2TTCv8dRHzr4w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710732499; a=rsa-sha256; cv=none; b=G8HtAmsMjJLyPAchFK/80UL6FS7m7QWaToazzPbDU5CsDYZYhuwCA3ia2WeFI7l4fh89I5 bV8raWCCuXVSm7/NEj4dd0GjyT6dQdkbyg5tMHY7r3fGTGSXq7ac7ILbYMWqUMcO2zRL7g O/8OCksESLuzkiD8EaxM8uJcmndmkLM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=JAE4AHGL; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=Mywcfnh5NNAq/uHuTcHdgmDU+e61uebZtDEUql6Qjmw=; b=JAE4AHGLLAA2zK2ofDzl5hLk36 /+4V+3CniMhm7baMAv5R3LEgDXySCbFUc8wFAhzff4ybeXNCp+1u/D7iy9sF+zDr/5W9lHtHB3RSC cjrL9rNzpQuabd26vq4jFlTPKXQVR0G9U9b1Umfj+oEMDXjYsFEzGdFngM7HEN2AKklbpYfkofe28 wYfVIVcLeKDE45So9YsTtH9LhnxC/gyTxiE0HUV0QHxd4/Za0rTExc5Gl9n+lQrI14SXT+DPjUkl3 lT1dNIk5fsx5BUoXW3NnI0QtHavIVC5Q51QGk3BrUA/jj+NMdtTk6I+V5OoNcqSuGDopwL04KM7q/ po/8tNzw==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rm3fW-0000000GDq7-21up; Mon, 18 Mar 2024 03:28:10 +0000 Date: Mon, 18 Mar 2024 03:28:10 +0000 From: Matthew Wilcox To: =?utf-8?B?6buE5pyd6ZizIChaaGFveWFuZyBIdWFuZyk=?= Cc: Zhaoyang Huang , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?utf-8?B?5bq357qq5ruoIChTdGV2ZSBLYW5nKQ==?= Subject: Re: reply: [PATCH] mm: fix a race scenario in folio_isolate_lru Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 259F4A0017 X-Rspam-User: X-Stat-Signature: 6r6uniguwzphq3ykifds8nthzmg3ftwc X-Rspamd-Server: rspam03 X-HE-Tag: 1710732498-113589 X-HE-Meta: U2FsdGVkX18LaE4SG1zscr7R0zMAZUs1LTZ9XcKGlngBv0tYItVqDjQA8uTghO8sV5IjnNoZuWs836AxWqRym74VPC7iEp5G40Md805bdVPq16LpVdeDN+iZhz3uy4en86aexLCtZLn7golWh4fgDB8Xr+G3babSi5aM8T2W6jTEMuoOnTiZukfQyjzf0wqdTKjaTKDwaBAqyiEVS8udbpirqAG3KZd+2O5CjEe+gHYF2u7SRZ/3cOIqA8uGAserXqpjt6k+hMqKDvuGU6q1ZwU/NusdQnA+s0BleBvitMLPH0wa8OmC9FJAv++isDFYp4t/QSJpBhQws1sbQwXpkcBe7pv4y9PoMB6js1mcxfgWeuunUrQ/HjMdkxrNISs3rbxey36mgZslYdm7eZ6L43FN/JCinxPBky/bc6CZYRDBWAkf08zITllYNeEzlfE1wFmSl7YCsx5EkflLVJWQCtJ316pv66PG+YG/QzPrksJSOTFt29il60WytAIH3SvCps1tS02xKxsgcP5fli6ngKbXzb8Ib8blG7W41zwQ2SYzBOUYwTIotn/muSvv6Rml2wOyvLQmKpFE4jlICuwhjkwQJgq68tKFPHF1YS1O46FTyHX0CJ2R4sJESt4DvRf848HWnqLl5eMoRQudC9zGtB3QGbuo7GQxzHxr1C+8kYq3GMnc4pg3qPLupau6g7XYiClCLqjbcAb+JT5NPXItHoNJg3NAXjpC29+56f+779AoT5ID24nFWVTQsAniJPVhKXyS5PXUyTXK6lGggi9cG8AOV+2kGkKKartw3o7YPsCnUuvdOVBYiBmLsL44wV/5H0XwLlBzuw0Yd07/AivPcST1TPA4ImgzP6OdjNxl6JizkA9Og6BcWp8JCPffYpfTMGyub7tj+EN+dPCchnZX71uTcoLAWY6zyx67AQV+eulqVYXdH4EIyxuhRL/vIbemAQ4kwcZqNZSA8e8zQXx bAZmEHEw bBiPOe6C6EOvbczumMl5yrz7uC6BjdnndOMFT+xZ97iaXbq2q6WBtaoXGIKcwOSx+j4aXuOvmPryQNsuB/P+NIS6sYTY7+r+kxa+bl2/RQ5mUmTlKascjxPjZBpIBnQJ1uudlI2BIe8yGazBVt8arMrlQozv90+WmQUz2xYD6W7/wnMrFOmuaqXcnNRw3PYPNS58rUV14TV5ld69zvjoIRtBDaqUaR7ngtzxeaIrw2FjnnSgkUHS4+KTvxA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 18, 2024 at 01:37:04AM +0000, 黄朝阳 (Zhaoyang Huang) wrote: > >On Sun, Mar 17, 2024 at 12:07:40PM +0800, Zhaoyang Huang wrote: > >> Could it be this scenario, where folio comes from pte(thread 0), local > >> fbatch(thread 1) and page cache(thread 2) concurrently and proceed > >> intermixed without lock's protection? Actually, IMO, thread 1 also > >> could see the folio with refcnt==1 since it doesn't care if the page > >> is on the page cache or not. > >> > >> madivise_cold_and_pageout does no explicit folio_get thing since the > >> folio comes from pte which implies it has one refcnt from pagecache > > > >Mmm, no. It's implicit, but madvise_cold_or_pageout_pte_range() > >does guarantee that the folio has at least one refcount. > > > >Since we get the folio from vm_normal_folio(vma, addr, ptent); we know that > >there is at least one mapcount on the folio. refcount is always >= mapcount. > >Since we hold pte_offset_map_lock(), we know that mapcount (and therefore > >refcount) cannot be decremented until we call pte_unmap_unlock(), which we > >don't do until we have called folio_isolate_lru(). > > > >Good try though, took me a few minutes of looking at it to convince myself that > >it was safe. > > > >Something to bear in mind is that if the race you outline is real, failing to hold a > >refcount on the folio leaves the caller susceptible to the > >VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio); if the other thread calls > >folio_put(). > Resend the chart via outlook. > I think the problem rely on an special timing which is rare, I would like to list them below in timing sequence. > > 1. thread 0 calls folio_isolate_lru with refcnt == 1 (i assume you mean refcnt == 2 here, otherwise none of this makes sense) > 2. thread 1 calls release_pages with refcnt == 2.(IMO, it could be 1 as release_pages doesn't care if the folio is used by page cache or fs) > 3. thread 2 decrease refcnt to 1 by calling filemap_free_folio.(as I mentioned in 2, thread 2 is not mandatary here) > 4. thread 1 calls folio_put_testzero and pass.(lruvec->lock has not been take here) But there's already a bug here. Rearrange the order of this: 2. thread 1 calls release_pages with refcount == 2 (decreasing refcount to 1) 3. thread 2 decrease refcount to 0 by calling filemap_free_folio 1. thread 0 calls folio_isolate_lru() and hits the BUG(). > 5. thread 0 clear folio's PG_lru by calling folio_test_clear_lru. The folio_get behind has no meaning there. > 6. thread 1 failed in folio_test_lru and leave the folio on the LRU. > 7. thread 1 add folio to pages_to_free wrongly which could break the LRU's->list and will have next folio experience list_del_invalid > > #thread 0(madivise_cold_and_pageout) #1(lru_add_drain->fbatch_release_pages) #2(read_pages->filemap_remove_folios) > refcnt == 1(represent page cache) refcnt==2(another one represent LRU) folio comes from page cache This is still illegible. Try it this way: Thread 0 Thread 1 Thread 2 madvise_cold_or_pageout_pte_range lru_add_drain fbatch_release_pages read_pages filemap_remove_folio Some accuracy in your report would also be appreciated. There's no function called madivise_cold_and_pageout, nor is there a function called filemap_remove_folios(). It's a little detail, but it's annoying for me to try to find which function you're actually referring to. I have to guess, and it puts me in a bad mood. At any rate, these three functions cannot do what you're proposing. In read_page(), when we call filemap_remove_folio(), the folio in question will not have the uptodate flag set, so can never have been put in the page tables, so cannot be found by madvise(). Also, as I said in my earlier email, madvise_cold_or_pageout_pte_range() does guarantee that the refcount on the folio is held and can never decrease to zero while folio_isolate_lru() is running. So that's two ways this scenario cannot happen.