From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65D9CC54E67 for ; Thu, 28 Mar 2024 01:27:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3F206B0082; Wed, 27 Mar 2024 21:27:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CEEED6B0085; Wed, 27 Mar 2024 21:27:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB6706B0087; Wed, 27 Mar 2024 21:27:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9EC6F6B0082 for ; Wed, 27 Mar 2024 21:27:46 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 58425A1445 for ; Thu, 28 Mar 2024 01:27:46 +0000 (UTC) X-FDA: 81944710932.11.B22E77F Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf14.hostedemail.com (Postfix) with ESMTP id AD765100007 for ; Thu, 28 Mar 2024 01:27:44 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d5vuWldL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711589264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a4+6bhV4ASkyXzhB6rRjo48e6+mgSPLqfD3+eCsrCCs=; b=D9Gn5drOhUXk0wjyJfHz1IOmAtvC/YYETZpCzpJsHB/BuM+7mrGcUT829IWvZ9ZwzHG7QD ilX2cKiEnXcLZuxFQhjcwVnKjjpVRFwrh4oP9mjOmQ5xYkQd/yJsraGxO45zVcps2ZU6Z7 naQpo0NTPtTHIlnEy0TF4laQNNAmdLg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d5vuWldL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711589264; a=rsa-sha256; cv=none; b=2uuvJQmzUoiCyx81uW0KboiE09VCsmHdZxt+Jl25/Zn4Ln4GWEFP/9s5mLjvdCcQZ/6Ucg Hm2E5RoXD8hEn8uxo7jBP3zYj9tlHS81lvQBnpH47u+2r6nnD680ummgte77+nEY3ukuBl YbASbPZthRJFDlMBvtcuSaT76r35fu0= Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-2d700beb60bso7137171fa.1 for ; Wed, 27 Mar 2024 18:27:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711589263; x=1712194063; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=a4+6bhV4ASkyXzhB6rRjo48e6+mgSPLqfD3+eCsrCCs=; b=d5vuWldLaixhukDUGrGPc8JiTnOh+8kGffqYyXtZYndWjfOTvZGslA2/3iBgX+CcSr Cr1itrZZkaCNYe7vFzVdKCivLsMJBtDhMiztGS6Vt2BtmdQL2zjIShYzPDo/SBJPCjQh jOBLc22kPNqz4H+8MjgOR5T8HZnmbYB1tcXk3FBUIc87alqfSu9l+gbTLlP83R8DvZYa ovmd5AIusAxupZo8KKhF459CEsH7XXxMUTiozY/zZJmjOAI42vc+mGv1ShoGASsIOtun MWaxBIMpaJb0AVHC41pSoAF6+Nqe8voM13Fdm6JEckFMOeqZB53mmMpVPtQCPz0Hc7Hw SqDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711589263; x=1712194063; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=a4+6bhV4ASkyXzhB6rRjo48e6+mgSPLqfD3+eCsrCCs=; b=fup97r9giCkV1NH4tXuIL1sRRNeWl385hrbQlJgSEts8mvxJER4cycdLRCNqgyF+N0 yVRGqotvrCwuEpVnnBmAYK8amwms5GtmFDAw/4jroa7QY5Y/QI3pU3H3LNAGbXeZZPM3 ZqhmKBIlNodMNElC6gxgnXGCu2izbWSGUHEVROGGmIsl30FWS7hb7SMB4LXRx3ouKKWv 9CWzCYyDE7RqygO3xLrXKHh9znvUSH7dUPj6KjwWomzDKAdlDXM5EgrNB2jzpAC7aAqF ElCF+wgYuvd3hcfog6NGP60Xo+9fmurwfSe1cuqJAyKUMCGPtOcrjOv4bvaC+hM0vKYy WobQ== X-Forwarded-Encrypted: i=1; AJvYcCVK0Uvnh8LUocuusLNkRlB2roIz2gDxNe2gWFnfsQHHS1BUBzwUFDdcrMUxHNKwQkwd6vNZQZVaU0o1v/11p81ecEo= X-Gm-Message-State: AOJu0Yxxe1md0VoAgkhDeuObKv2B+iVmgww9sarL06vOjhPM//ON0eqQ efpG0nuzM7J+iERR7oRjku0JsIO3zDfxYglO76Dcp56yak6vGfWJrRAMzewK/D08hbBL4zyeFNn 2brmNOJqZJteMGavsfQnk0t4qobA= X-Google-Smtp-Source: AGHT+IHjHc/M5wg8Kjv4Xh41rdEpeSo18qi7fVeBXzM21c5hf+atpQ+Bm9FW8uywUPCAPYwN/gDi0QxSLifoeLU+Wbk= X-Received: by 2002:a2e:8551:0:b0:2d5:acde:9065 with SMTP id u17-20020a2e8551000000b002d5acde9065mr1501774ljj.34.1711589262680; Wed, 27 Mar 2024 18:27:42 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Zhaoyang Huang Date: Thu, 28 Mar 2024 09:27:31 +0800 Message-ID: Subject: Re: summarize all information again at bottom//reply: reply: [PATCH] mm: fix a race scenario in folio_isolate_lru To: Matthew Wilcox Cc: =?UTF-8?B?6buE5pyd6ZizIChaaGFveWFuZyBIdWFuZyk=?= , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?UTF-8?B?5bq357qq5ruoIChTdGV2ZSBLYW5nKQ==?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: AD765100007 X-Stat-Signature: rfiyqpip5kd98e5k4pfrdyzcwre6uc3y X-Rspam-User: X-HE-Tag: 1711589264-187212 X-HE-Meta: U2FsdGVkX19aFinvi9EeaDw0XySkA8VxDXl/StlrLxqaCybhOumgAMf+sUEndg5x7NNac5kwiO6g8pYiuz0R0BV1Em532TQ6K4qrdh1cun68ef+ZFNAtA4Eu78cDHSSg1O00Y/H5oCANQsDpxUu4CKtPMasayQLTgPbyYgEXGAnA8pJWv4Wtqu2KnEJXvVPVWGSKuy5z/siRaAsqIolCSA2KtWhZFF/EoLuezPu0N2BQ0vObcGDztiHTMGQj8A2JwgB6zVjCSe//myKUEV6OMUUwXA8+Oixmm3ILfHU+aO3IJIsuJtt1MY7w6XpWxQepBAMyn4tBYP/JHq5ZsTpe6FNqUbt0tYkGAy/V5hl6Io6fwD8MZOedjfWpRaQnD38q2iBqySdqiGrLDdD3RPPHV2gsaG4dp3d4KONCtoFbVWUidAdRQLBHvGkbG2yOBw2Ilwr/WpuTGcN/WRgESqogOe24fDKgErXRoLdhohm5G+ELKh8rYc+tTMS4sKqBuhK4IiIH327Vdoi+JtCQEFezJeP8zPYB3DgKIZY+hhWTQBkU27e2zn5RdIcdTX9Hi1/QEh1EM+KZnS7Hkf/o+fuxxLOPmRx8fNgl7FR8Bii+OAhA/HvkCFKHgoS1vmshhD0rAgrDGHq7cb8Dj4DJLdfrm4ZtuFyqN3Ad6LyqRx1WRYJtHjZZ/KYVugkggyJ762tGS2gv0kEWbOL4X9JioEuHKlsf1gstjVFYBNEb65A8kHgchvaXe+UJ003MDbDLzZK7UtHSajXCulVAFIup9kUxAMOjRWNEoJnpSHyLhkenFo4AjDGg2BhhnFKPZKMX9BKwMAY2fmmRWVKjpcGSehDZJmo1wm59n2qX7TSyULDo3+DSYlZleTp/0UcM0hkIoNFHSd7Z5ZSw1S80Iu7E1KKN8Lo7QgLIJZT48HroSKUFLNes3rG/XDCAWo0BF+qm6Gk8d/t7fj3+j4mi+uL2sDL sHVnF64I 8i/Flyy9qbhgpfQWUZPLvQyKMwBvUuPmSXDwvDFEWzWY/8D0lk5hdnHBinH/JPaOuMDKvk/aqz8ynYQvJpBYuCCchBr1/OAtoKVzmrbvaoPA0eFRrUR5gK0ZlVw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.013134, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 27, 2024 at 8:31=E2=80=AFPM Matthew Wilcox wrote: > > On Wed, Mar 27, 2024 at 09:25:59AM +0800, Zhaoyang Huang wrote: > > > Ignoring any other thread, you're basically saying that there's a > > > refcount imbalance here. Which means we'd hit an assert (that folio > > > refcount went below zero) in the normal case where another thread was= n't > > > simultaneously trying to do anything. > > Theoretically Yes but it is rare in practice as aops->readahead will > > launch all pages to IO under most scenarios. > > Rare, but this path has been tested. > > > read_pages > > aops->readahead[1] > > ... > > while (folio =3D readahead_folio)[2] > > filemap_remove_folio > > > > IMO, according to the comments of readahead_page, the refcnt > > represents page cache dropped in [1] makes sense for two reasons, '1. > > The folio is going to do IO and is locked until IO done;2. The refcnt > > will be added back when found again from the page cache and then serve > > for PTE or vfs' while it doesn't make sense in [2] as the refcnt of > > page cache will be dropped in filemap_remove_folio > > > > * Context: The page is locked and has an elevated refcount. The calle= r > > * should decreases the refcount once the page has been submitted for I= /O > > * and unlock the page once all I/O to that page has completed. > > * Return: A pointer to the next page, or %NULL if we are done. > > Follow the refcount through. > > In page_cache_ra_unbounded(): > > folio =3D filemap_alloc_folio(gfp_mask, 0); > (folio has refcount 1) > ret =3D filemap_add_folio(mapping, folio, index + i, gfp_= mask); > (folio has refcount 2) > > Then we call read_pages() > First we call ->readahead() which for some reason stops early. > Then we call readahead_folio() which calls folio_put() > (folio has refcount 1) > Then we call folio_get() > (folio has refcount 2) > Then we call filemap_remove_folio() > (folio has refcount 1) > Then we call folio_unlock() > Then we call folio_put() ok, I missed the refcnt from alloc_pages. However, I still think it is a bug to call readahead_folio in read_pages as the refcnt obtained by alloc_pages should be its final guard which is paired to the one which checked in shrink_folio_list->__remove_mapping->folio_ref_freeze(2)(this 2 represent alloc_pages & page cache). If we removed this one without isolating the folio from LRU, the following race could happen. Furthermore, the refcnt dropped in the readahead_folio represents page cache, it doesn't make sense to drop it twice in read_pages. 0. Thread_readahead: folio_put() folio_put_test_zero() =3D=3D true __folio_put() folio_test_lru() =3D=3D true 1. Thread_isolate folio_isolate_lru folio_test_clear_lru() lruvec_del_folio() 2. Thread_readahead folio_put() folio_put_test_zero() =3D=3D true __folio_put folio_test_lru() =3D=3D true lruvec_del_folio() > (folio has refcount 0 and is freed) > > Yes, other things can happen in there to increment the refcount, so this > folio_put() might not be the last put, but we hold the folio locked the > entire time, so many things which might be attempted will block on the > folio lock. In particular, nobody can remove it from the page cache, > so its refcount cannot reach 0 until the last folio_put() of the > sequence.