From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F37BC52D7D for ; Wed, 14 Aug 2024 04:23:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03ECB6B007B; Wed, 14 Aug 2024 00:23:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1B0B6B0083; Wed, 14 Aug 2024 00:23:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBCD96B0085; Wed, 14 Aug 2024 00:23:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B9D076B0082 for ; Wed, 14 Aug 2024 00:23:34 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 681741C43D3 for ; Wed, 14 Aug 2024 04:23:34 +0000 (UTC) X-FDA: 82449557148.27.3E2E150 Received: from fhigh1-smtp.messagingengine.com (fhigh1-smtp.messagingengine.com [103.168.172.152]) by imf03.hostedemail.com (Postfix) with ESMTP id 788A420015 for ; Wed, 14 Aug 2024 04:23:32 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bur.io header.s=fm1 header.b="EEb/wpKR"; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=flfIQDEL; dmarc=none; spf=pass (imf03.hostedemail.com: domain of boris@bur.io designates 103.168.172.152 as permitted sender) smtp.mailfrom=boris@bur.io ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723609400; a=rsa-sha256; cv=none; b=LNqtfPVFywiqbROhSwRJLU8sJtDRdXovXVrbuff8pH9/JQhz5sTO/g6wd5ZryJGvkYZQkV IMMwDZH//Vx8LZZoVRMBM04VXh89pwk0nl4gz/UY4W16RX693RxwP1/9o49XZvlQ5LGsS2 6GnV0y8/jWPYbOvPiYWM7AzGZJHn8Q8= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=bur.io header.s=fm1 header.b="EEb/wpKR"; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=flfIQDEL; dmarc=none; spf=pass (imf03.hostedemail.com: domain of boris@bur.io designates 103.168.172.152 as permitted sender) smtp.mailfrom=boris@bur.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723609400; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N/H0frljJ4C/CQUXFQPRvdEYO40GoIh9jjaX0EWS1fE=; b=nXjNNv9UWNd8P8ZhmbrTxutzeAx99RKyK72P+sZDzE7cxfWASGA13ufy++4IZC2urYBzxR seeVuDtAysn2oy+S2eNa6lmsF1L9/VR8C1OvEzh0gN4bYKphFq6KqxNHZ1f3KvahfT00zG SElVcy8eQrPgJB6VWxYMRFp4+LwoD/8= Received: from phl-compute-02.internal (phl-compute-02.nyi.internal [10.202.2.42]) by mailfhigh.nyi.internal (Postfix) with ESMTP id C8CDA1151B32; Wed, 14 Aug 2024 00:23:31 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-02.internal (MEProxy); Wed, 14 Aug 2024 00:23:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1723609411; x=1723695811; bh=N/H0frljJ4 C/CQUXFQPRvdEYO40GoIh9jjaX0EWS1fE=; b=EEb/wpKRTzKNc++x09H6HgrIrW roefcFZxQo2fH8V7lJG4rsH/UjONVUuEhtgrqbfOhDaLL1ZJ64P1qy8EgU9ADB81 XbyeKNl/PpGSKAfNcCPsSMcRLqg7zkGynVihZ0Q6kPaFztgRQz8mgbnaDa29dUZg 0fByS3m/b+oEX1uJMjO2PzS7SlZ7R+fp/kKA1UX5c1E3YlilFqSHryDHOLO6kbFI vvGqRelOIqyMNtU1rb3gHMBloeXOIzqkQZkA83bekHXlu6h84iTPd0yFjU6l3a+e kui0iEMiUk/u3+m5wcQ3nJnXo7tSmtDN7nU5IiNT8WqbhwV6iGZLWX/jP/qw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1723609411; x=1723695811; bh=N/H0frljJ4C/CQUXFQPRvdEYO40G oIh9jjaX0EWS1fE=; b=flfIQDELi9Dq3UpasUQEMSKW5aHVBIv9O7OBx94GSLMi ME7VQtN614WBeRuD/WeUsg37ZZvy1jKDHOMQDOuiXoAYpc0b89D3S/cLSeogEuqh jpOcd/nhrbxELBy+HtTrh4KDECfxS1xnxy38rzFWL8nrLRgAZqr2A+gCnOIOWF87 jjOv7wGp3F/7jFcRkCVZVcNTDtZAHFXiJdojpxzlrqzMo0oQE7eruQEb3Gn3wU30 XTO2428lxPwslrGgTFyq5TvCQ04OBHdm+XFRpMF36nT39p8xycfRk5hNYNveubmO poERYs6y4ljqkQp+wxzITvGcP+88UMArYFKldkqzGA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddtfedgkeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnh htshculddquddttddmnecujfgurhepfffhvfevuffkfhggtggujgesthdtredttddtvden ucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdrihhoqeenuc ggtffrrghtthgvrhhnpeekvdekffejleelhfevhedvjeduhfejtdfhvdevieeiiedugfeu gfdtjefgfeeljeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegsohhrihhssegsuhhrrdhiohdpnhgspghrtghpthhtohepgedpmhhouggvpehs mhhtphhouhhtpdhrtghpthhtohepfihilhhlhiesihhnfhhrrgguvggrugdrohhrghdprh gtphhtthhopehshhgrkhgvvghlrdgsuhhttheslhhinhhugidruggvvhdprhgtphhtthho pehlihhnuhigqdhmmheskhhvrggtkhdrohhrghdprhgtphhtthhopehlihhnmhhirghohh gvsehhuhgrfigvihdrtghomh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 14 Aug 2024 00:23:31 -0400 (EDT) Date: Tue, 13 Aug 2024 21:23:29 -0700 From: Boris Burkov To: Matthew Wilcox Cc: Shakeel Butt , linux-mm@kvack.org, linmiaohe@huawei.com Subject: Re: [PATCH RFC] mm: fix refcount check in mapping_evict_folio Message-ID: <20240814042329.GA402156@zen.localdomain> References: <5qxfn7y43mf6vkkk6adfgenilahe5uykscz7muxq7tjmjijxg3@dpshd4mrcxgn> <20240814032715.GA400993@zen.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: 788A420015 X-Rspamd-Server: rspam01 X-Stat-Signature: ix138sc61h5er7wu1dz8qczws1tj8t8n X-HE-Tag: 1723609412-385458 X-HE-Meta: U2FsdGVkX1+0grmRue1g0bnzvvQXTcYhaXylS1LekyiGcCjdMRWY7euX3S/DQuyj4+VLsQsdJdpNtl9tdJ4GTGowerGL71D+qqbrNDlOmMNSSInD+iyVIVzlboo+hSHIy/LVdvh532n6+Urkf/JN+/ElrJizPI4tlkYcNu9+TRQ6zVqDrSIQcrb8K0KL0K/uOOE1WEUpzynFfj7VCxUsluxxTt8pSFgaFp04e3CwX+stKugzbBl0yEZLUo7GWgvq9pOZOUlETp2RBloO8AuONKxoieuLLVbBBOgmRU9J9NlEg51X/P1+ZUhAxwBqxXk60jHKkmW65EZ2uyuolflI4bAzr62AXa6v+ommCkOwdjYBK/DOEcQANB/3yf+cexs2zGJun7xEfURCSuqKNS9BA13rv47kXhJJBhAG6HJaM6f+fYh1YuzupCh7/B0nbEfFWxRqkTGAudY+cY/7SP1/CjdzIoCcdcpmy2CExWK1TMAu6ZCUfdfWsNmci9hOlRtYoONceMkGsUkgD+eXmmA1IymClk+PN+7LcZBD+kOQJQUouC3Xlf1uQhvzLb7f34jKdtBq6o4w+38raJCijAc+fhvQseGzAKGP3nhSPxwdhkaV0OnmL/YmMZ2DxpL6GXLcJENRHpQPCsLcmI9M2DlCa65jDh03dfhRqykFHWey2kMtFqD+DeBVXeHDnprrvM9rtiAj5+tk0ABd3B58Zdl44cgDmqYjj2+O7txdQ66W/A1ANAWx71+8d/6VsZGbyikIFyM2OyHebHDwJj+JZoZ6/SG30gHemIWNTBTQxfCOqt0+CqJ6ag4ylcWOTUYDe+wfqCIgCLikbi9P/C9C7WHHVsj702ltEh4cBC+Kt1WB2TOkL2V6kWudI+vqsNZcEirE+NT3GwHVRheP5tbJI3QR0EJ7nj++0qRGZPouS4W5DxTC+F08TzHnUNtEwaj10nwSDvUVvHOELwCnj6sYc2V Xkni17KW S7PFq5SyetLD58fpj5Yx+TmAsw0Uo+J9TuKlVCZbAsobTN2tFLdBpU4i8rCO4ICnfCj3zGnQQFAkHPcZlRALW5QDY9Y3kOTj3+1I+4369V3rbPwEUXHDUQt5/qx2MsVIvIDJdqKBAxhqtceY/MmA4bMKDxYOOss+c+eJQmsr1RcPCn+GTR0Lf2g25n4i+Sm6lwLg0dm1K6gR8zQsqlxcRqC8If6D/RPReBP79I//uwsHCguw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 14, 2024 at 04:46:13AM +0100, Matthew Wilcox wrote: > On Tue, Aug 13, 2024 at 08:27:15PM -0700, Boris Burkov wrote: > > On Wed, Aug 14, 2024 at 04:15:25AM +0100, Matthew Wilcox wrote: > > > On Tue, Aug 13, 2024 at 12:58:09PM -0700, Shakeel Butt wrote: > > > > > + /* > > > > > + * The refcount will be elevated if any page in the folio is mapped. > > > > > + * > > > > > + * The refcounts break down as follows: > > > > > + * 1 per mapped page > > > > > + * 1 from folio_attach_private, if private is set > > > > > + * 1 from allocating the page in the first place > > > > > + * 1 from the caller > > > > > + */ > > > > > > > > I think the above explanation is correct at least from my code > > > > inspection. Most of the callers are related to memory failure. I would > > > > reword the "1 per mapped page" to "1 per page in page cache" or > > > > something as mapped here might mean mapped in page tables. > > > > > > It's not though. The "1 from allocating the page in the first place" > > > is donated to the page cache. It's late here and I don't have the > > > ability to work through what's really going on here. > > > > Can you explain what you mean by "donated to the page cache" more > > precisely? > > > > Perhaps there is something better btrfs can do with its refcounting > > as it calls alloc_pages_bulk_array, then filemap_add_folio, and finally > > folio_attach_private. But I am not sure which of those refcounts we can > > (or should?) drop. > > Look at how readahead works for normal files; ignore what btrfs is doing > because it's probably wrong. I'm going to use the term "expected > refcount" because there may also be temporary speculative refcounts > from stale references (either GUP or pagecache). > > folio = filemap_alloc_folio(gfp_mask, 0); > (expected refcount 1) > ret = filemap_add_folio(mapping, folio, index + i, gfp_mask); > (expected refcount 1 + nr_pages) > read_pages(ractl); > aops->readahead(rac); > ... calls readahead_folio() which calls folio_put() > (expected refcount nr_pages) > > if filesystem calls folio_attach_private(), add one to the expected > refcount. > > That's it. Folios in the pagecache should have a refcount of nr_pages + > 1 if private data exists. Every caller who has called filemap_get_folio() > has an extra refcount. Every user mapping of a page adds one to the > refcount (and to the mapcount). Thank you for the extra explanation, that is very helpful. > > If btrfs superblocks have an extra refcount, they're wrong and should > have it put somewhere. I suppose by analogy btrfs should do a put sometime after filemap_add_folio of the metadata page. I'll look into making that change instead of this, since it seems like the expected refcount was correct after all and btrfs had an extra one. > > > At some point, I intend to reduce the number of atomic operations we do > by having filemap_add_folio() increment by one fewer than it currently > does, and removing the folio_put() in readahead_folio(). I haven't been > brave enough to do that yet. > > I also think we should not increment the refcount by nr_pages when we > add it to the page cache. Incrementing by one should be sufficient. > And that would mean that we can just delete the "folio_ref_add()" > in __filemap_add_folio().