From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32864C3DA4A for ; Wed, 14 Aug 2024 03:46:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53C346B0082; Tue, 13 Aug 2024 23:46:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EDA86B0083; Tue, 13 Aug 2024 23:46:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 366706B0085; Tue, 13 Aug 2024 23:46:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 157146B0082 for ; Tue, 13 Aug 2024 23:46:20 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AE7B640C4B for ; Wed, 14 Aug 2024 03:46:19 +0000 (UTC) X-FDA: 82449463278.08.B39B200 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf19.hostedemail.com (Postfix) with ESMTP id D4EB21A0011 for ; Wed, 14 Aug 2024 03:46:17 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=JioE2tAu; spf=none (imf19.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723607143; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MPPyet4y1DH9I4dogw7rAh1eAFPCWzprAY6st32iZSw=; b=rXxfxXcD7Zp6RpuC6Uw//cTVou6s/YnWlPJ4zehsW6Nsu/uMKCwcRzHyOX+FRZdB6U3xiA aYAAMKwGLGyuNWSyPY6BhdkmSJzn7HG9atO4RtuJRSIpihOicdFdGdmlDVjv84nKEyj3gY q95/Kg6PPz/yLkULO/tqSgYnZwgey8A= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=JioE2tAu; spf=none (imf19.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723607143; a=rsa-sha256; cv=none; b=bIbNJ537A8JQZagkUx7pzZ6xlVdeeG4geNy+mUI1VJe2rukz1J2Kk11P7swJ7EdxbZ37Sg iS2RevixhUCVq1QowArfWL3EGFCHPrrF+vcj40kF7XP16FiANC5/L338x9k+ojv4D5S7BP BQpSxJOXFPF27e56DooU+STTBSKWdLY= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=MPPyet4y1DH9I4dogw7rAh1eAFPCWzprAY6st32iZSw=; b=JioE2tAu5QxkksxlWqIwJUpKgU Dniw3H0UKpJJzusJVeQzW9g11LAW2u1rf1tEE9L0DgKNt4OW0wA4mrtFTXLZUjoyeSU/sTzbY3uIB FuMAd77ktbJM8rN2Lv9fYLQ+d/JzZ9p3lHO3S8YiGpqROkgwRjhxWJVlIDsmJ2EjPmWn9nAi+G46c bd7OJ14721hYQGyLSFMKpnKgjEQYT1+eWe1uy/LINgafDbyn3evDP4rqhJUBw+sVeQrAkM4hnbAXL mkvbW5xfv8Z/SfDlW8VkyX3tLwnUaBnHoY4nQeDjhuPXyClb0M6Hhm8jJsDVA6rLkc57BS0+cQv68 V3hLk1/g==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1se4xh-0000000HTFq-37pJ; Wed, 14 Aug 2024 03:46:13 +0000 Date: Wed, 14 Aug 2024 04:46:13 +0100 From: Matthew Wilcox To: Boris Burkov Cc: Shakeel Butt , linux-mm@kvack.org, linmiaohe@huawei.com Subject: Re: [PATCH RFC] mm: fix refcount check in mapping_evict_folio Message-ID: References: <5qxfn7y43mf6vkkk6adfgenilahe5uykscz7muxq7tjmjijxg3@dpshd4mrcxgn> <20240814032715.GA400993@zen.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240814032715.GA400993@zen.localdomain> X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: D4EB21A0011 X-Stat-Signature: gqjmzs5y9ejy6k8kxoqtxacnshp5dfhr X-HE-Tag: 1723607177-663738 X-HE-Meta: U2FsdGVkX1+Nq73pPCEdyZMNihBFSbp+4Ddu8XXNr9OaHcVNYK6PlH09G/fZtJRZPHkxPzIlC3gfWv7EvXBljCaiJYg7UQktJQ0A1GqTiGxK30dvH/+ePGO/9EfHJKOqnRmVs+AB53TSv7M+E930WLXlvF0WfX6d4NzQkmb1z5L5hpWLGt77Ge6KFlF/zVwcOepCeDuaWYadSmVR/O12J2SNK0X7lVBBMK9oDqbaEQvgoQD5kzqQU8FdVlETuJkc2wv0YvTzNFXyQ4awZtHJPB3XZY7ssl/Bhm5XDQypwEbb1JUoG/gsGKAanmQ6nsBcLs466LoUOGwytd6JRaiwyEk+2tu8LwfvsBdalRQLs9dSSja5TyHZEqAE0hAugY5G0kXoB+DzNQE3Y7L4vf3kL6laFE9opb6o5Zo74y9T7k5Lo6fEPtyp2xKJk3XKuDNAnoounYIpU3lkSTrzXiNnqKIv+n3EW7JdlbICPMqh5K++qXQmoWdxibNMXjvd/CNwdocGkRISx+Wo7jzJW2r+SmXSrs8He3UzJ5249GGwOVobZRAyojREZute3KS6IH9Nzckhc7wH5dXxV/wxQPAzgf7nsi6phvd3FLX90H44KN040bzyqUwNQ+ucuo257+X2IKNPdZ4qD6AfeziKbReG2wEVV7eVnEBRo4d4gsSUr9rq720J8lpCCvaOGZinkKDUt5x5E03pKAW9wYGuw9F455uTVWOOcLLGkpUc6sZnRTYOd86H81jwA/1NEFKQa2N1RgGARWe3FgBUKIqrvcaXUYbCl/0LlrbRn0uPAN9mEoByqth9eNOpdoxeysdA2chVgErpghwodfaZAbc30geGXWwTGyhEf3qFVWAKlEcYfEzdYWPS+Ugyak4UG1oKVcTDzqf7s0C0ko9bI1gOOI3IgsWTy5fE0N9f1xeNlyPi45Aav31UHe5dknMCobpkifk/I/cLF2U3Ls9mlDp+1mN ETkUqTic oSHWhTB47C/hV8yLQi4GPrVR/r8dpVjJEPzjKCt/RdXe6CzAbhKnjH4i9LpFSkbbqWMDXv6eXr9TYaQWUsVVJ4aVpuWZT1LRL2Q2TagnJzPRhES9S4NL3kBlq8RJ1sxW6eq+4WLYinx788bK8W0aeyfne5ufE0TRRCNTctWOPls3QGtODIxPhetWJpK6uwNHLM+k8e9zDKMl6nBBpQSGbrS6mNPmbbF6ayb0y0PHUfTpU0Tx4UY1iGoSapjmKhmCv8gC7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 13, 2024 at 08:27:15PM -0700, Boris Burkov wrote: > On Wed, Aug 14, 2024 at 04:15:25AM +0100, Matthew Wilcox wrote: > > On Tue, Aug 13, 2024 at 12:58:09PM -0700, Shakeel Butt wrote: > > > > + /* > > > > + * The refcount will be elevated if any page in the folio is mapped. > > > > + * > > > > + * The refcounts break down as follows: > > > > + * 1 per mapped page > > > > + * 1 from folio_attach_private, if private is set > > > > + * 1 from allocating the page in the first place > > > > + * 1 from the caller > > > > + */ > > > > > > I think the above explanation is correct at least from my code > > > inspection. Most of the callers are related to memory failure. I would > > > reword the "1 per mapped page" to "1 per page in page cache" or > > > something as mapped here might mean mapped in page tables. > > > > It's not though. The "1 from allocating the page in the first place" > > is donated to the page cache. It's late here and I don't have the > > ability to work through what's really going on here. > > Can you explain what you mean by "donated to the page cache" more > precisely? > > Perhaps there is something better btrfs can do with its refcounting > as it calls alloc_pages_bulk_array, then filemap_add_folio, and finally > folio_attach_private. But I am not sure which of those refcounts we can > (or should?) drop. Look at how readahead works for normal files; ignore what btrfs is doing because it's probably wrong. I'm going to use the term "expected refcount" because there may also be temporary speculative refcounts from stale references (either GUP or pagecache). folio = filemap_alloc_folio(gfp_mask, 0); (expected refcount 1) ret = filemap_add_folio(mapping, folio, index + i, gfp_mask); (expected refcount 1 + nr_pages) read_pages(ractl); aops->readahead(rac); ... calls readahead_folio() which calls folio_put() (expected refcount nr_pages) if filesystem calls folio_attach_private(), add one to the expected refcount. That's it. Folios in the pagecache should have a refcount of nr_pages + 1 if private data exists. Every caller who has called filemap_get_folio() has an extra refcount. Every user mapping of a page adds one to the refcount (and to the mapcount). If btrfs superblocks have an extra refcount, they're wrong and should have it put somewhere. At some point, I intend to reduce the number of atomic operations we do by having filemap_add_folio() increment by one fewer than it currently does, and removing the folio_put() in readahead_folio(). I haven't been brave enough to do that yet. I also think we should not increment the refcount by nr_pages when we add it to the page cache. Incrementing by one should be sufficient. And that would mean that we can just delete the "folio_ref_add()" in __filemap_add_folio().