From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB799D41C00 for ; Wed, 13 Nov 2024 04:57:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 356CD6B0093; Tue, 12 Nov 2024 23:57:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 305EA6B009A; Tue, 12 Nov 2024 23:57:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CDEF6B00BB; Tue, 12 Nov 2024 23:57:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 00E276B0093 for ; Tue, 12 Nov 2024 23:57:35 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 93117A07CA for ; Wed, 13 Nov 2024 04:57:35 +0000 (UTC) X-FDA: 82779861906.27.1821E9F Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf09.hostedemail.com (Postfix) with ESMTP id 53F6F140006 for ; Wed, 13 Nov 2024 04:57:04 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ZT7wqmL2; spf=none (imf09.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731473799; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j1HJo1ufOOxZgsH2EGvRK6a4aBfxJMGSr5VDtSY9Rgs=; b=B4Zlg9QTu/iu8EFwC3i92MoqMz7UHor1Xw6mRSfZ4rCGaf3OS9OMLjc7MCe6WIEobE85Ao viTw1dcXd2xQGTMBgq0xKXj8HYPgAjIrW1/T7A2f2XzMPZmMEDZnEFdxiD7qKkJn44+4CT U8uHBHVenkdjIVYzCzBsqWBUsPj3HhA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731473799; a=rsa-sha256; cv=none; b=1Nt/OD3ClZtjaTV2XnyUdMW2PTQmSdh/6pKY3a1J/uXSnEMbPElE21wGro1I8a1jShPj7d u7AuzEbDz6guHOORv9Eug9Zu7RTh2mjI3Q29sZbQfFcXRby1FF+YbiWn5IFnfMQbdTj17j 8/IC4rnksV8h/aiLSPrlxuiDS3bniTI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ZT7wqmL2; spf=none (imf09.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=j1HJo1ufOOxZgsH2EGvRK6a4aBfxJMGSr5VDtSY9Rgs=; b=ZT7wqmL2VOXcAirGj0kjVCjCvh 4z2BhaS4iVOQ43YLyNmGY43VklL1/Z2O7Ma1NK4RzLuUHfFiUzS2WjkOYB5oKPoFJda3Bx7rTagGw jOZn1nnDPQGPUnhJMJHFu4kXX+w8FgWC1U4gfDubH4vy8aK6Mf6LRX/CMPitatQjucVjfC+qTKfkV JrRVjk2/BoA8zXvfKHLDICQwWNltHNpqfMqSmAxHbBBbPZSd6hJRwc1k5YwiM7C5ZPiZMducbEvnl VIh1yDHuoLwWUumcMyn/Y5kUfjO/K1qrgEniRYXqOBLcIrJyY+T6jb+nbxTL70sAmPUGrDVo22mWO vPHnMYtA==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tB5RY-0000000Fj5A-3sOC; Wed, 13 Nov 2024 04:57:29 +0000 Date: Wed, 13 Nov 2024 04:57:28 +0000 From: Matthew Wilcox To: David Hildenbrand Cc: Jason Gunthorpe , Fuad Tabba , linux-mm@kvack.org, kvm@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org, muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com, pbonzini@redhat.com, seanjc@google.com, jhubbard@nvidia.com, ackerleytng@google.com, vannapurve@google.com, mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk Subject: Re: [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops Message-ID: References: <20241108162040.159038-1-tabba@google.com> <20241108170501.GI539304@nvidia.com> <9dc212ac-c4c3-40f2-9feb-a8bcf71a1246@redhat.com> <20241112135348.GA28228@nvidia.com> <430b6a38-facf-4127-b1ef-5cfe7c495d63@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <430b6a38-facf-4127-b1ef-5cfe7c495d63@redhat.com> X-Stat-Signature: w7xeghzosy6okxacjaa6xdqsujf34osk X-Rspam-User: X-Rspamd-Queue-Id: 53F6F140006 X-Rspamd-Server: rspam02 X-HE-Tag: 1731473824-460654 X-HE-Meta: U2FsdGVkX1/rEFnMWPrrK8jkFGP6qchSi2fiN1HU+qEL7hDHM3PpJgnlLC08uNG459SCkGhfYMVZuI81dvtHkl0dsfpUP9PLjNjSUxDesn8Bylj9t0rQQCwhRFZey8ovGlP7EUQFZjEIOJ1EHjP/FkqN2JvbpJVXW3v/8bL9T/McQUbf7n3j/clrdfXQGpjw/CSrl6fa+tXHUkAT2sucIiv0RELbBBq2qmOUAwWQ0x9b2jERDgDqdLdhDnbcGPz41yJI8NS8BBqfyVe6hhzOFFcO4uhYVtvjdYtqNVvD1S5whH8QxxxbizcyqNVddILtKpizNPRx2RFGsmhOFXLG87/6At4hGpiQGb/Wg8muDjPab0DS7UxP/pqtOgZ45fNX80mOZzDwVGkA7yiQTo8gc4EVf0HVqnIIECMIR45UM7Q8flIVf4xPwBNwcYaWPZwBlmGvvicwe6wL6YJ2ZJVj1xSeK920BbLmlFEbNup8zm+INATTAlDNBAK6fLDH1JB9YC6Y5YCPXf3Wb5QcnqRtg5ZQTrCY7RoojY9XLrKjyzfm6qY8BeTCtgz/NFfPqYzjttfIwhRy6F3O1Mq1Z7l5fQGqf7PFkYE1PKLXtEeHYNjN+mfq3uSkPyETKbdFApNQXw50jyt87lDiWi4VMDlWnQZ4yuxKhTi3+MBmrU+ECv/67jruvOHisVvty+i8+ph+9+l1CpvTbmBK+A9KIJttcr6cZHGu3YguyM4NkklMQWD4UGNvbsEgn08pwXvW1EhUe+8avn8U5qclgRBnprU/oS8a+A1WVnJx952+KOLZSu9m2/eYoM7n1ipqnJp2GaNUjLfd9FOi7JXVGJm8gIgSYWaX2X7qqwCZGnee5t/OEMBNv+NpicKBJYqkowoywCf0Dm58GHiYOy5Kno2lfbiQh5u1+9WG9ZMFdlA4qGoY/sn1ncIBDr+O57PAdMo3CuCApflkq7Q2v86lWVsR1xz FCil+eyE KNHjCWZe4BdBRbXnnZqgQMuTyIT1xnkFcWwvWT+ZyoLClIyjPrVT+l3TuPvNejcBnRxnuadkR6DRaMa5+6dWj68uW5+ULUXZLgkjSxuel/Nd4fELAn/GURbyaQeNiZfTK1A9garOi6mryWqKqE40m5mOJggWYCf113ngvExLUeX6oYJwW+kRrDG+EZyEyxABocriYEoio9LWhJhcgV/y8fGDY+dqPI7lxXtwaWLzqhkvhwDpXWlKUNb7/oeSXevoMJCLafeQx0S0P3lNgxyZtha/6hbpWPdnKMG640gXUS6LNa5Zf0Sp6kwByiuK1opbWqedks32L31nvftg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 03:22:46PM +0100, David Hildenbrand wrote: > On 12.11.24 14:53, Jason Gunthorpe wrote: > > On Tue, Nov 12, 2024 at 10:10:06AM +0100, David Hildenbrand wrote: > > > On 12.11.24 06:26, Matthew Wilcox wrote: > > > > I don't want you to respin. I think this is a bad idea. > > > > > > I'm hoping you'll find some more time to explain what exactly you don't > > > like, because this series only refactors what we already have. > > > > > > I enjoy seeing the special casing (especially hugetlb) gone from mm/swap.c. I don't. The list of 'if's is better than the indirect function call. That's terribly expensive, and the way we reuse the lru.next field is fragile. Not to mention that it introduces a new thing for the hardening people to fret over. > > And, IMHO, seems like overkill. We have only a handful of cases - > > maybe we shouldn't be trying to get to full generality but just handle > > a couple of cases directly? I don't really think it is such a bad > > thing to have an if ladder on the free path if we have only a couple > > things. Certainly it looks good instead of doing overlaying tricks. > > I'd really like to abstract hugetlb handling if possible. The way it stands > it's just very odd. There might be ways to make that better. I haven't really been looking too hard at making that special handling go away. > We'll need some reliable way to identify these folios that need care. > guest_memfd will be using folio->mapcount for now, so for now we couldn't > set a page type like hugetlb does. If hugetlb can set lru.next at a certain point, then guestmemfd could set a page type at a similar point, no? > > Also how does this translate to Matthew's memdesc world? In a memdesc world, pages no longer have a refcount. We might still have put_page() which will now be a very complicated (and out-of-line) function that looks up what kind of memdesc it is and operates on the memdesc's refcount ... if it has one. I don't know if it'll be exported to modules; I can see uses in the mm code, but I'm not sure if modules will have a need. Each memdesc type will have its own function to call to free the memdesc. So we'll still have folio_put(). But slab does not have, need nor want a refcount, so it'll just slab_free(). I expect us to keep around a list of recently-freed memdescs of a particular type with their pages still attached so that we can allocate them again quickly (or reclaim them under memory pressure). Once that freelist overflows, we'll free a batch of them to the buddy allocator (for the pages) and the slab allocator (for the memdesc itself). > guest_memfd and hugetlb would be operating on folios (at least for now), > which contain the refcount,lru,private, ... so nothing special there. > > Once we actually decoupled "struct folio" from "struct page", we *might* > have to play less tricks, because we could just have a callback pointer > there. But well, maybe we also want to save some space in there. > > Do we want dedicated memdescs for hugetlb/guest_memfd that extend folios in > the future? I don't know, maybe. I've certainly considered going so far as a per-fs folio. So we'd have an ext4_folio, an btrfs_folio, an iomap_folio, etc. That'd let us get rid of folio->private, but I'm not sure that C's type system can really handle this nicely. Maybe in a Rust world ;-) What I'm thinking about is that I'd really like to be able to declare that all the functions in ext4_aops only accept pointers to ext4_folio, so ext4_dirty_folio() can't be called with pointers to _any_ folio, but specifically folios which were previously allocated for ext4. I don't know if Rust lets you do something like that. > I'm currently wondering if we can use folio->private for the time being. > Either > > (a) If folio->private is still set once the refcount drops to 0, it > indicates that there is a freeing callback/owner_ops. We'll have to make > hugetlb not use folio->private and convert others to clear folio->private > before freeing. > > (b) Use bitX of folio->private to indicate that this has "owner_ops" > meaning. We'll have to make hugetlb not use folio->private and make others > not use bitX. Might be harder and overkill, because right now we only really > need the callback when refcount==0. > > (c) Use some other indication that folio->private contains folio_ops. I really don't want to use folio_ops / folio_owner_ops. I read https://lore.kernel.org/all/CAGtprH_JP2w-4rq02h_Ugvq5KuHX7TUvegOS7xUs_iy5hriE7g@mail.gmail.com/ and I still don't understand what you're trying to do. Would it work to use aops->free_folio() to notify you when the folio is being removed from the address space?