From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A840CC48BC3 for ; Mon, 19 Feb 2024 15:03:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 223936B007B; Mon, 19 Feb 2024 10:03:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D3E86B007D; Mon, 19 Feb 2024 10:03:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C2EB6B007E; Mon, 19 Feb 2024 10:03:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EAFD76B007B for ; Mon, 19 Feb 2024 10:03:26 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1C024140440 for ; Mon, 19 Feb 2024 15:03:26 +0000 (UTC) X-FDA: 81808872012.05.88D010B Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf27.hostedemail.com (Postfix) with ESMTP id 58D5940035 for ; Mon, 19 Feb 2024 15:03:20 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=CZsWrYi1; spf=none (imf27.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708355003; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=m1lpG3iQZ07+acIeYg2UFvkv4q3JIoVwVeVWYSMncIg=; b=BBhGzZnrtmFCifmFpqhvTu+1PN/6oZYm/mafu77T/DcW0R9a0YdfyTBgW6Ao6ubZ4KJvuL DCLZomm6fSwb6PiT7ezbhir+VwbeAvBHQDwYtgwBc6M3FO1MrFdwv7WQkBo8pvaVdkdV4h jXgXuiyFg0olPTRn6WPgxCBo+tmTHUM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708355003; a=rsa-sha256; cv=none; b=eO7PhjqR/EPlDehJqL65hRtPjWotMpdFa0QkcFI8F1gfHQF3rrRypBIBUMssPlRzLxrA1q jt8rOZhuMWLQwsyF+z3FbEck+ynviyA+nYqyJN1vIAkjkCCqbBB6NyYev2yPHEP+9yrrKp g3pi+10Zo5NWtQVFIk4EvakcaWoZhbQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=CZsWrYi1; spf=none (imf27.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=m1lpG3iQZ07+acIeYg2UFvkv4q3JIoVwVeVWYSMncIg=; b=CZsWrYi14RipFUUJun5RX6tuq4 3hM8vIhja2G2zUkYazm1UZPd/DFb26l4iGuqzE8PZf4YoJjEnlaFTGk1vaovEAXircb9lMVhb8DfA NfKE0Tb+i2AM6kX4QbO1tnoTCipWCqod05qlT6V5413rgnKDMFkOVsVkCTP6igcypG640WgDRWrnV M7JM2YIsHmAkg0vT6zGQX6UNt0Io+zf10HKVN4tNNoKuDXm9Ilmn1l7u+waNMVHiSr4RBjzqRiXv8 zYnHyFgZQNL22Wn/78zI+pn+suqW8w+9aoKYxkfO/Dnskgr+Dvntl9E1aKRdQdEsajRcL0TMvKY2v eTaShgCw==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rc5Ar-0000000D5OJ-1TlN; Mon, 19 Feb 2024 15:03:17 +0000 Date: Mon, 19 Feb 2024 15:03:17 +0000 From: Matthew Wilcox To: David Hildenbrand Cc: Andrew Morton , linux-mm@kvack.org Subject: Re: [PATCH v2 01/18] mm: Make folios_put() the basis of release_pages() Message-ID: References: <20240217022546.1496101-1-willy@infradead.org> <20240217022546.1496101-2-willy@infradead.org> <15797535-3107-4724-acec-7c006da490f3@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <15797535-3107-4724-acec-7c006da490f3@redhat.com> X-Rspamd-Queue-Id: 58D5940035 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: gjnx4nd1htubjje949p1zm87c73dbmdj X-HE-Tag: 1708355000-332822 X-HE-Meta: U2FsdGVkX1/DOEo25oCkHNseCrlJ5/6HINt8B5qmb7FzuuppNHrNpD2lniuuHSw6ubVXe5pfZhrxLIZ+CKGIra7YSz/gP7v77KRb64vGgUGBOzLPpzPsC9zQBe9q//sZx+GfzvcLeQYFKjsPSDmnr/8UpQABdjeN4OzklWL+W3zyJp53zEBURom6gsOvgHhP89WHXO0uTZLUZPs/nb2UtGm/w0k6L3TTLkF3vArQYzr0hH0VmWJp87TOwA7OwAtqwqWkRcAYkOyTY/ybsLgbL4Bpk1t3nIs9+Y54VckECO8sy9As/SsdCV8mXVhUP/2LD701WbowadYJA8NfbUQTmSwGCD3x5dLIFhqRwvQOvpQtCj8mRBIO7cFKY0SNKC9Tb+jQ8GqAhJJfXphOEcdb2tfCY06tvuy0+6B7ssypWyhP5tLjCSU1S/hGQtJwhc+gAv19BX4yMxqTza0oulsGyfaTvGi/Q/f57o/dDlDnwccxUWhjDu0MNsT9qCTWc0f4pugBVpHCnJ0EHAwMhkPkHYSIZmMyMtvkYFH/kyTBMf0wtr+fwdXuJoqzpLmj8Jr6Hj6foya8grlFM68Uo24BDuFSjsaX0zWAQH5zN/5qWBTNQRdpKKHAmhe3fJDHWqOum8duLnhdJqZgGHM/FHLWycgrwuP3btEZPOAU2T8v8FQ1iIy3FGVlXqFDMVVlu/gPY5ouH4yIsBWPD/Dhis5WW40/3shwqBWjYb42MfR3cqua0HUhUoCb5O4pN46zwJREUUrBpaqNKagI08gi8aHRP2/T+7g6ArKxWFixeuXT6jl7pt0ZNb/l4oEiEN7AqVooAsxjulyfIjdRQIClsz+cOfwEI/zu4vD2+zNkLjcyHg1mG88aI7j1NzLlvvC0NHlbnZZ1aJyyHHVBmsD+C3pyJlNGi9uYYo6TUhqupa9Y2pI1Zcke3LEN04FDsNiX92jQnvadW7f7yf3L52f8ggA 2synANvX uU7adUk+ykuYDKDpbly9Z7MEB1uVtJ1Ge1yCFyc3sGX5yZrosmY1OMjqVi+1F1VSZagAWhAA6/u7zDGWe0yla1hFm3ATuMyHif2nNoWsUt/ccZRv9DSMALAFg1tp0foJustBapdB/XRQ3T3c3q+60XTFT2cRhJbS4/0KYBq01SKWHTyIJSNKJ4Cz48A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 19, 2024 at 10:43:06AM +0100, David Hildenbrand wrote: > On 17.02.24 03:25, Matthew Wilcox (Oracle) wrote: > > By making release_pages() call folios_put(), we can get rid of the calls > > to compound_head() for the callers that already know they have folios. > > We can also get rid of the lock_batch tracking as we know the size > > of the batch is limited by folio_batch. This does reduce the maximum > > number of pages for which the lruvec lock is held, from SWAP_CLUSTER_MAX > > (32) to PAGEVEC_SIZE (15). I do not expect this to make a significant > > difference, but if it does, we can increase PAGEVEC_SIZE to 31. > > > > I'm afraid that won't apply to current mm-unstable anymore, where we can now > put multiple references to a single folio (as part of unmapping > large PTE-mapped folios). Argh. I'm not a huge fan of that approach, but let's live with it for now. How about this as a replacement patch? It compiles ... diff --git a/include/linux/mm.h b/include/linux/mm.h index 1743bdeab506..42de41e469a1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -36,6 +36,7 @@ struct anon_vma; struct anon_vma_chain; struct user_struct; struct pt_regs; +struct folio_batch; extern int sysctl_page_lock_unfairness; @@ -1519,6 +1520,8 @@ static inline void folio_put_refs(struct folio *folio, int refs) __folio_put(folio); } +void folios_put_refs(struct folio_batch *folios, unsigned int *refs); + /* * union release_pages_arg - an array of pages or folios * @@ -1541,18 +1544,19 @@ void release_pages(release_pages_arg, int nr); /** * folios_put - Decrement the reference count on an array of folios. * @folios: The folios. - * @nr: How many folios there are. * - * Like folio_put(), but for an array of folios. This is more efficient - * than writing the loop yourself as it will optimise the locks which - * need to be taken if the folios are freed. + * Like folio_put(), but for a batch of folios. This is more efficient + * than writing the loop yourself as it will optimise the locks which need + * to be taken if the folios are freed. The folios batch is returned + * empty and ready to be reused for another batch; there is no need to + * reinitialise it. * * Context: May be called in process or interrupt context, but not in NMI * context. May be called while holding a spinlock. */ -static inline void folios_put(struct folio **folios, unsigned int nr) +static inline void folios_put(struct folio_batch *folios) { - release_pages(folios, nr); + folios_put_refs(folios, NULL); } static inline void put_page(struct page *page) diff --git a/mm/mlock.c b/mm/mlock.c index 086546ac5766..1ed2f2ab37cd 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -206,8 +206,7 @@ static void mlock_folio_batch(struct folio_batch *fbatch) if (lruvec) unlock_page_lruvec_irq(lruvec); - folios_put(fbatch->folios, folio_batch_count(fbatch)); - folio_batch_reinit(fbatch); + folios_put(fbatch); } void mlock_drain_local(void) diff --git a/mm/swap.c b/mm/swap.c index e5380d732c0d..6b736fceccfa 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -89,7 +89,7 @@ static void __page_cache_release(struct folio *folio) __folio_clear_lru_flags(folio); unlock_page_lruvec_irqrestore(lruvec, flags); } - /* See comment on folio_test_mlocked in release_pages() */ + /* See comment on folio_test_mlocked in folios_put() */ if (unlikely(folio_test_mlocked(folio))) { long nr_pages = folio_nr_pages(folio); @@ -175,7 +175,7 @@ static void lru_add_fn(struct lruvec *lruvec, struct folio *folio) * while the LRU lock is held. * * (That is not true of __page_cache_release(), and not necessarily - * true of release_pages(): but those only clear the mlocked flag after + * true of folios_put(): but those only clear the mlocked flag after * folio_put_testzero() has excluded any other users of the folio.) */ if (folio_evictable(folio)) { @@ -221,8 +221,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) if (lruvec) unlock_page_lruvec_irqrestore(lruvec, flags); - folios_put(fbatch->folios, folio_batch_count(fbatch)); - folio_batch_reinit(fbatch); + folios_put(fbatch); } static void folio_batch_add_and_move(struct folio_batch *fbatch, @@ -946,47 +945,30 @@ void lru_cache_disable(void) } /** - * release_pages - batched put_page() - * @arg: array of pages to release - * @nr: number of pages + * folios_put_refs - Reduce the reference count on a batch of folios. + * @folios: The folios. + * @refs: The number of refs to subtract from each folio. * - * Decrement the reference count on all the pages in @arg. If it - * fell to zero, remove the page from the LRU and free it. + * Like folio_put(), but for a batch of folios. This is more efficient + * than writing the loop yourself as it will optimise the locks which need + * to be taken if the folios are freed. The folios batch is returned + * empty and ready to be reused for another batch; there is no need + * to reinitialise it. If @refs is NULL, we subtract one from each + * folio refcount. * - * Note that the argument can be an array of pages, encoded pages, - * or folio pointers. We ignore any encoded bits, and turn any of - * them into just a folio that gets free'd. + * Context: May be called in process or interrupt context, but not in NMI + * context. May be called while holding a spinlock. */ -void release_pages(release_pages_arg arg, int nr) +void folios_put_refs(struct folio_batch *folios, unsigned int *refs) { int i; - struct encoded_page **encoded = arg.encoded_pages; LIST_HEAD(pages_to_free); struct lruvec *lruvec = NULL; unsigned long flags = 0; - unsigned int lock_batch; - for (i = 0; i < nr; i++) { - unsigned int nr_refs = 1; - struct folio *folio; - - /* Turn any of the argument types into a folio */ - folio = page_folio(encoded_page_ptr(encoded[i])); - - /* Is our next entry actually "nr_pages" -> "nr_refs" ? */ - if (unlikely(encoded_page_flags(encoded[i]) & - ENCODED_PAGE_BIT_NR_PAGES_NEXT)) - nr_refs = encoded_nr_pages(encoded[++i]); - - /* - * Make sure the IRQ-safe lock-holding time does not get - * excessive with a continuous string of pages from the - * same lruvec. The lock is held only if lruvec != NULL. - */ - if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = NULL; - } + for (i = 0; i < folios->nr; i++) { + struct folio *folio = folios->folios[i]; + unsigned int nr_refs = refs ? refs[i] : 1; if (is_huge_zero_page(&folio->page)) continue; @@ -1016,13 +998,8 @@ void release_pages(release_pages_arg arg, int nr) } if (folio_test_lru(folio)) { - struct lruvec *prev_lruvec = lruvec; - lruvec = folio_lruvec_relock_irqsave(folio, lruvec, &flags); - if (prev_lruvec != lruvec) - lock_batch = 0; - lruvec_del_folio(lruvec, folio); __folio_clear_lru_flags(folio); } @@ -1046,6 +1023,47 @@ void release_pages(release_pages_arg arg, int nr) mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); + folios->nr = 0; +} +EXPORT_SYMBOL(folios_put); + +/** + * release_pages - batched put_page() + * @arg: array of pages to release + * @nr: number of pages + * + * Decrement the reference count on all the pages in @arg. If it + * fell to zero, remove the page from the LRU and free it. + * + * Note that the argument can be an array of pages, encoded pages, + * or folio pointers. We ignore any encoded bits, and turn any of + * them into just a folio that gets free'd. + */ +void release_pages(release_pages_arg arg, int nr) +{ + struct folio_batch fbatch; + int refs[PAGEVEC_SIZE]; + struct encoded_page **encoded = arg.encoded_pages; + int i; + + folio_batch_init(&fbatch); + for (i = 0; i < nr; i++) { + /* Turn any of the argument types into a folio */ + struct folio *folio = page_folio(encoded_page_ptr(encoded[i])); + + /* Is our next entry actually "nr_pages" -> "nr_refs" ? */ + refs[fbatch.nr] = 1; + if (unlikely(encoded_page_flags(encoded[i]) & + ENCODED_PAGE_BIT_NR_PAGES_NEXT)) + refs[fbatch.nr] = encoded_nr_pages(encoded[++i]); + + if (folio_batch_add(&fbatch, folio) > 0) + continue; + folios_put_refs(&fbatch, refs); + } + + if (fbatch.nr) + folios_put_refs(&fbatch, refs); } EXPORT_SYMBOL(release_pages);