From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF489C4167B for ; Sat, 31 Dec 2022 21:46:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9709C8E0007; Sat, 31 Dec 2022 16:46:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 80C5D8E0001; Sat, 31 Dec 2022 16:46:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60F308E0007; Sat, 31 Dec 2022 16:46:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 39FF88E0001 for ; Sat, 31 Dec 2022 16:46:12 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A689E160270 for ; Sat, 31 Dec 2022 21:46:11 +0000 (UTC) X-FDA: 80303934942.27.4CACC0E Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf29.hostedemail.com (Postfix) with ESMTP id 11B44120004 for ; Sat, 31 Dec 2022 21:46:09 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=B9hqsg2S; spf=none (imf29.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672523170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ey9MkIG/Wma0AYrd9YHtW6VqfAs3Lwkb3yAjavmqz0o=; b=1khLHtVuaPeI4ieBEz2QV3u3teYd47jZ2ER4VPdU9Ezbsu0aLzvNBoJZD1GS+TuPrkSr7s 4gHJkQwOgbpHcodqWSntWVx6732041RdYH448FzAyH+VQ6tUo4JazDIyc2Uiqw8l0Zkxfs hlz7akO+xBNN6aGklo7ndOClN4ZyhQY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=B9hqsg2S; spf=none (imf29.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672523170; a=rsa-sha256; cv=none; b=W81F34NmP/VJl4zZojys9hJyID5yl9n3V8vNCdBkfHrM3XH5uPQRdkl/wRlmAo2gm6+D4n cOzkYGn2h8oIRbHQN4Nxflt4wPr5KduAv8wUbUd3gcSnZQYD3z51Z9QxGyF0Cld5UPVu0V kAZJW90R/hOMbTtWSxzU47puzLwJtkA= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=Ey9MkIG/Wma0AYrd9YHtW6VqfAs3Lwkb3yAjavmqz0o=; b=B9hqsg2Sn9buqgZYYVvq/J1E8X xGl5P9Ww9QWoYmoMytOL1a8laAd9CUF04LnzVTYr2XjaahjZzRD4Ggv8MZef2RzgynsPWprlBLmxX oB9dO06Q2xNfhwxoO6PCqBL8as98zSjprSBtIgjv+SePJgA8RC37IpzlxYDTCJtFz2Y5oaOTQpbmr 1yrptZSzhHy39aGnyZW0m8lxWogsg09skHNyTKta/DjQ5MX6cwl6AjvH9rLuXg846j0QC4uHTBvB/ 0MTII12hjmEfjhUHUYNmAuunHzNo2x+64R4K3WhrLcQwLA7ya9hUeapTIUHX+uzFFkFUl9AwaIHWZ prEd11iA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pBjgC-00BkaT-Oq; Sat, 31 Dec 2022 21:46:12 +0000 From: "Matthew Wilcox (Oracle)" To: Andrew Morton Cc: "Matthew Wilcox (Oracle)" , linux-mm@kvack.org, Hugh Dickins Subject: [PATCH 01/22] mm: Remove folio_pincount_ptr() and head_compound_pincount() Date: Sat, 31 Dec 2022 21:45:49 +0000 Message-Id: <20221231214610.2800682-2-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20221231214610.2800682-1-willy@infradead.org> References: <20221231214610.2800682-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 11B44120004 X-Stat-Signature: e11xja1smwgw51nbr3xegweg3j55dxun X-Rspam-User: X-HE-Tag: 1672523169-853726 X-HE-Meta: U2FsdGVkX19m+m5i/QGn1Rphp8nYhflK1GT34C2B4aheBHdiMX5sp/VrJ7+bWOEwjTAFll6PuxrYa5raGRnHrlnmhiDnCmZufGPprTdqWdhlsZlXKyeDgX+QLKdV3AQqZ2sW2j50N+xKxXbSv8punXUoyNnwyMeIFxlqXWidMuWw0ONdLB9P/NRu6hbfql+jLkvLjbPs3SNArS6Ad+ODsUfNwiu3yoEnRnmBy9EtELZTbfT+CPI/GOPrwsKzP+2D/RVwWoXf/LAw5hOeALnEpDk5QY+viHN75lHHejha9tMWMwp1WSd330n7s8ky9fqJ3I0pyclvBpNaS+p/WR6UqbZVzklMhCCvk626EciITHWCgI45Txg8yCVFi1s9Hvz2eeWkPHf93fR/hGsW7RImtdSDJSDPyb5GyOqdr9ZEvfq2a/dcU2IMXMEAD2L7tbWPMxcH2+rvDeFYstc6606nIoXOKhrBTNh/wViDdnExDC7FA6XSbpyAMFDDgxKdGB+mqVx/NOFlv2EnIVJuhfbPUJ4niI93rYWcf3EbRQfnqluI/gN3JBZwvUDvXf1yyKzjFb16OH4iQrykgTany/aiTA1NQ12hjNfaFLX6THyw327tPWktPISq8KT2Lvw8QwE4fjwzF9WP6itq58KRV9GaPnOQVLGVHyuNV3W5QgHGrhZIA8JwJA6jTjBXGy9n3kYg+Xrl3OrhyvJdkY3wWb6UEo9oqCOw9L8xllZI3pQGRFuQ1S2NYSDCexUX8Ak2KPSTvUbZ+UwUII6mONBcy/Me9hBoDlkR6G48yKiRwhgZdkQhAy70U8EYuDhBO5qyWozsyQeoLvFMmQgqQx4CcihsY//WWlS1nIAAqsOPrR8hMRjGtL4rnLgpvW0zw1BcgF2Ujfk3Tl4hK/nyyeBeTfxUudzYm3xpr5ZIWSpEqUiRp2weFxh69uZTNw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We can use folio->_pincount directly, since all users are guarded by tests of compound/large. Signed-off-by: Matthew Wilcox (Oracle) --- Documentation/core-api/pin_user_pages.rst | 29 +++++++++++------------ include/linux/mm.h | 14 ++--------- include/linux/mm_types.h | 5 ---- mm/debug.c | 4 ++-- mm/gup.c | 8 +++---- mm/huge_memory.c | 4 ++-- mm/hugetlb.c | 4 ++-- mm/page_alloc.c | 9 ++++--- 8 files changed, 32 insertions(+), 45 deletions(-) diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index b18416f4500f..674edf62f186 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -55,18 +55,17 @@ flags the caller provides. The caller is required to pass in a non-null struct pages* array, and the function then pins pages by incrementing each by a special value: GUP_PIN_COUNTING_BIAS. -For compound pages, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead, -an exact form of pin counting is achieved, by using the 2nd struct page -in the compound page. A new struct page field, compound_pincount, has -been added in order to support this. - -This approach for compound pages avoids the counting upper limit problems that -are discussed below. Those limitations would have been aggravated severely by -huge pages, because each tail page adds a refcount to the head page. And in -fact, testing revealed that, without a separate compound_pincount field, -page overflows were seen in some huge page stress tests. - -This also means that huge pages and compound pages do not suffer +For large folios, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead, +the extra space available in the struct folio is used to store the +pincount directly. + +This approach for large folios avoids the counting upper limit problems +that are discussed below. Those limitations would have been aggravated +severely by huge pages, because each tail page adds a refcount to the +head page. And in fact, testing revealed that, without a separate pincount +field, refcount overflows were seen in some huge page stress tests. + +This also means that huge pages and large folios do not suffer from the false positives problem that is mentioned below.:: Function @@ -264,9 +263,9 @@ place.) Other diagnostics ================= -dump_page() has been enhanced slightly, to handle these new counting -fields, and to better report on compound pages in general. Specifically, -for compound pages, the exact (compound_pincount) pincount is reported. +dump_page() has been enhanced slightly to handle these new counting +fields, and to better report on large folios in general. Specifically, +for large folios, the exact pincount is reported. References ========== diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..ec801f24ef61 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1006,11 +1006,6 @@ static inline void folio_set_compound_dtor(struct folio *folio, void destroy_large_folio(struct folio *folio); -static inline int head_compound_pincount(struct page *head) -{ - return atomic_read(compound_pincount_ptr(head)); -} - static inline void set_compound_order(struct page *page, unsigned int order) { page[1].compound_order = order; @@ -1637,11 +1632,6 @@ static inline struct folio *pfn_folio(unsigned long pfn) return page_folio(pfn_to_page(pfn)); } -static inline atomic_t *folio_pincount_ptr(struct folio *folio) -{ - return &folio_page(folio, 1)->compound_pincount; -} - /** * folio_maybe_dma_pinned - Report if a folio may be pinned for DMA. * @folio: The folio. @@ -1659,7 +1649,7 @@ static inline atomic_t *folio_pincount_ptr(struct folio *folio) * expected to be able to deal gracefully with a false positive. * * For large folios, the result will be exactly correct. That's because - * we have more tracking data available: the compound_pincount is used + * we have more tracking data available: the _pincount field is used * instead of the GUP_PIN_COUNTING_BIAS scheme. * * For more information, please see Documentation/core-api/pin_user_pages.rst. @@ -1670,7 +1660,7 @@ static inline atomic_t *folio_pincount_ptr(struct folio *folio) static inline bool folio_maybe_dma_pinned(struct folio *folio) { if (folio_test_large(folio)) - return atomic_read(folio_pincount_ptr(folio)) > 0; + return atomic_read(&folio->_pincount) > 0; /* * folio_ref_count() is signed. If that refcount overflows, then diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3b8475007734..5d9bf1f79e96 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -443,11 +443,6 @@ static inline atomic_t *subpages_mapcount_ptr(struct page *page) return &page[1].subpages_mapcount; } -static inline atomic_t *compound_pincount_ptr(struct page *page) -{ - return &page[1].compound_pincount; -} - /* * Used for sizing the vmemmap region on some architectures */ diff --git a/mm/debug.c b/mm/debug.c index 7f8e5f744e42..893c9dbf76ca 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -94,11 +94,11 @@ static void __dump_page(struct page *page) page, page_ref_count(head), mapcount, mapping, page_to_pgoff(page), page_to_pfn(page)); if (compound) { - pr_warn("head:%p order:%u compound_mapcount:%d subpages_mapcount:%d compound_pincount:%d\n", + pr_warn("head:%p order:%u compound_mapcount:%d subpages_mapcount:%d pincount:%d\n", head, compound_order(head), head_compound_mapcount(head), head_subpages_mapcount(head), - head_compound_pincount(head)); + atomic_read(&folio->_pincount)); } #ifdef CONFIG_MEMCG diff --git a/mm/gup.c b/mm/gup.c index f45a3a5be53a..38ba1697dd61 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -111,7 +111,7 @@ static inline struct folio *try_get_folio(struct page *page, int refs) * FOLL_GET: folio's refcount will be incremented by @refs. * * FOLL_PIN on large folios: folio's refcount will be incremented by - * @refs, and its compound_pincount will be incremented by @refs. + * @refs, and its pincount will be incremented by @refs. * * FOLL_PIN on single-page folios: folio's refcount will be incremented by * @refs * GUP_PIN_COUNTING_BIAS. @@ -157,7 +157,7 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) * try_get_folio() is left intact. */ if (folio_test_large(folio)) - atomic_add(refs, folio_pincount_ptr(folio)); + atomic_add(refs, &folio->_pincount); else folio_ref_add(folio, refs * (GUP_PIN_COUNTING_BIAS - 1)); @@ -182,7 +182,7 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) if (flags & FOLL_PIN) { node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs); if (folio_test_large(folio)) - atomic_sub(refs, folio_pincount_ptr(folio)); + atomic_sub(refs, &folio->_pincount); else refs *= GUP_PIN_COUNTING_BIAS; } @@ -232,7 +232,7 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) */ if (folio_test_large(folio)) { folio_ref_add(folio, 1); - atomic_add(1, folio_pincount_ptr(folio)); + atomic_add(1, &folio->_pincount); } else { folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index abe6cfd92ffa..ca2eaec84726 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2479,9 +2479,9 @@ static void __split_huge_page_tail(struct page *head, int tail, * of swap cache pages that store the swp_entry_t in tail pages. * Fix up and warn once if private is unexpectedly set. * - * What of 32-bit systems, on which head[1].compound_pincount overlays + * What of 32-bit systems, on which folio->_pincount overlays * head[1].private? No problem: THP_SWAP is not enabled on 32-bit, and - * compound_pincount must be 0 for folio_ref_freeze() to have succeeded. + * pincount must be 0 for folio_ref_freeze() to have succeeded. */ if (!folio_test_swapcache(page_folio(head))) { VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index db895230ee7e..c01493ceeb8d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1480,7 +1480,7 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, atomic_set(folio_mapcount_ptr(folio), 0); atomic_set(folio_subpages_mapcount_ptr(folio), 0); - atomic_set(folio_pincount_ptr(folio), 0); + atomic_set(&folio->_pincount, 0); for (i = 1; i < nr_pages; i++) { p = folio_page(folio, i); @@ -2002,7 +2002,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, } atomic_set(folio_mapcount_ptr(folio), -1); atomic_set(folio_subpages_mapcount_ptr(folio), 0); - atomic_set(folio_pincount_ptr(folio), 0); + atomic_set(&folio->_pincount, 0); return true; out_error: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0745aedebb37..a04ed7f72b36 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -775,11 +775,13 @@ void free_compound_page(struct page *page) static void prep_compound_head(struct page *page, unsigned int order) { + struct folio *folio = (struct folio *)page; + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); set_compound_order(page, order); atomic_set(compound_mapcount_ptr(page), -1); atomic_set(subpages_mapcount_ptr(page), 0); - atomic_set(compound_pincount_ptr(page), 0); + atomic_set(&folio->_pincount, 0); } static void prep_compound_tail(struct page *head, int tail_idx) @@ -1291,6 +1293,7 @@ static inline bool free_page_is_bad(struct page *page) static int free_tail_pages_check(struct page *head_page, struct page *page) { + struct folio *folio = (struct folio *)head_page; int ret = 1; /* @@ -1314,8 +1317,8 @@ static int free_tail_pages_check(struct page *head_page, struct page *page) bad_page(page, "nonzero subpages_mapcount"); goto out; } - if (unlikely(head_compound_pincount(head_page))) { - bad_page(page, "nonzero compound_pincount"); + if (unlikely(atomic_read(&folio->_pincount))) { + bad_page(page, "nonzero pincount"); goto out; } break; -- 2.35.1