From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32F57CD11C2 for ; Sun, 7 Apr 2024 10:19:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 539CC6B0082; Sun, 7 Apr 2024 06:19:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E92E6B0083; Sun, 7 Apr 2024 06:19:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D8196B0085; Sun, 7 Apr 2024 06:19:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1FE0F6B0082 for ; Sun, 7 Apr 2024 06:19:39 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7F5A71403EB for ; Sun, 7 Apr 2024 10:19:38 +0000 (UTC) X-FDA: 81982339236.23.4003F32 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) by imf11.hostedemail.com (Postfix) with ESMTP id 8555D40005 for ; Sun, 7 Apr 2024 10:19:34 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=wUJA6P2x; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf11.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712485176; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OzPx/B+ze8m0dLabdMr0UmZCaMr0YAFjdV5UZixpvRI=; b=2CRQ8zJ+byw/kKcFJDWso5ihzVTT26V9kj3BUnS3Ogupu6OOyyk6nnCVn4Y+ZDQQLjtF0q DwYBEyGn0XOAgbUUpDxuGCBY49jMHsYFCMN/58En/cwc7kQWdkcO1yzwr9l2Z3sxaHYQOE I/vrPPAorNhtfSQVKLKb8WmFuC/myAo= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=wUJA6P2x; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf11.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712485176; a=rsa-sha256; cv=none; b=wVjXEacfFi1uo57ZKe+AEz8xChnmtHOQsbh9bu7LrgBBrsk3l+DqKL/WWtOWhIGDkc1GQ9 oAywDKkDrO4zR6YXQ2HIhQlz/v5Hb8/JhA7jV45pdzTVji8r5b4wNzP+PYXToaDv1ohwb0 FHZvhxqrXxIPYp7L+rE+DZn2TJfrBCA= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1712485170; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=OzPx/B+ze8m0dLabdMr0UmZCaMr0YAFjdV5UZixpvRI=; b=wUJA6P2xwces5IWHt4GhUZYiZBm6nZcPt/7g1Vjcoo0Oygky03K0TnoKEgDirvM9UYIi3PgNfdrNouYnzjLX7p/zqnMB1NKz/xzXqqzaPQTWAIXOG1E71Dd6KbptIm3oWgez4wlP0UOqR5JFEnsK5PUCkUuSb5sCccgVjRtr/JA= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0W407tQp_1712485168; Received: from 30.97.56.66(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W407tQp_1712485168) by smtp.aliyun-inc.com; Sun, 07 Apr 2024 18:19:29 +0800 Message-ID: <7b3b7f2e-7109-4e72-b1cf-259cb56f3629@linux.alibaba.com> Date: Sun, 7 Apr 2024 18:19:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 10/10] mm: page_alloc: consolidate free page accounting To: Johannes Weiner , Andrew Morton Cc: Vlastimil Babka , Mel Gorman , Zi Yan , "Huang, Ying" , David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240320180429.678181-1-hannes@cmpxchg.org> <20240320180429.678181-11-hannes@cmpxchg.org> From: Baolin Wang In-Reply-To: <20240320180429.678181-11-hannes@cmpxchg.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 8555D40005 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: hzqiijm9smpgbh8rb9migbj9erghhw5d X-HE-Tag: 1712485174-271514 X-HE-Meta: U2FsdGVkX19CDMvfMZIm0tfxwiAWZ7ECoBep/CGBezO/+8+oWBwiTLvo2p3U4xqlxCmJXyYr0Oq3i+V5DdPdWfsJ4EqZbFdPZUX6tsmhfb+qxJ2XMme/QUzBTYNLQd0LCf+nX/AOUhQ8f4tRMOvjBIiMWhwFcr6f3DSb9j8/CatwfTgzbNcaTobnirsZ4rtrRh8lUA5wijNzd2RLTaIuOYmmkZvZ2Xam5/5KehnNOpRXzBl6x953/vC58PDI9k5WqYTZhsRqmldQLl/NXJglvJz2AgbXtE+kBc9aH8CqHIMlQhQSEP3XHde+3gpotQFhHwe67EuXhXhTKx0EqDwSGvHVSx4WTcR7NEtsdPUVOvzU2dyu3j7rFeJzMTxQbq+IinVNBw+myi7J79qmFt9qGRdR5HKZLEJHLyv5pPvl/Ik22HXIlpKbY2TB2wURwvGTUncXUxuyozoSGmU/LtjFlGh8AhshGxGkA6/eFeT/JUfSLiHnRZs1ilIW12x+hEZuR3WknK8wYFCmTehOR9EnCEAo8V2+0CrboOGUWvv6yp12VroIxIM31nnYCmBI9bxSSwHv6vknJNgQ/dlV+rh3/q3HI3CYQfYpGUGA6ChysVIFA7sHkLw4RV7UhAYVLV6/7Rcc7QkVd9chh70rAxl/LQm39OEc38KSQjUn7Wp3KMJY2VIryGjYcBdSHmE3d+/e3NsZtqFCG0VMZ3gfNgLZY7yC28AGuz/q/iZ4fabsWNvbpa6wqHxhXW3EW/P2/FEMtpZvEqqJP76jDfx578yNt+TPvR77mrKWWz+FjCWmyUDM1pA3KfelosOZBf0osNvItWwM9XHk4Vi5c1NX5vgX+5ZPO3DzNfo0so4dzyfqoUCEGqWZua4yd8CEuNvTfkxD+o0O3CLmcd2uCBt0wO4PywI3Svb3RXWlTICgR6yP6PVYcls/IHyWj+zTTtalvDrrP+EQC1gO1zhXV1eLXFK dCvmbH7g BodbX7yMgSKk0tCpLgbibL4IkIQfZMt4jlTr1OReaZ5uOXjjLh6N2cx1k66TLD6u+xfbSMWI0vS8HNSLPU7Zf1McPApWtACJ7bd43HDrFJRuO5d1q5QVvZ+JCpYao96X3HNtqfDxV3IsbGrsFNU4r6mH204YhweM7oaNudIGZTKFecnU8ojhQrYqYw37kj/Yjd+HVcv04I0rANCwyWXZtCiVNcEM3IHE+buR1VOiYGL9ZyU9rflh9qCw8yzPuJQ5a7wlW5j5UpaJfEr5TpVd1GPrGwQgLnuDYUNwBHZcIiZfQ6p0YVL9VSG8tCe0wdkWnC+W84pjAv/sPoWOJR5r6BH1j90o2GUm1RLEOwpxra2XBIC8I/+/wsKCCrv/3FDWtYnOLBJvecK4DIm/RdurEgFYVPg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/3/21 02:02, Johannes Weiner wrote: > Free page accounting currently happens a bit too high up the call > stack, where it has to deal with guard pages, compaction capturing, > block stealing and even page isolation. This is subtle and fragile, > and makes it difficult to hack on the code. > > Now that type violations on the freelists have been fixed, push the > accounting down to where pages enter and leave the freelist. > > Signed-off-by: Johannes Weiner > --- > include/linux/mm.h | 18 ++-- > include/linux/vmstat.h | 8 -- > mm/debug_page_alloc.c | 12 +-- > mm/internal.h | 5 -- > mm/page_alloc.c | 194 +++++++++++++++++++++++------------------ > mm/page_isolation.c | 3 +- > 6 files changed, 120 insertions(+), 120 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 8147b1302413..bd2e94391c7e 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3781,24 +3781,22 @@ static inline bool page_is_guard(struct page *page) > return PageGuard(page); > } > > -bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order, > - int migratetype); > +bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order); > static inline bool set_page_guard(struct zone *zone, struct page *page, > - unsigned int order, int migratetype) > + unsigned int order) > { > if (!debug_guardpage_enabled()) > return false; > - return __set_page_guard(zone, page, order, migratetype); > + return __set_page_guard(zone, page, order); > } > > -void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order, > - int migratetype); > +void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order); > static inline void clear_page_guard(struct zone *zone, struct page *page, > - unsigned int order, int migratetype) > + unsigned int order) > { > if (!debug_guardpage_enabled()) > return; > - __clear_page_guard(zone, page, order, migratetype); > + __clear_page_guard(zone, page, order); > } > > #else /* CONFIG_DEBUG_PAGEALLOC */ > @@ -3808,9 +3806,9 @@ static inline unsigned int debug_guardpage_minorder(void) { return 0; } > static inline bool debug_guardpage_enabled(void) { return false; } > static inline bool page_is_guard(struct page *page) { return false; } > static inline bool set_page_guard(struct zone *zone, struct page *page, > - unsigned int order, int migratetype) { return false; } > + unsigned int order) { return false; } > static inline void clear_page_guard(struct zone *zone, struct page *page, > - unsigned int order, int migratetype) {} > + unsigned int order) {} > #endif /* CONFIG_DEBUG_PAGEALLOC */ > > #ifdef __HAVE_ARCH_GATE_AREA > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > index 343906a98d6e..735eae6e272c 100644 > --- a/include/linux/vmstat.h > +++ b/include/linux/vmstat.h > @@ -487,14 +487,6 @@ static inline void node_stat_sub_folio(struct folio *folio, > mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio)); > } > > -static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages, > - int migratetype) > -{ > - __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); > - if (is_migrate_cma(migratetype)) > - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); > -} > - > extern const char * const vmstat_text[]; > > static inline const char *zone_stat_name(enum zone_stat_item item) > diff --git a/mm/debug_page_alloc.c b/mm/debug_page_alloc.c > index 6755f0c9d4a3..d46acf989dde 100644 > --- a/mm/debug_page_alloc.c > +++ b/mm/debug_page_alloc.c > @@ -32,8 +32,7 @@ static int __init debug_guardpage_minorder_setup(char *buf) > } > early_param("debug_guardpage_minorder", debug_guardpage_minorder_setup); > > -bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order, > - int migratetype) > +bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order) > { > if (order >= debug_guardpage_minorder()) > return false; > @@ -41,19 +40,12 @@ bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order, > __SetPageGuard(page); > INIT_LIST_HEAD(&page->buddy_list); > set_page_private(page, order); > - /* Guard pages are not available for any usage */ > - if (!is_migrate_isolate(migratetype)) > - __mod_zone_freepage_state(zone, -(1 << order), migratetype); > > return true; > } > > -void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order, > - int migratetype) > +void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order) > { > __ClearPageGuard(page); > - > set_page_private(page, 0); > - if (!is_migrate_isolate(migratetype)) > - __mod_zone_freepage_state(zone, (1 << order), migratetype); > } > diff --git a/mm/internal.h b/mm/internal.h > index d6e6c7d9f04e..0a4007b03d0d 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -1036,11 +1036,6 @@ static inline bool is_migrate_highatomic(enum migratetype migratetype) > return migratetype == MIGRATE_HIGHATOMIC; > } > > -static inline bool is_migrate_highatomic_page(struct page *page) > -{ > - return get_pageblock_migratetype(page) == MIGRATE_HIGHATOMIC; > -} > - > void setup_zone_pageset(struct zone *zone); > > struct migration_target_control { > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index efb2581ac142..c46491f83ac2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -642,42 +642,72 @@ compaction_capture(struct capture_control *capc, struct page *page, > } > #endif /* CONFIG_COMPACTION */ > > -/* Used for pages not on another list */ > -static inline void add_to_free_list(struct page *page, struct zone *zone, > - unsigned int order, int migratetype) > +static inline void account_freepages(struct page *page, struct zone *zone, > + int nr_pages, int migratetype) > { > - struct free_area *area = &zone->free_area[order]; > + if (is_migrate_isolate(migratetype)) > + return; > > - list_add(&page->buddy_list, &area->free_list[migratetype]); > - area->nr_free++; > + __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); > + > + if (is_migrate_cma(migratetype)) > + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); > } > > /* Used for pages not on another list */ > -static inline void add_to_free_list_tail(struct page *page, struct zone *zone, > - unsigned int order, int migratetype) > +static inline void __add_to_free_list(struct page *page, struct zone *zone, > + unsigned int order, int migratetype, > + bool tail) > { > struct free_area *area = &zone->free_area[order]; > > - list_add_tail(&page->buddy_list, &area->free_list[migratetype]); > + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, > + "page type is %lu, passed migratetype is %d (nr=%d)\n", > + get_pageblock_migratetype(page), migratetype, 1 << order); > + > + if (tail) > + list_add_tail(&page->buddy_list, &area->free_list[migratetype]); > + else > + list_add(&page->buddy_list, &area->free_list[migratetype]); > area->nr_free++; > } > > +static inline void add_to_free_list(struct page *page, struct zone *zone, > + unsigned int order, int migratetype, > + bool tail) > +{ > + __add_to_free_list(page, zone, order, migratetype, tail); > + account_freepages(page, zone, 1 << order, migratetype); > +} > + > /* > * Used for pages which are on another list. Move the pages to the tail > * of the list - so the moved pages won't immediately be considered for > * allocation again (e.g., optimization for memory onlining). > */ > static inline void move_to_free_list(struct page *page, struct zone *zone, > - unsigned int order, int migratetype) > + unsigned int order, int old_mt, int new_mt) > { > struct free_area *area = &zone->free_area[order]; > > - list_move_tail(&page->buddy_list, &area->free_list[migratetype]); > + /* Free page moving can fail, so it happens before the type update */ > + VM_WARN_ONCE(get_pageblock_migratetype(page) != old_mt, > + "page type is %lu, passed migratetype is %d (nr=%d)\n", > + get_pageblock_migratetype(page), old_mt, 1 << order); > + > + list_move_tail(&page->buddy_list, &area->free_list[new_mt]); > + > + account_freepages(page, zone, -(1 << order), old_mt); > + account_freepages(page, zone, 1 << order, new_mt); > } > > -static inline void del_page_from_free_list(struct page *page, struct zone *zone, > - unsigned int order) > +static inline void __del_page_from_free_list(struct page *page, struct zone *zone, > + unsigned int order, int migratetype) > { > + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, > + "page type is %lu, passed migratetype is %d (nr=%d)\n", > + get_pageblock_migratetype(page), migratetype, 1 << order); > + > /* clear reported state and update reported page count */ > if (page_reported(page)) > __ClearPageReported(page); > @@ -688,6 +718,13 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, > zone->free_area[order].nr_free--; > } > > +static inline void del_page_from_free_list(struct page *page, struct zone *zone, > + unsigned int order, int migratetype) > +{ > + __del_page_from_free_list(page, zone, order, migratetype); > + account_freepages(page, zone, -(1 << order), migratetype); > +} > + > static inline struct page *get_page_from_free_area(struct free_area *area, > int migratetype) > { > @@ -759,18 +796,16 @@ static inline void __free_one_page(struct page *page, > VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); > > VM_BUG_ON(migratetype == -1); > - if (likely(!is_migrate_isolate(migratetype))) > - __mod_zone_freepage_state(zone, 1 << order, migratetype); > - > VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); > VM_BUG_ON_PAGE(bad_range(zone, page), page); > > + account_freepages(page, zone, 1 << order, migratetype); > + > while (order < MAX_PAGE_ORDER) { > - if (compaction_capture(capc, page, order, migratetype)) { > - __mod_zone_freepage_state(zone, -(1 << order), > - migratetype); > + int buddy_mt = migratetype; > + > + if (compaction_capture(capc, page, order, migratetype)) > return; > - } IIUC, if the released page is captured by compaction, then the statistics for free pages should be correspondingly decreased, otherwise, there will be a slight regression for my thpcompact benchmark. thpcompact Percentage Faults Huge k6.9-rc2-base base + patch10 + 2 fixes Percentage huge-1 78.18 ( 0.00%) 71.92 ( -8.01%) Percentage huge-3 86.70 ( 0.00%) 86.07 ( -0.73%) Percentage huge-5 90.26 ( 0.00%) 78.02 ( -13.57%) Percentage huge-7 92.34 ( 0.00%) 78.67 ( -14.81%) Percentage huge-12 91.18 ( 0.00%) 81.04 ( -11.12%) Percentage huge-18 89.00 ( 0.00%) 79.57 ( -10.60%) Percentage huge-24 90.52 ( 0.00%) 80.07 ( -11.54%) Percentage huge-30 94.44 ( 0.00%) 96.28 ( 1.95%) Percentage huge-32 93.09 ( 0.00%) 99.39 ( 6.77%) I add below fix based on your fix 2, then the thpcompact Percentage looks good. How do you think for the fix? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8330c5c2de6b..2facf844ef84 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page, while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; - if (compaction_capture(capc, page, order, migratetype)) + if (compaction_capture(capc, page, order, migratetype)) { + account_freepages(zone, -(1 << order), migratetype); return; + } buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn); if (!buddy) With my fix, the THP percentage looks better: k6.9-rc2-base base + patch10 + 2 fixes + my fix Percentage huge-1 78.18 ( 0.00%) 82.83 ( 5.94%) Percentage huge-3 86.70 ( 0.00%) 93.47 ( 7.81%) Percentage huge-5 90.26 ( 0.00%) 94.73 ( 4.95%) Percentage huge-7 92.34 ( 0.00%) 95.22 ( 3.12%) Percentage huge-12 91.18 ( 0.00%) 92.40 ( 1.34%) Percentage huge-18 89.00 ( 0.00%) 85.39 ( -4.06%) Percentage huge-24 90.52 ( 0.00%) 94.70 ( 4.61%) Percentage huge-30 94.44 ( 0.00%) 97.00 ( 2.71%) Percentage huge-32 93.09 ( 0.00%) 92.87 ( -0.24%)