From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FD57CD68FD for ; Tue, 10 Oct 2023 08:09:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 739218D00AF; Tue, 10 Oct 2023 04:09:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C0E08D006D; Tue, 10 Oct 2023 04:09:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53B8A8D00AF; Tue, 10 Oct 2023 04:09:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3CAE78D006D for ; Tue, 10 Oct 2023 04:09:14 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 085931A0234 for ; Tue, 10 Oct 2023 08:09:14 +0000 (UTC) X-FDA: 81328826628.18.68B64F9 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by imf13.hostedemail.com (Postfix) with ESMTP id C50BE20013 for ; Tue, 10 Oct 2023 08:09:10 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ehd6j+bj; spf=pass (imf13.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696925352; a=rsa-sha256; cv=none; b=dmkQ6AvUM1zFx/FJdfcPohwVat8yHwZjMlc6NDHgBNkFNQSfTbbIAxkc2Nx0MJzvtYg8/x iFmoh2CJ/HbCJbbD/LlyRu8U17gE6mKcOaqLB0t5pkUur1O70nWrwmTS1lrkyjbJ2WVWnW gXZQhq/GTEchiQn5HisoG5FewXEit0M= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ehd6j+bj; spf=pass (imf13.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696925352; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HOI8FdEhof1E56vYcfbeenk8tBLZQjqBwMBJBmWAjRk=; b=sP7feei6ql5379bgeW2JLhjsTjM2pIPbjoH5kWh0aY02JQ44Z3uf+qDSg/2qNNhZvSOa0A Tvi/y9TTGb2Yoc3Iia+P0aTqr7DFxbX19dUXVgF7/LdnIpMAQu7X7/Ruati1KfPijY2hi0 OsZLR+TuzA5Tzu2iS4dFtd30AdvRxWk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696925350; x=1728461350; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=fyIYQttk+5+NGZsawMIlcEz5mFyrT6kUPFEx7wD19EM=; b=Ehd6j+bjteQm1YCgIr6H8X4xcqVYqXFo21e9NxCKOAkrYKfmZaCpdFc6 7FBbz6ZAvCPVnHNhIychY99SbsGBBBybzXG3hW37ScLBoeaamPKrl8dqo rDB5MF6poawRUhJQfSkYrGL386uXKCUzxYqk2RWrAMuid9F9kz0iNEqig SyYHxow1sWlWPsdalbi4Kh6OQaUYGoZEGec2iJMqNpLgeQ5xbILPcrne0 NeJrJCfjIcD70Ie+4f5BdQYYxD6wukavevODI5UJ1qM300q2Y5RSFUX9v bk819TgnzorPbRuMudbGGHW5/fCmDVd2HoTPCUy3kbQRPQur+1gY+zqLK Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="369403938" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="369403938" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:09:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="823692426" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="823692426" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:09:03 -0700 From: "Huang, Ying" To: Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zi Yan , Ryan Roberts , Andrew Morton , "Matthew Wilcox (Oracle)" , David Hildenbrand , "Yin, Fengwei" , Yu Zhao , Vlastimil Babka , Johannes Weiner , Baolin Wang , Kemeng Shi , Mel Gorman , Rohan Puri , Mcgrof Chamberlain , Adam Manzanares , John Hubbard Subject: Re: [RFC PATCH 1/4] mm/compaction: add support for >0 order folio memory compaction. References: <20230912162815.440749-1-zi.yan@sent.com> <20230912162815.440749-2-zi.yan@sent.com> Date: Tue, 10 Oct 2023 16:07:00 +0800 In-Reply-To: <20230912162815.440749-2-zi.yan@sent.com> (Zi Yan's message of "Tue, 12 Sep 2023 12:28:12 -0400") Message-ID: <87mswqhpej.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C50BE20013 X-Stat-Signature: fii34dcnmuqkznzsnpgwrqsow4brqebh X-Rspam-User: X-HE-Tag: 1696925350-575567 X-HE-Meta: U2FsdGVkX18w5GrNP0M0//u0nI2Dl5Qa+kZtKUJ3cF3spPgMpr4pvWuEG9AGhoko11Um4ieoF4talVZ+79UwCXTGZgn2JazlTfWxoD9rIzuxqJA4nb2anQdfsTTD89Sa4mzWhECBNYWP6vyRc0C/rNSqDii9K41hf3ZA1axxmKkwnK3Vy5x+BHa8NNp3yA4LbpcnI+F1OzHOAddbikZIJgm7AU6nF76Xtw4K5WKYNXjtG0yvgNuWSidO1vtmHFtpSHmbVme2vs0is9fLCh+hdaOHmz+itMhQ21YL1HO71fF3IwK/i+uolcMEoVF5DXs1qhysgPvfkXD0TeyLtMvaIhdBdUw7XHJZLGRf/30wSfEExrEO15feFjFvce2yaC0v4B+OH9zKIcRH7ga+5794elEE7fekVE51P1TLAafGP5PjRofvBlL3gRWkenIafK3h1AWNgpR5qbc2VfNpc59yD9DwJ1IzC9dVFfUt+BianE4QLL7n1ixCCjlIHznTyG/1t7VctNNd9qLkiNUn66uDIMRHyUwxcQu6/kNXTgAqloEF8LF/AE60IDj8Tt1azdMOuSZX/LDIzUB4PC0bQP2vGafAwHQZnH69q+UFRJHlyahYBKe7oQw/PO2n7m5sPM91uGTYFFMQzFYMUHc9+aZxZ2jWaPUo4lQpCs1byg19lZ0piXxv/oadIBvly/1xYIaJStt91bGK9/KCCeyY8Cror2Qt/3WWTTHdRQQSzJuG9DnI+7/rziID0ILyxp2Qrwt0gG5CzDFgcjcdiTcpT2n2YB1UhMMBAaq8MeQ36dGbg3kUPNXcVd9kGEOXSxqyPe2SLex47ri6aQwDgIHNNfubL+h4ZLCpDfoJtB2U7NLH9H0MtbK4N/usjyfGgb9X+j8bdTBECu74XP+Tki9kWrL8z/T13+2tK0Ai6OF1Zakg9Lv4s9S7FzNzldA1IOp+5rb8zuZGNEmu5nOr6qz82oz dkG8Sor+ aayAHqqITtDYQUlXd4RNbh5IGlu21BlqtPapjQlZTJw1/ErYalT/YdXrvmO/9KgRWTpOkBXV34S98RcfhSg+iCbaYkafCNj+ixLXFzqKeTqVGtsRaIxro3PHzF4V2pFahmghcD358nCBMT9lugC4km6zqFA750HN7jJIzGPaF/hPycsE/Kmk5q5Nln5w86+R8vScwNiP2o2c+KeIsgIq8t1cbdo/fp9GMyawQ3y04YX/kfFOw5BowiY21GQjyhaWcVwQ3TaqaxJpOcTIdklza7gnQlzG9GjYxqvGZKrhNDv1nhM2m1II24/x9borAL7WENooBEkI2rM+cNeE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Zi Yan writes: > From: Zi Yan > > Before, memory compaction only migrates order-0 folios and skips >0 order > folios. This commit adds support for >0 order folio compaction by keeping > isolated free pages at their original size without splitting them into > order-0 pages and using them directly during migration process. > > What is different from the prior implementation: > 1. All isolated free pages are kept in a MAX_ORDER+1 array of page lists, > where each page list stores free pages in the same order. > 2. All free pages are not post_alloc_hook() processed nor buddy pages, > although their orders are stored in first page's private like buddy > pages. > 3. During migration, in new page allocation time (i.e., in > compaction_alloc()), free pages are then processed by post_alloc_hook(). > When migration fails and a new page is returned (i.e., in > compaction_free()), free pages are restored by reversing the > post_alloc_hook() operations. > > Step 3 is done for a latter optimization that splitting and/or merging free > pages during compaction becomes easier. > > Signed-off-by: Zi Yan > --- > mm/compaction.c | 108 +++++++++++++++++++++++++++++++++++++++--------- > mm/internal.h | 7 +++- > 2 files changed, 94 insertions(+), 21 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 01ba298739dd..868e92e55d27 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -107,6 +107,44 @@ static void split_map_pages(struct list_head *list) > list_splice(&tmp_list, list); > } > > +static unsigned long release_free_list(struct free_list *freepages) > +{ > + int order; > + unsigned long high_pfn = 0; > + > + for (order = 0; order <= MAX_ORDER; order++) { > + struct page *page, *next; > + > + list_for_each_entry_safe(page, next, &freepages[order].pages, lru) { > + unsigned long pfn = page_to_pfn(page); > + > + list_del(&page->lru); > + /* > + * Convert free pages into post allocation pages, so > + * that we can free them via __free_page. > + */ > + post_alloc_hook(page, order, __GFP_MOVABLE); > + __free_pages(page, order); > + if (pfn > high_pfn) > + high_pfn = pfn; > + } > + } > + return high_pfn; > +} > + > +static void sort_free_pages(struct list_head *src, struct free_list *dst) > +{ > + unsigned int order; > + struct page *page, *next; > + > + list_for_each_entry_safe(page, next, src, lru) { > + order = buddy_order(page); > + > + list_move(&page->lru, &dst[order].pages); > + dst[order].nr_free++; > + } > +} > + > #ifdef CONFIG_COMPACTION > bool PageMovable(struct page *page) > { > @@ -1422,6 +1460,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn) > { > unsigned long start_pfn, end_pfn; > struct page *page; > + LIST_HEAD(freelist); > > /* Do not search around if there are enough pages already */ > if (cc->nr_freepages >= cc->nr_migratepages) > @@ -1439,7 +1478,8 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn) > if (!page) > return; > > - isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false); > + isolate_freepages_block(cc, &start_pfn, end_pfn, &freelist, 1, false); > + sort_free_pages(&freelist, cc->freepages); > > /* Skip this pageblock in the future as it's full or nearly full */ > if (start_pfn == end_pfn && !cc->no_set_skip_hint) > @@ -1568,7 +1608,7 @@ static void fast_isolate_freepages(struct compact_control *cc) > nr_scanned += nr_isolated - 1; > total_isolated += nr_isolated; > cc->nr_freepages += nr_isolated; > - list_add_tail(&page->lru, &cc->freepages); > + list_add_tail(&page->lru, &cc->freepages[order].pages); > count_compact_events(COMPACTISOLATED, nr_isolated); > } else { > /* If isolation fails, abort the search */ > @@ -1642,13 +1682,13 @@ static void isolate_freepages(struct compact_control *cc) > unsigned long isolate_start_pfn; /* exact pfn we start at */ > unsigned long block_end_pfn; /* end of current pageblock */ > unsigned long low_pfn; /* lowest pfn scanner is able to scan */ > - struct list_head *freelist = &cc->freepages; > unsigned int stride; > + LIST_HEAD(freelist); > > /* Try a small search of the free lists for a candidate */ > fast_isolate_freepages(cc); > if (cc->nr_freepages) > - goto splitmap; > + return; > > /* > * Initialise the free scanner. The starting point is where we last > @@ -1708,7 +1748,8 @@ static void isolate_freepages(struct compact_control *cc) > > /* Found a block suitable for isolating free pages from. */ > nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn, > - block_end_pfn, freelist, stride, false); > + block_end_pfn, &freelist, stride, false); > + sort_free_pages(&freelist, cc->freepages); > > /* Update the skip hint if the full pageblock was scanned */ > if (isolate_start_pfn == block_end_pfn) > @@ -1749,10 +1790,6 @@ static void isolate_freepages(struct compact_control *cc) > * and the loop terminated due to isolate_start_pfn < low_pfn > */ > cc->free_pfn = isolate_start_pfn; > - > -splitmap: > - /* __isolate_free_page() does not map the pages */ > - split_map_pages(freelist); > } > > /* > @@ -1763,18 +1800,21 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) > { > struct compact_control *cc = (struct compact_control *)data; > struct folio *dst; > + int order = folio_order(src); > > - if (list_empty(&cc->freepages)) { > + if (!cc->freepages[order].nr_free) { > isolate_freepages(cc); > - > - if (list_empty(&cc->freepages)) > + if (!cc->freepages[order].nr_free) > return NULL; > } > > - dst = list_entry(cc->freepages.next, struct folio, lru); > + dst = list_first_entry(&cc->freepages[order].pages, struct folio, lru); > + cc->freepages[order].nr_free--; > list_del(&dst->lru); > - cc->nr_freepages--; > - > + post_alloc_hook(&dst->page, order, __GFP_MOVABLE); > + if (order) > + prep_compound_page(&dst->page, order); > + cc->nr_freepages -= 1 << order; > return dst; > } > > @@ -1786,9 +1826,34 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) > static void compaction_free(struct folio *dst, unsigned long data) > { > struct compact_control *cc = (struct compact_control *)data; > + int order = folio_order(dst); > + struct page *page = &dst->page; > > - list_add(&dst->lru, &cc->freepages); > - cc->nr_freepages++; > + if (order) { > + int i; > + > + page[1].flags &= ~PAGE_FLAGS_SECOND; > + for (i = 1; i < (1 << order); i++) { > + page[i].mapping = NULL; > + clear_compound_head(&page[i]); > + page[i].flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > + } > + > + } > + /* revert post_alloc_hook() operations */ > + page->mapping = NULL; > + page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > + set_page_count(page, 0); > + page_mapcount_reset(page); > + reset_page_owner(page, order); > + page_table_check_free(page, order); > + arch_free_page(page, order); > + set_page_private(page, order); > + INIT_LIST_HEAD(&dst->lru); > + > + list_add(&dst->lru, &cc->freepages[order].pages); > + cc->freepages[order].nr_free++; > + cc->nr_freepages += 1 << order; > } > > /* possible outcome of isolate_migratepages */ > @@ -2412,6 +2477,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > const bool sync = cc->mode != MIGRATE_ASYNC; > bool update_cached; > unsigned int nr_succeeded = 0; > + int order; > > /* > * These counters track activities during zone compaction. Initialize > @@ -2421,7 +2487,10 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > cc->total_free_scanned = 0; > cc->nr_migratepages = 0; > cc->nr_freepages = 0; > - INIT_LIST_HEAD(&cc->freepages); > + for (order = 0; order <= MAX_ORDER; order++) { > + INIT_LIST_HEAD(&cc->freepages[order].pages); > + cc->freepages[order].nr_free = 0; > + } > INIT_LIST_HEAD(&cc->migratepages); > > cc->migratetype = gfp_migratetype(cc->gfp_mask); > @@ -2607,7 +2676,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > * so we don't leave any returned pages behind in the next attempt. > */ > if (cc->nr_freepages > 0) { > - unsigned long free_pfn = release_freepages(&cc->freepages); > + unsigned long free_pfn = release_free_list(cc->freepages); > > cc->nr_freepages = 0; > VM_BUG_ON(free_pfn == 0); > @@ -2626,7 +2695,6 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > > trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret); > > - VM_BUG_ON(!list_empty(&cc->freepages)); > VM_BUG_ON(!list_empty(&cc->migratepages)); > > return ret; > diff --git a/mm/internal.h b/mm/internal.h > index 8c90e966e9f8..f5c691bb5c1c 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -465,6 +465,11 @@ int split_free_page(struct page *free_page, > /* > * in mm/compaction.c > */ > + > +struct free_list { > + struct list_head pages; > + unsigned long nr_free; Do we really need nr_free? Is it enough just to use list_empty(&free_list->pages)? > +}; > /* > * compact_control is used to track pages being migrated and the free pages > * they are being migrated to during memory compaction. The free_pfn starts > @@ -473,7 +478,7 @@ int split_free_page(struct page *free_page, > * completes when free_pfn <= migrate_pfn > */ > struct compact_control { > - struct list_head freepages; /* List of free pages to migrate to */ > + struct free_list freepages[MAX_ORDER + 1]; /* List of free pages to migrate to */ > struct list_head migratepages; /* List of pages being migrated */ > unsigned int nr_freepages; /* Number of isolated free pages */ > unsigned int nr_migratepages; /* Number of pages to migrate */ -- Best Regards, Huang, Ying