From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74A3FE95A95 for ; Mon, 9 Oct 2023 07:14:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 87B168D0019; Mon, 9 Oct 2023 03:14:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 82B258D0001; Mon, 9 Oct 2023 03:14:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F3888D0019; Mon, 9 Oct 2023 03:14:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5D65C8D0001 for ; Mon, 9 Oct 2023 03:14:44 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 222C340171 for ; Mon, 9 Oct 2023 07:14:44 +0000 (UTC) X-FDA: 81325060488.09.2FF59A4 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by imf23.hostedemail.com (Postfix) with ESMTP id A298414001F for ; Mon, 9 Oct 2023 07:14:41 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KlwtEwwJ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696835682; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5wbytvEQ25v0eXRi31m/molDGBSiknPOCEHW76bMgaw=; b=7IS7KaYHegodILWYYZts2CprAJcydqcD5C9+2D3h3jYXDGzyKsr895mtkaskZuhyVtOG0a f8i/vZLOrslTXR8xQ58B35YjRMM9nCtIAgexwF+MKUAlyUZ5bvlYr5Qo570sNtuvQBfdsT ZplsLOD9wvIdhkXnqkSVOyoIdUXrBUQ= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KlwtEwwJ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696835682; a=rsa-sha256; cv=none; b=ogqiZ2ly9CS1CRgC8UUrt5lAD74gYhiE1myORECXO6ui93iyC9u6JtROQvhUqfpW5lau1F 1e5TTzPtnDVbsNQMXQ/Y9eVbLHutAJCNBVuQyxH0RK2hVfaRTCnw947yYPg2Q667P42VLd KY9NJS1mae9LeNkvX0KyOVgapYh//q8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696835681; x=1728371681; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=OgL+yegQ7yGiZSze3dAzzQralJ3UwM2bdojXS7B1kHM=; b=KlwtEwwJ4LrF2/wsHpf0cy3FMp5ZD7W4QjHEV2PC+CaJB7QzZr1i4nKr MO78ViMWDICRhv2nk72DQHqDZyUnr0DM3UO3gLphiLj7ZXP2TO5J393xO sfNCwYfZ3lW3AfpE0dc7O0TvzyaGKyFM+rv+8viKkWkDsqL5A+VraCaKg NDHRYpje6m2WFdswJX7RycjzeQbELtWsa1x1X2/eORWK0RNgwpP+cnusP YJNTuzOAfLeq5Gl3Xo/GYeyHB5r2bUdsxPHVp0gi35J5EgNKXVXgielk+ WC/gJCh/1Tne00ZdFjsbeoovIe+2nlpzVcnLvFe4QyD75uXPtsfGfvNrl g==; X-IronPort-AV: E=McAfee;i="6600,9927,10857"; a="381350909" X-IronPort-AV: E=Sophos;i="6.03,209,1694761200"; d="scan'208";a="381350909" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2023 00:14:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10857"; a="702789098" X-IronPort-AV: E=Sophos;i="6.03,209,1694761200"; d="scan'208";a="702789098" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2023 00:14:34 -0700 From: "Huang, Ying" To: Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zi Yan , Ryan Roberts , Andrew Morton , "Matthew Wilcox (Oracle)" , David Hildenbrand , "Yin, Fengwei" , Yu Zhao , Vlastimil Babka , Johannes Weiner , Baolin Wang , Kemeng Shi , Mel Gorman , Rohan Puri , Mcgrof Chamberlain , Adam Manzanares , John Hubbard Subject: Re: [RFC PATCH 0/4] Enable >0 order folio memory compaction References: <20230912162815.440749-1-zi.yan@sent.com> Date: Mon, 09 Oct 2023 15:12:30 +0800 In-Reply-To: <20230912162815.440749-1-zi.yan@sent.com> (Zi Yan's message of "Tue, 12 Sep 2023 12:28:11 -0400") Message-ID: <87a5ssjmld.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Stat-Signature: qyr1ka3ytmqrycsax3cst3zauwosqs7o X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A298414001F X-HE-Tag: 1696835681-105726 X-HE-Meta: U2FsdGVkX19u3j75QJXiJ1Pf8zbIQhSToQiggK2hTU6NmXu72/8fbC7ChcyEekDtg6mPAtIiDrzCBxx7LpB0vhwNQI/UXCU3DnIt7DsxHi3ESAceoZGOOhrHLur0puvfXou8K7kQPgwQVfptoZvj6Mjpnq4WJTL+kOuKIQWhu55esDfccJZfX/XkgNsBigX4HNNzc2yUWQ8iOBvqkx03XRBt6NtW1c2EyfEufQ/wvJJpzRlRddtrZKZqHtzB4QchgWLPMsErUYiTdA0m89EvMEU974w2tAxFqS5f0EKDeeRLAnlsJVVqh5z8qih0WCA82fbw+4N5rB1zIZEBAVP/SXvedg6YzRM66BzJQ7sYfZx4iHhUZHTlLvblmZ6KkOnsLaT6wq41GvPJigAEWWejWKGIwOpkS1KhR4rNtCuoMVeYDxI0ieyb16Rk57mZ6Q01TJLOSC5pqqnGpCENeiBZ+e3U9varNP/cUERCGDx6UZniiHy5IzP7Bj0mRWZ2h0tfrGxJSbYve0DdtEVkfdJJDqfx9Ma4270r2fElwPm2wVk6Cl7UFBh7jarboxdsZze2WfAxm9rV+P+XUgkz3EFBuee3GPeZ0vv/TCTCd0pTv2GqNkzCZ6B7KhODx5jcR5ZxcQGLmcYdPAdt86ToUrjP+UZe/YrPx05bC3wnV57LZSriTMcaMujN3H51k6kYrP5EG0b55sxPX0k3ZNzMHbPivrpGLicvVL86Aka8yJcKM0K7ASJfuNZez96iVSDpf8/gilNMwPSp9hcsBHsGaNDM6NwPjy6ZIUEMSJU2nIkxZpwti8I4Pp3mqInkXfrPkGJi4VG2Cl6L0GeKOMGG9L/rbjRdBYicmJJieyXhEexSRNvvxhh40h6Pf80ZaiQ8DlDru3MHXIPcBsGE5KL4NwF6UiugJ3XaXBLAfyhgAr3HMDtOTv1aCGgfFqevL14kBTwjrtlrBICVoKCWXgif1Ib aw0+lOmS 9lASVPxFULJmqH8AMPxREChLJS7Qz5uvoJF5YodJ/5KPJqKYA8TDL5or3WPQlDsMhjU/Bbohog8tMMjVsllurjEtsyi3pTSZfWfdwBEieGnyvNrOF6KaJNZbELAH05sFbybn8FNBQBSB5fw1uV+8FdgqPsAmDHhLygDZM5+QDtc8RGoYV88U0Vo972fXFpFLsjemFybKFnNQod53XL/+c6H8WE5KpvUyeQGUFTOrOiCjlM70e2UwL5G5kVn36MLZdZIwRsatJQZNLa4NpqTBpJqwSjXtTt0k/REtC5KpS7h+mPW3vBJpXv0wtIVozsRXCb9f1KGimBd+8L3D0tbAMcqQkiM3ahbqtAj470tsHYD5nmnjpGzRogKVUyakB5XLH3T28S/fovXQ0ydfN0Ih4krgVjv2SpxE4XFZ2IBV9UawSTLUyPT8ayeztpX4uAXzPRSBW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Zi, Thanks for your patch! Zi Yan writes: > From: Zi Yan > > Hi all, > > This patchset enables >0 order folio memory compaction, which is one of > the prerequisitions for large folio support[1]. It is on top of > mm-everything-2023-09-11-22-56. > > Overview > === > > To support >0 order folio compaction, the patchset changes how free pages used > for migration are kept during compaction. migrate_pages() can split the large folio for allocation failure. So the minimal implementation could be - allow to migrate large folios in compaction - return -ENOMEM for order > 0 in compaction_alloc() The performance may be not desirable. But that may be a baseline for further optimization. And, if we can measure the performance for each step of optimization, that will be even better. > Free pages used to be split into > order-0 pages that are post allocation processed (i.e., PageBuddy flag cleared, > page order stored in page->private is zeroed, and page reference is set to 1). > Now all free pages are kept in a MAX_ORDER+1 array of page lists based > on their order without post allocation process. When migrate_pages() asks for > a new page, one of the free pages, based on the requested page order, is > then processed and given out. > > > Optimizations > === > > 1. Free page split is added to increase migration success rate in case > a source page does not have a matched free page in the free page lists. > Free page merge is possible but not implemented, since existing > PFN-based buddy page merge algorithm requires the identification of > buddy pages, but free pages kept for memory compaction cannot have > PageBuddy set to avoid confusing other PFN scanners. > > 2. Sort source pages in ascending order before migration is added to Trivial. s/ascending/descending/ > reduce free page split. Otherwise, high order free pages might be > prematurely split, causing undesired high order folio migration failures. > > > TODOs > === > > 1. Refactor free page post allocation and free page preparation code so > that compaction_alloc() and compaction_free() can call functions instead > of hard coding. > > 2. One possible optimization is to allow migrate_pages() to continue > even if get_new_folio() returns a NULL. In general, that means there is > not enough memory. But in >0 order folio compaction case, that means > there is no suitable free page at source page order. It might be better > to skip that page and finish the rest of migration to achieve a better > compaction result. We can split the source folio if get_new_folio() returns NULL. So, do we really need this? In general, we may reconsider all further optimizations given splitting is available already. > 3. Another possible optimization is to enable free page merge. It is > possible that a to-be-migrated page causes free page split then fails to > migrate eventually. We would lose a high order free page without free > page merge function. But a way of identifying free pages for memory > compaction is needed to reuse existing PFN-based buddy page merge. > > 4. The implemented >0 order folio compaction algorithm is quite naive > and does not consider all possible situations. A better algorithm can > improve compaction success rate. > > > Feel free to give comments and ask questions. > > Thanks. > > > [1] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/ > > Zi Yan (4): > mm/compaction: add support for >0 order folio memory compaction. > mm/compaction: optimize >0 order folio compaction with free page > split. > mm/compaction: optimize >0 order folio compaction by sorting source > pages. > mm/compaction: enable compacting >0 order folios. > > mm/compaction.c | 205 +++++++++++++++++++++++++++++++++++++++--------- > mm/internal.h | 7 +- > 2 files changed, 176 insertions(+), 36 deletions(-) -- Best Regards, Huang, Ying