From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93B9DC3DA63 for ; Fri, 26 Jul 2024 05:55:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FF316B009A; Fri, 26 Jul 2024 01:55:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AFB06B009B; Fri, 26 Jul 2024 01:55:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB9616B009D; Fri, 26 Jul 2024 01:55:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CC69F6B009A for ; Fri, 26 Jul 2024 01:55:41 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 81C42415B4 for ; Fri, 26 Jul 2024 05:55:41 +0000 (UTC) X-FDA: 82380842082.12.A3C505D Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) by imf03.hostedemail.com (Postfix) with ESMTP id C7D802001D for ; Fri, 26 Jul 2024 05:55:38 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=csqWtukC; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721973299; a=rsa-sha256; cv=none; b=4A1kDopAohmDjnWgZ59ZhjZukO4fX1T926IPc/h9xYyIoON3aiyd3K7qoAbU/aFSJCDP6P lkrgby8BPHsLJRctJ5tJfniGfs2AdOim6MQ6AuUTMpKhjXCC44Tjv6VS5e5TIpqMVQajJ2 ym1l7vF+/egzrx7s3P1kZtQey+CYbyg= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=csqWtukC; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721973299; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dmsvYvdJ2dz/q6wntntsiEONp6VFtUuZ6VfEQct4i2A=; b=hHeCGtqtgtEJL3OzIFIHlBBGgfJ/OuL1MQhFrogpJ8RogQl8UYAX/SSibydDfFAQx9qxxk IQqqf0K5EeYI9uFGmA3Z/XgweiIH++w/vdwfGdlrISv7WrqEcrTxfMl4mD2mlrY7EvXhnF 5TK1Qu5TB2T9ftcumai2iccJyThoLXM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721973339; x=1753509339; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=k0Voiw1tqOqaSA+ul5dSBP+qW+fdcsI9oLOuIpuZ7P8=; b=csqWtukC1h1Ln4PUp3UUMubKOmxPLQaIpcqoyBpf3QEycQN52ilUS7CC z5eHqXpxmb/aW0qRKSWtEASsaVSEJqZ3aVaBJ64lNo4sYQjyFuTxUJz3B 5frHRfiIeudeaPZ+rAcIXymQ/Bj7Ul+LHOscTn7qDTvvmvxuULXTkMt4i jcqya6eo5rsNzn1E/AtNxzwcUEa/dnpEICpGeptIBiVgPQcD0GadWPwdp GpZvSy01jodxOgZXedd0yMQOhz21BjN345mfvimrlsE0BYGPKPIxD4mpx J3HFuDyzecFb436I+u8gzzsTMEAVDqcCFNCasEPKk6wgctoVJylVOvZNQ Q==; X-CSE-ConnectionGUID: DN9tmzZUTeq3XLcQxVFQAg== X-CSE-MsgGUID: vAALJrPTQ0C36FoW/O0MBQ== X-IronPort-AV: E=McAfee;i="6700,10204,11144"; a="19905395" X-IronPort-AV: E=Sophos;i="6.09,238,1716274800"; d="scan'208";a="19905395" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 22:55:37 -0700 X-CSE-ConnectionGUID: CYK+Ga7FRbmLZDWoDeX/Og== X-CSE-MsgGUID: 4C7s8pMgSSyyAqOJUnUGFw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,238,1716274800"; d="scan'208";a="52833629" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 22:55:35 -0700 From: "Huang, Ying" To: Chris Li Cc: Ryan Roberts , Andrew Morton , Kairui Song , Hugh Dickins , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Subject: Re: [PATCH v4 2/3] mm: swap: mTHP allocate swap entries from nonfull list In-Reply-To: (Chris Li's message of "Thu, 25 Jul 2024 21:50:56 -0700") References: <20240711-swap-allocator-v4-0-0295a4d4c7aa@kernel.org> <20240711-swap-allocator-v4-2-0295a4d4c7aa@kernel.org> <874j8nxhiq.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o76qjhqs.fsf@yhuang6-desk2.ccr.corp.intel.com> <43f73463-af42-4a00-8996-5f63bdf264a3@arm.com> <87jzhdkdzv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sew0ei84.fsf@yhuang6-desk2.ccr.corp.intel.com> <4ec149fc-7c13-4777-bc97-58ee455a3d7e@arm.com> <87le1q6jyo.fsf@yhuang6-desk2.ccr.corp.intel.com> <87zfq43o4n.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 26 Jul 2024 13:52:02 +0800 Message-ID: <87o76k3dkt.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C7D802001D X-Stat-Signature: 3p3crr6o7zsqbm8bwffzmcau19w1yppj X-Rspam-User: X-HE-Tag: 1721973338-320363 X-HE-Meta: U2FsdGVkX19+uKrxx173wZ0tBf5fmp/PWkp5Xowyim7u2PiLo2HWelRnSoS70hN8GNiG+Op7/dfcjT4o1InZwN4qn4ead2FTt1l0ZCzyb3C9El1cGNBrY84JnYYWy01TeshtY08YMb9TfdM3Z/vHl/a0jEmOGqo10G4ivlo2fwXPv4Q0d+niA2fHT/oYISld9Kr+fHEz1MPFMKTeGsOZ5wv6XKifll4Iwx+Ua4cUbcsZ6P0ptUZHotZ8tgcVMtrGhOXPmmZSLytOvbkUdOoxJyN0aHpNoV0GwrQjhy8eBDZVeEQHfephah6VtR49eHNK++YfuqMY4RDGHvqDPDCWDMqdCYVkyDAMIP5YxXOiF2s9E4Y8MBwPEipWawdwm/RWzSSTn8XXTCdsO+c4+dkh/6SXM/TagiuiiYmF78hMhhtJ9F1Xzh9+jqhIy8dgkF2XX8mF5u28QvdrR79phHEqBdDtaBsMurydudiQT3Lh5I+dyvryF1YRzo6VbfZY3+WvaVjg4t2LEIWqh8Nz7rwWtc9bYkbcDFkm0F4BGpVrrloJLV5Hb6L1yDJfYd4iR+Bz/e4668eV2YXEhPq+jQZ5/SM8XXTBt4qlv8szgOsQNEb4YsCzX/TxOFu8Z3RQfcSjvr3kE07d4uDxhydQoaHPVBXORk7w0GcqTY34DSn9p41OAo0L357O26X35Ae87GKZH/3n9Zltese7GVNMhLiYJevSztfeYTdV6Fzjs2gKnB8CxRlrvCoY5eDg+Qi3jT4OUWyd6esZnGH+9WTq9W5sDiaXY3STQZJye18OL/Io6Pr0gb38F7UAi/izoJaOYdlnLGYpzYTTtT34bV37C0WWRmWMMDw6HyN5zDSrI5FJmiS2+B32YXuPfO5v8tYx1GserywchbWS8vcT700+LRJ9ad5Lz5sxoYw+xvlCRITxXgwhxPdEQKRJSkrXqMpuZ2DQNxM0wBGLXtahTxrK4DG tJX61CLQ bHL+fue8trzYo+KuFU64IylkF89JwtsMQwvkskR6QJefBzcUIeTKvn+FMVrYE7/tx4HqZ2wzc/S29Omw4XERKKvxpx3u5pt8LpV4SEdDz0XvOdAsdbmKE64rhw8KYCknNG0mCwTrELeGWwLyNsFiJkiMdXHptSDMhvFIDy1GEBzyAyCW4NtAUxF9A3F3wKMz7qK4pbjrC+2vljOQCOfH9rWHG7wevajeLzUhw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li writes: > On Thu, Jul 25, 2024 at 7:07=E2=80=AFPM Huang, Ying wrote: >> > If the freeing of swap entry is random distribution. You need 16 >> > continuous swap entries free at the same time at aligned 16 base >> > locations. The total number of order 4 free swap space add up together >> > is much lower than the order 0 allocatable swap space. >> > If having one entry free is 50% probability(swapfile half full), then >> > having 16 swap entries is continually free is (0.5) EXP 16 =3D 1.5 E-5. >> > If the swapfile is 80% full, that number drops to 6.5 E -12. >> >> This depends on workloads. Quite some workloads will show some degree >> of spatial locality. For a workload with no spatial locality at all as >> above, mTHP may be not a good choice at the first place. > > The fragmentation comes from the order 0 entry not from the mTHP. mTHP > have their own valid usage case, and should be separate from how you > use the order 0 entry. That is why I consider this kind of strategy > only works on the lucky case. I would much prefer the strategy that > can guarantee work not depend on luck. It seems that you have some perfect solution. Will learn it when you post it. >> >> - Order-4 pages need to be swapped out, but no enough order-4 non-full >> >> clusters available. >> > >> > Exactly. >> > >> >> >> >> So, we need a way to migrate non-full clusters among orders to adjust= to >> >> the various situations automatically. >> > >> > There is no easy way to migrate swap entries to different locations. >> > That is why I like to have discontiguous swap entries allocation for >> > mTHP. >> >> We suggest to migrate non-full swap clsuters among different lists, not >> swap entries. > > Then you have the down side of reducing the number of total high order > clusters. By chance it is much easier to fragment the cluster than > anti-fragment a cluster. The orders of clusters have a natural > tendency to move down rather than move up, given long enough time of > random access. It will likely run out of high order clusters in the > long run if we don't have any separation of orders. As my example above, you may have almost 0 high-order clusters forever. So, your solution only works for very specific use cases. It's not a general solution. >> >> But yes, data is needed for any performance related change. >> >> BTW: I think non-full cluster isn't a good name. Partial cluster is >> much better and follows the same convention as partial slab. > > I am not opposed to it. The only reason I hold off on the rename is > because there are patches from Kairui I am testing depending on it. > Let's finish up the V5 patch with the swap cache reclaim code path > then do the renaming as one batch job. We actually have more than one > list that has the clusters partially full. It helps reduce the repeat > scan of the cluster that is not full but also not able to allocate > swap entries for this order. Just the name of one of them as > "partial" is not precise either. Because the other lists are also > partially full. We'd better give them precise meaning systematically. I don't think that it's hard to do a search/replace before the next version. -- Best Regards, Huang, Ying