From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 969F4C2BD09 for ; Thu, 4 Jul 2024 01:42:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 34B5F6B0099; Wed, 3 Jul 2024 21:42:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D0A46B009A; Wed, 3 Jul 2024 21:42:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14AB46B009B; Wed, 3 Jul 2024 21:42:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E87E06B0099 for ; Wed, 3 Jul 2024 21:42:03 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 92BA01C1310 for ; Thu, 4 Jul 2024 01:42:03 +0000 (UTC) X-FDA: 82300369326.21.BC73AED Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by imf09.hostedemail.com (Postfix) with ESMTP id E8523140017 for ; Thu, 4 Jul 2024 01:42:00 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Q3Xx4W7v; spf=pass (imf09.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720057308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=L+FUavK/GaAoHtpz9SmcWWiB6ne4oSr2uYjP5CBSEbs=; b=nUiSw7IBiKCUPdTPu8uTLUKhv0QwpNZZEFb1uXFQqWOuxlicBispiDMjOXQoSrEinCNHt7 +jmldCHJ/ZSKxIPUoPjFQgHzRnWCPvZ2qAk49UpVEWMcygIONDkTemHuMUaXSivNKPn9xL 57Wy9XGGge77aR7CDuoVY22uHQZuWKM= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Q3Xx4W7v; spf=pass (imf09.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720057308; a=rsa-sha256; cv=none; b=FsIqms98x/SQcPAfAohb5ZrFcvu0667trhUCaoc1XrvpGc7PWLhGQheOhcJ7RCY/X5aJx5 nG/cbS/frGA9+NwnbZhUdfZd9bcUD4laYYmBzbrM3dPGFkXJVymlcs+kCCxu8mGQmmuea4 pgzR/JdNanXbI1LhKnkgC4CNOQNzUQw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720057321; x=1751593321; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=s/Aa0BJyrKpjKQ94bhIQQb2pZjpiX1HeSQehN0NvdII=; b=Q3Xx4W7v5j01BC1pXVG2Hl9qr3ZXcvMqZgNLAc09KAfLudx95eNKsHUY 1a1VBcfE0qQUqACsHd3VxKKh8hxUiG7+9/FapoaqjgrfvwboyjS4o1mUC opMzoRz9m/cuwTuBB6G2fgjlkycgi0zHZJNcS9KWltDHt1FlNhgxby+d4 Tr6ukWHNaqG6BWDs7nWvEAztgN/GddbSqx2wGOhqGilfDJOKLghxG+ctt js0rZSHhEXk1DD8muz7EaSJVYSget6l4VL3fJLGFf8lupPizuv3CW/lXu 2ExhQmjeI4LliLm9CkBjYCeYIsW3VOxWkxbA1GuQG/uj402Hv7Mrsm/hm Q==; X-CSE-ConnectionGUID: KoWeYdDrTjmjci+REF1sHw== X-CSE-MsgGUID: SBtAR3aDR7ag+X6wGZuatQ== X-IronPort-AV: E=McAfee;i="6700,10204,11122"; a="21078271" X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="21078271" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 18:41:59 -0700 X-CSE-ConnectionGUID: bsWP/oM4TcOPJf1ZqepPAg== X-CSE-MsgGUID: dCvdjqnFRJGZ/rQEOPH3Bw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="51388565" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 18:41:54 -0700 From: "Huang, Ying" To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, surenb@google.com, kaleshsingh@google.com, hughd@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com, baolin.wang@linux.alibaba.com, shakeel.butt@linux.dev, senozhatsky@chromium.org, minchan@kernel.org Subject: Re: [PATCH RFC v4 0/2] mm: support mTHP swap-in for zRAM-like swapfile In-Reply-To: (Barry Song's message of "Wed, 3 Jul 2024 19:58:51 +1200") References: <20240629111010.230484-1-21cnbao@gmail.com> <87ikxnj8az.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 04 Jul 2024 09:40:03 +0800 Message-ID: <8734oqhr4c.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: E8523140017 X-Stat-Signature: cpjm3pgpeis5zykbbw3pom9xdnecc1eg X-Rspam-User: X-HE-Tag: 1720057320-714315 X-HE-Meta: U2FsdGVkX1/kSFSfqu1pcYhpo5L+1nJ3QzVpqJT1N0L31EZr770mjApamEF2Ecota9JCnZ2cDVO9RY6s2OxB8fNuONaKmhlTM3+wfedABI1TvmF2XUxRjSJwe9TghkQVp8FP4vKkSN0hBvFfPU++jufC18F3oOX8IPxds7pDg3AAPgW3C3s2qb5ULjAXpRIXdUGeS0G/swHfSnQFBt5iYgllouxUJWVOucNuseRKh4Y4PqeRwqgFi1QKjolWbYr4mps+fMGvvSClAg4BAwVdRoqPYAxRZlbQESVqyAuGkDcG/KaBbvSYqMLwY2Wk6MaM/C9Vb8/lUelGRtiX3o5/yMy3A4So13eIstPbx34udrL9QYEpryTzaOVbMCQJgNRBZ9moY4AcWXa/2Vioxr85wY9zjZ18rKO5JFNx2B/I3h85Y6AWzVJN1VlxcqVdgvYo919SZD5s2bpUvWfaPFU9lYVM8TbLAPadjuxge423C051N1PiFS/pTOI/CxakUrJoIw1gEVanmxewq1XPSP4Kv0G2kD941zj/zvhPlGDyM8JAcrL0LW6/NWWamNXdFEeiJRQ0rO5hgaL9A76NnG9UtWhf6uUu6FHi0PWRqMXRUewfcj7qA+HEb7v/piFXEJvL8JipjoBfEie5uJyUtjlACNetIUt9ufUSNxG2vZMCifX6hzYrC2iaetCL0F4945mQCbNhEqsWw0w+EOzRx/2GCzS0kjBRHYrZQ4k6F5jcGhTVNE0xFFZEDahSsNh0E1ZBl45RnG4kBGAdAP3Yg5p2cs9SuR0H8lDZnTYj8v+NiN73SNgY/mVqEeXBizr51+62D6MgF//X2EObg05DVM3qCtu8QOOjO6ENdDZbO3ChwMeRmctYgol2FKeU2aYuzWXAG5FkvKrBD2UszBAtEUYWSFfyz/551hwkLKBLG2/2LiI/C6+fHRPeNuemOA9qharKLNClWSjZ2zTPf1c4zQk Mgbva+gh M1K4xgiNUR4N561LsrBLhib6TdzgMSS+lpdiHcccq8VPB32bhH+0AiwK1Ats4D9FkOhyBDdxndEabSg+IQf2/GWNv9RTN3o6KxkiSbGjpA9Ms8qQbzn98aDu3Ef6v7anwG9W12SCxijjHKOYn9qmHb7o5dmRRmJLaJX+nafMuy6/Y1HINwkF/1K6NVeC3zR2aFZ5n X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Barry Song <21cnbao@gmail.com> writes: > On Wed, Jul 3, 2024 at 6:33=E2=80=AFPM Huang, Ying = wrote: >> > > Ying, thanks! > >> Barry Song <21cnbao@gmail.com> writes: [snip] >> > This patch introduces mTHP swap-in support. For now, we limit mTHP >> > swap-ins to contiguous swaps that were likely swapped out from mTHP as >> > a whole. >> > >> > Additionally, the current implementation only covers the SWAP_SYNCHRON= OUS >> > case. This is the simplest and most common use case, benefiting millio= ns >> >> I admit that Android is an important target platform of Linux kernel. >> But I will not advocate that it's MOST common ... > > Okay, I understand that there are still many embedded systems similar > to Android, even if > they are not Android :-) > >> >> > of Android phones and similar devices with minimal implementation >> > cost. In this straightforward scenario, large folios are always exclus= ive, >> > eliminating the need to handle complex rmap and swapcache issues. >> > >> > It offers several benefits: >> > 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP aft= er >> > swap-out and swap-in. >> > 2. Eliminates fragmentation in swap slots and supports successful THP_= SWPOUT >> > without fragmentation. Based on the observed data [1] on Chris's an= d Ryan's >> > THP swap allocation optimization, aligned swap-in plays a crucial r= ole >> > in the success of THP_SWPOUT. >> > 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CPU= usage >> > and enhancing compression ratios significantly. We have another pat= chset >> > to enable mTHP compression and decompression in zsmalloc/zRAM[2]. >> > >> > Using the readahead mechanism to decide whether to swap in mTHP doesn'= t seem >> > to be an optimal approach. There's a critical distinction between page= cache >> > and anonymous pages: pagecache can be evicted and later retrieved from= disk, >> > potentially becoming a mTHP upon retrieval, whereas anonymous pages mu= st >> > always reside in memory or swapfile. If we swap in small folios and id= entify >> > adjacent memory suitable for swapping in as mTHP, those pages that hav= e been >> > converted to small folios may never transition to mTHP. The process of >> > converting mTHP into small folios remains irreversible. This introduces >> > the risk of losing all mTHP through several swap-out and swap-in cycle= s, >> > let alone losing the benefits of defragmentation, improved compression >> > ratios, and reduced CPU usage based on mTHP compression/decompression. >> >> I understand that the most optimal policy in your use cases may be >> always swapping-in mTHP in highest order. But, it may be not in some >> other use cases. For example, relative slow swap devices, non-fault >> sub-pages swapped out again before usage, etc. >> >> So, IMO, the default policy should be the one that can adapt to the >> requirements automatically. For example, if most non-fault sub-pages >> will be read/written before being swapped out again, we should swap-in >> in larger order, otherwise in smaller order. Swap readahead is one >> possible way to do that. But, I admit that this may not work perfectly >> in your use cases. >> >> Previously I hope that we can start with this automatic policy that >> helps everyone, then check whether it can satisfy your requirements >> before implementing the optimal policy for you. But it appears that you >> don't agree with this. >> >> Based on the above, IMO, we should not use your policy as default at >> least for now. A user space interface can be implemented to select >> different swap-in order policy similar as that of mTHP allocation order >> policy. We need a different policy because the performance characters >> of the memory allocation is quite different from that of swap-in. For >> example, the SSD reading could be much slower than the memory >> allocation. With the policy selection, I think that we can implement >> mTHP swap-in for non-SWAP_SYNCHRONOUS too. Users need to know what they >> are doing. > > Agreed. Ryan also suggested something similar before. > Could we add this user policy by: > > /sys/kernel/mm/transparent_hugepage/hugepages-/swapin_enabled > which could be 0 or 1, I assume we don't need so many "always inherit > madvise never"? > > Do you have any suggestions regarding the user interface? /sys/kernel/mm/transparent_hugepage/hugepages-/swapin_enabled looks good to me. To be consistent with "enabled" in the same directory, and more importantly, to be extensible, I think that it's better to start with at least "always never". I believe that we will add "auto" in the future to tune automatically. Which can be used as default finally. -- Best Regards, Huang, Ying