From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01B06C54E69 for ; Fri, 15 Mar 2024 09:17:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 42C918010B; Fri, 15 Mar 2024 05:17:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3DC54800B4; Fri, 15 Mar 2024 05:17:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27CD78010B; Fri, 15 Mar 2024 05:17:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 16274800B4 for ; Fri, 15 Mar 2024 05:17:38 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A7D7E40FED for ; Fri, 15 Mar 2024 09:17:37 +0000 (UTC) X-FDA: 81898720554.03.0133679 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by imf24.hostedemail.com (Postfix) with ESMTP id 8D11518001F for ; Fri, 15 Mar 2024 09:17:34 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ko67Yf3M; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710494256; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lP2JPHyWPaubdiHJjXUdsX+V+gDlODs5dLLJPZqgDww=; b=e3URjSXaPhx/d4GUx3JFc306WtZHZPcQ3inoky+FD4Z4eNmIUyHsI/l1973cij0gh3iopi 6puii3l01qt6AMxOYD+1bTRFrYFEjGe0lUx8ltmq29JI5f7zs3ry9Gn5ugU9PSuJ1Pq4v2 vXRwKcsEyN2fl9wYNCXgZ5PWYt5kyno= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ko67Yf3M; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710494256; a=rsa-sha256; cv=none; b=K4lhNwKHYEI19nWlNWb+LJNJ+7Qa8eyj6xWqytkTzoj1tfKq9hre0KoqS6qAUqH3jtRp37 p91+18OqMnyte7h1H+CNCU2YQbmiGStHbtPKBsFt4TyvR9WMZGA6FbxxFzDMmv+IRUWx68 3w30NYy0R1ooULqvsc/Ac2dfg1BMx1k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710494254; x=1742030254; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=0AwaHVEEu5wj0fuHBbfehSKilFQUjqeQnXVwSKnsvRY=; b=Ko67Yf3MpUdzB3vwXzKSwNa9z5KnUd9PXXRRI0rdf2Opgoq1+ksD32re Mh8fkLwF4dMT70ik6GpW22L3AIsacqyeIe0+FBNmkZuJJD/slnrVMgi7K epc/FGSlfow1LLZuibcE4MiPuoJTZjnOUO4tPLNQP1fjtXgcjXqChNJ9C /5BKpuH/wBFtTN6yg7Inkml7Wz+6Q/KWl2C0CqGnGb9+zLEuZ81OSHugf YsSQa0A1oH7eK985xXDyBAhrcXyXP0okhI/C8OKwSgCuVZeAWzDPnb7AS /QiaVhS9xGlr1WwjzZxDeM2ycmkJN/FjxtIUsJDUm1xqvVbgq/i+AihBB A==; X-IronPort-AV: E=McAfee;i="6600,9927,11013"; a="15998144" X-IronPort-AV: E=Sophos;i="6.07,128,1708416000"; d="scan'208";a="15998144" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Mar 2024 02:17:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,128,1708416000"; d="scan'208";a="12528091" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Mar 2024 02:17:27 -0700 From: "Huang, Ying" To: Barry Song <21cnbao@gmail.com> Cc: Matthew Wilcox , akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com, chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, xiang@kernel.org, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Subject: Re: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole In-Reply-To: (Barry Song's message of "Fri, 15 Mar 2024 21:54:29 +1300") References: <20240304081348.197341-1-21cnbao@gmail.com> <20240304081348.197341-6-21cnbao@gmail.com> <87wmq3yji6.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 15 Mar 2024 17:15:33 +0800 Message-ID: <87sf0rx3d6.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8D11518001F X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 8qrj4wbyg8fjm93x1416j44rk8qqy5ni X-HE-Tag: 1710494254-239305 X-HE-Meta: U2FsdGVkX18I9xHMJbZ3mGH3fisd5A7GiiyNKU+6cMt8ilvFlRYKJFwedRWWcyDDhHlJJNZnBh4KQDalhwV94GvCvPAzmlBr+qRYHUBjIgpPyFTS+CmrE0H+QHkcBRRCBarvLCjArL3R0knke0RLzseO4PhxudmnWTm8jkiqr32ExbaG/LOBpBMoAWBS4HTMyPs8HWxIEsw3F5hBuDeAGNpSGOE5cRaWQQcrzGVXDb0GVOfYojalLDTNGt9HtZqzLCG5MCgsZ8uMr7TWE0CUrJg2PPkr2kPYsQ8Ot79/7KISAK6G2/kSwuMEtUaHbFWTLP0cdLYjW+T9tiEc5WazWMxrPvWC2UxRl5fy1uDeKjD/gId00ypMpZgP6RKW87LbEhOg2vcm8DCEd1ORe4E9Ov0WPeEUk0lMM+uza2LCsMhnwcMjRcMOLsKvhN/QXDCdP3ctql+FM0FstWKFvIaCDJeUdEC0RtnDZxKgbFXySw0mMr3vkg0FhehL9jnYO9UirVI0/BKFJpgxob4BiZssp0wU2LQuRtaSY+DzeM0bT/K7PnZyYDt+5o9DCKCH+vBGFZ31hFdnKLM6xk7BEMVHkglDAPyMrrZgwU6RsKd4LcjybbHzfihkp6R4PB0ywgPFQNQuxYNlpmjGyhkQSo85JZFA5woKsDbaASee5rNiM0RXFMjjaVDa/w3aiz5a68HVII+wU2TUbqF65uXU43pFpBTAjpm9JsI7suiNz4ncyBOP4/boJlUzI0ecFAo63WyzQX3dHK2OIZWZEScLyIL8V6vlnehzh6fcoXL1anouj1gBs1GOhlJbvJwbWV7UpK70K6aMIg+75OywWjNuDHz4ZDYafHcbn4dH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Barry Song <21cnbao@gmail.com> writes: > On Fri, Mar 15, 2024 at 9:43=E2=80=AFPM Huang, Ying wrote: >> >> Barry Song <21cnbao@gmail.com> writes: >> >> > From: Chuanhua Han >> > >> > On an embedded system like Android, more than half of anon memory is >> > actually in swap devices such as zRAM. For example, while an app is >> > switched to background, its most memory might be swapped-out. >> > >> > Now we have mTHP features, unfortunately, if we don't support large fo= lios >> > swap-in, once those large folios are swapped-out, we immediately lose = the >> > performance gain we can get through large folios and hardware optimiza= tion >> > such as CONT-PTE. >> > >> > This patch brings up mTHP swap-in support. Right now, we limit mTHP sw= ap-in >> > to those contiguous swaps which were likely swapped out from mTHP as a >> > whole. >> > >> > Meanwhile, the current implementation only covers the SWAP_SYCHRONOUS >> > case. It doesn't support swapin_readahead as large folios yet since th= is >> > kind of shared memory is much less than memory mapped by single proces= s. >> >> In contrast, I still think that it's better to start with normal swap-in >> path, then expand to SWAP_SYCHRONOUS case. > > I'd rather try the reverse direction as non-sync anon memory is only arou= nd > 3% in a phone as my observation. Phone is not the only platform that Linux is running on. >> >> In normal swap-in path, we can take advantage of swap readahead >> information to determine the swapped-in large folio order. That is, if >> the return value of swapin_nr_pages() > 1, then we can try to allocate >> and swapin a large folio. > > I am not quite sure we still need to depend on this. in do_anon_page, > we have broken the assumption and allocated a large folio directly. I don't think that we have a sophisticated policy to allocate large folio. Large folio could waste memory for some workloads, so I think that it's a good idea to allocate large folio always. Readahead gives us an opportunity to play with the policy. > On the other hand, compressing/decompressing large folios as a > whole rather than doing it one by one can save a large percent of > CPUs and provide a much lower compression ratio. With a hardware > accelerator, this is even faster. I am not against to support large folio for compressing/decompressing. I just suggest to do that later, after we play with normal swap-in. SWAP_SYCHRONOUS related swap-in code is an optimization based on normal swap. So, it seems natural to support large folio swap-in for normal swap-in firstly. > So I'd rather more aggressively get large folios swap-in involved > than depending on readahead. We can take advantage of readahead algorithm in SWAP_SYCHRONOUS optimization too. The sub-pages that is not accessed by page fault can be treated as readahead. I think that is a better policy than allocating large folio always. >> >> To do that, we need to track whether the sub-pages are accessed. I >> guess we need that information for large file folio readahead too. >> >> Hi, Matthew, >> >> Can you help us on tracking whether the sub-pages of a readahead large >> folio has been accessed? >> >> > Right now, we are re-faulting large folios which are still in swapcach= e as a >> > whole, this can effectively decrease extra loops and early-exitings wh= ich we >> > have increased in arch_swap_restore() while supporting MTE restore for= folios >> > rather than page. On the other hand, it can also decrease do_swap_page= as >> > PTEs used to be set one by one even we hit a large folio in swapcache. >> > >> -- Best Regards, Huang, Ying