From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4986C2BD09 for ; Wed, 3 Jul 2024 06:33:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB0EF6B0082; Wed, 3 Jul 2024 02:33:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E608B6B0083; Wed, 3 Jul 2024 02:33:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D27726B0085; Wed, 3 Jul 2024 02:33:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B3BE36B0082 for ; Wed, 3 Jul 2024 02:33:17 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 611DDA0858 for ; Wed, 3 Jul 2024 06:33:17 +0000 (UTC) X-FDA: 82297474434.08.D4060AC Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by imf18.hostedemail.com (Postfix) with ESMTP id 849241C0010 for ; Wed, 3 Jul 2024 06:33:14 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YP0GhPau; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719988372; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GlAeHS2psB5JKCKQItBGxbeJ+15Nx/dd8GfOhhGS9Ro=; b=RQ+xOwWU2uMMbCRTo2t8GaHTwvN4vdc66zOKDYv2bccY+cpIqh5fU0w/qMi4fA3YhFRAMr QISpqxfMWfLn+2ARcqrSrudqUys6/9pOZLjU4rkj9XRWIQHKGCAKMIp6nYRCH4X5F7SP7f 23xyTaarcAxJb2q8QFZWfwEUaKRimHc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719988372; a=rsa-sha256; cv=none; b=K2U4I12WcNoSJhbA9rmsphsn53qGS6bv/5MIcX0p0ntVZMVesAcOWRCDvdrA8U42VNbXI1 lzX040l0UKkaxLotK8QTbv5/ywk71xTaJqP/5SFwxJhOFgMXsgpnAqVvJorPCZcX3dfwmN 4ERwnEu9c7jlt/UO/VAI5RJWKcb7O+s= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YP0GhPau; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719988394; x=1751524394; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=H5eZhdMck6PKyVrAjb2JK2IorIN9/Ev82wjv1COU90s=; b=YP0GhPau07CN5Qa+LQTPdkbaGWr+QLoOUhgJ9R8Enm8KwNkdeoSDpCNS 4TMClboUA0eg0nxKXSLJ3C6cRR/LhlmM7ZkioHcy4P0HUO7cCOCCxs7wq jToTj87S9HQkdNJeMIizKGeWINBQIoPCNbH8V4Fnz/SdyYZabZRcaBdXc R0GZ79pCvZU7YmZLK6bXiSsWlX5ZucdV7H6MDA6E4egoUpKwWWBQj5PRP xqJTR/bcRnCTI6WX5+aXvmTGKj53zCbha1595auF3ZcZ25eiqBVvhPn54 DU0e/h9m1q8tVfrTroTWsP8vJHpfSL67bif3PvO1Gu6IstBVgOFP2I2no w==; X-CSE-ConnectionGUID: 5Jexh0s/SmyUKRHRl9+Ydg== X-CSE-MsgGUID: KCBWZpW+QSORHyI5wH8Ftw== X-IronPort-AV: E=McAfee;i="6700,10204,11121"; a="16856524" X-IronPort-AV: E=Sophos;i="6.09,181,1716274800"; d="scan'208";a="16856524" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2024 23:33:12 -0700 X-CSE-ConnectionGUID: UEkP0j2dQxCdKOAaqTTBvQ== X-CSE-MsgGUID: BDVkjxB/Si6N+TTmzpIFhw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,181,1716274800"; d="scan'208";a="69322686" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2024 23:33:07 -0700 From: "Huang, Ying" To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, surenb@google.com, kaleshsingh@google.com, hughd@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com, baolin.wang@linux.alibaba.com, shakeel.butt@linux.dev, senozhatsky@chromium.org, minchan@kernel.org Subject: Re: [PATCH RFC v4 0/2] mm: support mTHP swap-in for zRAM-like swapfile In-Reply-To: <20240629111010.230484-1-21cnbao@gmail.com> (Barry Song's message of "Sat, 29 Jun 2024 23:10:08 +1200") References: <20240629111010.230484-1-21cnbao@gmail.com> Date: Wed, 03 Jul 2024 14:31:16 +0800 Message-ID: <87ikxnj8az.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 849241C0010 X-Stat-Signature: njq5npinwhby5mjkwpwnfp6mq1bngwce X-HE-Tag: 1719988394-846632 X-HE-Meta: U2FsdGVkX1/PGZqQFECaHUqnDFIue84rmw6hvk4OVYYjqVtlILG6DwHM3s6OaWHO+xsnJl8ODsd23Tnr7xPQiP+K7Gy0a3HRqigGIhO0a6Z8PhitLr0f6domggKbX16f0Ho1JmfZVc9Csa48zuqyGfydtmrXqkxGK85Lh+YSX5D/i+HYQqNEQFI0BgcHdY9p0Xm5vSXt8IdOUar+57GiGedcTUXDHkCXb3UqoAcWuVqjUpH7ymwf6O7gFsj2Gi6PsBBwvJikoA5cZrqBnM81OZN9RKG7QpdevTiG8jvA7J7uXGGq0kUkF3dDy5SHlqU0SogxoO8sYcMNVZdef+KQ2sdvfFbAEMdUDsdMAB0hwWejy/9MkFkhH8Z+MG8hi7k7hoWd8BArfltfUsr+RsKbrgsFzfnWmHgseCUWa9C5FGx765SfTsecXJDQFisGol4/nrVRw1urhF+U2xDtLqV1deaB6y/dCDuliQO8v8A1zQAaGwU1aOH4ZKe69zLJCI4c/RAHSuit+FTgmQZeNhbCY55dfNuxQKIZaC8KQW8nJGDxSjbGqzzTKsePNv3QQLbME4GuGLCPS07RE+xorSzo9nMVc/d6a2yle6rNwqWc4nN4iN5Fq/zHvGhq7JOPCOnN1VC8ZbArUYQXzjs7Wl/I400JgHwN5h2swab2F/pawuJcgc8Pe5VFGzovH812IFuEtUgJN6xsRn8VNIiLDT9X83eig/CI2bVLV4Msy/bS2VeuK6V5/DgjzO672MPLAzWybBLOcpifkLcKHuNeGPeehPRk6r+2qztFplQe75U3EfhcUxoTGQzyWHcYMZ+zpmo5la2H/A5prqw7e44RNZ2i6JKqPPhEskXMFtaeaVdYcYW3coFGABDGMxTmjhQdqKTHV3hFh9bnavl89N+AVZYWEcbejvrx1IZ5t3EZaguh56AZ51eUNX39VJ7Th5VTquTMVk/cR0boGGYm4E+mCPK p2gxw1sL YbBrYHZuY+cpKuFaOT/UFFSAYWN1QdoVvqBqSVahrD/GEu0KGxVYHtFWm0aEJx0H7g2LinPmCRxUrJ4+pDkFvFMbWcgkLHF2jgrn12RewXAWoh1VJQVeJVvCrcyVBuDN2C1HUrYjhw7Dg8dnTJhczz+HXr6gVubini7SL8mAvASmaNXOlgw2qhQuLPhBgxFd2rLqY4LcNx3gEIB64aL5iPdKmzCftTKxMABvYXF8lt8VrU0Twu+MLsViFzQUYlAyIDyhgwCKBX/Tilt+9DgC8UWjdrxQ5J9yGhEN+ucq8MJBPKMFymDdMv8hb6BHBZX1MYttIbJlAHA/7UQEqD1FBbUIWhQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Barry Song <21cnbao@gmail.com> writes: > From: Barry Song > > In an embedded system like Android, more than half of anonymous memory is > actually stored in swap devices such as zRAM. For instance, when an app > is switched to the background, most of its memory might be swapped out. > > Currently, we have mTHP features, but unfortunately, without support > for large folio swap-ins, once those large folios are swapped out, > we lose them immediately because mTHP is a one-way ticket. No exactly one-way ticket, we have (or will have) khugepaged. But I admit that it may be not good enough for you. > This is unacceptable and reduces mTHP to merely a toy on systems > with significant swap utilization. May be true in your systems. May be not in some other systems. > This patch introduces mTHP swap-in support. For now, we limit mTHP > swap-ins to contiguous swaps that were likely swapped out from mTHP as > a whole. > > Additionally, the current implementation only covers the SWAP_SYNCHRONOUS > case. This is the simplest and most common use case, benefiting millions I admit that Android is an important target platform of Linux kernel. But I will not advocate that it's MOST common ... > of Android phones and similar devices with minimal implementation > cost. In this straightforward scenario, large folios are always exclusive, > eliminating the need to handle complex rmap and swapcache issues. > > It offers several benefits: > 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after > swap-out and swap-in. > 2. Eliminates fragmentation in swap slots and supports successful THP_SWPOUT > without fragmentation. Based on the observed data [1] on Chris's and Ryan's > THP swap allocation optimization, aligned swap-in plays a crucial role > in the success of THP_SWPOUT. > 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CPU usage > and enhancing compression ratios significantly. We have another patchset > to enable mTHP compression and decompression in zsmalloc/zRAM[2]. > > Using the readahead mechanism to decide whether to swap in mTHP doesn't seem > to be an optimal approach. There's a critical distinction between pagecache > and anonymous pages: pagecache can be evicted and later retrieved from disk, > potentially becoming a mTHP upon retrieval, whereas anonymous pages must > always reside in memory or swapfile. If we swap in small folios and identify > adjacent memory suitable for swapping in as mTHP, those pages that have been > converted to small folios may never transition to mTHP. The process of > converting mTHP into small folios remains irreversible. This introduces > the risk of losing all mTHP through several swap-out and swap-in cycles, > let alone losing the benefits of defragmentation, improved compression > ratios, and reduced CPU usage based on mTHP compression/decompression. I understand that the most optimal policy in your use cases may be always swapping-in mTHP in highest order. But, it may be not in some other use cases. For example, relative slow swap devices, non-fault sub-pages swapped out again before usage, etc. So, IMO, the default policy should be the one that can adapt to the requirements automatically. For example, if most non-fault sub-pages will be read/written before being swapped out again, we should swap-in in larger order, otherwise in smaller order. Swap readahead is one possible way to do that. But, I admit that this may not work perfectly in your use cases. Previously I hope that we can start with this automatic policy that helps everyone, then check whether it can satisfy your requirements before implementing the optimal policy for you. But it appears that you don't agree with this. Based on the above, IMO, we should not use your policy as default at least for now. A user space interface can be implemented to select different swap-in order policy similar as that of mTHP allocation order policy. We need a different policy because the performance characters of the memory allocation is quite different from that of swap-in. For example, the SSD reading could be much slower than the memory allocation. With the policy selection, I think that we can implement mTHP swap-in for non-SWAP_SYNCHRONOUS too. Users need to know what they are doing. > Conversely, in deploying mTHP on millions of real-world products with this > feature in OPPO's out-of-tree code[3], we haven't observed any significant > increase in memory footprint for 64KiB mTHP based on CONT-PTE on ARM64. > > [1] https://lore.kernel.org/linux-mm/20240622071231.576056-1-21cnbao@gmail.com/ > [2] https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@gmail.com/ > [3] OnePlusOSS / android_kernel_oneplus_sm8550 > https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11 > [snip] -- Best Regards, Huang, Ying