From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B48CBC2BD09 for ; Wed, 3 Jul 2024 08:32:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1B5C6B0085; Wed, 3 Jul 2024 04:32:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DCB866B0089; Wed, 3 Jul 2024 04:32:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C92D26B008C; Wed, 3 Jul 2024 04:32:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ABD7F6B0085 for ; Wed, 3 Jul 2024 04:32:37 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4D48DA09B9 for ; Wed, 3 Jul 2024 08:32:37 +0000 (UTC) X-FDA: 82297775154.23.BDC3D75 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf28.hostedemail.com (Postfix) with ESMTP id 795A2C0019 for ; Wed, 3 Jul 2024 08:32:35 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ToTqJiWs; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.182 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719995543; a=rsa-sha256; cv=none; b=pPzyVfg8+64dCO4bWA22EdAOstLCkSkFhCXmWL1V7wj3Ic2xc8djqIVLLZNy19aeQMcLdO MuR8Fpj3nUVVSWFkay2J3SIMn4RTsuTN4xzswmTs2lzr2uXjoOFwzhb8XuWXAUjnjeLUT3 Bh4MhraGjc66KDWoI7Tsxs5EoXNEOXM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ToTqJiWs; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.182 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719995543; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FWu/SK9JoXzB+1RcHOyOxt+ym2qzgvKKxy1/Ozs1OXw=; b=3nKVLxjjjBjs3r7PoQfYwmRHI0ZhVj/1UnzwvRW3SchBfo2Qcc1nSqACLpe+wZkVrhh9gl DezF7ve/rYo6PWCgC/nZB6Tm8kvUD8QotYWj48D+LdYPYIOZo7AQVQbN7uK3X8C1U/ODMO 8qbW3rq6RetV7NutqaRUIpT8wPeET4k= Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-79c2c05638cso398745885a.3 for ; Wed, 03 Jul 2024 01:32:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719995554; x=1720600354; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FWu/SK9JoXzB+1RcHOyOxt+ym2qzgvKKxy1/Ozs1OXw=; b=ToTqJiWsqFAy+BdJbGoYjg3eUcMhebGfgYh2Ad01Gq7VLKTHlpJwBrzwOqdqjTC+u6 Kar0asrXqiDsnfqJuUzX2WvNvz8oo/anFl9aj0Juve5QaRYA8cPMJyDIAByvvKL3ud2q CsLK4i0vt/6z03Q9vTZFb2Y2yiZI147ZqYX5VKmEWgKAb3VudaNHXFOthZ94lsSfF3xs WBZgaJgNQjEye8BGC9EL+wqOdOXPoh1lP0QayrV/mOJJfjmHOLXebJngUy3S3lsRtQFi P+G+L38DQOi7HY9qS2d12A2RRZXunIXP5FFm//Zr+75XlMCCw42cjh9IJLWKIEnpyyhp lQsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719995554; x=1720600354; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FWu/SK9JoXzB+1RcHOyOxt+ym2qzgvKKxy1/Ozs1OXw=; b=mNccLv5VmRIEXbPxHCQFmc9UUesuoyPuqfacLRMH4Du41s/riMg6TiIqWZo7o0d2dT c3Ao/97fuFYFLEKRda1IEwMllOX+ZZAhr0exXilxEqf7fK1Tl8mTAvcbbT+8TcFKg1co wZOZvAiIS/Csrt+5PVnNwmRaRSQzcy3997Zbj5KeH78H4F5pVNJ3Ss2vxKBiqM3kaTQO 5t5H7gmfdYazh+UMxzaaqEK8CDLdGSRVDpH/6CB+py+CY21POalsEg7ncDzGFv9KoaiR z1qsE/iYckV7vE99Af3fZRaU6RbctTZm7QcuXrgnlPYoK+GLdTKlWWijYuuRuohmXsd1 0szg== X-Forwarded-Encrypted: i=1; AJvYcCUO+mxAsLEhfDfueSUZ6gVE0AqREpLTgcBeBtW7AYsObikl29jYWwIf3qVhIq0wEmCE8MaykPbOwqeP6vgR9uvrMcY= X-Gm-Message-State: AOJu0YwIIzYAXHL1KXrdrpKBppDzMY7CPwnv3Ts76Qnotg7dtOs7ppCM aUVf0iWWNcE8rlZ/N90wASbNmPFQeCXTfTk3vslt4jCZaRzktPHQ2gFEv8U7hD5IWduiy+5lO+l ChzGylVOPSPgTJkpJg84zIlKYi/s= X-Google-Smtp-Source: AGHT+IGIXWLCzI7WSxmjk9c6zYVW3azDa72EfznUmUoaUmmt9X6VDrd/w0KMDG2kOM7JiKY3YRy/m2nzdY7bdJjLIqQ= X-Received: by 2002:a05:620a:191d:b0:79d:5990:df3f with SMTP id af79cd13be357-79d7ba60389mr1492129485a.37.1719995554469; Wed, 03 Jul 2024 01:32:34 -0700 (PDT) MIME-Version: 1.0 References: <20240629111010.230484-1-21cnbao@gmail.com> <87ikxnj8az.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 3 Jul 2024 20:32:22 +1200 Message-ID: Subject: Re: [PATCH RFC v4 0/2] mm: support mTHP swap-in for zRAM-like swapfile To: "Huang, Ying" Cc: akpm@linux-foundation.org, linux-mm@kvack.org, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, surenb@google.com, kaleshsingh@google.com, hughd@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com, baolin.wang@linux.alibaba.com, shakeel.butt@linux.dev, senozhatsky@chromium.org, minchan@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: iu79ny3tnboju73xas6ps1tifa1dieyy X-Rspamd-Queue-Id: 795A2C0019 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719995555-377022 X-HE-Meta: U2FsdGVkX193mvmfZQKXHimcvlRvIKWeaFbPQWzGixK19Rzs2Xx/m106o7ua+aKPPRlEXAH42rxxB+MkLl6suUKKLU8xdwyWgfAets4v5D+uVQdCZJ5fbw7O3QQ3yuvz2hcYVP/cGUDVKaKBhzio+sXit9GeCw5fzx2jD2KX5vsLRIP8aFGMI6Z5RjQAXjb68P3P/RrDHIwIQfGfeOnhbK23H5iMjAubb2rkJBihQrxILuXFeOggOzjOIY4Oe8ggBRhmn6dpwdfkaxYH78oYs1GZx2fepuLBbwwAbAiCvN5rVrl7NXFPBlNJZ3j8v3t0IIF6SlvyCZ36Rn6J7YAfo3sQ3T1wZlCvYaDm1O43XWXJs1QFqGEhXCGhQhwatSNL6tl5IjSTrWa2KaOyhmzdtp72k78WRmud7ConSm9WH2jeBbZYvy+AcrGwfqo4zGYsoxHEWoCSq4zeIAX36Idh18AtAUjHgVDExqj5DclbLUnWWcOIJ2NA47Rxrh0sFtq7Y7IZeM5eQOglwpQp/j4J5lCFagFlOPKiIZHY2iXs6BkYKhsydGsfojzhedHAEIFiS2Atj4zzVZBQlh6xWeeh0I7j41FOmnLO6yXgRXC63VCiH4zIhzEBocfj/a8u4U2RSL9NRNcGaBem47XvIzxC3zKbKD/JiCjoO3hq8TOFngdW/u5EUbDCsz6RbuGPqM/RFXzXxGWb+fzes/JZ4f6JNiLCN51NbxN7pXXSKGDYeBweGfewONeWnE5tn4eQ7/iQr8c2Xw/n/EWDs77S9zH62eTrlMSDc89YUjSTo3qc6zyTnYK+teRel7AZyUauKB02XfkxFrwHZTm6Y3xtKPHBXVoUWKGv0THHaKwXKaXDhgGCwsalEROPRzQNzh/yrWyM011KvLh7o9qgfFvxfzNwvRDHKmbmhcbmPOluFYGKppF+lPIFK26AMMPOe/obu4SH70z2ipYZlRB2X9KzTSR SI5nIGnV JHnRMkVy6jPzwTMJejfdIhGLginCGt2vPwNeo28/3zTeFD+8mvGJhry9bRRdPP6mrjDecz2Yh+fZRNlZkAWMrRDhukhZvwMV5hKlU50ZzEOPwYtRl/YvFdSLKUTCcLSSZNRmBd1iCafSbep4mxzBbGeVYgJpJLoTLn7+gdyojRHFU4lo1620xRF3wRaKmiUDH+eDNSrLivZRTMP2sxsjJyVtkE1Fcs65h0ah/AU0HWKFHE3qXuxnYpIgWbly8ltOpiFYpsJ38ToMikWcvgnkzT75oHC/Z042NcxhveYBxwQg4IQ3ZlRQVVol5V6g0veSxlC7eeuizUwC2bZv26WUtEihr2ar52jW3sCgAoMJwXfx7gn/O2c//s6EHuIjMi3l4lFwcbtWJgIaxeb0m5iD8txEd3lFqBAq+yveg2nTN3sjqEU6pQ5EhINoiPeoFM83YUn5wzxo7UbZGNgAvxG9BeKDQ2M22ZAGRpS4Pa4ciyld6aKP9azQaPFqcsBDTfTLhFjvI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 3, 2024 at 7:58=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrote= : > > On Wed, Jul 3, 2024 at 6:33=E2=80=AFPM Huang, Ying = wrote: > > > > Ying, thanks! > > > Barry Song <21cnbao@gmail.com> writes: > > > > > From: Barry Song > > > > > > In an embedded system like Android, more than half of anonymous memor= y is > > > actually stored in swap devices such as zRAM. For instance, when an a= pp > > > is switched to the background, most of its memory might be swapped ou= t. > > > > > > Currently, we have mTHP features, but unfortunately, without support > > > for large folio swap-ins, once those large folios are swapped out, > > > we lose them immediately because mTHP is a one-way ticket. > > > > No exactly one-way ticket, we have (or will have) khugepaged. But I > > admit that it may be not good enough for you. > > That's right. From what I understand, khugepaged currently only supports = PMD THP > till now? > Moreover, I have concerns that khugepaged might not be suitable for > all mTHPs for > the following reasons: > > 1. The lifecycle of mTHP might not be that long. We paid the cost for > the collapse, > but it could swap-out just after that. We expect THP to be durable and > not become > obsolete quickly, given the significant amount of money we spent on it. > > 2. mTHP's size might not be substantial enough for a collapse. For > example, if we can > find an effective method, such as Yu's TAO or others, we can achieve a > high success > rate in mTHP allocations at a minimal cost rather than depending on > compaction/collapse. > > 3. It could be a significant challenge to manage the collapse - unmap, > and map processes > in relation to the power consumption of phones considering the number > of mTHP could > be much larger than PMD-mapped THP. This behavior could be quite often. > > > > > > This is unacceptable and reduces mTHP to merely a toy on systems > > > with significant swap utilization. > > > > May be true in your systems. May be not in some other systems. > > I agree that this isn't a concern for systems without significant > swapout and swapin activity. > However, on Android, where we frequently switch between applications > like YouTube, > Chrome, Zoom, WeChat, Alipay, TikTok, and others, swapping could occur > throughout the > day :-) > > > > > > This patch introduces mTHP swap-in support. For now, we limit mTHP > > > swap-ins to contiguous swaps that were likely swapped out from mTHP a= s > > > a whole. > > > > > > Additionally, the current implementation only covers the SWAP_SYNCHRO= NOUS > > > case. This is the simplest and most common use case, benefiting milli= ons > > > > I admit that Android is an important target platform of Linux kernel. > > But I will not advocate that it's MOST common ... > > Okay, I understand that there are still many embedded systems similar > to Android, even if > they are not Android :-) > > > > > > of Android phones and similar devices with minimal implementation > > > cost. In this straightforward scenario, large folios are always exclu= sive, > > > eliminating the need to handle complex rmap and swapcache issues. > > > > > > It offers several benefits: > > > 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP af= ter > > > swap-out and swap-in. > > > 2. Eliminates fragmentation in swap slots and supports successful THP= _SWPOUT > > > without fragmentation. Based on the observed data [1] on Chris's a= nd Ryan's > > > THP swap allocation optimization, aligned swap-in plays a crucial = role > > > in the success of THP_SWPOUT. > > > 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CP= U usage > > > and enhancing compression ratios significantly. We have another pa= tchset > > > to enable mTHP compression and decompression in zsmalloc/zRAM[2]. > > > > > > Using the readahead mechanism to decide whether to swap in mTHP doesn= 't seem > > > to be an optimal approach. There's a critical distinction between pag= ecache > > > and anonymous pages: pagecache can be evicted and later retrieved fro= m disk, > > > potentially becoming a mTHP upon retrieval, whereas anonymous pages m= ust > > > always reside in memory or swapfile. If we swap in small folios and i= dentify > > > adjacent memory suitable for swapping in as mTHP, those pages that ha= ve been > > > converted to small folios may never transition to mTHP. The process o= f > > > converting mTHP into small folios remains irreversible. This introduc= es > > > the risk of losing all mTHP through several swap-out and swap-in cycl= es, > > > let alone losing the benefits of defragmentation, improved compressio= n > > > ratios, and reduced CPU usage based on mTHP compression/decompression= . > > > > I understand that the most optimal policy in your use cases may be > > always swapping-in mTHP in highest order. But, it may be not in some > > other use cases. For example, relative slow swap devices, non-fault > > sub-pages swapped out again before usage, etc. > > > > So, IMO, the default policy should be the one that can adapt to the > > requirements automatically. For example, if most non-fault sub-pages > > will be read/written before being swapped out again, we should swap-in > > in larger order, otherwise in smaller order. Swap readahead is one > > possible way to do that. But, I admit that this may not work perfectly > > in your use cases. > > > > Previously I hope that we can start with this automatic policy that > > helps everyone, then check whether it can satisfy your requirements > > before implementing the optimal policy for you. But it appears that yo= u > > don't agree with this. > > > > Based on the above, IMO, we should not use your policy as default at > > least for now. A user space interface can be implemented to select > > different swap-in order policy similar as that of mTHP allocation order > > policy. We need a different policy because the performance characters > > of the memory allocation is quite different from that of swap-in. For > > example, the SSD reading could be much slower than the memory > > allocation. With the policy selection, I think that we can implement > > mTHP swap-in for non-SWAP_SYNCHRONOUS too. Users need to know what the= y > > are doing. > > Agreed. Ryan also suggested something similar before. > Could we add this user policy by: > > /sys/kernel/mm/transparent_hugepage/hugepages-/swapin_enabled > which could be 0 or 1, I assume we don't need so many "always inherit > madvise never"? I actually meant: Firstly, we respect the existing THP policy, and then we incorporate swapin_enabled after checking both allowable and suitable, pseudo code like this, orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); orders =3D thp_swapin_allowable_order(orders); > > Do you have any suggestions regarding the user interface? > > > > > > Conversely, in deploying mTHP on millions of real-world products with= this > > > feature in OPPO's out-of-tree code[3], we haven't observed any signif= icant > > > increase in memory footprint for 64KiB mTHP based on CONT-PTE on ARM6= 4. > > > > > > [1] https://lore.kernel.org/linux-mm/20240622071231.576056-1-21cnbao@= gmail.com/ > > > [2] https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@g= mail.com/ > > > [3] OnePlusOSS / android_kernel_oneplus_sm8550 > > > https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/onep= lus/sm8550_u_14.0.0_oneplus11 > > > > > > > [snip] > > > > -- > > Best Regards, > > Huang, Ying > > Thanks > Barry