From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AC61C2BD09 for ; Wed, 3 Jul 2024 07:59:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77A516B008A; Wed, 3 Jul 2024 03:59:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 702F66B008C; Wed, 3 Jul 2024 03:59:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A40D6B0092; Wed, 3 Jul 2024 03:59:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3CEA86B008A for ; Wed, 3 Jul 2024 03:59:07 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A8711140945 for ; Wed, 3 Jul 2024 07:59:06 +0000 (UTC) X-FDA: 82297690692.10.0EEAC19 Received: from mail-ua1-f52.google.com (mail-ua1-f52.google.com [209.85.222.52]) by imf01.hostedemail.com (Postfix) with ESMTP id CB2E540020 for ; Wed, 3 Jul 2024 07:59:03 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aWfWlJuJ; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719993532; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7oOEaDaVIDbOn6sqUGIDZZUmg9qCJEMQzuMTK1E7NRU=; b=UXDZRMZBmJ/X4vJrXObN3pRpoqkKumGwDuSJ9XtNI3QkooK+Bk+s48zC7LJ0NFdDnPHdxN Vsyofqu+gcM15TyzMCzb9uCiEEti0Wxx56Tidkkd5KZY+kbH4nRsixyORCtZq/K7KATr6n O9ypVxuKU8nR0zC8meHXtUd4644qtcI= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aWfWlJuJ; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719993532; a=rsa-sha256; cv=none; b=1wdn/up68TpFu+pJpgMrNOTl3giwoqjbPlgjU+Sq4oB5A+54uZLPxxCReQKMMswbtU2AyG N4UTdDXRpRY51TUqYXcVWUhXB6+0zFzgd63zbI7+naHaf4YewoVpNnWYQoLjKzHfQ5BdbG bijCVmapxsLraRFQ+C/eEzppFzQKAxE= Received: by mail-ua1-f52.google.com with SMTP id a1e0cc1a2514c-80b76c5de79so1334076241.1 for ; Wed, 03 Jul 2024 00:59:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719993543; x=1720598343; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7oOEaDaVIDbOn6sqUGIDZZUmg9qCJEMQzuMTK1E7NRU=; b=aWfWlJuJYszy0f6oPbcKZwT3ad9n+aGOr6kco8ruLcH0cT4Rdj0XSakcRLcfiXD6Fn zySlBKU78yTo4A6NRderGjdJ2ye3ep+jSIW/0Xad0UM9jfY8n17IA9eShds+HnNHW+/Z KrEZbMc765rWGTfh+r19UkuNKuiQbnKHCkR+7CqgZtqUg8KDc1evuEyJuYbLyaFqh6u/ tfo6z+7Go1ccEEpH2axDKzl4xCTsQDh/EO5aKG+acEuaIaQkZvvJRhpxTj2QLOhyDKSR oCLpQudtCWg4FSWt9apw1TMlm7vbGcXWmR12GMydGUvzmj0pu2K1hDT8nGmy3fL5PJir tL3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719993543; x=1720598343; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7oOEaDaVIDbOn6sqUGIDZZUmg9qCJEMQzuMTK1E7NRU=; b=GcIm/P+C5mRv9zDaAO/8rGQo2MhRBBAZY09J0sQ2C5nPHm9rmKFgTbjmGew2eQdSU6 ldWan1Qq964ivYeJ6ZFZIIixjbZJiqgR5cG1Z5d54D47mBVu7/puXk9bwzBsH3IPACZn 895/9Rgw11XZRpV6+fT69CUl2bKaPKJoesZ/ZtUuLcj/KoMqexrjemjwvvG//aoMSggk BLUUuGDiak7bUeWHeQ8O/OWn4Oi1AIc3mcBHdncqZqWWUorSNoXBguXmkW2co6SU9+iY 1rvV8NoCYJKaKCx801oRCy1N2/XznpY0iW57WAMmdx9RHOR2IPP77ObeEHmDwXGSpNlJ WtMw== X-Forwarded-Encrypted: i=1; AJvYcCU3cnGgPUlyyVGHIOsbsw6VD9UapdkQJaCS5lNLmp8PAHb0FjHSTymTf9dgt8jd6nIX5ArxJw1qCfzslCjZJNSUuPg= X-Gm-Message-State: AOJu0YzUStz98bDn3DvgSVIAacjgZX8/Z8ffM0i1PEuIPoOS6R0vGKHP 2g6YBC6ZHlF3LTkIL7+1tmwZ4XfWIF/7Z/Gvy07kr7vXJMQ3HqXZkdYfRXtSeOo0Vacb27dvGVg ccQmKg6y0UaSMibx2BR1bwzKLyIg= X-Google-Smtp-Source: AGHT+IFXK/irsf1uyNZnKDY/lrCaJrz0BT6iTgx6f00MO63ps1kd7o0Ttcd9faPqG3PichjEok0Gg9NILYA+BGinlJc= X-Received: by 2002:a67:f486:0:b0:48f:4778:929d with SMTP id ada2fe7eead31-48faf140af3mr12914156137.27.1719993542664; Wed, 03 Jul 2024 00:59:02 -0700 (PDT) MIME-Version: 1.0 References: <20240629111010.230484-1-21cnbao@gmail.com> <87ikxnj8az.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87ikxnj8az.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 3 Jul 2024 19:58:51 +1200 Message-ID: Subject: Re: [PATCH RFC v4 0/2] mm: support mTHP swap-in for zRAM-like swapfile To: "Huang, Ying" Cc: akpm@linux-foundation.org, linux-mm@kvack.org, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, surenb@google.com, kaleshsingh@google.com, hughd@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com, baolin.wang@linux.alibaba.com, shakeel.butt@linux.dev, senozhatsky@chromium.org, minchan@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: CB2E540020 X-Stat-Signature: zbk48nu6qioidqsm36ctdusjj1mkn3pg X-HE-Tag: 1719993543-370149 X-HE-Meta: U2FsdGVkX1/QprUAIZ+ua7gif/WeeTSKKtOnlqeN5On5+oTnR2eUQZGuAMIdJJE2BdbHMNCsdMChW/YE7Fo3N8Hq4LzoOpCf2ouL7vgGQ0LdH+WiSNFt2syTHUBSifc6ExYo++to43MsaEVxNTOGxFOxhombDfd44eynjMZ8fr3fqCHtEi40B/wDvSA7Q69UD6tk6iUwSaAmzMK2I+2ZcrVklXV+O3d4H+TK6u9x0mZo9AbQyzLalZ86uJJBVcDT2Pwjp8+f1MbFJC8fA/EqK/emYoRQAJQ4fIu7TkLkkmGmVxDpVUrNO2QLK3ejKZoHB6xqlfRYEc9iIs8PWDVT+NIGwSSL5zlTxIY01hf3B4QmDFHi3RBw/EG2P+rE2AwLUyj+4E7ovUlHzOVCIAC7KXFqjiauXs9B62DOyZXxo5YJWyQUev4knm2a3l9M49Fwybr/MytZGl4TER3WHTxcgwlQZ5PrSU5IRc2vXchAkEeuiwfGy6iYUAnkc9mGCBuuU9gH354p5fBnB3fqPX0lqbyJp+zBmCmc7J+Nei4z0sxonVBN5NJS+9wlmxpXq4aNPKKh43mENy2dcceZ/dkhU6qzbYKApaLk0Da4oPgdWK7LMcBOFtt8QxoGPALhWby/akp3l8scEOAn6gQIziLhrO56tkZwYiuZ2oPs0+18XmruMaoCoPOCx2wc3/1a5n9fqfkJrIKKUAeNYuazw/MnIOJc/jzBfQcbpVGgv64L7XF4P2+mOpUyr4nhdQAgmyX2NgXJAHOZkigMYbvUksGgVtexn2DeZLBBmOsLU1+IAa5iKH+3zW9a+e8gAp45NwRbKao6wwAr8JGigf9Q8ptdJXguD9DL6taeaqaUZCkMa8XI4LYuGOOoX4PoGXBTIlEZ9zM3DbIAwWem3mIFtPCQtGJXiJwtCg33zr2kKIWWdzTubzPrgmNLXrpNUgbKSAInQmIrqCdUGQr+Zmzrl8u jg68MhgQ IwELHYQU/khEFRUwJoXYlHaDUdyCzq/DS4pelFsKSUwPcMgSXuI4H0OY8+gFX0NO+BElvW+s8PEgqg3tpbjndmv2LF9QLFaEEFOzqqLucaT1ll7Qg43avrpr5fYtI4yjXJlN+LV/uXj5sPmEJ+BqgMTLI5fGGAwzJpsEGT1NipOP4rlvqA/Psh7LcwZ174JGX7ENAsvFhFJ1FbuJILYb7Z2DISFI5tVKQtWx2bDTPD3iCjVDmb2ier4XVc8z2MOpP7lR1g7on5AaCRqkbu4gRjO6CFCAlU+13rXsXdGwWCI3An7pAvypViVAMy7U9L1usFdqp/qrNJyIuErYbNlY4uNrXi1Gm+6V+Y21sBbcAJtbOkhV/q09jChHUGMZOXHlI7LJnRDyc/mxSJYuTVBQORAC0zOV8d9R17dovE81noQqNFbNVD3YeYq1iarQErLf+W4ZZqZFZYn/JEe2OtfVeoL5uvD0fa6rZO4JmNjGbXAxxFBjsmgOSwK2rqNHPImKTeaDL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 3, 2024 at 6:33=E2=80=AFPM Huang, Ying w= rote: > Ying, thanks! > Barry Song <21cnbao@gmail.com> writes: > > > From: Barry Song > > > > In an embedded system like Android, more than half of anonymous memory = is > > actually stored in swap devices such as zRAM. For instance, when an app > > is switched to the background, most of its memory might be swapped out. > > > > Currently, we have mTHP features, but unfortunately, without support > > for large folio swap-ins, once those large folios are swapped out, > > we lose them immediately because mTHP is a one-way ticket. > > No exactly one-way ticket, we have (or will have) khugepaged. But I > admit that it may be not good enough for you. That's right. From what I understand, khugepaged currently only supports PM= D THP till now? Moreover, I have concerns that khugepaged might not be suitable for all mTHPs for the following reasons: 1. The lifecycle of mTHP might not be that long. We paid the cost for the collapse, but it could swap-out just after that. We expect THP to be durable and not become obsolete quickly, given the significant amount of money we spent on it. 2. mTHP's size might not be substantial enough for a collapse. For example, if we can find an effective method, such as Yu's TAO or others, we can achieve a high success rate in mTHP allocations at a minimal cost rather than depending on compaction/collapse. 3. It could be a significant challenge to manage the collapse - unmap, and map processes in relation to the power consumption of phones considering the number of mTHP could be much larger than PMD-mapped THP. This behavior could be quite often. > > > This is unacceptable and reduces mTHP to merely a toy on systems > > with significant swap utilization. > > May be true in your systems. May be not in some other systems. I agree that this isn't a concern for systems without significant swapout and swapin activity. However, on Android, where we frequently switch between applications like YouTube, Chrome, Zoom, WeChat, Alipay, TikTok, and others, swapping could occur throughout the day :-) > > > This patch introduces mTHP swap-in support. For now, we limit mTHP > > swap-ins to contiguous swaps that were likely swapped out from mTHP as > > a whole. > > > > Additionally, the current implementation only covers the SWAP_SYNCHRONO= US > > case. This is the simplest and most common use case, benefiting million= s > > I admit that Android is an important target platform of Linux kernel. > But I will not advocate that it's MOST common ... Okay, I understand that there are still many embedded systems similar to Android, even if they are not Android :-) > > > of Android phones and similar devices with minimal implementation > > cost. In this straightforward scenario, large folios are always exclusi= ve, > > eliminating the need to handle complex rmap and swapcache issues. > > > > It offers several benefits: > > 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP afte= r > > swap-out and swap-in. > > 2. Eliminates fragmentation in swap slots and supports successful THP_S= WPOUT > > without fragmentation. Based on the observed data [1] on Chris's and= Ryan's > > THP swap allocation optimization, aligned swap-in plays a crucial ro= le > > in the success of THP_SWPOUT. > > 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CPU = usage > > and enhancing compression ratios significantly. We have another patc= hset > > to enable mTHP compression and decompression in zsmalloc/zRAM[2]. > > > > Using the readahead mechanism to decide whether to swap in mTHP doesn't= seem > > to be an optimal approach. There's a critical distinction between pagec= ache > > and anonymous pages: pagecache can be evicted and later retrieved from = disk, > > potentially becoming a mTHP upon retrieval, whereas anonymous pages mus= t > > always reside in memory or swapfile. If we swap in small folios and ide= ntify > > adjacent memory suitable for swapping in as mTHP, those pages that have= been > > converted to small folios may never transition to mTHP. The process of > > converting mTHP into small folios remains irreversible. This introduces > > the risk of losing all mTHP through several swap-out and swap-in cycles= , > > let alone losing the benefits of defragmentation, improved compression > > ratios, and reduced CPU usage based on mTHP compression/decompression. > > I understand that the most optimal policy in your use cases may be > always swapping-in mTHP in highest order. But, it may be not in some > other use cases. For example, relative slow swap devices, non-fault > sub-pages swapped out again before usage, etc. > > So, IMO, the default policy should be the one that can adapt to the > requirements automatically. For example, if most non-fault sub-pages > will be read/written before being swapped out again, we should swap-in > in larger order, otherwise in smaller order. Swap readahead is one > possible way to do that. But, I admit that this may not work perfectly > in your use cases. > > Previously I hope that we can start with this automatic policy that > helps everyone, then check whether it can satisfy your requirements > before implementing the optimal policy for you. But it appears that you > don't agree with this. > > Based on the above, IMO, we should not use your policy as default at > least for now. A user space interface can be implemented to select > different swap-in order policy similar as that of mTHP allocation order > policy. We need a different policy because the performance characters > of the memory allocation is quite different from that of swap-in. For > example, the SSD reading could be much slower than the memory > allocation. With the policy selection, I think that we can implement > mTHP swap-in for non-SWAP_SYNCHRONOUS too. Users need to know what they > are doing. Agreed. Ryan also suggested something similar before. Could we add this user policy by: /sys/kernel/mm/transparent_hugepage/hugepages-/swapin_enabled which could be 0 or 1, I assume we don't need so many "always inherit madvise never"? Do you have any suggestions regarding the user interface? > > > Conversely, in deploying mTHP on millions of real-world products with t= his > > feature in OPPO's out-of-tree code[3], we haven't observed any signific= ant > > increase in memory footprint for 64KiB mTHP based on CONT-PTE on ARM64. > > > > [1] https://lore.kernel.org/linux-mm/20240622071231.576056-1-21cnbao@gm= ail.com/ > > [2] https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@gma= il.com/ > > [3] OnePlusOSS / android_kernel_oneplus_sm8550 > > https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplu= s/sm8550_u_14.0.0_oneplus11 > > > > [snip] > > -- > Best Regards, > Huang, Ying Thanks Barry