From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3DBBC3065C for ; Thu, 4 Jul 2024 10:23:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 740956B00B5; Thu, 4 Jul 2024 06:23:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F0466B00B6; Thu, 4 Jul 2024 06:23:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5905A6B00B7; Thu, 4 Jul 2024 06:23:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3CEB46B00B5 for ; Thu, 4 Jul 2024 06:23:38 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A7525160D4F for ; Thu, 4 Jul 2024 10:23:37 +0000 (UTC) X-FDA: 82301683674.03.B3494B9 Received: from mail-vk1-f175.google.com (mail-vk1-f175.google.com [209.85.221.175]) by imf08.hostedemail.com (Postfix) with ESMTP id DE4B6160013 for ; Thu, 4 Jul 2024 10:23:35 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H3m2EiLR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720088597; a=rsa-sha256; cv=none; b=mgPuOMhj0uEmlzK4laBmMC9O3O21GubrFOakoON3LV0xrYuvyKBeRBFrZ+TA2drYKpu7t7 gt3ywzeGX7u2tOxvWfEFwq17oxg34xZTWYbGszzoYgQe8SPcNhM7XyqDO8hxp6lzyfJAH0 BZ4esXW4UDnW8CW0WJX1Ej0BHLrVDTE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H3m2EiLR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720088597; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WmwdjVxyjtKsZjEvWiJCTAahZeOXiJ6Fw2zpiu13l0A=; b=Fws/6b2HwzCzW1lN1brjHCa2U1rq5jp7itf/AOt91GEsUxFbAj4lNVqUIbplqglIgu/PXs VMdre9Af9NWZqSRRo14KBXPP6PlJBrIqZyk7Wv6NI4BgtCpvG+aLbbHfKsD6qDh6RkOpDo mxHPC+1MvGFWLoOIf0/sCnZmwH8fJPA= Received: by mail-vk1-f175.google.com with SMTP id 71dfb90a1353d-4f2d90d5ce9so136798e0c.2 for ; Thu, 04 Jul 2024 03:23:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720088615; x=1720693415; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WmwdjVxyjtKsZjEvWiJCTAahZeOXiJ6Fw2zpiu13l0A=; b=H3m2EiLRet1OCIX4uPfK3udY2xMVFQLl3hYPcgyz/l7yBRcGUbSkpapFZcw9rz7CqS XyXU8dSKzR2KlKBlPky5cORJwDBVWR5012y4sqXJ4rCGGlKDjt3F0bQ6ZIeYSmejEcw9 pRCHXBWjkWsdtioMDRTmtfBnb+oOOyDYJxmU8Tjgbeuh3P9kEVodPc23opm8KcpcBDtQ szUWg1LkphM8P63q+zrZE+ZtCit+zHJZWQ2fsYiBXS7YsF9Rmrx06mM0n/osWyU9yVoG /Ha4M6DYdtgsJGoX6LM5X2rS/KqZ/GO6lSkHNQ0JrLZGxiKD/MFVwITWyz6xEomPJhh2 guwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720088615; x=1720693415; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WmwdjVxyjtKsZjEvWiJCTAahZeOXiJ6Fw2zpiu13l0A=; b=Z08xsSl2UnFhGHBasYbV/Ue0eqFevcQORhwRaFuIAoSQRwr57aYGeLxJXytV9JIVD2 DnfQsbLVssiL1567z+622hjuraybt0AX2W0SqVDIWb1GcLmWEPH2301/bQAUr2qBLWKR EqYitvabTAP6+NepjG32X5o75UypKcv7Seo5WwNYkunxBZBSoIXA0rsAKKr0+RE9TBds 7t8EVhHVCIxe54ASK1A+Lt0oNBnYssAtmJPEe36Fjk4KI6LU2ET/F4jT+iOfAhQQBhEA qb4Fn0nNX9SiKEYfWfIg+dZ7HmTtwByJ1+rG37gBoEzl3mqKyvzjTMR9JDVfGTUQ3z+m lTkw== X-Forwarded-Encrypted: i=1; AJvYcCWmysaI+ye5OOt239cLcrcW/F2QGpQlIJwedYfqbiCdJexGabMhqGqdtnp/uxXqD8wVq2Sk/F/+M7rJR/O1rRdZfaU= X-Gm-Message-State: AOJu0YzvD+qgr/E8vsl+FJPS/KTUTBVMVXu5ktl/5oweBcz78oApWjb5 24FGgEQrHoUl42Hbcc+VYR9l6cUEEiiqfS89L21sGl5aT3cLT9dPiHv2iOEHkBukD4xgCYc91EY PCGl1RDz2RISFn4wy1Yl9YCfOqEE= X-Google-Smtp-Source: AGHT+IGWxqnIFpMAmgQZNvh0Y1gAvdLy+PIQwYo9k5U4/xN4SkWGNDHxB0lLKtmbjTikAjoFkutVrOqkihbASZr+zGo= X-Received: by 2002:a05:6122:4d0c:b0:4ef:6d02:f4a with SMTP id 71dfb90a1353d-4f2f407a7eamr1398798e0c.13.1720088614717; Thu, 04 Jul 2024 03:23:34 -0700 (PDT) MIME-Version: 1.0 References: <20240629111010.230484-1-21cnbao@gmail.com> <87ikxnj8az.fsf@yhuang6-desk2.ccr.corp.intel.com> <8734oqhr4c.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <8734oqhr4c.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Barry Song <21cnbao@gmail.com> Date: Thu, 4 Jul 2024 22:23:20 +1200 Message-ID: Subject: Re: [PATCH RFC v4 0/2] mm: support mTHP swap-in for zRAM-like swapfile To: "Huang, Ying" Cc: akpm@linux-foundation.org, linux-mm@kvack.org, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, surenb@google.com, kaleshsingh@google.com, hughd@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com, baolin.wang@linux.alibaba.com, shakeel.butt@linux.dev, senozhatsky@chromium.org, minchan@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DE4B6160013 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: cth7q1j5stc1r5mfn3xm8z55io7euh7u X-HE-Tag: 1720088615-954721 X-HE-Meta: U2FsdGVkX19z5IzYjazgBFiDrcLmaz9npRZOckjErCFcKX/33ll4i5n3yJ+Dol8PCAY6qEZyCp2QowSFXCmIAo+vXSuExet+R5T9AmCQ9AiEYQlSynl8F1dclEYsYXe4tGAWRKLEfYXYvmbqv/dS031zV6kpVqiC6rx+wQZBcTHFlS/gFK/VA3r++9iXEtkr5IInrEbbYyTVg9fGid2LvEnrF2ffPScddcyS8V6iPny9EQza+mcm7Ig6Ugdu0zS79o4vrsMNn9JdVPrtYLQu/Jf0e6DU3lOIpW5txYbc0VUW1pmi21wGqt7qjQ39jwXH1W1g5tOKr1Rtx7ZG63xacmdU0qwIObUHQSUd0dGJdqLcTWmKPk4uG7QJgVim1REzGApc5Exr2iaIaymBN/iYIjgrrlCva/R7sj0vXlnhrIs6q2wf/Tu6HlR1aLWl35inloiDaoR1O4TNi1qymvZqPCRH7S6P2We2q5CnPfCNb2HKznRO+2yl+q9Xod86zwlwJoWUazKHjOXuhUDBoLFHi4EJSXz++fV7Q7ftueWXG0i6EboJdY3XXmaqZ45g4Ko1vUts36BetIUMYdmcZlJlCj/htcrNlZfvu4Kzm8Wh3+GRD6dQpY0oZciFKN8Qr0EqE4m4q3XgU1PHRWckpK/8Rgv6FXvEscL5rzKXhC9ZXf69wBGrGcwt9/lUbiO3groA+cIzhXBWHe3to4jpso3eXewMUfjCl9yqIstZs/1dr0DrXiEqeVZ0jxBKoPPUO4FRWFPLXhKxHpuo81I/8JL+oyArc/8WFmnudIg3U8jJuJtfDB16ruPqX01umNji00ZjuRq+Y4067wNTsFV7bRTv4ZakdNX15AD5wj1Yjut9bAF7qTwNjHzaYz9VSRa3PKglZCnmpYigbzGGXlE3rTDPSjHew0t3idhODrj2dkBCAp5Iga2NidNw79DSdvGXFDzfDOInxb0ihIiJLilFX6p VWEqQYFc iOrVo0Hy6vG0Bwx4hTC9vsq/C3mK+sTwup2F2XD0oE4Up9gFAX6JbodstBRwdwZShZ7TKAYTL9y+N1T5MXfU9SkAWW4QtJtK6rsxwawAEoeoTlwnwYso0FHY7mHE3notqN2SMJcpP251QJvvQuBpvKqEAIldCghdD37VWezW5kMi2Evsmhg3RFwPiAa4EjibCVDI9QtMcWzK9P+Q7OQRz92rnmX3qqXPrt0ieE+SLkRuwtKw1d9cDP57joOYwbugfX9j0r6SJ/rU7q6nH7JdCHUvOgOSrWF8n0K4vnPK63VNWa0dtBETBzZxGvH+uuGboFx7KMDPZHhHwkvnj8LdbpWilRQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 4, 2024 at 1:42=E2=80=AFPM Huang, Ying w= rote: > > Barry Song <21cnbao@gmail.com> writes: > > > On Wed, Jul 3, 2024 at 6:33=E2=80=AFPM Huang, Ying wrote: > >> > > > > Ying, thanks! > > > >> Barry Song <21cnbao@gmail.com> writes: > > [snip] > > >> > This patch introduces mTHP swap-in support. For now, we limit mTHP > >> > swap-ins to contiguous swaps that were likely swapped out from mTHP = as > >> > a whole. > >> > > >> > Additionally, the current implementation only covers the SWAP_SYNCHR= ONOUS > >> > case. This is the simplest and most common use case, benefiting mill= ions > >> > >> I admit that Android is an important target platform of Linux kernel. > >> But I will not advocate that it's MOST common ... > > > > Okay, I understand that there are still many embedded systems similar > > to Android, even if > > they are not Android :-) > > > >> > >> > of Android phones and similar devices with minimal implementation > >> > cost. In this straightforward scenario, large folios are always excl= usive, > >> > eliminating the need to handle complex rmap and swapcache issues. > >> > > >> > It offers several benefits: > >> > 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP a= fter > >> > swap-out and swap-in. > >> > 2. Eliminates fragmentation in swap slots and supports successful TH= P_SWPOUT > >> > without fragmentation. Based on the observed data [1] on Chris's = and Ryan's > >> > THP swap allocation optimization, aligned swap-in plays a crucial= role > >> > in the success of THP_SWPOUT. > >> > 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing C= PU usage > >> > and enhancing compression ratios significantly. We have another p= atchset > >> > to enable mTHP compression and decompression in zsmalloc/zRAM[2]. > >> > > >> > Using the readahead mechanism to decide whether to swap in mTHP does= n't seem > >> > to be an optimal approach. There's a critical distinction between pa= gecache > >> > and anonymous pages: pagecache can be evicted and later retrieved fr= om disk, > >> > potentially becoming a mTHP upon retrieval, whereas anonymous pages = must > >> > always reside in memory or swapfile. If we swap in small folios and = identify > >> > adjacent memory suitable for swapping in as mTHP, those pages that h= ave been > >> > converted to small folios may never transition to mTHP. The process = of > >> > converting mTHP into small folios remains irreversible. This introdu= ces > >> > the risk of losing all mTHP through several swap-out and swap-in cyc= les, > >> > let alone losing the benefits of defragmentation, improved compressi= on > >> > ratios, and reduced CPU usage based on mTHP compression/decompressio= n. > >> > >> I understand that the most optimal policy in your use cases may be > >> always swapping-in mTHP in highest order. But, it may be not in some > >> other use cases. For example, relative slow swap devices, non-fault > >> sub-pages swapped out again before usage, etc. > >> > >> So, IMO, the default policy should be the one that can adapt to the > >> requirements automatically. For example, if most non-fault sub-pages > >> will be read/written before being swapped out again, we should swap-in > >> in larger order, otherwise in smaller order. Swap readahead is one > >> possible way to do that. But, I admit that this may not work perfectl= y > >> in your use cases. > >> > >> Previously I hope that we can start with this automatic policy that > >> helps everyone, then check whether it can satisfy your requirements > >> before implementing the optimal policy for you. But it appears that y= ou > >> don't agree with this. > >> > >> Based on the above, IMO, we should not use your policy as default at > >> least for now. A user space interface can be implemented to select > >> different swap-in order policy similar as that of mTHP allocation orde= r > >> policy. We need a different policy because the performance characters > >> of the memory allocation is quite different from that of swap-in. For > >> example, the SSD reading could be much slower than the memory > >> allocation. With the policy selection, I think that we can implement > >> mTHP swap-in for non-SWAP_SYNCHRONOUS too. Users need to know what th= ey > >> are doing. > > > > Agreed. Ryan also suggested something similar before. > > Could we add this user policy by: > > > > /sys/kernel/mm/transparent_hugepage/hugepages-/swapin_enabled > > which could be 0 or 1, I assume we don't need so many "always inherit > > madvise never"? > > > > Do you have any suggestions regarding the user interface? > > /sys/kernel/mm/transparent_hugepage/hugepages-/swapin_enabled > > looks good to me. To be consistent with "enabled" in the same > directory, and more importantly, to be extensible, I think that it's > better to start with at least "always never". I believe that we will > add "auto" in the future to tune automatically. Which can be used as > default finally. Sounds good to me. Thanks! > > -- > Best Regards, > Huang, Ying Barry