From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEE0EC8303E for ; Thu, 29 Aug 2024 17:10:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A26F6B00A4; Thu, 29 Aug 2024 13:10:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1530C6B00A6; Thu, 29 Aug 2024 13:10:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F35CB6B00B0; Thu, 29 Aug 2024 13:10:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D48806B00A4 for ; Thu, 29 Aug 2024 13:10:48 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 81C1C1A0B54 for ; Thu, 29 Aug 2024 17:10:48 +0000 (UTC) X-FDA: 82505922576.09.60B31DA Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) by imf20.hostedemail.com (Postfix) with ESMTP id 28E3C1C0018 for ; Thu, 29 Aug 2024 17:10:44 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=P9GPIQV5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.52 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724951379; a=rsa-sha256; cv=none; b=HrE9dK1oT8TONfUeRYdQJ3TilX9Q2XxcK4UkCSSpudW+MfPa3LWX78STH5foH5w4+/1+SF UtPI5f5x69kvFioGko1CvgvSzXa0UGC29dPwfK/8kxt/Pg8gDHQ3iKfxHdElP6bUExzUDC PASGqfs5G6Cvg3RyAOS3Uio6JYB5GjE= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=P9GPIQV5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.52 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724951379; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JP4Eh51AekJMydrtI494GdwaaSvpwbU1iqZqBMLS+IU=; b=lpKTtp70+MV+kZoPNh6p3WZKuKsCCjyC1+9zQW7VlHiemZz5LBkYUsTT4fDYm3hHvLStWW WQC3qKzVXP4+uygL7hcU5Fez9OVBrHWZPyWOnDAKs2aXJR1p3b8ulAwtTzIfMOb6lsH5Am JI/GNNHzJIa3rCAzR7BI4SV3UPCG/9k= Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-6bf7f4a1334so4339456d6.2 for ; Thu, 29 Aug 2024 10:10:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724951444; x=1725556244; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JP4Eh51AekJMydrtI494GdwaaSvpwbU1iqZqBMLS+IU=; b=P9GPIQV518K+DiCkSCv2GTq46N8bQo2dvvYfqEBvjeOtxWuI7BBVyr0CmX+GegF8bh HSSxMNb9LWTnEaXBP5/y6baJBplK3aGmUL03NB1KWq83SoOTTGOi8pvkbqIUIhItB54I bdgi7y+0atEBNf5EE4QlqYop3zuPt1zKAtzpdvwF0/GL7w+n9m3VioBgmp3C7UFSpa1H DvvPi06K334vJCZReM4rl6+j/hn+zZNi9UozrlxiK27ZqKBZHPLp0VY9L88q+eakk3i1 KSWVwIGmdq5nBfDQxD0ZFAY89yPC41GM8KKD+WOUVgAwnigesh2/F+P0DNwYH2N6428W i27w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724951444; x=1725556244; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JP4Eh51AekJMydrtI494GdwaaSvpwbU1iqZqBMLS+IU=; b=tYS+wasqYXerPXHEmkV2p9TbXCMEmSgdRwnivkIOsbvDYiwBUsoGrqJbpMH+rQ72CO F66fXttnVVBU49kZYl7/5dPsmo5AHcGkP/5hCjmGVbuIBLFF3ODPIegn26aYS1wJrgQW L41dojbiu9uF7mioi82521cPw+kBNWc+mSGXMQ7nxcrMS7LtfQqDRe0b12maXCtnA+m9 k5AnNw7mQ0bUfmzkJh2lIXue+dn5QoRMeaGFy7QazB6Ybm/zUAwWxclJlOTQMfRZXQak qQ/eDwTB1hh5IlXKPbfb4UYsUhZ4GnHou2kyTQu6kiLQJ+nhsOqfR1RL2QfgbC0/qFft tNDw== X-Forwarded-Encrypted: i=1; AJvYcCVdgfynUF7nPcKj/qz8zt91vhp+8atQ4Giu3KUxzA1hJW4FsOig3RqjuZqt9ELsc0HZRgwUz80GvA==@kvack.org X-Gm-Message-State: AOJu0Yx1y1V5CengecnOr4sAGFZ7rJq3HI80NTCnyPL4D30Qm913uaYH o4zNRUA5YfgC/MvVPdifhZHenoAgES2sJh1qQFs7HVqjGMgcrl8HbJB/fpUkksPY19O1WauKP3D ZRfI/OyJbEHfarVod/mchlI4cf/o= X-Google-Smtp-Source: AGHT+IFpa54Pu1ljtImHw5UT06YP15vm5GYyQ98tkJniYlTA7g9/rJHt3aNNaNy6f62GevtdZtSIWzT2cRatervk8Oo= X-Received: by 2002:a05:6214:5a0b:b0:6c1:6b5d:8cd with SMTP id 6a1803df08f44-6c33e6a7d9dmr46646616d6.49.1724951443834; Thu, 29 Aug 2024 10:10:43 -0700 (PDT) MIME-Version: 1.0 References: <20240828093516.30228-1-kanchana.p.sridhar@intel.com> In-Reply-To: From: Nhat Pham Date: Thu, 29 Aug 2024 10:10:32 -0700 Message-ID: Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios To: "Sridhar, Kanchana P" Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "yosryahmed@google.com" , "ryan.roberts@arm.com" , "Huang, Ying" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" , Usama Arif , Chengming Zhou Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 28E3C1C0018 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: fpk8kuuzyhnnc99gzygint1meb8c7ir4 X-HE-Tag: 1724951444-320518 X-HE-Meta: U2FsdGVkX192QlZD117n0hYULSPodI21UnKbU8PPJ2xI752AYiVoQhswUVF/09MpBOkYJWJHD1C6xoivOcmoDs+Le4/iY5w1r82+jhwEbftAVpu4pUZ0OI0BAVqkPfEpFPZKKHlKvs5Qd3CLzmpOsfsERQ4rZaSM1ZbRpECFo1nsO17sdl/e1ZS3rjxaxc2GsCNBikJZqR4g/lM5UoTvwr7TWZp3+zQxfl+GNzbtUWq5JLZVnqqYtFMrZtM8Qkwyiapr3Myn5ARmBxEbIrMJZUMG2b2xwMNXgj4o36PMjZQhXXKP4e4ri/OQHvS87l0fh+MdmM1RShJrxfW3ipy8N4a9u0ma1oa6YdyXYmV1KVI4RGacghXD+P4OcZy4WXlKw+787JlQ3iNrl8AvY87sWJuOK8ECcfXVwI34WFKiFvfKjSwhxMrH0sB6GHd+3eJHbVZXkjMGI5GrWOpPYBV5HNluJ+p8QplF/9K48cSLkRQQTpsFuRy2xaphwJcKvolC6vyHMM4cCCQJm97SueFAgztJqaeSxeJz729DOEa2Ct2Z0BC7oal3ecxnEzF0qJJpQWUMHyNPrmrnx6PqP4YCRStp2V/bU9YppgRntDPoVFeVsqm5U/p5CnuPPHUPTa/A7UrAf4DEOAhSJ6+L7J+bY8eEtfY9RfXYAscCB5IlQqqFoA6jaYvi+ny8x/bAukU8EASWFxAVMAWjoDedQ+bGaHFQxqamzdlo50L2REy77bplPpEOWHwqy/iHwt5g/po8RWjAHAFm0TFjPW9ALjpamJl3Z9RFVYVktwfkDNuWU4HH3QqdHbWXIujYh3EUas2c21mo508xOsDdPKkv9SO5SgFGZB0OYbFnaPiWbaVJkzsqC2qgpU+ShNRTtqhXmMReasPcqSmvNH4IZKqheI0Db2kI/w2zjss/tSJWcBNLdpi20BHEOJTuu0Cm1Dq9svijEHac6JXQ5Qb+zKBbe4v 4YOvAMdk YaL7yh2Ax2c/xnSlzAQVVabKEeuVWx/grQYoA4kDM+jdcfpdPqtXLuKzQRXr6RcHF7O3kP0unhLF94DFGKZd3J5pVzqFoOgygi43k2QkVNdJO6rk/I9zMh0SoS91NrfMtbfypTCAV9U9iLSjtoFDt7eARcKjTvRBHHXrX4Djj97rJXXiA4PKNhTLAlKTzT7cbc8Yl+T1A8OcM0qJ2FAU9y3qGHi8MYIVlvrZTW8z4CHnPCfrzGJ33DnhP+ENj+MD9qdQ1/s79dkvgWijayGGzJy6vsiPrSHIpNDMC41LKX+GYUo6SgVLi4RQyeEvaxTebnT0Z7RBeM80AJEyS0/rHXO5FN/jtrd6aS9SODZG4hjgd/eIJgWeiXbaPclFr8rucTWiqzcMm7+5wHo56clLbrPppk1E2YeO7zIplU9fY8z5EFDeBacFW4A+X+ul9X7pWUgUZHY2QjNrlws8XdIu+sT5txaPM2mhrL9ktH9YeqK07edXUVCC66tHygoE9Xdx1E6HqUV5Sqc3jS0bj2kZH2j0t/9tutGgEz7oqCE+26FvUeh2G1TK5fgqbTC91KQImxPNvWNs7aUhnKaT4D/A0mmqmQe97WIu5F4T+VsmF//jVp1yYZK0Dn4RnFEQgnBrhpluZoq/ZxluKGdR7d/HJxjQY0ywYSA4vG8xPehVkfrNvz4g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 28, 2024 at 5:06=E2=80=AFPM Sridhar, Kanchana P wrote: > > > > -----Original Message----- > > From: Nhat Pham > > Sent: Wednesday, August 28, 2024 2:35 PM > > To: Sridhar, Kanchana P > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com; > > Huang, Ying ; 21cnbao@gmail.com; akpm@linux- > > foundation.org; Zou, Nanhai ; Feghali, Wajdi K > > ; Gopal, Vinodh > > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > > > > On Wed, Aug 28, 2024 at 2:35=E2=80=AFAM Kanchana P Sridhar > > wrote: > > > > > > Hi All, > > > > > > This patch-series enables zswap_store() to accept and store mTHP > > > folios. The most significant contribution in this series is from the > > > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has be= en > > > migrated to v6.11-rc3 in patch 2/4 of this series. > > > > > > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting > > > https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > > ryan.roberts@arm.com/T/#u > > > > > > Additionally, there is an attempt to modularize some of the functiona= lity > > > in zswap_store(), to make it more amenable to supporting any-order > > > mTHPs. For instance, the function zswap_store_entry() stores a > > zswap_entry > > > in the xarray. Likewise, zswap_delete_stored_offsets() can be used to > > > delete all offsets corresponding to a higher order folio stored in zs= wap. > > > > > > > Will this have any conflict with mTHP swap work? Especially with mTHP > > swap-in and zswap writeback. > > > > My understanding is from zswap's perspective, the large folio is > > broken apart into independent subpages, correct? What happens when we > > have partially written back mTHP (i.e some subpages are in zswap > > still, whereas others are written back to swap). Would this > > automatically prevent mTHP swapin? > > That is a good point. To begin with, this patch-series would make the def= ault > behavior for mTHP swapout/storage and swapin for ZSWAP to be on par with > ZRAM. From zswap's perspective, imo this is a significant step forward to= wards > realizing cold memory storage with mTHP folios. However, it is only a sta= rting > point that makes the behavior uniform across zswap/zram. Initially, workl= oads > would see a one-time benefit with reclaim being able to swapout mTHP > folios without splitting, to zswap. If the mTHPs were cold memory, then w= e > would have derived latency gains towards memory savings (with zswap). > > However, if the mTHP were part of "not so cold" memory, this would result > in a one-way mTHP conversion to 4K folios. Depending on workloads and the= ir > access patterns, we could either see individual 4K folios being swapped i= n, > or entire chunks if not the entire (original) mTHP needing to be swapped = in. > > It should be noted that this is more of a performance vs. cold memory > preservation trade-off that needs to drive mTHP reclaim, storage, swapin = and > writeback policy. Different workloads could require different policies. H= owever, > even though this patch is only a starting point, it is still functionally= correct > by being equivalent to zram-mTHP, and compatible with the rest of mm and > swap as far as mTHP. Another important functionality/data consistency dec= ision > I made in this patch series is error handling during zswap_store() of mTH= P: > in case of any errors, all swap offsets for the mTHP are deleted from the > zswap xarray/zpool, since we know that the mTHP will now have to be store= d > in the backing swap device. IOW, an mTHP is either entirely stored in zsw= ap, > or entirely not stored in zswap. > > To answer your question, we would need to come up with what the semantics > would need to be for zswap zpool storage granularity, swapin granularity, > readahead granularity and writeback wrt mTHP and how the overall swap > sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-orde= r > folios during swapout. Once we have a good understanding of these policie= s, > we could implement them in zswap. Alternately, develop an abstraction tha= t is > one level above zswap/zram and makes things easier and shareable between > zswap and zram. By this, I mean fundamental assumptions such as consecuti= ve > swap offsets (for instance). To some extent, this implies that an mTHP as= a > swap entity is defined by consecutiveness of swap offsets. Maybe the poli= cy > to keep mTHPs in the system over extended duration might be to assemble > them dynamically based on swapin_readahead() decisions (which is based on > workload access patterns). In other words, mTHPs could be a useful abstra= ction > that can be static or even dynamic based on working set characteristics, = and > cold memory preservation. This is quite a complex topic imho. > > As we know, Barry Song and Chuanhua Han have started the discussion on > this in their zram mTHP swapin series [1]. Yeah I'm a bit more concerned with the correctness aspect. As long as it's not buggy, then we can implement mTHP zswapout first, and force individual subpage (z)swapin for now (since we cannot control writeback from writing individual subpages). We can discuss strategy to harmonize mTHP, zswap (with writeback) as we go along. BTW, I think we're not cc-ing Chengming? Is the get_maintainers script not working properly... Let me manually add him in - please include him in future submission and responses, as he is also a zswap reviewer :) Also cc-ing Usama who is interested in this work. > > [1] https://lore.kernel.org/all/20240821074541.516249-3-hanchuanhua@oppo.= com/T/#u > > Thanks, > Kanchana