From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9F0ED3DEA0 for ; Fri, 18 Oct 2024 17:21:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BE4F6B007B; Fri, 18 Oct 2024 13:21:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 36E476B0082; Fri, 18 Oct 2024 13:21:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 235AC6B0096; Fri, 18 Oct 2024 13:21:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 05A0B6B007B for ; Fri, 18 Oct 2024 13:21:23 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8652CC04C6 for ; Fri, 18 Oct 2024 17:21:10 +0000 (UTC) X-FDA: 82687388910.10.BA693ED Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) by imf12.hostedemail.com (Postfix) with ESMTP id 79BBA40014 for ; Fri, 18 Oct 2024 17:21:16 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QVbedwIZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729271973; a=rsa-sha256; cv=none; b=QKsubmKI7Gvmw2tCI0jqwQXjRIuF6p4Ee1IA9ToWhPf7rTKg7FGENgwf/JhXWuf7UoSnot XD4A9fjErJ6NXxRfyqo3Qh5TEHA8XqiXvsgeJ+mWlyAodtdU0Us9FEk9aA7+IK+N7bLXsX s1kdfbhap+jsv08wyVdngQBS7t+FyJU= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QVbedwIZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729271973; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nQVAcZ2nJlYpO9S9j+dRNzkbpkPjGUWLEdeCXu7RkWI=; b=nALchq5CpPqf3LxaUkgizeZzm0LhvtjvhWFkVLBFx36kYUdwnEFBQlnOwp6M0MKU7toj3T Q0gCHtvtFaXcn28d5O2/bGnHXHUKYenUUvy87dZEzd/hNkGSgDquLdRVZyfEpjqf55J88C 4HV6O0nUJZn1mnEmd2pLNRe2hwu/j3w= Received: by mail-qv1-f47.google.com with SMTP id 6a1803df08f44-6cbe9e8bbb1so15618286d6.1 for ; Fri, 18 Oct 2024 10:21:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729272081; x=1729876881; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nQVAcZ2nJlYpO9S9j+dRNzkbpkPjGUWLEdeCXu7RkWI=; b=QVbedwIZrON391Q4MmfqXmpYNS0ENSMXrmEr1/GYgREmiA29b9IQUv/bB9+QUcJa1u 8OdkQ3c95nxLY9qyivAre2+xoRvICa77Vn4u1UihpAIWdsC+Bf2+zWMh3H7PW8GWC/Ii QGmEBP4eg7aXoJI7yxONe5A62raX8SI+FkWtDrUvLSdfbBfbZQyKhh1i0dNUTjvJAAS5 HsBDOuIrjwbN26NVodZZqSaUj24+xkDLtOUY0Ke8Eriq/BS23nXBGRylO1VTXXoWsify 05LyJjwKPQJTEPtdQMJBnf40TZxIaBLfsoDBhLRVhRnMMwPORPgi/OokeDDBBM5HeV+t 5R5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729272081; x=1729876881; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nQVAcZ2nJlYpO9S9j+dRNzkbpkPjGUWLEdeCXu7RkWI=; b=EKxG6NKL0IFL+anIKgQuPCiWv7fxIEVo8qoTj7FJEdbIdu17eja13ubL5t/mYMYi20 t7Nh8dCMZ8wphYb0DKVqxCqEhjUMQfbj7u9iZAKtGpyy/VbN1qlrnY88yY2ohELdPB/J VaAEwBOU0B4C3z6HDpF3+IBOYKomf5at/fMFpwfks8qbvYY5OUQyxCdJHVD21xx2+L9g RbCB3Twgvsy8gbfqw2cfIjq+A6hpaCDOPZqaXK8ugLB23Tn21P1i4pp4wCadepjaakgp JFFyUPuhcefVg/m1X6UDre1Sac8A9qWbXm+MS38ZJ4tV0HLrxfasv9XK2mVTyaLHuB2s k1WA== X-Forwarded-Encrypted: i=1; AJvYcCV53B5sUUSF2JQ+XbHbYItJKTcMDlEAzj7Eq/QsFuupI/nNcO1C6oZsizSIQlaKTd41twDc2c7kig==@kvack.org X-Gm-Message-State: AOJu0Yyktet7ZMPNP16hOfjGmtu5s5Vf6dED35jLlBWxT1iFXwFX2Gyt 5yg8y90vmxWORd7clE0vKX7K5mJH28z4qYoZwmnEAysj3qZ6ep2ft56S+LAYld8yWNOXc1xtqYy 5YmSv/L3JUtGUlD3YhXyUDfQn7Ac= X-Google-Smtp-Source: AGHT+IGb3M/z9O6DLeA4VaQDCxdnsZS6tG6uEbi4RvdZyUwvzcHQVFG3rE2WBH5VTcH9QSYUv8S0sDOsC8Wkwz7Rw/0= X-Received: by 2002:a05:6214:3901:b0:6cb:c9bb:b040 with SMTP id 6a1803df08f44-6cde14c5adamr46942506d6.3.1729272080648; Fri, 18 Oct 2024 10:21:20 -0700 (PDT) MIME-Version: 1.0 References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> <20241018064805.336490-7-kanchana.p.sridhar@intel.com> <71bcbd3f-a8bd-4014-aabe-081006cc62f8@redhat.com> <169e5cb6-701a-474c-a703-60daee8b4d3f@gmail.com> In-Reply-To: <169e5cb6-701a-474c-a703-60daee8b4d3f@gmail.com> From: Nhat Pham Date: Fri, 18 Oct 2024 10:21:09 -0700 Message-ID: Subject: Re: [RFC PATCH v1 6/7] mm: do_swap_page() calls swapin_readahead() zswap load batching interface. To: Usama Arif Cc: David Hildenbrand , Kanchana P Sridhar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, chengming.zhou@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 79BBA40014 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ip9hoysszs4f589jgihx5gfryrc45ofp X-HE-Tag: 1729272076-843570 X-HE-Meta: U2FsdGVkX1/swRPSPRLpLynx9wRdUBDCETpWPGynEJ/rLu1YpoiHDRgDJEAk/WxvfeewlzKF5gCXbaQuuiWcIBRZXzeoFhBPQcHuAQ12CMeSORG5XyYSjtnJCA+ChuI0n2u53QjHH5p7ktUeP26OFVod1Uxf8eYbD9i18+uyV7tqJ196YVfn1L5EAiAXFL/h7Dw0KbwrUBp8uXAet5njjP4u3t7f3RsHN6+JPhxiqx9HDrjkf1F4P8Gg7lTE94aBJZ2Fnc8JXHq6H6i1XJK/Gbd6fMra9XHndzrlC+AOv+iUXZUWAeYZQIXbGHh9wQaEX6an2mxW+gHNxiU7t1F1+31x8fTX49/CqekRM0boFj4SjoFofR6eThCeQiKdZcqaEDvnitFSrtSEVjydBrlpJ1JGEUDL3b5Sfi9PVnhE8x3CqcUX3c1ZpP5HgF1NlQLVUeYOsfhexzgSeVz8vQmXx2Tu+UqxpIN1q38CKB+MCmISn7VE6r6KKaoTzVocsR3xri5b8xoC4nCcK6ipF+3Kj4WQvVcSXY9aUBK0OMFbwk4W4B8uGtNXRVkCiQRGP1HL+46AaYn8pMHuofCUAo9SL5vV2kjn7qv5uDcdy6Ub8rFPAkGCcInkD/CjG4Txugo/ieL3qPncoW2r4OMgFmLvJHWZz5TSNJDgbbvRzDxkH+rT0IqVkWHtkUWXzm0KQY2FCc6UY3OSAYxR9iBndijI0MqUF9YqCQFSeDubFQx3QwVr9PRei+6M3n/npIMMtQ9inXvWEnrzKttP0EomM2nYlXg8J0olxRv5C/vwD+MTz8GqGuvoHab8GtzmcQXniTLQ9Fwoyza47tMnev0jYw4tCGX+dJa6MtsQevJ+SHgxpxgzPqV9oEkjYNYT9yPvs7C2wzqWXHoiEszxA/0xgIdrtO9sKSV+rxYoN3u4i4hDGV4ItpL5g/noJ5/gHWKVHAeHIaBLtskMJaRIwlMpzbo WSil2u+I 3s75Ac6NGWKCOTtybLeS0XT//kB/Q/K4719gHiJajX7s9unLdmPIjcJ13OC10s1aNmHTKBEE/8O8MVaB5gKe8rXH+tWv/YmyVmj1oQUcfack0bfaHoIL3bQlDuRXYszdRqMeCL5oNt5T9B1FS3NjGRzgzj+CmU8woPC++W5NWXxIh35tsAojKUbwTlJs3yisKyKLuSVAIZlitqQzqQRH6v9FUbkA0AW6AGZcLNK/x1Yp5ShmUbYpLd5YS/mRHb+TdbGUpzAO6iiNz/L7nf7hKBAZUbJsrKgMp71KH+P2JeaUVJgh8B/Ao8KKY8kk5eMo/8p0EIq1l8bIrYpk5YP8pcd/zdzbhjXCbe9CaSIbfdDqyZKtlZiTVjE0sgaGSUgP9yEsG1LGI6u/rtQN1WLU7uh7OihjyKTdNfBNOxoIMjt/NzZVrrFYReLmRXg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 18, 2024 at 4:04=E2=80=AFAM Usama Arif = wrote: > > > On 18/10/2024 08:26, David Hildenbrand wrote: > > On 18.10.24 08:48, Kanchana P Sridhar wrote: > >> This patch invokes the swapin_readahead() based batching interface to > >> prefetch a batch of 4K folios for zswap load with batch decompressions > >> in parallel using IAA hardware. swapin_readahead() prefetches folios b= ased > >> on vm.page-cluster and the usefulness of prior prefetches to the > >> workload. As folios are created in the swapcache and the readahead cod= e > >> calls swap_read_folio() with a "zswap_batch" and a "non_zswap_batch", = the > >> respective folio_batches get populated with the folios to be read. > >> > >> Finally, the swapin_readahead() procedures will call the newly added > >> process_ra_batch_of_same_type() which: > >> > >> 1) Reads all the non_zswap_batch folios sequentially by calling > >> swap_read_folio(). > >> 2) Calls swap_read_zswap_batch_unplug() with the zswap_batch which c= alls > >> zswap_finish_load_batch() that finally decompresses each > >> SWAP_CRYPTO_SUB_BATCH_SIZE sub-batch (i.e. upto 8 pages in a pref= etch > >> batch of say, 32 folios) in parallel with IAA. > >> > >> Within do_swap_page(), we try to benefit from batch decompressions in = both > >> these scenarios: > >> > >> 1) single-mapped, SWP_SYNCHRONOUS_IO: > >> We call swapin_readahead() with "single_mapped_path =3D true". = This is > >> done only in the !zswap_never_enabled() case. > >> 2) Shared and/or non-SWP_SYNCHRONOUS_IO folios: > >> We call swapin_readahead() with "single_mapped_path =3D false". > >> > >> This will place folios in the swapcache: a design choice that handles = cases > >> where a folio that is "single-mapped" in process 1 could be prefetched= in > >> process 2; and handles highly contended server scenarios with stabilit= y. > >> There are checks added at the end of do_swap_page(), after the folio h= as > >> been successfully loaded, to detect if the single-mapped swapcache fol= io is > >> still single-mapped, and if so, folio_free_swap() is called on the fol= io. > >> > >> Within the swapin_readahead() functions, if single_mapped_path is true= , and > >> either the platform does not have IAA, or, if the platform has IAA and= the > >> user selects a software compressor for zswap (details of sysfs knob > >> follow), readahead/batching are skipped and the folio is loaded using > >> zswap_load(). > >> > >> A new swap parameter "singlemapped_ra_enabled" (false by default) is a= dded > >> for platforms that have IAA, zswap_load_batching_enabled() is true, an= d we > >> want to give the user the option to run experiments with IAA and with > >> software compressors for zswap (swap device is SWP_SYNCHRONOUS_IO): > >> > >> For IAA: > >> echo true > /sys/kernel/mm/swap/singlemapped_ra_enabled > >> > >> For software compressors: > >> echo false > /sys/kernel/mm/swap/singlemapped_ra_enabled > >> > >> If "singlemapped_ra_enabled" is set to false, swapin_readahead() will = skip > >> prefetching folios in the "single-mapped SWP_SYNCHRONOUS_IO" do_swap_p= age() > >> path. > >> > >> Thanks Ying Huang for the really helpful brainstorming discussions on = the > >> swap_read_folio() plug design. > >> > >> Suggested-by: Ying Huang > >> Signed-off-by: Kanchana P Sridhar > >> --- > >> mm/memory.c | 187 +++++++++++++++++++++++++++++++++++++---------= -- > >> mm/shmem.c | 2 +- > >> mm/swap.h | 12 ++-- > >> mm/swap_state.c | 157 ++++++++++++++++++++++++++++++++++++---- > >> mm/swapfile.c | 2 +- > >> 5 files changed, 299 insertions(+), 61 deletions(-) > >> > >> diff --git a/mm/memory.c b/mm/memory.c > >> index b5745b9ffdf7..9655b85fc243 100644 > >> --- a/mm/memory.c > >> +++ b/mm/memory.c > >> @@ -3924,6 +3924,42 @@ static vm_fault_t remove_device_exclusive_entry= (struct vm_fault *vmf) > >> return 0; > >> } > >> +/* > >> + * swapin readahead based batching interface for zswap batched loads = using IAA: > >> + * > >> + * Should only be called for and if the faulting swap entry in do_swa= p_page > >> + * is single-mapped and SWP_SYNCHRONOUS_IO. > >> + * > >> + * Detect if the folio is in the swapcache, is still mapped to only t= his > >> + * process, and further, there are no additional references to this f= olio > >> + * (for e.g. if another process simultaneously readahead this swap en= try > >> + * while this process was handling the page-fault, and got a pointer = to the > >> + * folio allocated by this process in the swapcache), besides the ref= erences > >> + * that were obtained within __read_swap_cache_async() by this proces= s that is > >> + * faulting in this single-mapped swap entry. > >> + */ > > > > How is this supposed to work for large folios? > > > > Hi, > > I was looking at zswapin large folio support and have posted a RFC in [1]= . > I got bogged down with some prod stuff, so wasn't able to send it earlier= . > > It looks quite different, and I think simpler from this series, so might = be > a good comparison. > > [1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@gma= il.com/ > > Thanks, > Usama I agree. I think the lower hanging fruit here is to build upon Usama's patch. Kanchana, do you think we can just use the new batch decompressing infrastructure, and apply it to Usama's large folio zswap loading? I'm not denying the readahead idea outright, but that seems much more complicated. There are questions regarding the benefits of readahead-ing when apply to zswap in the first place - IIUC, zram circumvents that logic in several cases, and zswap shares many characteristics with zram (fast, synchronous compression devices). So let's reap the low hanging fruits first, get the wins as well as stress test the new infrastructure. Then we can discuss the readahead idea later?