From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3B01CD5BA8 for ; Thu, 5 Sep 2024 10:53:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10B176B02FB; Thu, 5 Sep 2024 06:53:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BB256B02FE; Thu, 5 Sep 2024 06:53:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4F5D6B02FC; Thu, 5 Sep 2024 06:53:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C315C6B02F6 for ; Thu, 5 Sep 2024 06:53:32 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 76C12120B63 for ; Thu, 5 Sep 2024 10:53:32 +0000 (UTC) X-FDA: 82530373464.21.E301BEF Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by imf07.hostedemail.com (Postfix) with ESMTP id 6CDC64001B for ; Thu, 5 Sep 2024 10:53:30 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SUV5a2tn; spf=pass (imf07.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725533513; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qa8c6PmNUqrDtcNmPZLWEjW1kIiibCZ+TUc4oU8tx9s=; b=SkZJaFv/guf1r5IDz7CYEmgnaTXCQvpM2rLPPuvQllXitAr3q/8YDeVsv3bwRRfrBFguYf +r7TJ8dxQQAiDIQU24anHLv+Ds9qs7mtuyYQcoIAr3weKsG/y9SV3K3Y3KxjSyMHEtsfJy Cq1ALiSXWzhv7Lr4pG/9CVyjA3iAIqY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725533513; a=rsa-sha256; cv=none; b=NTWrjKpFDMXrBAHPpK8K74bn3q90waxi65vmBMFbSm5sx6Y/APLuc4deZ3guRmHTnA6r6q MjTHQWLweAGHA4KmjrhaxYLFrCLSH5cvhgWU5TLXD88x//6Rr65SStbt/1zUZAxs8xAiVy 7EFnVkzV8DV7YTlCFya0SObAdwIo8C4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SUV5a2tn; spf=pass (imf07.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a86859e2fc0so89098666b.3 for ; Thu, 05 Sep 2024 03:53:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725533609; x=1726138409; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=qa8c6PmNUqrDtcNmPZLWEjW1kIiibCZ+TUc4oU8tx9s=; b=SUV5a2tnJUNiD7S1e3MOpDj1wMF8E7HK1OsEJV9gx4kElhlHNeStb8DIkoxnPJVD/+ +TPvTGBqML+1WuS8E2XbTaG5Kt3lCwGB5NWU4FIcawqMP3OC2c1nskHHPLRIrPa/E6mQ oSffc0pqMLpNz9wluRl/nxUHFSPsI2qO+HjtGzL81yad6NHl+KqtBKHrsseo6ZXMaIqR ZS+3ExJSf9wtODzFCzwl1M0+mp4tg7wZDuLNJm+2+JGRNN8hPLnKKA87hhEF7SpibNow m0LuGTh3aV9Bg3HVCixKB13NHOXm1UyDyxLQvK4KpIN5wIkwrVtnr5j4i9GVIfc2k6Mp 5ycg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725533609; x=1726138409; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qa8c6PmNUqrDtcNmPZLWEjW1kIiibCZ+TUc4oU8tx9s=; b=S57CIRcX9uzAV+GxmA0YlXy/UQ2oBbQKcR/tcwKNMTcNtfw0nuRjYlkLa36Sggfy3d CuaGtyDaEpB/3c+itaZFcBkFaVlWP6oqHq5mmvCjQkf/TGzP5VzdNhhUTVPXZIo41HOt mA8fVIPyAcb/3gpnvI6ZIMxRdsY/O8QSFb9UWplGo+eSRMeWo5z/bVFllBA0/1zRwqbw LK5rihOZHo660Fja92P0YxgrtqQusT7drx68A0TZ2TGI8cur97CqPMyFttJudNodvslx 5bO0XMW1urdNbag7H7wOUucjXA7n5PcJJHzUc2oVC8SlNBoYa01RQBv2InNidJ7WdqaD UEHg== X-Forwarded-Encrypted: i=1; AJvYcCW2jo5EXD05SeUw6v8LaJ7HwIwqQgQruwqfzWB1pWA2ENKeAHlb/20vKox6Cfgg2dHMigpIUnFdGQ==@kvack.org X-Gm-Message-State: AOJu0YwaM8cUy0x9OezfCpnDzXeKr+IMT+Y9eziqW2Kz8sYAFyv+QAc9 5PTEk2HP4PogxO1ZIRuV6qZArJPnIag3VfnEyc4bdiN6W7xksCDq X-Google-Smtp-Source: AGHT+IEfh6Q3J/ff5RKbzlmCoWINTCoaUotPZUbp2XqlHNYgyXtSaJqgVBD8x6p+qemKwy8GsHOSuQ== X-Received: by 2002:a17:907:968d:b0:a86:b6ee:8747 with SMTP id a640c23a62f3a-a8a32f9f1f8mr607056666b.43.1725533608621; Thu, 05 Sep 2024 03:53:28 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:eb:d0d0:c7fd:c82c? ([2620:10d:c092:500::5:decd]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8a623a9aa6sm120893866b.149.2024.09.05.03.53.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Sep 2024 03:53:28 -0700 (PDT) Message-ID: Date: Thu, 5 Sep 2024 11:53:27 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 1/2] mm: store zero pages to be swapped out in a bitmap To: Barry Song <21cnbao@gmail.com>, Yosry Ahmed Cc: akpm@linux-foundation.org, chengming.zhou@linux.dev, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, nphamcs@gmail.com, shakeel.butt@linux.dev, willy@infradead.org, ying.huang@intel.com, hanchuanhua@oppo.com References: <20240612124750.2220726-2-usamaarif642@gmail.com> <20240904055522.2376-1-21cnbao@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6CDC64001B X-Stat-Signature: r6hnh8fyao8zfjzrzai77hb8ggs9kra3 X-HE-Tag: 1725533610-946606 X-HE-Meta: U2FsdGVkX19qsZXTW6sNxKZSzu2rIWIdo8uyA9ivvS+u17tBEXE/G0ya2M0hE2EW5OJ64INHtFozHgUN9maitrJCQexBS9YWIpkqv1MmoulXbyQyPcWju1VzUFmAwYzwRo5SJVPpFyAgFT1dB0urVixKxSw3iabNxCp8bKINpXpH2XOaFCDVEb8TWIo+AR+G5Ts4HOMq9RhdJTTr8YroccMaXXw9KpNwFofJFAA3N/eurRFuZKpV60e8z80mf6s7fU/nGXoBm9d1jIbmqqVdI3+nQ61sFz1TUzg89xXDWDJftQ/Ikyq/TR7uhIpX2TXxm4Bp3uHi2R/K1Im1u1oUOLM1uZeWUhB5Nkx7uHOvJkUmIM5KfCz5wZJqiuNrVUoU/pQ9B1xX5VpT1/MTn9JV0Zb6oDLhY3yK6LfDXnFLP6rPm9bD+FoVeJH2XIf8yjtGSAFPAsL/v/ntfy/+/5Rut0LliAqS1k0/NkkYUhF1ensd5BQ5xXCA6ZZ54ZgoB0Rjlf4YGjT2yWUjjBM/tvmDuCPrZu7mzMvJd7nKJvNddF3mC1XN9zTot4nwoY3pAi//ABpwpRl8cBqQGFR6MNSn6BwafEVy1qCBwuMyhuH+ZHMP/x2il11P9gsq+diytisjYX/KIHEOOEZ3M6fLO0h8+0JRdEb6OxXoF8RQcOJjwL191lGWt5v2BPzfmYsw6rF4X18EukRHHN8thB/rgKWQTc8NUFH7AlyE7Z4Qh3AlK/18wTV8kcgnxqdCgOQMPb6h2TrwDbnrL5fvo3elpWfOZ2Ah/roxIUw00HLp4rrwjyctiiSkfSaAERS9MX9kRfo5Buy7U1Ib2jYWMSsEE/S7+ipLInUBsFvdsEFc3wg6wVVI28/CndYQ2weksY7rumLvJn47leJNJiqN30vo0J3ECjHZzHIKyy7iVW3zdH7zL9uaiPT05PrODi6BDg/1/3WPgOKyWqbD3XFXsKJ+jeA SsSuhi0r rVta8UE5sIItCb3oOuu5vZ9hsC9jbxukjIC0m4iATeVHM4DxlliHY8LHF/aERv9ryqQeZyEIe7Y5pnkrbyOfuZv/LaaTgax3LhH5Z1lIOXFts4a726gAUj9XHJPTSsQdIIg7rrHwnQLtPS5c79Wb3or7fSbQxhCMaQ6g59snfgTUEzn2xE9bwwAoNvhIM2XJcBovGqwUx/QVIPVQbndBpTGDeva7nwuXxqNTwyVafz167jfxfCNh5unYj7iX/4pyWLx/l6BPucFbVfV+IJ3CO5he7jxpz0McAWdvSvwlwaQIMfPqbXOMWmTuyuaKuQMD5CmL+zg3AmY4KhHR0ZLlfc2dq32a0jtNr0ETWiO+HxgboRy38iKoHx5BForKP7+dCPfjoz5yMbLBsOgGIfSiAi2iQKboobS5T7b08XMqF236TdnZlmHeo1N5cGWAOFoByBI+r9jthz6CPsl5j/yIea0snAaNWJwzUDPeX14JUr5elrr7ZilWXUHOcchDK6WHXtiOs0VU7SMRosZU23z3KJsZogw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/09/2024 11:33, Barry Song wrote: > On Thu, Sep 5, 2024 at 10:10 PM Barry Song <21cnbao@gmail.com> wrote: >> >> On Thu, Sep 5, 2024 at 8:49 PM Barry Song <21cnbao@gmail.com> wrote: >>> >>> On Thu, Sep 5, 2024 at 7:55 PM Yosry Ahmed wrote: >>>> >>>> On Thu, Sep 5, 2024 at 12:03 AM Barry Song <21cnbao@gmail.com> wrote: >>>>> >>>>> On Thu, Sep 5, 2024 at 5:41 AM Yosry Ahmed wrote: >>>>>> >>>>>> [..] >>>>>>>> I understand the point of doing this to unblock the synchronous large >>>>>>>> folio swapin support work, but at some point we're gonna have to >>>>>>>> actually handle the cases where a large folio being swapped in is >>>>>>>> partially in the swap cache, zswap, the zeromap, etc. >>>>>>>> >>>>>>>> All these cases will need similar-ish handling, and I suspect we won't >>>>>>>> just skip swapping in large folios in all these cases. >>>>>>> >>>>>>> I agree that this is definitely the goal. `swap_read_folio()` should be a >>>>>>> dependable API that always returns reliable data, regardless of whether >>>>>>> `zeromap` or `zswap` is involved. Despite these issues, mTHP swap-in shouldn't >>>>>>> be held back. Significant efforts are underway to support large folios in >>>>>>> `zswap`, and progress is being made. Not to mention we've already allowed >>>>>>> `zeromap` to proceed, even though it doesn't support large folios. >>>>>>> >>>>>>> It's genuinely unfair to let the lack of mTHP support in `zeromap` and >>>>>>> `zswap` hold swap-in hostage. >>>>>> >>>>> >>>>> Hi Yosry, >>>>> >>>>>> Well, two points here: >>>>>> >>>>>> 1. I did not say that we should block the synchronous mTHP swapin work >>>>>> for this :) I said the next item on the TODO list for mTHP swapin >>>>>> support should be handling these cases. >>>>> >>>>> Thanks for your clarification! >>>>> >>>>>> >>>>>> 2. I think two things are getting conflated here. Zswap needs to >>>>>> support mTHP swapin*. Zeromap already supports mTHPs AFAICT. What is >>>>>> truly, and is outside the scope of zswap/zeromap, is being able to >>>>>> support hybrid mTHP swapin. >>>>>> >>>>>> When swapping in an mTHP, the swapped entries can be on disk, in the >>>>>> swapcache, in zswap, or in the zeromap. Even if all these things >>>>>> support mTHPs individually, we essentially need support to form an >>>>>> mTHP from swap entries in different backends. That's what I meant. >>>>>> Actually if we have that, we may not really need mTHP swapin support >>>>>> in zswap, because we can just form the large folio in the swap layer >>>>>> from multiple zswap entries. >>>>>> >>>>> >>>>> After further consideration, I've actually started to disagree with the idea >>>>> of supporting hybrid swapin (forming an mTHP from swap entries in different >>>>> backends). My reasoning is as follows: >>>> >>>> I do not have any data about this, so you could very well be right >>>> here. Handling hybrid swapin could be simply falling back to the >>>> smallest order we can swapin from a single backend. We can at least >>>> start with this, and collect data about how many mTHP swapins fallback >>>> due to hybrid backends. This way we only take the complexity if >>>> needed. >>>> >>>> I did imagine though that it's possible for two virtually contiguous >>>> folios to be swapped out to contiguous swap entries and end up in >>>> different media (e.g. if only one of them is zero-filled). I am not >>>> sure how rare it would be in practice. >>>> >>>>> >>>>> 1. The scenario where an mTHP is partially zeromap, partially zswap, etc., >>>>> would be an extremely rare case, as long as we're swapping out the mTHP as >>>>> a whole and all the modules are handling it accordingly. It's highly >>>>> unlikely to form this mix of zeromap, zswap, and swapcache unless the >>>>> contiguous VMA virtual address happens to get some small folios with >>>>> aligned and contiguous swap slots. Even then, they would need to be >>>>> partially zeromap and partially non-zeromap, zswap, etc. >>>> >>>> As I mentioned, we can start simple and collect data for this. If it's >>>> rare and we don't need to handle it, that's good. >>>> >>>>> >>>>> As you mentioned, zeromap handles mTHP as a whole during swapping >>>>> out, marking all subpages of the entire mTHP as zeromap rather than just >>>>> a subset of them. >>>>> >>>>> And swap-in can also entirely map a swapcache which is a large folio based >>>>> on our previous patchset which has been in mainline: >>>>> "mm: swap: entirely map large folios found in swapcache" >>>>> https://lore.kernel.org/all/20240529082824.150954-1-21cnbao@gmail.com/ >>>>> >>>>> It seems the only thing we're missing is zswap support for mTHP. >>>> >>>> It is still possible for two virtually contiguous folios to be swapped >>>> out to contiguous swap entries. It is also possible that a large folio >>>> is swapped out as a whole, then only a part of it is swapped in later >>>> due to memory pressure. If that part is later reclaimed again and gets >>>> added to the swapcache, we can run into the hybrid swapin situation. >>>> There may be other scenarios as well, I did not think this through. >>>> >>>>> >>>>> 2. Implementing hybrid swap-in would be extremely tricky and could disrupt >>>>> several software layers. I can share some pseudo code below: >>>> >>>> Yeah it definitely would be complex, so we need proper justification for it. >>>> >>>>> >>>>> swap_read_folio() >>>>> { >>>>> if (zeromap_full) >>>>> folio_read_from_zeromap() >>>>> else if (zswap_map_full) >>>>> folio_read_from_zswap() >>>>> else { >>>>> folio_read_from_swapfile() >>>>> if (zeromap_partial) >>>>> folio_read_from_zeromap_fixup() /* fill zero >>>>> for partially zeromap subpages */ >>>>> if (zwap_partial) >>>>> folio_read_from_zswap_fixup() /* zswap_load >>>>> for partially zswap-mapped subpages */ >>>>> >>>>> folio_mark_uptodate() >>>>> folio_unlock() >>>>> } >>>>> >>>>> We'd also need to modify folio_read_from_swapfile() to skip >>>>> folio_mark_uptodate() >>>>> and folio_unlock() after completing the BIO. This approach seems to >>>>> entirely disrupt >>>>> the software layers. >>>>> >>>>> This could also lead to unnecessary IO operations for subpages that >>>>> require fixup. >>>>> Since such cases are quite rare, I believe the added complexity isn't worth it. >>>>> >>>>> My point is that we should simply check that all PTEs have consistent zeromap, >>>>> zswap, and swapcache statuses before proceeding, otherwise fall back to the next >>>>> lower order if needed. This approach improves performance and avoids complex >>>>> corner cases. >>>> >>>> Agree that we should start with that, although we should probably >>>> fallback to the largest order we can swapin from a single backend, >>>> rather than the next lower order. >>>> >>>>> >>>>> So once zswap mTHP is there, I would also expect an API similar to >>>>> swap_zeromap_entries_check() >>>>> for example: >>>>> zswap_entries_check(entry, nr) which can return if we are having >>>>> full, non, and partial zswap to replace the existing >>>>> zswap_never_enabled(). >>>> >>>> I think a better API would be similar to what Usama had. Basically >>>> take in (entry, nr) and return how much of it is in zswap starting at >>>> entry, so that we can decide the swapin order. >>>> >>>> Maybe we can adjust your proposed swap_zeromap_entries_check() as well >>>> to do that? Basically return the number of swap entries in the zeromap >>>> starting at 'entry'. If 'entry' itself is not in the zeromap we return >>>> 0 naturally. That would be a small adjustment/fix over what Usama had, >>>> but implementing it with bitmap operations like you did would be >>>> better. >>> >>> I assume you means the below >>> >>> /* >>> * Return the number of contiguous zeromap entries started from entry >>> */ >>> static inline unsigned int swap_zeromap_entries_count(swp_entry_t entry, int nr) >>> { >>> struct swap_info_struct *sis = swp_swap_info(entry); >>> unsigned long start = swp_offset(entry); >>> unsigned long end = start + nr; >>> unsigned long idx; >>> >>> idx = find_next_bit(sis->zeromap, end, start); >>> if (idx != start) >>> return 0; >>> >>> return find_next_zero_bit(sis->zeromap, end, start) - idx; >>> } >>> >>> If yes, I really like this idea. >>> >>> It seems much better than using an enum, which would require adding a new >>> data structure :-) Additionally, returning the number allows callers >>> to fall back >>> to the largest possible order, rather than trying next lower orders >>> sequentially. >> >> No, returning 0 after only checking first entry would still reintroduce >> the current bug, where the start entry is zeromap but other entries >> might not be. We need another value to indicate whether the entries >> are consistent if we want to avoid the enum: >> >> /* >> * Return the number of contiguous zeromap entries started from entry; >> * If all entries have consistent zeromap, *consistent will be true; >> * otherwise, false; >> */ >> static inline unsigned int swap_zeromap_entries_count(swp_entry_t entry, >> int nr, bool *consistent) >> { >> struct swap_info_struct *sis = swp_swap_info(entry); >> unsigned long start = swp_offset(entry); >> unsigned long end = start + nr; >> unsigned long s_idx, c_idx; >> >> s_idx = find_next_bit(sis->zeromap, end, start); >> if (s_idx == end) { >> *consistent = true; >> return 0; >> } >> >> c_idx = find_next_zero_bit(sis->zeromap, end, start); >> if (c_idx == end) { >> *consistent = true; >> return nr; >> } >> >> *consistent = false; >> if (s_idx == start) >> return 0; >> return c_idx - s_idx; >> } >> >> I can actually switch the places of the "consistent" and returned >> number if that looks >> better. > > I'd rather make it simpler by: > > /* > * Check if all entries have consistent zeromap status, return true if > * all entries are zeromap or non-zeromap, else return false; > */ > static inline bool swap_zeromap_entries_check(swp_entry_t entry, int nr) > { > struct swap_info_struct *sis = swp_swap_info(entry); > unsigned long start = swp_offset(entry); > unsigned long end = start + *nr; > I guess you meant end= start + nr here? > if (find_next_bit(sis->zeromap, end, start) == end) > return true; > if (find_next_zero_bit(sis->zeromap, end, start) == end) > return true; > So if zeromap is all false, this still returns true. We cant use this function in swap_read_folio_zeromap, to check at time of swapin if all were zeros, right? > return false; > } > > mm/page_io.c can combine this with reading the zeromap of first entry to > decide if it will read folio from zeromap; mm/memory.c only needs the bool > to fallback to the largest possible order. > > static inline unsigned long thp_swap_suitable_orders(...) > { > int order, nr; > > order = highest_order(orders); > > while (orders) { > nr = 1 << order; > if ((addr >> PAGE_SHIFT) % nr == swp_offset % nr && > swap_zeromap_entries_check(entry, nr)) > break; > order = next_order(&orders, order); > } > > return orders; > } > >> >>> >>> Hi Usama, >>> what is your take on this? >>> >>>> >>>>> >>>>> Though I am not sure how cheap zswap can implement it, >>>>> swap_zeromap_entries_check() >>>>> could be two simple bit operations: >>>>> >>>>> +static inline zeromap_stat_t swap_zeromap_entries_check(swp_entry_t >>>>> entry, int nr) >>>>> +{ >>>>> + struct swap_info_struct *sis = swp_swap_info(entry); >>>>> + unsigned long start = swp_offset(entry); >>>>> + unsigned long end = start + nr; >>>>> + >>>>> + if (find_next_bit(sis->zeromap, end, start) == end) >>>>> + return SWAP_ZEROMAP_NON; >>>>> + if (find_next_zero_bit(sis->zeromap, end, start) == end) >>>>> + return SWAP_ZEROMAP_FULL; >>>>> + >>>>> + return SWAP_ZEROMAP_PARTIAL; >>>>> +} >>>>> >>>>> 3. swapcache is different from zeromap and zswap. Swapcache indicates >>>>> that the memory >>>>> is still available and should be re-mapped rather than allocating a >>>>> new folio. Our previous >>>>> patchset has implemented a full re-map of an mTHP in do_swap_page() as mentioned >>>>> in 1. >>>>> >>>>> For the same reason as point 1, partial swapcache is a rare edge case. >>>>> Not re-mapping it >>>>> and instead allocating a new folio would add significant complexity. >>>>> >>>>>>> >>>>>>> Nonetheless, `zeromap` and `zswap` are distinct cases. With `zeromap`, we >>>>>>> permit almost all mTHP swap-ins, except for those rare situations where >>>>>>> small folios that were swapped out happen to have contiguous and aligned >>>>>>> swap slots. >>>>>>> >>>>>>> swapcache is another quite different story, since our user scenarios begin from >>>>>>> the simplest sync io on mobile phones, we don't quite care about swapcache. >>>>>> >>>>>> Right. The reason I bring this up is as I mentioned above, there is a >>>>>> common problem of forming large folios from different sources, which >>>>>> includes the swap cache. The fact that synchronous swapin does not use >>>>>> the swapcache was a happy coincidence for you, as you can add support >>>>>> mTHP swapins without handling this case yet ;) >>>>> >>>>> As I mentioned above, I'd really rather filter out those corner cases >>>>> than support >>>>> them, not just for the current situation to unlock swap-in series :-) >>>> >>>> If they are indeed corner cases, then I definitely agree. >>> > > Thanks > Barry