From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDDBCCD5BA7 for ; Thu, 5 Sep 2024 10:50:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D69B8D0009; Thu, 5 Sep 2024 06:50:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 748E68D0001; Thu, 5 Sep 2024 06:50:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C1258D0009; Thu, 5 Sep 2024 06:50:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 351458D0001 for ; Thu, 5 Sep 2024 06:50:22 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B8E431A0B58 for ; Thu, 5 Sep 2024 10:50:21 +0000 (UTC) X-FDA: 82530365442.14.D40F8E8 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf26.hostedemail.com (Postfix) with ESMTP id A17D714000F for ; Thu, 5 Sep 2024 10:50:19 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LKf1LqH3; spf=pass (imf26.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725533291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0GYJUg7+54qTrwGkRZduNJxcfzOT9XNv1f/rzckvM/M=; b=sx41xHjSRa6EUDalrtMHC7orkrq/f1SbKVO3M4hE3pcmDMMK6J+ynDfZ/o6hK4qrtEbyFN vx2cJAf+c+9zW0N1dSUYm1+tGjUN8WM56tUgzXkFICLT0/1yE9ldJPuvtI2376rhN7VHun iQVRlYlwOpSsiPD3AOBO4QbRo/K8qfA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LKf1LqH3; spf=pass (imf26.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725533291; a=rsa-sha256; cv=none; b=ktUpLR0xAjAPSOiCnS6X0zeM2SRwAx92H3aqhg8RvQxarU5M2zfOp2vO3TPRe3OF1bSTGa YZ7ggYPhcVc+8CyyzqUFM0oj3oj8bo08YzTNE4ivwSyiUXBWnF+/ezzKMdrsCjUxkx3WKC Qs1YbhmDztcO8a3qJhT7b/G1eeqCyCY= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a83597ce5beso104300366b.1 for ; Thu, 05 Sep 2024 03:50:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725533418; x=1726138218; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=0GYJUg7+54qTrwGkRZduNJxcfzOT9XNv1f/rzckvM/M=; b=LKf1LqH3EF3GV2LnyWAOhn56W/LnZcbs3pfAA5GDtBKwqZBzF2lMcROfK/u5Ig2vxQ mzWN7cYlQHtZ+sDkgnsAsQyfTgF7Ozea54KtXHCy6kYKAsrJ9vlWRG29FQhiW1h8+rg1 V+AG7zY6dKZXULVzd5JYPxGxwPdUSdMEwtZLCyGLAu2SPOxUqWhrinRmtO0NIHrjLDUU +ztocUbO0hR/WDbfbRYyCXCfeCdS2/8AcCzM1IX5Y9xUAzMtxKTDGAwZry4sVpwqYv9E PMLdDvvfZqvWnjhwq2kqtkoQjzyZFZLIwcK8TjYLko6L5iXbvIFZLqoPt2qKmNTiLgKf Rssg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725533418; x=1726138218; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0GYJUg7+54qTrwGkRZduNJxcfzOT9XNv1f/rzckvM/M=; b=ju81dSuM0YjIz7QMlrQ5IrtjLkALtjN2nCaqM2qZG8Lpu3Bg5Gt4dpHIwpE6tUUCyg mLhuRqbUvnDNGOEGfQAMjd8VHXhDjqGGVzrIA+jPj0dPGP2MoYxYTOKMPteInW/V220o x+jsPneqpvyqyIJLHoTrWToIiELgWKUWC2A4MMQsQFcrR041V0DjdddzJbqvMO0TGzGZ Qpync1miPSZLkRpN7rYlCrPzy1axI6qS6YkprWjvvNInsaKNb0mvB0n3SZABNGAGyxBE OYqGXUrCsLt714RA0E5mnQ788mbpsbM5AhLmlPmMxU9GF8d5CDRHRn+QaHn+mlK2d5J9 VKDQ== X-Forwarded-Encrypted: i=1; AJvYcCV7Yh+VdE4xlLXk4Wwkc80PFlJhtfbdGuU8V4nTcppT107MKj8QzYZ7lp6j7VmYx41OmvNtm5lO8w==@kvack.org X-Gm-Message-State: AOJu0YzjjVNeVrQh8k07W2+vYsGEfkJuevGWpkmSxOGv+0xFMrIQmRLx epIzh6u7IO26T717VtdAp1BFGzRRif0qsiffglXE4ej8O0+91X2H X-Google-Smtp-Source: AGHT+IEZ1ordyIBaKuxssxxyKjS8ASSj+jW5JbDN/TwKYvzXAFgkAfkMZBTAKoABnYhx7n7JbuNRwA== X-Received: by 2002:a17:907:7421:b0:a8a:6db7:665d with SMTP id a640c23a62f3a-a8a6db76ad2mr113394366b.17.1725533417744; Thu, 05 Sep 2024 03:50:17 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:eb:d0d0:c7fd:c82c? ([2620:10d:c092:500::5:decd]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8a7a1a4326sm32729966b.37.2024.09.05.03.50.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Sep 2024 03:50:17 -0700 (PDT) Message-ID: <076d2577-61d0-42a6-a95f-11326684a2f2@gmail.com> Date: Thu, 5 Sep 2024 11:50:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 1/2] mm: store zero pages to be swapped out in a bitmap To: Barry Song <21cnbao@gmail.com> Cc: Yosry Ahmed , akpm@linux-foundation.org, chengming.zhou@linux.dev, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, nphamcs@gmail.com, shakeel.butt@linux.dev, willy@infradead.org, ying.huang@intel.com, hanchuanhua@oppo.com References: <20240612124750.2220726-2-usamaarif642@gmail.com> <20240904055522.2376-1-21cnbao@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A17D714000F X-Stat-Signature: 4yj8g79igyyyj78yd41hz57u7jxk18ps X-Rspam-User: X-HE-Tag: 1725533419-492445 X-HE-Meta: U2FsdGVkX1+0aoze2F1Otf32KyJqfE9pxdRdfUkMSL4uaU8G2Wpus/H0U4VLXSm8Ud2YkBJGzcn80UD4AdAurwWKacY/wgycLG4Mv3CPB+ZRxQeEe4wcyEIRIQbejHP5EKvSlrNQdY9N9873gfq8pTFmc5jshCcFGl3ZRIojDBl13Qik3SrdTFfDYzccYT4T8XJJ8IbJYFG/rOlttN9J2ES9auLZW5UPuNpEiK/2KHucuI19CCnis3OpeZwBKH9wmJZbuuXW+QMm89RAf+DOPRQew6Q5uweLPJRv5u+GcL+zL5dl0bQnkAzAZonuI51nlu36eF9CXszB8h2wM+m5Ey+1IH858N2zLtp6PmCjmJ7mo06lENlekxlNvbCR7t8PvM/AMdkd38WUOCFDCAYG8NizYcxzbtOus+Ung0zU0NKTPYpjkVyRzQAh48/iAbmaAxcdpFVwVnzsUQNrwHasqFHzzWfUbaH+c+cgBVIyPK6Tz4m+hiIZUyXCJKNZgkPjzGc2AKJqGqLvmOZnXRElKHUGkTkCtj5UuZwBn4O8U2TBHZhj0SMylAOtxEUmFze7jt7CWL17oHsN5lqQnbjQB7AaWSLZ5pDYrH/SXly89IWKlBWr9JM2NunzHIitKfZ62HuclvBnUI3udNxB2isIiw+sP9LHdXubtf14KVAN643pwPmC8la21x/WqsJJXFitamRWX9y+LZCVJ2O1dNOFmQRrt55H9JvUlvhaYd7MljMcuUjVFpxeLjMyWU6NUbrVn+E7s75nT7JQMwgtQhoTijwwiNv7ETc6L6pI8lN8op9A4fCslJFJ2XHdLJx0Me20AOspm2k+YZLOWzCiNrdUVF2gXTCOIcQOiORjIgHwkBwjegJbpkL57VCWn882JjIuhDKaxaytiJDbE5u6ghp+mgGgql2LLM2xjTfKlHpO4o6SPjcczf+Pkd9SKwF7kbtxCMVX6lKVuUe7w7VjMzk lOWecKum dQR6+p8+3dVzNFM/zYO8FiH/iGM3ag4w6tXFECDeJpxYbiGhImJymcUSqFEW69sRIlrPeFBG0DSguk5c8cfaPzISwDafiRkAq1cemHd3QqtUTMh9LvJQNhCBV53QRE472vJewInR8I8toY2pzjKkihn4N1sLUCvIq2rhNIQN/ws/McEi9hgd514UFmr+Q8jCh9A4/SR0Ac4RfOHaW6oFmxtuqgvgK08npmba8P0HEdtp25UBi/V6VLnwVV9EI2vV6XRQOoylVzhG/h4UnJVBL6X2feizg+ubB6TybG9KvRbyiufDhkc3raNrqisv0wcpf9x4/ivfVeO1+4qwr3gV6YE+8rljerxmoaKL/aaWd09AH3dgiJ0OhbgXONM5GMH8HvtFlWlHrDyL+RzUlV52If3E4xNaSOItGQd5DKCTa0e57ShwUd3/tzKb8OMtLGOLEYzBKdx2i72JocyVrJHL8+ZPO8Q/5I+6NwgiEnBV1fI0WgppDLGbpBGzDesYT/uhvgcqZKGgbyYQ46hEK0YKphdEj0g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/09/2024 11:42, Barry Song wrote: > On Thu, Sep 5, 2024 at 10:37 PM Usama Arif wrote: >> >> >> >> On 05/09/2024 11:10, Barry Song wrote: >>> On Thu, Sep 5, 2024 at 8:49 PM Barry Song <21cnbao@gmail.com> wrote: >>>> >>>> On Thu, Sep 5, 2024 at 7:55 PM Yosry Ahmed wrote: >>>>> >>>>> On Thu, Sep 5, 2024 at 12:03 AM Barry Song <21cnbao@gmail.com> wrote: >>>>>> >>>>>> On Thu, Sep 5, 2024 at 5:41 AM Yosry Ahmed wrote: >>>>>>> >>>>>>> [..] >>>>>>>>> I understand the point of doing this to unblock the synchronous large >>>>>>>>> folio swapin support work, but at some point we're gonna have to >>>>>>>>> actually handle the cases where a large folio being swapped in is >>>>>>>>> partially in the swap cache, zswap, the zeromap, etc. >>>>>>>>> >>>>>>>>> All these cases will need similar-ish handling, and I suspect we won't >>>>>>>>> just skip swapping in large folios in all these cases. >>>>>>>> >>>>>>>> I agree that this is definitely the goal. `swap_read_folio()` should be a >>>>>>>> dependable API that always returns reliable data, regardless of whether >>>>>>>> `zeromap` or `zswap` is involved. Despite these issues, mTHP swap-in shouldn't >>>>>>>> be held back. Significant efforts are underway to support large folios in >>>>>>>> `zswap`, and progress is being made. Not to mention we've already allowed >>>>>>>> `zeromap` to proceed, even though it doesn't support large folios. >>>>>>>> >>>>>>>> It's genuinely unfair to let the lack of mTHP support in `zeromap` and >>>>>>>> `zswap` hold swap-in hostage. >>>>>>> >>>>>> >>>>>> Hi Yosry, >>>>>> >>>>>>> Well, two points here: >>>>>>> >>>>>>> 1. I did not say that we should block the synchronous mTHP swapin work >>>>>>> for this :) I said the next item on the TODO list for mTHP swapin >>>>>>> support should be handling these cases. >>>>>> >>>>>> Thanks for your clarification! >>>>>> >>>>>>> >>>>>>> 2. I think two things are getting conflated here. Zswap needs to >>>>>>> support mTHP swapin*. Zeromap already supports mTHPs AFAICT. What is >>>>>>> truly, and is outside the scope of zswap/zeromap, is being able to >>>>>>> support hybrid mTHP swapin. >>>>>>> >>>>>>> When swapping in an mTHP, the swapped entries can be on disk, in the >>>>>>> swapcache, in zswap, or in the zeromap. Even if all these things >>>>>>> support mTHPs individually, we essentially need support to form an >>>>>>> mTHP from swap entries in different backends. That's what I meant. >>>>>>> Actually if we have that, we may not really need mTHP swapin support >>>>>>> in zswap, because we can just form the large folio in the swap layer >>>>>>> from multiple zswap entries. >>>>>>> >>>>>> >>>>>> After further consideration, I've actually started to disagree with the idea >>>>>> of supporting hybrid swapin (forming an mTHP from swap entries in different >>>>>> backends). My reasoning is as follows: >>>>> >>>>> I do not have any data about this, so you could very well be right >>>>> here. Handling hybrid swapin could be simply falling back to the >>>>> smallest order we can swapin from a single backend. We can at least >>>>> start with this, and collect data about how many mTHP swapins fallback >>>>> due to hybrid backends. This way we only take the complexity if >>>>> needed. >>>>> >>>>> I did imagine though that it's possible for two virtually contiguous >>>>> folios to be swapped out to contiguous swap entries and end up in >>>>> different media (e.g. if only one of them is zero-filled). I am not >>>>> sure how rare it would be in practice. >>>>> >>>>>> >>>>>> 1. The scenario where an mTHP is partially zeromap, partially zswap, etc., >>>>>> would be an extremely rare case, as long as we're swapping out the mTHP as >>>>>> a whole and all the modules are handling it accordingly. It's highly >>>>>> unlikely to form this mix of zeromap, zswap, and swapcache unless the >>>>>> contiguous VMA virtual address happens to get some small folios with >>>>>> aligned and contiguous swap slots. Even then, they would need to be >>>>>> partially zeromap and partially non-zeromap, zswap, etc. >>>>> >>>>> As I mentioned, we can start simple and collect data for this. If it's >>>>> rare and we don't need to handle it, that's good. >>>>> >>>>>> >>>>>> As you mentioned, zeromap handles mTHP as a whole during swapping >>>>>> out, marking all subpages of the entire mTHP as zeromap rather than just >>>>>> a subset of them. >>>>>> >>>>>> And swap-in can also entirely map a swapcache which is a large folio based >>>>>> on our previous patchset which has been in mainline: >>>>>> "mm: swap: entirely map large folios found in swapcache" >>>>>> https://lore.kernel.org/all/20240529082824.150954-1-21cnbao@gmail.com/ >>>>>> >>>>>> It seems the only thing we're missing is zswap support for mTHP. >>>>> >>>>> It is still possible for two virtually contiguous folios to be swapped >>>>> out to contiguous swap entries. It is also possible that a large folio >>>>> is swapped out as a whole, then only a part of it is swapped in later >>>>> due to memory pressure. If that part is later reclaimed again and gets >>>>> added to the swapcache, we can run into the hybrid swapin situation. >>>>> There may be other scenarios as well, I did not think this through. >>>>> >>>>>> >>>>>> 2. Implementing hybrid swap-in would be extremely tricky and could disrupt >>>>>> several software layers. I can share some pseudo code below: >>>>> >>>>> Yeah it definitely would be complex, so we need proper justification for it. >>>>> >>>>>> >>>>>> swap_read_folio() >>>>>> { >>>>>> if (zeromap_full) >>>>>> folio_read_from_zeromap() >>>>>> else if (zswap_map_full) >>>>>> folio_read_from_zswap() >>>>>> else { >>>>>> folio_read_from_swapfile() >>>>>> if (zeromap_partial) >>>>>> folio_read_from_zeromap_fixup() /* fill zero >>>>>> for partially zeromap subpages */ >>>>>> if (zwap_partial) >>>>>> folio_read_from_zswap_fixup() /* zswap_load >>>>>> for partially zswap-mapped subpages */ >>>>>> >>>>>> folio_mark_uptodate() >>>>>> folio_unlock() >>>>>> } >>>>>> >>>>>> We'd also need to modify folio_read_from_swapfile() to skip >>>>>> folio_mark_uptodate() >>>>>> and folio_unlock() after completing the BIO. This approach seems to >>>>>> entirely disrupt >>>>>> the software layers. >>>>>> >>>>>> This could also lead to unnecessary IO operations for subpages that >>>>>> require fixup. >>>>>> Since such cases are quite rare, I believe the added complexity isn't worth it. >>>>>> >>>>>> My point is that we should simply check that all PTEs have consistent zeromap, >>>>>> zswap, and swapcache statuses before proceeding, otherwise fall back to the next >>>>>> lower order if needed. This approach improves performance and avoids complex >>>>>> corner cases. >>>>> >>>>> Agree that we should start with that, although we should probably >>>>> fallback to the largest order we can swapin from a single backend, >>>>> rather than the next lower order. >>>>> >>>>>> >>>>>> So once zswap mTHP is there, I would also expect an API similar to >>>>>> swap_zeromap_entries_check() >>>>>> for example: >>>>>> zswap_entries_check(entry, nr) which can return if we are having >>>>>> full, non, and partial zswap to replace the existing >>>>>> zswap_never_enabled(). >>>>> >>>>> I think a better API would be similar to what Usama had. Basically >>>>> take in (entry, nr) and return how much of it is in zswap starting at >>>>> entry, so that we can decide the swapin order. >>>>> >>>>> Maybe we can adjust your proposed swap_zeromap_entries_check() as well >>>>> to do that? Basically return the number of swap entries in the zeromap >>>>> starting at 'entry'. If 'entry' itself is not in the zeromap we return >>>>> 0 naturally. That would be a small adjustment/fix over what Usama had, >>>>> but implementing it with bitmap operations like you did would be >>>>> better. >>>> >>>> I assume you means the below >>>> >>>> /* >>>> * Return the number of contiguous zeromap entries started from entry >>>> */ >>>> static inline unsigned int swap_zeromap_entries_count(swp_entry_t entry, int nr) >>>> { >>>> struct swap_info_struct *sis = swp_swap_info(entry); >>>> unsigned long start = swp_offset(entry); >>>> unsigned long end = start + nr; >>>> unsigned long idx; >>>> >>>> idx = find_next_bit(sis->zeromap, end, start); >>>> if (idx != start) >>>> return 0; >>>> >>>> return find_next_zero_bit(sis->zeromap, end, start) - idx; >>>> } >>>> >>>> If yes, I really like this idea. >>>> >>>> It seems much better than using an enum, which would require adding a new >>>> data structure :-) Additionally, returning the number allows callers >>>> to fall back >>>> to the largest possible order, rather than trying next lower orders >>>> sequentially. >>> >>> No, returning 0 after only checking first entry would still reintroduce >>> the current bug, where the start entry is zeromap but other entries >>> might not be. We need another value to indicate whether the entries >>> are consistent if we want to avoid the enum: >>> >>> /* >>> * Return the number of contiguous zeromap entries started from entry; >>> * If all entries have consistent zeromap, *consistent will be true; >>> * otherwise, false; >>> */ >>> static inline unsigned int swap_zeromap_entries_count(swp_entry_t entry, >>> int nr, bool *consistent) >>> { >>> struct swap_info_struct *sis = swp_swap_info(entry); >>> unsigned long start = swp_offset(entry); >>> unsigned long end = start + nr; >>> unsigned long s_idx, c_idx; >>> >>> s_idx = find_next_bit(sis->zeromap, end, start); >> >> In all of the implementations you sent, you are using find_next_bit(..,end, start), but >> I believe it should be find_next_bit(..,nr, start)? > > I guess no, the tricky thing is that size means the size from the first bit of > bitmap but not from the "start" bit? > Ah ok, we should probably change the function prototype to end. Its ok then if thats the case. >> TBH, I liked the enum implementation you had in https://lore.kernel.org/all/20240905002926.1055-1-21cnbao@gmail.com/ >> Its the easiest to review and understand, and least likely to introduce any bugs. >> But it could be a personal preference. >> The likelihood of having contiguous zeromap entries *that* is less than nr is very low right? >> If so we could go with the enum implementation? > > what about the bool impementation i sent in the last email, it seems the > simplest code. > Looking now. >> >> >>> if (s_idx == end) { >>> *consistent = true; >>> return 0; >>> } >>> >>> c_idx = find_next_zero_bit(sis->zeromap, end, start); >>> if (c_idx == end) { >>> *consistent = true; >>> return nr; >>> } >>> >>> *consistent = false; >>> if (s_idx == start) >>> return 0; >>> return c_idx - s_idx; >>> } >>> >>> I can actually switch the places of the "consistent" and returned >>> number if that looks >>> better. >>> >>>> >>>> Hi Usama, >>>> what is your take on this? >>>> >>>>> >>>>>> >>>>>> Though I am not sure how cheap zswap can implement it, >>>>>> swap_zeromap_entries_check() >>>>>> could be two simple bit operations: >>>>>> >>>>>> +static inline zeromap_stat_t swap_zeromap_entries_check(swp_entry_t >>>>>> entry, int nr) >>>>>> +{ >>>>>> + struct swap_info_struct *sis = swp_swap_info(entry); >>>>>> + unsigned long start = swp_offset(entry); >>>>>> + unsigned long end = start + nr; >>>>>> + >>>>>> + if (find_next_bit(sis->zeromap, end, start) == end) >>>>>> + return SWAP_ZEROMAP_NON; >>>>>> + if (find_next_zero_bit(sis->zeromap, end, start) == end) >>>>>> + return SWAP_ZEROMAP_FULL; >>>>>> + >>>>>> + return SWAP_ZEROMAP_PARTIAL; >>>>>> +} >>>>>> >>>>>> 3. swapcache is different from zeromap and zswap. Swapcache indicates >>>>>> that the memory >>>>>> is still available and should be re-mapped rather than allocating a >>>>>> new folio. Our previous >>>>>> patchset has implemented a full re-map of an mTHP in do_swap_page() as mentioned >>>>>> in 1. >>>>>> >>>>>> For the same reason as point 1, partial swapcache is a rare edge case. >>>>>> Not re-mapping it >>>>>> and instead allocating a new folio would add significant complexity. >>>>>> >>>>>>>> >>>>>>>> Nonetheless, `zeromap` and `zswap` are distinct cases. With `zeromap`, we >>>>>>>> permit almost all mTHP swap-ins, except for those rare situations where >>>>>>>> small folios that were swapped out happen to have contiguous and aligned >>>>>>>> swap slots. >>>>>>>> >>>>>>>> swapcache is another quite different story, since our user scenarios begin from >>>>>>>> the simplest sync io on mobile phones, we don't quite care about swapcache. >>>>>>> >>>>>>> Right. The reason I bring this up is as I mentioned above, there is a >>>>>>> common problem of forming large folios from different sources, which >>>>>>> includes the swap cache. The fact that synchronous swapin does not use >>>>>>> the swapcache was a happy coincidence for you, as you can add support >>>>>>> mTHP swapins without handling this case yet ;) >>>>>> >>>>>> As I mentioned above, I'd really rather filter out those corner cases >>>>>> than support >>>>>> them, not just for the current situation to unlock swap-in series :-) >>>>> >>>>> If they are indeed corner cases, then I definitely agree. >>>> > > Thanks > Barry