From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 922F8C27C75 for ; Tue, 11 Jun 2024 17:51:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 106AE6B00A5; Tue, 11 Jun 2024 13:51:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B6C66B00A7; Tue, 11 Jun 2024 13:51:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EBFAF6B00A8; Tue, 11 Jun 2024 13:51:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CD37F6B00A5 for ; Tue, 11 Jun 2024 13:51:52 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 77A4EC08A2 for ; Tue, 11 Jun 2024 17:51:52 +0000 (UTC) X-FDA: 82219350864.09.E56942C Received: from mail-vs1-f43.google.com (mail-vs1-f43.google.com [209.85.217.43]) by imf24.hostedemail.com (Postfix) with ESMTP id AF568180007 for ; Tue, 11 Jun 2024 17:51:50 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Zg5/TOqG"; spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718128310; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AgOOvOxwqfeCMq6yLdCzK4I4GbZLJXDhNrH/6ViA8nM=; b=M3xyju4y8GeGvMLiyGA6xbF9EiGHuky1r6tIesGXDOJQLwgJD7Tw0WhTOAGqhzsFNsxIF6 NmG0Z2P5bP+PylQ8JDCKeyf+PnODdrP2SRXHcGKcFSbcTwJqlLIaPyfvq3P5k4uiIryola h/eTVfhrM+2umjsKNzN5jZmAvP8Tyqk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718128310; a=rsa-sha256; cv=none; b=xbFZQA8vTUUShuMTE04G6HN0V/BgRZsEejsLb7hWC0EbOvKrcNP7R4QMSq592XOzsTHig0 doTvhEM1PiK7CZCd5ix2GtvmbYBXfrjOIHMpbf4cfFSRDy+C3M5Z4XjTGIImF25Ptsj8ZB edGn4LVkRfdfssEyNLDqesZ96CUzdTQ= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Zg5/TOqG"; spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-vs1-f43.google.com with SMTP id ada2fe7eead31-48c54370c56so786665137.3 for ; Tue, 11 Jun 2024 10:51:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718128309; x=1718733109; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=AgOOvOxwqfeCMq6yLdCzK4I4GbZLJXDhNrH/6ViA8nM=; b=Zg5/TOqG9fAUdAU9v6IQUumpUMzBVbL6aO/uVEf1UtlhYCOCHrgVHq1SKobaE2FRIk OA6RyXe6sTCebm3y9Zxhz3V076wpF9nV55PIv7VpdvjghSU7+Acz32UiYZelmO2Q7a8E nixmHSZSwC3hfo57kh27xZ5yOz+8SPy3I1eEEGtgcOtzsyHpZCmGPE0TYXTl/5hPAmGl GnLjPwkikAcapz2+3xTb4BEOcgSCGs2zOBivVWXdX378K7B9W8lmLa3VQh+vIXqeqpnN 50KAUQO/wxSebvxbGaflLAzQLWhBJbEXc0kHL352G3ixGPyC4yI/s5PtFv9KuqqHVDMb jYVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718128309; x=1718733109; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AgOOvOxwqfeCMq6yLdCzK4I4GbZLJXDhNrH/6ViA8nM=; b=fCmCoyS6/i0e6l60ype6gbactKOTjkVP5NbAUPhv85mOW5bRkhJFYZE93wkIj6SeQr F/QECALpTAPAK6Ahw1ftA5FF+dRlIl8Vw2qlVBwx/KsXpjNJvAF8cadSnWbn3IIqRMEU WJiMg9jmdSL0E1OWZFbmhCxrtZKuJxpNyJnRtD6af1QzZR2Ox/5G2SpiIJvIlaP4uaXr JVY60zEOLbjqJF76+upqG5JUEpzzsCdG+/42SJe/6wr5ii7I+6z9PJlyZ11Hqr8ANvTZ B0HYuba3lW3kw+qaGpikibZGiIApVCA4ML0OpXolu1gAcCaDLwohoWFm/mk8Jd+ZPKNx uxwg== X-Forwarded-Encrypted: i=1; AJvYcCXbKPxBQte99MUutLa975te0NIvTwqVlIEZY7wReObZ0BM9bBjmldQTZ708AqGD1DNUMRquGG8y0cYbiVmlRSn9cx4= X-Gm-Message-State: AOJu0YzSsvYmU8589Kwc849P0IvkSGtY+Xj0pVyPKPRDrnEBpfmHPb/5 0JE9P1z1WaHXBwfoDzZEMNiMGYT/Odbb/6Pr2lTP8jHwOzW0XrqDcppK9arCV/IJWsoRldnYUn5 R8NTo0CYMnQagNljFArmf/QAvVbgvGxkL0J7r X-Google-Smtp-Source: AGHT+IFW69M4nfsYi29M3zBfyivmaOCONNrtXO/Vdnq3OjZCVVh12iV0VLVWjulmWfiQpP23Fs8hLe/yAlGcfs44q6o= X-Received: by 2002:a05:6102:833:b0:48d:8c7f:551a with SMTP id ada2fe7eead31-48d8dbdc025mr1300935137.21.1718128309441; Tue, 11 Jun 2024 10:51:49 -0700 (PDT) MIME-Version: 1.0 References: <20240610121820.328876-1-usamaarif642@gmail.com> <20240610121820.328876-2-usamaarif642@gmail.com> <9ddfe544-636d-4638-ae0e-053674e47322@gmail.com> <08ea43f2-13d2-4b27-ae62-42cebc185c7b@gmail.com> In-Reply-To: <08ea43f2-13d2-4b27-ae62-42cebc185c7b@gmail.com> From: Yosry Ahmed Date: Tue, 11 Jun 2024 10:51:08 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] mm: store zero pages to be swapped out in a bitmap To: Usama Arif Cc: 21cnbao@gmail.com, akpm@linux-foundation.org, hannes@cmpxchg.org, david@redhat.com, ying.huang@intel.com, hughd@google.com, willy@infradead.org, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Shakeel Butt Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: AF568180007 X-Rspam-User: X-Rspamd-Server: rspam08 X-Stat-Signature: j7gfnzsndjfrayjkq1drd8bqp6iftsne X-HE-Tag: 1718128310-417060 X-HE-Meta: U2FsdGVkX18mL0EUEJzUlSeoCoU7GVkBaH6Asho5XALXyO62+153TKdJ3R5XFeLFAZesv77kQiB30NTGAYGPntWc0QhaI3xrhe1UxIeAMcs/DVINQ2GB8sGaAbuujmBX0cvLtUL1ZkoMhmsVD9wCmDK/HD86GCJLLzlr31+vA67ztaQsBGI5x19tsJmHNJyOXfGb6szPB0O8uM/2o8FM87zFLlgb5scCGQQZPs/i1nt2YFcDbui75IM3i8Vc8heOzeUIKdl5mAA4xhAJhu8D3P1inQthcIp2JAEY1mdBDjOCQN1/o+gCuf0OrcPljTM3AQfj7yNm4Jz7YC8zkHV+DVVkx92MD+RnEGMHOD4ItfMXwUmClFRJYWQ4TrDeiAcqeDNOiaD7Zu7XXZhtqXgctc5QIb/+WZjTOeL6LBPxQ1GLIXtBFvfeRJL/RHv8ykP5bWyDja+JuCVpcxeXzHlcOqQNKx8sY/ULWE3AGoxWL7VF7qBYpa6FghQzRZmvw+jbL7z+1JqlNlqjR1JMecDfgikq62i1Ka4CDlrNfnkqnKmmX7as3+dZMtjS3HiSPv0KCK5ItXBY5qpZdbN0nvo7abrug406qGz7BRgwG2ipQURewkaxEp+b1XOLCgopKjXClxB4wcL74EePqnTgCiSSxQzQeG/5j5tQ4G+AIWU7dFzcf4PJxIwUlh3nEMdjeb0qQtFuFwgNFT6EgBBY1tnWb0XgoSAfY9uV2kLGbUerMveYDR9Uhmpkn1MDKlWo45B+WxdRc9vxLaNxGPpN6OYdtdqGB/IepjO7PTvudT2ozisLJhf+CI+x7E2UAV6Nchvp4aUPyZ7ZPCBdjpdIg/RmFPImUcAi7FvPa5l5kQ8w4/dyriE+98mdmEocqkMAQWWUdKaaUs7VasGTVYkmTCM3wzPCLLb7jBw69c2wjdwaUfjLd8EjutcJpKZImKBKCxY61qLfHoTgpDG0IG4/mW4 PZIeDl02 eIZD9FtkvTf8YXieC4SuGF4UfL8wHDkJO+Rv1kDDvSBGnhnyQCnmaIgB+8R6UKnXnelV62A1m7L0e0xT6/NTywXMX8GPc2A5ykzxiHA0Le1bkvOOr4kknP1a1xZfZihkoV6eyeP/34/8jfvXVrpPgj7V5JX9TUKrSstzWJd4fFd4IdfqpahcjloENUwWCY+vg8NfEzLzF5AhrDNB0NL7TgrPEu3qTKZCfm71M X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [..] > >> I think its better to handle this in Barrys patch. I feel this series is > >> close to its final state, i.e. the only diff I have for the next > >> revision is below to remove start/end_writeback for zer_filled case. I > >> will comment on Barrys patch once the I send out the next revision of this. > > Sorry I did not make myself clearer. I did not mean that you should > > handle the large folio swapin here. This needs to be handled at a > > higher level because as you mentioned, a large folio may be partially > > in the zeromap, zswap, swapcache, disk, etc. > > > > What I meant is that we should probably have a debug check to make > > sure this doesn't go unhandled. For zswap, I am trying to add a > > warning and fail the swapin operation if a large folio slips through > > to zswap. We can do something similar here if folks agree this is the > > right way in the interim: > > https://lore.kernel.org/lkml/20240611024516.1375191-3-yosryahmed@google.com/. > > > > Maybe I am too paranoid, but I think it's easy to mess up these things > > when working on large folio swapin imo. > > So there is a difference between zswap and this optimization. In this > optimization, if the zeromap is set for all the folio bits, then we > should do large folio swapin. There still needs to be a change in Barrys > patch in alloc_swap_folio, but apart from that does the below diff over > v3 make it better? I will send a v4 with this if it sounds good. > > > diff --git a/mm/page_io.c b/mm/page_io.c > index 6400be6e4291..bf01364748a9 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -234,18 +234,24 @@ static void swap_zeromap_folio_clear(struct folio > *folio) > } > } > > -static bool swap_zeromap_folio_test(struct folio *folio) > +/* > + * Return the index of the first subpage which is not zero-filled > + * according to swap_info_struct->zeromap. > + * If all pages are zero-filled according to zeromap, it will return > + * folio_nr_pages(folio). > + */ > +static long swap_zeromap_folio_test(struct folio *folio) > { > struct swap_info_struct *sis = swp_swap_info(folio->swap); > swp_entry_t entry; > - unsigned int i; > + long i; Why long? > > for (i = 0; i < folio_nr_pages(folio); i++) { > entry = page_swap_entry(folio_page(folio, i)); > if (!test_bit(swp_offset(entry), sis->zeromap)) > - return false; > + return i; > } > - return true; > + return i; > } > > /* > @@ -581,6 +587,7 @@ void swap_read_folio(struct folio *folio, bool > synchronous, > { > struct swap_info_struct *sis = swp_swap_info(folio->swap); > bool workingset = folio_test_workingset(folio); > + long first_non_zero_page_idx; > unsigned long pflags; > bool in_thrashing; > > @@ -598,10 +605,19 @@ void swap_read_folio(struct folio *folio, bool > synchronous, > psi_memstall_enter(&pflags); > } > delayacct_swapin_start(); > - if (swap_zeromap_folio_test(folio)) { > + first_non_zero_page_idx = swap_zeromap_folio_test(folio); > + if (first_non_zero_page_idx == folio_nr_pages(folio)) { > folio_zero_fill(folio); > folio_mark_uptodate(folio); > folio_unlock(folio); > + } else if (first_non_zero_page_idx != 0) { > + /* > + * The case for when only *some* of subpages being > swapped-in were recorded > + * in sis->zeromap, while the rest are in zswap/disk is > currently not handled. > + * WARN in this case and return without marking the > folio uptodate so that > + * an IO error is emitted (e.g. do_swap_page() will sigbus). > + */ > + WARN_ON_ONCE(1); > } else if (zswap_load(folio)) { > folio_mark_uptodate(folio); > folio_unlock(folio); > > This is too much noise for swap_read_folio(). How about adding swap_read_folio_zeromap() that takes care of this and decides whether or not to call folio_mark_uptodate()? -static bool swap_zeromap_folio_test(struct folio *folio) +/* + * Return the index of the first subpage which is not zero-filled according to + * swap_info_struct->zeromap. If all pages are zero-filled according to + * zeromap, it will return folio_nr_pages(folio). + */ +static unsigned int swap_zeromap_folio_test(struct folio *folio) { struct swap_info_struct *sis = swp_swap_info(folio->swap); swp_entry_t entry; @@ -243,9 +248,9 @@ static bool swap_zeromap_folio_test(struct folio *folio) for (i = 0; i < folio_nr_pages(folio); i++) { entry = page_swap_entry(folio_page(folio, i)); if (!test_bit(swp_offset(entry), sis->zeromap)) - return false; + return i; } - return true; + return i; } /* @@ -511,6 +516,25 @@ static void sio_read_complete(struct kiocb *iocb, long ret) mempool_free(sio, sio_pool); } +static bool swap_read_folio_zeromap(struct folio *folio) +{ + unsigned int idx = swap_zeromap_folio_test(folio); + + if (idx == 0) + return false; + + /* + * Swapping in a large folio that is partially in the zeromap is not + * currently handled. Return true without marking the folio uptodate so + * that an IO error is emitted (e.g. do_swap_page() will sigbus). + */ + if (WARN_ON_ONCE(idx < folio_nr_pages(folio))) + return true; + + folio_zero_fill(folio); + folio_mark_uptodate(folio); + return true +} + static void swap_read_folio_fs(struct folio *folio, struct swap_iocb **plug) { struct swap_info_struct *sis = swp_swap_info(folio->swap); @@ -600,9 +624,7 @@ void swap_read_folio(struct folio *folio, bool synchronous, psi_memstall_enter(&pflags); } delayacct_swapin_start(); - if (swap_zeromap_folio_test(folio)) { - folio_zero_fill(folio); - folio_mark_uptodate(folio); + if (swap_read_folio_zeromap(folio)) { folio_unlock(folio); } else if (zswap_load(folio)) { folio_mark_uptodate(folio);