From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10469C27C75 for ; Tue, 11 Jun 2024 15:43:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A57C6B00B3; Tue, 11 Jun 2024 11:43:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9557C6B00B4; Tue, 11 Jun 2024 11:43:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 843C96B00B5; Tue, 11 Jun 2024 11:43:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 658E66B00B3 for ; Tue, 11 Jun 2024 11:43:33 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0BC14161310 for ; Tue, 11 Jun 2024 15:43:33 +0000 (UTC) X-FDA: 82219027506.11.2C00088 Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) by imf18.hostedemail.com (Postfix) with ESMTP id 1DED61C000B for ; Tue, 11 Jun 2024 15:43:30 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qdfYVmNt; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718120611; a=rsa-sha256; cv=none; b=sr7Y5P9jPqOXNiJxq0StpT5S6RTFlvuIIR8+0ZFnzjJMf4mYRQZps+fcIp5oMjIMVbiLHn 6O9MG8RDfr+Jao47RKp/d4FWTg7LP4/2C90eHA3sS1N/lQt7Qy2ZDFRcd0nRWIhg6kBBrl ChCON3Ai3Zr58/d5diT3yW8jR1fIFNw= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qdfYVmNt; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718120611; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VtoYh4khLsFmK6WU0BTex1tBaCSDZcazUnln/dzNt8c=; b=rKejhYSTBmi/p13GcJDEux7jD6pYhq9YfMjDeApQqTcX/wUWfCDNNEzgttNrXOGKDXSU+r P3eAam4Y50GD6AGFVEzCqRrGdYkwB6yMpSjn5Z4b0Fp8eScHAfSKtr0y23SUO6qMyEoAbY Cp1N+eVZL6msIyaEO6+hQXYjoMmDQTQ= Received: by mail-lj1-f172.google.com with SMTP id 38308e7fff4ca-2ebdfe261f9so41248061fa.1 for ; Tue, 11 Jun 2024 08:43:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718120609; x=1718725409; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VtoYh4khLsFmK6WU0BTex1tBaCSDZcazUnln/dzNt8c=; b=qdfYVmNtSav2ooV3JPmEysctmAnjRCNP4ZFKw0v9lqpLOwmjHA1XTKDTqDHyKrDgRX 6ivtHroWfBRyN9LhnsvatZTd+Yv+bljMGj7vEPFJiONUGAKWfVFpnIo4QsgvuzASj2Cs B3UahlicU8RPjJ9jYsji85usgA5xp5NZ8bl8TiDUTnQCfKOVf2XSa0K4qVFkS0tqRXGH leRjyGbvReZ8fxpOk6pjKuUGlQTuZk2BYLjgFXHvCOMkjcycxjhv6sj1HstWKeOx9Ee+ yDtRaakP2HxMqqU5mpPwLGolNcrikfS+erWlkMKQMJMudL+TxTqyJULJues/wpcmJYn8 enrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718120609; x=1718725409; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VtoYh4khLsFmK6WU0BTex1tBaCSDZcazUnln/dzNt8c=; b=F2bmW2+cGFtNoPa6J3enMYyxm8lI18MSLQmrcm30soxqWxDuNwJFCOT5elkCRRngAF xQV5AfLg/7I93uJho29ObB+OECy36HzO8nKnHINnjbfwO+D8N3QhFfpktXDgAPwNrd97 V1ogW4aRUF/di1EhQDwLCgMdm1oTM17DhFKjMrm+5mcZVJ23/fXc3o/MkWry3G5Jxhrp wQs+T3+RfRDxT547vJlzOn70Xh924u0GsMr0d+w4TYjao5cvHBZNulgGJjiDc+scHYDw cUGWMKIrUEldfM79JhWX7dJYmZABEE8/lF3YSqmc9WdJHgepv70DrPktzzH9D38MqYfP doxA== X-Forwarded-Encrypted: i=1; AJvYcCXESI9ZZ88qh8mcnytUXDB/AJXRS6VtQ8WMTp4F1Jqswe0wAHqSH8Gdaj2NF7yBj1OOlYDtF23pGehaQNDnHqaIi4k= X-Gm-Message-State: AOJu0YzKqzeLqdvP4zJcUZ4W0PN+rdyvYN13V2L/d1y0iqjurt17X0qe ox9UZEXnuDAFhpq8nE0g4NBmt/d8fFAT3nPe3VpgAdYRDO/XGHUVkMD3zmKFP+2mdtrj0euyEEu iwiXo1S+c6318U6pi0cG1IZBehUeFJjDPujWWENOroebk9q8D5S66htg= X-Google-Smtp-Source: AGHT+IG3HG7sJETAPhIUdddPCLOjWaSEHzbToxKs2II+yrkoBEyxRoeaXdMHulT8rNexKPFYuuEU6QJ0wxUA2xvG4LY= X-Received: by 2002:ac2:42c4:0:b0:52c:826f:f3f1 with SMTP id 2adb3069b0e04-52c82700c52mr5705373e87.2.1718120609166; Tue, 11 Jun 2024 08:43:29 -0700 (PDT) MIME-Version: 1.0 References: <20240610121820.328876-1-usamaarif642@gmail.com> <20240610121820.328876-2-usamaarif642@gmail.com> <9ddfe544-636d-4638-ae0e-053674e47322@gmail.com> In-Reply-To: <9ddfe544-636d-4638-ae0e-053674e47322@gmail.com> From: Yosry Ahmed Date: Tue, 11 Jun 2024 08:42:51 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] mm: store zero pages to be swapped out in a bitmap To: Usama Arif Cc: 21cnbao@gmail.com, akpm@linux-foundation.org, hannes@cmpxchg.org, david@redhat.com, ying.huang@intel.com, hughd@google.com, willy@infradead.org, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Shakeel Butt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: u5okuo45dgf47rtgby6uiczxx4tjcoyb X-Rspamd-Queue-Id: 1DED61C000B X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1718120610-922516 X-HE-Meta: U2FsdGVkX1/al3zMzZ+VDLj/E1H/NISH1OYxsePjgQfXYAO1R5rCn7YeV++gtEP3ymOAwTKwcTCX6xe/W78ZZW2CDoMAd9aXVN33qYxjdeEVpjL/kC2T5Nom/BFcUQdlfIzMfjjWtad2LJCMr2BCZnzoDt7lj0D+2dOWqhha6JBFU6tVOSEofo/qv7xN1kXFGrOCxkT7mJvuec1TmVHaXN9DLyQ2NmM8JGbqoCGB2h031Lv6gzKVKoejLL/EQngn1E/459Qw6vG0eyOhN1WHSt8c9kI9fmnBNNS5H+c36RwF7NUaI2cFJ8GRPPNjjs6sWlreUEiZ/uFem6gs3xU/afFSfCzMOr04pVER8ow99cYXT6ppitx9bvcOtrySmjOPx35PCqID67mZG1DJDleJZ1FQFYm0xs0GOgX+9b4SKnIXeWZFvdgVZl0JrqtMl6hISba8YlKMnvVOUTt+dHO/88YNL6Yzkedz1WUDOEVHOJSpE/Rcl/+E1nYw7wEo2DWACdLstYDHWVXvPDtG6KliDaeWPFZKXVzl1bxoK8JJNjAJIFlZUrt3dQtHJd58MmXjDPAbQjeDLOVWV65RgZmGPIZguVlCeVQxvxm0mL4TX8BbJVKUI8kCqw6SPcYNme5T6t/N4FNvIrk6I2W0DqHZ40/4AW3DNvQoHHrjqIepbePMRdXktBpENUIqqUzmfKtdjBlvJmhj7veuKrlKFsNHf6KHmsFo11vCKr/Yqc4k14S2q00d3xN+yUIXHyqXAn23wxkK4CGkij+TVvNdK8edE5Rd3AS3opudFcXTZ0yYS1PUvNuEM0lQv9kmoSBgIrfmduTv8f5NUi7+iaWYLyebnsmgIeUnBNED0qMGtfVSsPVC0W1xwedqC7UigYYlHkDlnIRn2EyJXEnmZhmRV3PDPqcoXiNLoyskjV0et0lzd1hDvuVfalTmnggXyg1QnNdB6S03IHSac8RwSx54A5z MGh/ZPh3 pZdK1pkFB57NoWp8W/DIcQowpApP9pXs/U+cCa7syfPFFVrXVgloqZlDnhCTMKcwTsdkM/DyK2zvX9LRcmhFg7F+nT1ydBErxahy0W99U2MFjRnPaBCJK01ezWnPJjIbWRhuxUuucgHQKkey6cbmrsYjJojheLU2CSKd1gYiSiWERFviDPgIxo0GkUIIYd0SY+4cvD8qcAaspPkhyYL98Ifr8dB5p8BLD7a7GO1ZC/JuJbVvF77yhXtUKlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 11, 2024 at 4:49=E2=80=AFAM Usama Arif = wrote: > > @@ -515,8 +600,11 @@ void swap_read_folio(struct folio *folio, bool > synchronous, > >>>> psi_memstall_enter(&pflags); > >>>> } > >>>> delayacct_swapin_start(); > >>>> - > >>>> - if (zswap_load(folio)) { > >>>> + if (swap_zeromap_folio_test(folio)) { > >>>> + folio_zero_fill(folio); > >>>> + folio_mark_uptodate(folio); > >>>> + folio_unlock(folio); > >>> We don't currently support swapping in large folios, but it is a work > >>> in progress, and this will break once we have it. > >>> swap_zeromap_folio_test() will return false even if parts of the foli= o > >>> are in fact zero-filled. Then, we will go read those from disk swap, > >>> essentially corrupting data. > >> So yes, with this patch I tested swap out of large zero folio, but whe= n > >> swapping in it was page by page. My assumption was that swap_read_foli= o > >> (when support is added) would only pass a large folio that was earlier > >> swapped out as a large folio. So if a zero filled large folio was > >> swapped out, the zeromap will be set for all the pages in that folio, > >> and at large folio swap in (when it is supported), it will see that al= l > >> the bits in the zeromap for that folio are set, and will just > >> folio_zero_fill. > >> > >> If even a single page in large folio has non-zero data then zeromap wi= ll > >> not store it and it will go to either zswap or disk, and at read time = as > >> all the bits in zeromap are not set, it will still goto either zswap o= r > >> disk, so I think this works? > >> > >> Is my assumption wrong that only large folios can be swapped in only i= f > >> they were swapped out as large? I think this code works in that case. > > I think the assumption is incorrect. I think we would just check if > > contiguous PTEs have contiguous swap entries and swapin the folio as a > > large folio in this case. It is likely that the swap entries are > > contiguous because it was swapped out as a large folio, but I don't > > think it's guaranteed. > > Yes, makes sense. Thanks for explaining this. > > > > > For example, here is a patch that implements large swapin support for > > the synchronous swapin case, and I think it just checks that the PTEs > > have contiguous swap entries: > > https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.= com/ > > > > This makes a lot of sense because otherwise you'd have to keep track > > of how the folios were composed at the time of swapout, to be able to > > swap the same folios back in. > > I think the solution to large folio swap-in for this optimization and > zswap is not in swap_read_folio in this patch-series or any call further > down the stack, as its too late by the time you reach here, but in > Barrys' patch. This needs to happen much earlier when deciding the size > of the folio at folio creation in alloc_swap_folio, after Barry checks > > if (is_zswap_enabled()) > goto fallback; > > once the order is decided, we would need to check the indexes in the > zeromap array starting from swap_offset(entry) and ending at > swap_offset(entry) + 2^order are set. If no bits are set, or all bits > are set, then we allocate large folio. Otherwise, we goto fallback. > > I think its better to handle this in Barrys patch. I feel this series is > close to its final state, i.e. the only diff I have for the next > revision is below to remove start/end_writeback for zer_filled case. I > will comment on Barrys patch once the I send out the next revision of thi= s. Sorry I did not make myself clearer. I did not mean that you should handle the large folio swapin here. This needs to be handled at a higher level because as you mentioned, a large folio may be partially in the zeromap, zswap, swapcache, disk, etc. What I meant is that we should probably have a debug check to make sure this doesn't go unhandled. For zswap, I am trying to add a warning and fail the swapin operation if a large folio slips through to zswap. We can do something similar here if folks agree this is the right way in the interim: https://lore.kernel.org/lkml/20240611024516.1375191-3-yosryahmed@google.com= /. Maybe I am too paranoid, but I think it's easy to mess up these things when working on large folio swapin imo.