From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10D2FD1713C for ; Mon, 21 Oct 2024 21:10:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FA206B0095; Mon, 21 Oct 2024 17:10:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A8B26B009B; Mon, 21 Oct 2024 17:10:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 571016B009C; Mon, 21 Oct 2024 17:10:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 396926B009B for ; Mon, 21 Oct 2024 17:10:31 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 155571C6964 for ; Mon, 21 Oct 2024 21:10:13 +0000 (UTC) X-FDA: 82698852726.18.EB4E8FB Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com [209.85.208.170]) by imf20.hostedemail.com (Postfix) with ESMTP id 5526D1C001E for ; Mon, 21 Oct 2024 21:10:10 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="u5Rr/CZt"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.170 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729544992; a=rsa-sha256; cv=none; b=qjK81aSA/Wfe5jIaVdNEufuEp5UZmU/stHbXLjJttVsErb6jyZZDkePoOEt+v1Iz/oH4xV c/Fc4YnfglwCogDjQvRqgVbw73V9fv/BE7zJfaX4PI4rdaVJq/nSldAJJdMvrArJTDMAtA pPYol03Z8QidhZo/vVcTHKZ259fP5tI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="u5Rr/CZt"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.170 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729544992; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YGaNgV0yGhUkl94jFkqUnmK0xlXlM006gLHr3N50u1A=; b=fOOxQL4T+oOVdNettiTMQbTf4quiudb4Tch/9R2o5Tn/Gfyo1bhN3JwquJFt9VuWyEIxDm /hMcDMdl3FwcBAyutCdDBriwa4z36DVYfExJqAESjsdqYocAK5mIP1jzfFIk7NqCvybx3n dIW0sWuhLTMqPrZ15udqGJcToc6Mb9k= Received: by mail-lj1-f170.google.com with SMTP id 38308e7fff4ca-2fb3c3d5513so54401651fa.1 for ; Mon, 21 Oct 2024 14:10:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729545027; x=1730149827; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YGaNgV0yGhUkl94jFkqUnmK0xlXlM006gLHr3N50u1A=; b=u5Rr/CZtumNRYqFodd8tYvy2teEvPXn2lTU/SFkI7+bTtRWYWVOUVZFqQSy49kd5F1 MxvAOQulBmgra5NIuO+85Dt9fsDhFR72/uEYmFxzAyQxJdhTECmPjS1Zfmkn6lo04Zqg 2JkkavfpCAutjC5mRAfowAadanSkrUvViApOC2QN08yP07ApJ2zzwIgtaRCRvP9uOBJE BcSviCXFJwUpc0/OjxO7weHxkwpVhRKXJp9rrPbrdTNrRCk7VmygacpE569cLNB6zAJI NL4+jBaVXmG3lM08JnPN/By16ydSuaHWFnQrZOn6vhW9x7e/isSGbirH2sXYh8Yn+yX9 CdhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729545027; x=1730149827; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YGaNgV0yGhUkl94jFkqUnmK0xlXlM006gLHr3N50u1A=; b=F8FW3wpRxu7YJ5XQTTpKIN7m7Ua2zkq6uqXd4YHxtav+o6LfGoby0yrFVNbecgysuv hFwn6vGuMARnDYu9P6VpU4TV8SINe4Bl1JPOn6CJ34gbdhmToXGkTX5ev/guT11jHsqw MIV9EMCjon0u+lHVVGSa7z/Ycr/w5lpeBKTZNK8jJMCK9blJ+6n0In5DfesWgYI2ppn+ DRAT5zoji/reF/iLT0PA0xRuf8CE74F8V+cDVSvapvZkawyVi6FGdczvYYDyyPQRfLnx 3+RLvi9VUSnj32FhQHRT0XqnhES8xPotcJp/XULZJhrSbHhA4X8rG5U1lrqoZQ2bL7Xr 2XiA== X-Forwarded-Encrypted: i=1; AJvYcCVHtowTULZtraZmKC+inlF5Abrwj692NqOPeMzkqbViG0MzpdJsYZGLyD5vowXOSLR+Ui5o5E1F6A==@kvack.org X-Gm-Message-State: AOJu0YwNertfvikd+oIP5RTPRlgVQNwV4Lfwu8TmEfQ/TbJHHbwDTdlW ViL3N7PIVF9ulmU6jSfmi5ko6aOdY9+5UklG1ZIrqJe2AYOqyJKmlury8EVIcxD9F7UV+8Y2JJi ScaGLOs0OzCynmQTi5+YcuLqRNAj3zLAwc6Xk X-Google-Smtp-Source: AGHT+IH1UKPItaNggnum/3HGt/QwNhyRPmr6T8JA58WOYdnUxoMuqAbfHyTjT6Sc+2PC+sqJEVSMurrFtH3WTqnUMRI= X-Received: by 2002:a05:6512:3b13:b0:539:e911:d20e with SMTP id 2adb3069b0e04-53b13a2213fmr149122e87.47.1729545026794; Mon, 21 Oct 2024 14:10:26 -0700 (PDT) MIME-Version: 1.0 References: <20241018105026.2521366-1-usamaarif642@gmail.com> <20241018105026.2521366-2-usamaarif642@gmail.com> In-Reply-To: <20241018105026.2521366-2-usamaarif642@gmail.com> From: Yosry Ahmed Date: Mon, 21 Oct 2024 14:09:48 -0700 Message-ID: Subject: Re: [RFC 1/4] mm/zswap: skip swapcache for swapping in zswap pages To: Usama Arif Cc: akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, david@redhat.com, willy@infradead.org, kanchana.p.sridhar@intel.com, nphamcs@gmail.com, chengming.zhou@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, riel@surriel.com, shakeel.butt@linux.dev, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kairui Song , Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: csp95y81yth6f3dpthyo7e7ummm4y5x6 X-Rspamd-Queue-Id: 5526D1C001E X-Rspamd-Server: rspam02 X-HE-Tag: 1729545010-909267 X-HE-Meta: U2FsdGVkX19qg32aGjYUEiMX4NFuaq5kjmW8H0EWVRo/0wXUuS2C1dy4Uk3PU4WPmo9ZrA3+gaX+sE0UH4NbdGi64jJf8jo7NkhaXUryOcTQSoi7Nx4LC97y4rYGnkOhjZFTKbMgoHr73ZXwIvEgjbDhwRC/41AFW4fKDok0lFT/+N30u2TvqETzyZOAPMH7RPUw6grTgy6Pblmyto4qohn1ZyvOUC+jFYhRaqlhQBma7wTwtjdSWGLijzfePO15NG1YovdBH44pIMy8x7QtCs8YqWsiwgd2vmdZr/uHsK6747MsmZVPYGUTqoykEUy5TtBYI2keoISOjL9Up5MiKux0Xn5057nXSt1gWArUgiVmkAYv4C5bLhX26XpcOWcrGh4PPEMl9cFtVs/FBIw77H8pUN91KNwKdEaWquyKeJ7yqyBu5OOVSpjKU9cbEnflYm9Cfv9t9JiSDpdFkYC1S/++l39qdNfbbSxicV0rnuFAmA/sbkVZ4OO3yw9wwXr5oNHdjveuhpL0htmML1M1X9++RH3nQm/Qx1rVseoLqdYJMQXsn6pRO+rG3TCpXW1LzFMNMkpQsfrYSE9v/JdOgHvDZJ86ifT+X+N8U7kZJhJ8PooeRPb3Mrs9bpLJRCwTmwFSlvGOjxI4QjmcxDCQibxGkL1F6gsxwWrT6Z15dX7bC3LUi1ERox2MkFm9u65KtvVKkZwuIQZ7/Fo6eCGzEPlAWZMQauwtI9cpo7cUDn0ctu/36F+U7jdG0nqtittqRoJ3pYcuvJrumxDEgDrBkI3NG8kTz7o0X5r46pUWITQTh9A+UmqxmdtBcKJUHGyJsQJfxM9Nnq12RnJvWRdWMWnxtk24TrrXLkJt4O7vR8ERqGe6cP1JJlKxy1aWz3rujGYSTL/Wh5lAFRUDKcSW8muhJbLTialLRlOSqVlUlrhiOjo0jD8KfAlbsd+xgk5hG/C5AY7vonP8QTvLFVY PDtcDhsv 7hsblcBkhfps+LMtk/BAwO43wDMOKKSjaqQKcHiBB65ufRWBUw1zEloRjaww/yuFPujB7slg8/D/R2zPw+XRTLBPpz2i9vcpSd4QP4um10EL/0GXp1pCvraOJYaoA/E+oo3JFc29zIlmEYJ+qaR3xaPZ7kWyK9P/jIJlByWpPaRGHvjfj6bD4xep5Fi4ZnKUh6GgIaZtjAc6aCk+32G8FwqVuyeLFXHBq5OSfw0JUrqXslfQbL70MedAOgg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 18, 2024 at 3:50=E2=80=AFAM Usama Arif = wrote: > > As mentioned in [1], there is a significant improvement in no > readahead swapin performance for super fast devices when skipping > swapcache. FYI, Kairui was working on removing the swapcache bypass completely, which I think may be a good thing: https://lore.kernel.org/lkml/20240326185032.72159-1-ryncsn@gmail.com/ However, that series is old, since before the large folio swapin support, so I am not sure if/when he intends to refresh it. In his approach there is still a swapin path for synchronous swapin though, which we can still utilize for zswap. > > With large folio zswapin support added in later patches, this will also > mean this path will also act as "readahead" by swapping in multiple > pages into large folios. further improving performance. > > [1] https://lore.kernel.org/all/1505886205-9671-5-git-send-email-minchan@= kernel.org/T/#m5a792a04dfea20eb7af4c355d00503efe1c86a93 > > Signed-off-by: Usama Arif > --- > include/linux/zswap.h | 6 ++++++ > mm/memory.c | 3 ++- > mm/page_io.c | 1 - > mm/zswap.c | 46 +++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 54 insertions(+), 2 deletions(-) > > diff --git a/include/linux/zswap.h b/include/linux/zswap.h > index d961ead91bf1..e418d75db738 100644 > --- a/include/linux/zswap.h > +++ b/include/linux/zswap.h > @@ -27,6 +27,7 @@ struct zswap_lruvec_state { > unsigned long zswap_total_pages(void); > bool zswap_store(struct folio *folio); > bool zswap_load(struct folio *folio); > +bool zswap_present_test(swp_entry_t swp, int nr_pages); > void zswap_invalidate(swp_entry_t swp); > int zswap_swapon(int type, unsigned long nr_pages); > void zswap_swapoff(int type); > @@ -49,6 +50,11 @@ static inline bool zswap_load(struct folio *folio) > return false; > } > > +static inline bool zswap_present_test(swp_entry_t swp, int nr_pages) > +{ > + return false; > +} > + > static inline void zswap_invalidate(swp_entry_t swp) {} > static inline int zswap_swapon(int type, unsigned long nr_pages) > { > diff --git a/mm/memory.c b/mm/memory.c > index 03e5452dd0c0..49d243131169 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4289,7 +4289,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > swapcache =3D folio; > > if (!folio) { > - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && > + if ((data_race(si->flags & SWP_SYNCHRONOUS_IO) || > + zswap_present_test(entry, 1)) && > __swap_count(entry) =3D=3D 1) { > /* skip swapcache */ > folio =3D alloc_swap_folio(vmf); > diff --git a/mm/page_io.c b/mm/page_io.c > index 4aa34862676f..2a15b197968a 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -602,7 +602,6 @@ void swap_read_folio(struct folio *folio, struct swap= _iocb **plug) > unsigned long pflags; > bool in_thrashing; > > - VM_BUG_ON_FOLIO(!folio_test_swapcache(folio) && !synchronous, fol= io); > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > VM_BUG_ON_FOLIO(folio_test_uptodate(folio), folio); > > diff --git a/mm/zswap.c b/mm/zswap.c > index 7f00cc918e7c..f4b03071b2fb 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1576,6 +1576,52 @@ bool zswap_store(struct folio *folio) > return ret; > } > > +static bool swp_offset_in_zswap(unsigned int type, pgoff_t offset) > +{ > + return (offset >> SWAP_ADDRESS_SPACE_SHIFT) < nr_zswap_trees[typ= e]; I am not sure I understand what we are looking for here. When does this return false? Aren't the zswap trees always allocated during swapon? > +} > + > +/* Returns true if the entire folio is in zswap */ There isn't really a folio at this point, maybe "Returns true if the entire range is in zswap"? Also, this is racy because an exclusive load, invalidation, or writeback can cause an entry to be removed from zswap. Under what conditions is this safe? The caller can probably guarantee we don't race against invalidation, but can we guarantee that concurrent exclusive loads or writebacks don't happen? If the answer is yes, this needs to be properly documented. > +bool zswap_present_test(swp_entry_t swp, int nr_pages) > +{ > + pgoff_t offset =3D swp_offset(swp), tree_max_idx; > + int max_idx =3D 0, i =3D 0, tree_offset =3D 0; > + unsigned int type =3D swp_type(swp); > + struct zswap_entry *entry =3D NULL; > + struct xarray *tree; > + > + while (i < nr_pages) { > + tree_offset =3D offset + i; > + /* Check if the tree exists. */ > + if (!swp_offset_in_zswap(type, tree_offset)) > + return false; > + > + tree =3D swap_zswap_tree(swp_entry(type, tree_offset)); > + XA_STATE(xas, tree, tree_offset); Please do not mix declarations with code. > + > + tree_max_idx =3D tree_offset % SWAP_ADDRESS_SPACE_PAGES ? > + ALIGN(tree_offset, SWAP_ADDRESS_SPACE_PAGES) : > + ALIGN(tree_offset + 1, SWAP_ADDRESS_SPACE_PAGES); Does this work if we always use ALIGN(tree_offset + 1, SWAP_ADDRESS_SPACE_PAGES)? > + max_idx =3D min(offset + nr_pages, tree_max_idx) - 1; > + rcu_read_lock(); > + xas_for_each(&xas, entry, max_idx) { > + if (xas_retry(&xas, entry)) > + continue; > + i++; > + } > + rcu_read_unlock(); > + /* > + * If xas_for_each exits because entry is NULL and nit: add () to the end of function names (i.e. xas_for_each()) > + * the number of entries checked are less then max idx, s/then/than > + * then zswap does not contain the entire folio. > + */ > + if (!entry && offset + i <=3D max_idx) > + return false; > + } > + > + return true; > +} > + > bool zswap_load(struct folio *folio) > { > swp_entry_t swp =3D folio->swap; > -- > 2.43.5 >