From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58A84C3DA64 for ; Sun, 4 Aug 2024 18:07:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA5176B007B; Sun, 4 Aug 2024 14:07:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D54A86B0082; Sun, 4 Aug 2024 14:07:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1C146B0085; Sun, 4 Aug 2024 14:07:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A392A6B007B for ; Sun, 4 Aug 2024 14:07:07 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 429E181A48 for ; Sun, 4 Aug 2024 18:07:07 +0000 (UTC) X-FDA: 82415344494.20.D5E64C1 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf02.hostedemail.com (Postfix) with ESMTP id B5B0C80019 for ; Sun, 4 Aug 2024 18:07:04 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=OID3zu6B; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722794757; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sEe1Jp7DTrGELkGFG5SJ1a25JU0z65JOHYDfc5nut84=; b=oF3UuBnqZN2VuGcpnEHD3HtxrDfYBfhGklSGVNh3UopKhiswjsijPIFxjkSlqDYwSuPW3P XT3hUvHpQ4wJh08FQ/iVUMao7BkGtpwkjVn1oUmP2ex4a7twq9kqahWdP9RiNgf7W/9QSr lGYmtj9SicGi+qj9S7bdtfoLF2hQPhc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722794757; a=rsa-sha256; cv=none; b=6LUkaKcC7rrz7wwGLXygEo8LNwOuBWfgnVzWXBmb9UPC6UHTZNEcIvIi8lPXXjlGR3iqcz smvaeO8LC7YTx9dmuLw7SEmk4u63t9QYv1S7ogu131Ayu3NSb7W0VweZTTWcBlHaw9R81N kBuheDLOk2fEoeCwXnrwMlek3gBxizI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=OID3zu6B; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id A9CEACE08CD for ; Sun, 4 Aug 2024 18:07:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F38E3C32786 for ; Sun, 4 Aug 2024 18:07:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722794821; bh=FPMZXAZ9bG2MhzosXGiGwqY/p8xJdCmEhGLR/Ovnpoo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=OID3zu6BjdZ8/p/mPdNClAhv9gTjN/Jbt/Xw2JB0Lnl3YrG+Fzvf2u/PiDzhsgj1a 6QpzlTJhPQ+wweLYtu+JdDp1n5TdjLnvdRbeIzdflKr2ZUCD0Cpo6T+0qlkNTfVJSY 1QYMEMqxjVloDNqH9Y221av/lqu2ehoPtKkaWeJOkNaRABd+D566X68zZO3dseZ8Kj edoR+DaTkOh3ZMi2p0uhakI9/b53LboN6q3fEMQDIN7VzWImFUBbn9+XmICLD7SIGS xWL6Bf9FY1/TQNFFExy+TQm8kooi/k2cwyJ6nQdw+GN0cYL4YEQWmMig7cj/F0R7J6 oCuwRTAqk932A== Received: by mail-yw1-f181.google.com with SMTP id 00721157ae682-690ad83d4d7so7480737b3.3 for ; Sun, 04 Aug 2024 11:07:00 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCXTJpc0MDPcnJL5t5D6zRrjv0YZ2l1x9ve8bo+dnLQjNqnuiX+CQq774Uc57ULI7ah9x6+keFO+QS2RqiTyTktJVSE= X-Gm-Message-State: AOJu0Yx3ejsOqWZWn0w8Hk4JZE8vskga7aSPS1xyvPU28LJUaghCPIj/ Cw/NNwDn5GjD/lx/S3Ssw+xj/4Fxt+qgxs4UNGc3v4ArdzEu5YjxJDnz3HzLtm/cq74X7QTSdxV 4mYU/0Kw13+w0LB5WL2xzqXXPZAMpMd8XqDVf3w== X-Google-Smtp-Source: AGHT+IGhIY16ROzZCifuV0PA+enuIA5D4n8kLAH9MdzNE9E7yeJulo0twps8z+X66oVReUil1BvGZa4DNFS9bFyCRUU= X-Received: by 2002:a81:8183:0:b0:65f:8209:3ede with SMTP id 00721157ae682-689641a40cfmr116350167b3.44.1722794820298; Sun, 04 Aug 2024 11:07:00 -0700 (PDT) MIME-Version: 1.0 References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org> <20240730-swap-allocator-v5-6-cb9c148b9297@kernel.org> In-Reply-To: From: Chris Li Date: Sun, 4 Aug 2024 11:06:43 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v5 6/9] mm: swap: allow cache reclaim to skip slot cache To: Kairui Song Cc: Barry Song , Andrew Morton , Hugh Dickins , Ryan Roberts , "Huang, Ying" , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B5B0C80019 X-Stat-Signature: dtjj4xtq95yr1xnhtwtqf5zfnaqq3kn4 X-Rspam-User: X-HE-Tag: 1722794824-716012 X-HE-Meta: U2FsdGVkX1+8mdVoMx1qBFYdr1Hhab4ksHXOUrPICo2pZQQNzDzivXQDxN8xbJv4CJux4QvAbCY/KESLRbHD1GLAVGZ911rSCVgwdGvbyZElM6WctwhsrG83WG+sCV4hI87vhirxUQ5R5CypAXNuR5oVGLgdGh/0I6xkMiB5FdkK4O6cR9GPJeGQkxbNfEui/WfRqAI7JBWzFvWYXOVU36LK1sZvpZgokbuQ6INpbPJ9wnp9vIU7nbgMB8jqdtD5dZb8/VP4PN8iEe90oYEglfs/tWZKw7Sp6CBH1M70XeYJLACpMcb57+d2EPHYUrIQiQ1Ua9qr8Vnen7wmtFfqkPN/zAf6JWVrw0ZDv/iku8pVBimNqb4OXWMdn/uoJY1gj3V0zrP0jPHnEAZKmpz/0xM/kdIB8/yTVM7JrhG6pEyuj1XScCFLd4ZtFPXcZTVn354yFvw4E9DUZcLOCse2IQrHJHR1iN35Z8o9kUNWLZWlryqvBBisqQJbYYJWSKOi5tzI/7y8/XAjrkIH7edRiYnhGyQvYltgtX0BJ/jDXTDWklflEBUxSTNGi8aZbkJBjl9A702fDC9ovcGo31MDvak/aq133H2yeDI7GU+cGE0SvwY4Rn1WnXUTmA4j8xVlhznFWRCQFj6YMBRCEJCusEPr3C0Ru2txGolCoNBbhvKHer/GbeRzv2yJFAsQLFwnFVz1KEMV41hlAH56ck0kafOHoBWW6MsPB1OQBedxDPPVvcY0E+A1MHn/uNNIS1HoKyV5X/Yuh9pxfHTEnXcEpkdmOWoX2TFdCyYRjOABM71BfM7ZRxjJZNewIjheZ4ID9dGMdcF+YvJfiT2XkQYZLw0NeVDJYdg9CxVvuAviyPLPKVsfAN/yppWQQzlQ6cS43yQyct0squRqUL8gG5nEw6S1DCDE4skT9wrtXF61iaeSRjyZrekrD80vabT0hTwlWzZ0M5mS1dOmh1cI14X HfjAkTto PrMrZuTggryU6/Tk0KQBS4aZfoSeQY0GE2gc0+ulAsz2G4HrTRCzYdqMDs/VoX988XfHMlWkoIp4CWmv4MMVCnXONKSY3kXV7bLLyXFT1ha3d9r53IEvdwh8g/j8Uhdeskl5rRNs8wETAMch+GEM5YNROp/9HDLo+ZBKQs/8VwjUAH6GUH5wQjppZaHQfvEbvY3bJOjELgylMNa1PRlg6HUhOJ5/Vx9YETPvd4oIg0osaC0Pkmrv1d46bnWwp8ou00dG2JTLtOrJVfVBLteLg+WSZIjSs5wKFIbL69D1Ih/yb6P4QS2ROGkKDj/OiCZxipoF28r/eMZUBH4c270tNpRE/3Pf5kc76sT3nrmpaUmAN/nITeSDu0vhSV9HAbawU6gff0amX4ArzhNI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Aug 3, 2024 at 5:19=E2=80=AFAM Kairui Song wrote= : > > On Sat, Aug 3, 2024 at 6:39=E2=80=AFPM Barry Song wro= te: > > > > On Wed, Jul 31, 2024 at 2:49=E2=80=AFPM wrote: > > > > > > From: Kairui Song > > > > > > Currently we free the reclaimed slots through slot cache even > > > if the slot is required to be empty immediately. As a result > > > the reclaim caller will see the slot still occupied even after a > > > successful reclaim, and need to keep reclaiming until slot cache > > > get flushed. This caused ineffective or over reclaim when SWAP is > > > under stress. > > > > > > So introduce a new flag allowing the slot to be emptied bypassing > > > the slot cache. > > > > > > Signed-off-by: Kairui Song > > > --- > > > mm/swapfile.c | 152 +++++++++++++++++++++++++++++++++++++++++-------= ---------- > > > 1 file changed, 109 insertions(+), 43 deletions(-) > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > index 9b63b2262cc2..4c0fc0409d3c 100644 > > > --- a/mm/swapfile.c > > > +++ b/mm/swapfile.c > > > @@ -53,8 +53,15 @@ > > > static bool swap_count_continued(struct swap_info_struct *, pgoff_t, > > > unsigned char); > > > static void free_swap_count_continuations(struct swap_info_struct *)= ; > > > +static void swap_entry_range_free(struct swap_info_struct *si, swp_e= ntry_t entry, > > > + unsigned int nr_pages); > > > static void swap_range_alloc(struct swap_info_struct *si, unsigned l= ong offset, > > > unsigned int nr_entries); > > > +static bool folio_swapcache_freeable(struct folio *folio); > > > +static struct swap_cluster_info *lock_cluster_or_swap_info( > > > + struct swap_info_struct *si, unsigned long offset); > > > +static void unlock_cluster_or_swap_info(struct swap_info_struct *si, > > > + struct swap_cluster_info *ci)= ; > > > > > > static DEFINE_SPINLOCK(swap_lock); > > > static unsigned int nr_swapfiles; > > > @@ -129,8 +136,25 @@ static inline unsigned char swap_count(unsigned = char ent) > > > * corresponding page > > > */ > > > #define TTRS_UNMAPPED 0x2 > > > -/* Reclaim the swap entry if swap is getting full*/ > > > +/* Reclaim the swap entry if swap is getting full */ > > > #define TTRS_FULL 0x4 > > > +/* Reclaim directly, bypass the slot cache and don't touch device lo= ck */ > > > +#define TTRS_DIRECT 0x8 > > > + > > > +static bool swap_is_has_cache(struct swap_info_struct *si, > > > + unsigned long offset, int nr_pages) > > > +{ > > > + unsigned char *map =3D si->swap_map + offset; > > > + unsigned char *map_end =3D map + nr_pages; > > > + > > > + do { > > > + VM_BUG_ON(!(*map & SWAP_HAS_CACHE)); > > > + if (*map !=3D SWAP_HAS_CACHE) > > > + return false; > > > + } while (++map < map_end); > > > + > > > + return true; > > > +} > > > > > > /* > > > * returns number of pages in the folio that backs the swap entry. I= f positive, > > > @@ -141,12 +165,22 @@ static int __try_to_reclaim_swap(struct swap_in= fo_struct *si, > > > unsigned long offset, unsigned long = flags) > > > { > > > swp_entry_t entry =3D swp_entry(si->type, offset); > > > + struct address_space *address_space =3D swap_address_space(en= try); > > > + struct swap_cluster_info *ci; > > > struct folio *folio; > > > - int ret =3D 0; > > > + int ret, nr_pages; > > > + bool need_reclaim; > > > > > > - folio =3D filemap_get_folio(swap_address_space(entry), swap_c= ache_index(entry)); > > > + folio =3D filemap_get_folio(address_space, swap_cache_index(e= ntry)); > > > if (IS_ERR(folio)) > > > return 0; > > > + > > > + /* offset could point to the middle of a large folio */ > > > + entry =3D folio->swap; > > > + offset =3D swp_offset(entry); > > > + nr_pages =3D folio_nr_pages(folio); > > > + ret =3D -nr_pages; > > > + > > > /* > > > * When this function is called from scan_swap_map_slots() an= d it's > > > * called by vmscan.c at reclaiming folios. So we hold a foli= o lock > > > @@ -154,14 +188,50 @@ static int __try_to_reclaim_swap(struct swap_in= fo_struct *si, > > > * case and you should use folio_free_swap() with explicit fo= lio_lock() > > > * in usual operations. > > > */ > > > - if (folio_trylock(folio)) { > > > - if ((flags & TTRS_ANYWAY) || > > > - ((flags & TTRS_UNMAPPED) && !folio_mapped(folio))= || > > > - ((flags & TTRS_FULL) && mem_cgroup_swap_full(foli= o))) > > > - ret =3D folio_free_swap(folio); > > > - folio_unlock(folio); > > > + if (!folio_trylock(folio)) > > > + goto out; > > > + > > > + need_reclaim =3D ((flags & TTRS_ANYWAY) || > > > + ((flags & TTRS_UNMAPPED) && !folio_mapped(fol= io)) || > > > + ((flags & TTRS_FULL) && mem_cgroup_swap_full(= folio))); > > > + if (!need_reclaim || !folio_swapcache_freeable(folio)) > > > + goto out_unlock; > > > + > > > + /* > > > + * It's safe to delete the folio from swap cache only if the = folio's > > > + * swap_map is HAS_CACHE only, which means the slots have no = page table > > > + * reference or pending writeback, and can't be allocated to = others. > > > + */ > > > + ci =3D lock_cluster_or_swap_info(si, offset); > > > + need_reclaim =3D swap_is_has_cache(si, offset, nr_pages); > > > + unlock_cluster_or_swap_info(si, ci); > > > + if (!need_reclaim) > > > + goto out_unlock; > > > + > > > + if (!(flags & TTRS_DIRECT)) { > > > + /* Free through slot cache */ > > > + delete_from_swap_cache(folio); > > > + folio_set_dirty(folio); > > > + ret =3D nr_pages; > > > + goto out_unlock; > > > } > > > - ret =3D ret ? folio_nr_pages(folio) : -folio_nr_pages(folio); > > > + > > > + xa_lock_irq(&address_space->i_pages); > > > + __delete_from_swap_cache(folio, entry, NULL); > > > + xa_unlock_irq(&address_space->i_pages); > > > + folio_ref_sub(folio, nr_pages); > > > + folio_set_dirty(folio); > > > + > > > + spin_lock(&si->lock); > > > + /* Only sinple page folio can be backed by zswap */ > > > + if (!nr_pages) > > > + zswap_invalidate(entry); > > > > I am trying to figure out if I am mad :-) Does nr_pages =3D=3D 0 means= single > > page folio? > > > > Hi Barry > > I'm sorry, this should be nr_pages =3D=3D 1, I messed up order and nr, as > zswap only works for single page folios. Ack. Should be nr_pages =3D=3D 1. Barry, thanks for catching that. Chris