From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E26BC3DA4A for ; Sat, 3 Aug 2024 12:19:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8BB026B007B; Sat, 3 Aug 2024 08:19:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 86B486B0083; Sat, 3 Aug 2024 08:19:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 733676B0085; Sat, 3 Aug 2024 08:19:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 552496B007B for ; Sat, 3 Aug 2024 08:19:18 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C78E4C0542 for ; Sat, 3 Aug 2024 12:19:17 +0000 (UTC) X-FDA: 82410839154.16.D3D999E Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by imf05.hostedemail.com (Postfix) with ESMTP id EBAA9100003 for ; Sat, 3 Aug 2024 12:19:15 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VMK6UDuk; spf=pass (imf05.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722687497; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n4AP6Vgl2PdbHBiN7CjLamuTMAEWvLdQm4Pq+kIPd38=; b=pvJWnvN125XyyQHcR3cOsjfJmLTeQYlK8MA6tWNyg5zooMF9uTOingy1slhs/sw2QEh8HM jgpLrBuwBvQFL7pSgT67KBBoeaem00iLH4FpxsOowg9Pa9hLt6P9Y5LLAhA/k4uL1J3b88 lLG97xBBP8t9mp8vk3wHcUBV3f+uZQ4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722687497; a=rsa-sha256; cv=none; b=IQ9HZ0C/L9HDF2JhDdUzax3Y4jLjVZgEiHTL2c7qsJIWYZd9w2AW1G5Pf6bL9kQHfdNypt 95zGbGK9YychA8+h36OE4ifazhNb9oEAtJX8jWjKdomlh/8i3UbpcgoRKkIWC5PO+cC4eR q+W3Jxd01utdFaauYFB9PFL7UwnOT2o= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VMK6UDuk; spf=pass (imf05.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-2f15dd0b489so24482401fa.3 for ; Sat, 03 Aug 2024 05:19:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722687554; x=1723292354; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=n4AP6Vgl2PdbHBiN7CjLamuTMAEWvLdQm4Pq+kIPd38=; b=VMK6UDuk0Tg7PzypaxXfk3vcB8hKC1QW0bs/lOouFnHumpOLiY99//MTBrsLg9lbT6 ADgczWz1McJvljjYN+iWPNbE2hmJ3c2PaHh/A14dDnQH9oFXRQpvJMepPRMiXYQ3FiT9 XZmedXom8SoHIoJJbk/2f4pbVD4YhN3ARUjuRN7RWzYQ0zVm372QtlwaQVuwjxMIE65T QDzH+HxfNEhdKneldE4b2yg51UIRs1Hq8qZ530Q0UV0/NLeCf0uGgp0q76gBxot1s/Bp diE1TOYYGxNDfrN8CSNnL+u/bS5ucZgrOTc2Qzu5etUpjtJWuQF992aTw+MKlRdwc3WE bCgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722687554; x=1723292354; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n4AP6Vgl2PdbHBiN7CjLamuTMAEWvLdQm4Pq+kIPd38=; b=k/Usf2ZjNK+EuHHuDjN2S+3BTxjw/C+u5V/ua5WbjCcFRqHIZEQmO9vdXAlBX2GX2u stViqx6wUluSsq7/80P/78jKJhSx1wOMaFwvYwCvqArcxzpnnivDa6KnosnWjAdOyQ6v lbttUHawBn3g2bB0BZTiEvnuHXcPT8/twygR6r4QOQsZ1ijOKt/PnKUI/i6hPoU8DZt4 vtoJtifMBvOARv4GKRjlORTRAwIHMXLij8N16/AKWCyvoEEdgSEbWM3JrGKoKzpNJ6bS j0/tYz3WFjsP1nBONOITREtk5/gD6parP/bYWgrTudhPnVfeE7HY/lwTKCBOC5HrkJrY v6AQ== X-Forwarded-Encrypted: i=1; AJvYcCVzQaiKo5S2XpGsIBJsbag4PD+FS+kADBuE11FyOQq/be/5mKJVLt/j7Na2rWRIi7341hRwjn0ByQ5tlS0XDgee7T0= X-Gm-Message-State: AOJu0YzIuNGL0bDZU82CCYI2LeKcBCWYgNVxhk3IPW6mW+MKhLQJ73oa ar/Ji3em51SIRvbdnXQMTO9jpJFa/Uhsn/BOa1Q4Fbj3Uhd6Ik47J6LumfBtTR5Jsjl9F9cBupS 5ue5pXtCq2IXTlHwBq+t9aZur+Ng= X-Google-Smtp-Source: AGHT+IF9L1bpaO+SnO/rNrZIW4y2VDSvcLjYUMl7eymbp1PZQppn7B/p2LYl+K7fxmQzyF5ISXytN4Z4YECroqDBVdE= X-Received: by 2002:a2e:9c48:0:b0:2eb:68d0:88be with SMTP id 38308e7fff4ca-2f15aa8736amr43177531fa.12.1722687553493; Sat, 03 Aug 2024 05:19:13 -0700 (PDT) MIME-Version: 1.0 References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org> <20240730-swap-allocator-v5-6-cb9c148b9297@kernel.org> In-Reply-To: From: Kairui Song Date: Sat, 3 Aug 2024 20:18:57 +0800 Message-ID: Subject: Re: [PATCH v5 6/9] mm: swap: allow cache reclaim to skip slot cache To: Barry Song Cc: chrisl@kernel.org, Andrew Morton , Hugh Dickins , Ryan Roberts , "Huang, Ying" , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: EBAA9100003 X-Stat-Signature: wi5yeqgbdr6ssbdpn64rqjgbr5fiswpp X-HE-Tag: 1722687555-13302 X-HE-Meta: U2FsdGVkX1+NpGJclhs/LqjtzCjODf8sjpVJJWHoZf2asNwdNQ5DiX0m0qTyTMIO4pxkH1BmSF1fv70pGGPexCI4dZm460llP2nlAGFw1fDuzmuFN8kndH3TW1vaWctPhQerHsKz4VUH9LmarNdm4Z1OJFPlaYFiyEARAY5uwQ/w8d2H4GK+VRWhPA82DLHOpRSN/vChYNq0NqBwTWcmfeKcPVz0z7sky28JKpkAqQG8Nnu2xBRTI4HR71Y1yb5pg0SkxFPvX4TJXSxNj0QfEUV9jr+nuRKmwKV4kFRGeDQt9+CuxIPFfXryEs94Rh+czDuLwWH4eIx4k5b2GF9/JHzpO+eoNRR67JYpi7Ftr1EAIyBAVzABjR+DlzO9jV0RaSZVv9KucDA5kBR7nwcsWXrAOru6A/6xoMOLcT6RL2I83UNUXA3ujA3KEI+9xzB8W7smllECz5ITdtjMYmasPOloDfuTedaXt31DUbneiu4K04KLiYXvRSSDaxbMw8B8ESNWBKYggjbL99wHQPw/TzaUFkYqEMC/UFZGLjSFOPp2QJQOs5Q0ky04PG9OHu45eAvsn1w/o3bNGPA36pfeShmxevOqlr8ximGUkpSn+c5DwQYdrL7ovLxSY7Rpc8/emrfkq2PLqvKebsc4KHP20IopEKKlGm5UjqFzTl8fbNWZ78zAD/zzv7URvFL/dG3V8TQsUtbpnQihllzTa5tGR3FY+QZ4uSrkxLKvweGC/BuPPnp0ZZdsak53TWY3ZgJuz18CfIJ3F+xr2wYZz5WckOFiVY+9NoXGvBrZDefg/OmRDEb/2vMOf86F/uOFO1QPYC1wzOhK5zOnTuyN1TgoTz2U2BfEaq1CLkqWQyvoT6zAB0q9vy8KfplUNA8eb4lHDqJxjeo6bBAtBfD8GNRG/QeWqWiMgAW1J9mZXLxdQ/vzq3KKWk2sH4ANGY/VlL9OhJqL0b+bFWTxQSNGJdl foM4dd0V k9L0rtdmXhFsS9z7YC/JJ9d1Qr2yTg/W2vYdkf6rOIlaa00YyhGeXMJOumCHJO7qRVWc6qynPSnG2ghfjd/0HfX9mw4qzs+BWn34hVXfI4gLSUsMDLO61525kEqAUjCHmhsNdOEyiFyheRe0zI0Z2otgCBcpIV9NNnqZaHL5EI4K3wSH32nKs8fgWDH/1Hth/AggSErCdMF8bzBYEDnXayiAp2M/8pzraf1HbXdz5nvcSpKvzJa5h/niFOrq3MvxVfKdwKzda8I0HBXly4Qco7O2+a6ws+408apMt+IhRZBPOH5c1YE+bgu083WD/cNt8wzinHyMsuvQaenx5BQzL9gT/3ptVd9UkaeH1VYc5AfDtN9NY9+r1AucLEeBnqKJDY44oVbGADkv55poN4QLWLZSUeZ7TFx7FNRM6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Aug 3, 2024 at 6:39=E2=80=AFPM Barry Song wrote= : > > On Wed, Jul 31, 2024 at 2:49=E2=80=AFPM wrote: > > > > From: Kairui Song > > > > Currently we free the reclaimed slots through slot cache even > > if the slot is required to be empty immediately. As a result > > the reclaim caller will see the slot still occupied even after a > > successful reclaim, and need to keep reclaiming until slot cache > > get flushed. This caused ineffective or over reclaim when SWAP is > > under stress. > > > > So introduce a new flag allowing the slot to be emptied bypassing > > the slot cache. > > > > Signed-off-by: Kairui Song > > --- > > mm/swapfile.c | 152 +++++++++++++++++++++++++++++++++++++++++---------= -------- > > 1 file changed, 109 insertions(+), 43 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 9b63b2262cc2..4c0fc0409d3c 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -53,8 +53,15 @@ > > static bool swap_count_continued(struct swap_info_struct *, pgoff_t, > > unsigned char); > > static void free_swap_count_continuations(struct swap_info_struct *); > > +static void swap_entry_range_free(struct swap_info_struct *si, swp_ent= ry_t entry, > > + unsigned int nr_pages); > > static void swap_range_alloc(struct swap_info_struct *si, unsigned lon= g offset, > > unsigned int nr_entries); > > +static bool folio_swapcache_freeable(struct folio *folio); > > +static struct swap_cluster_info *lock_cluster_or_swap_info( > > + struct swap_info_struct *si, unsigned long offset); > > +static void unlock_cluster_or_swap_info(struct swap_info_struct *si, > > + struct swap_cluster_info *ci); > > > > static DEFINE_SPINLOCK(swap_lock); > > static unsigned int nr_swapfiles; > > @@ -129,8 +136,25 @@ static inline unsigned char swap_count(unsigned ch= ar ent) > > * corresponding page > > */ > > #define TTRS_UNMAPPED 0x2 > > -/* Reclaim the swap entry if swap is getting full*/ > > +/* Reclaim the swap entry if swap is getting full */ > > #define TTRS_FULL 0x4 > > +/* Reclaim directly, bypass the slot cache and don't touch device lock= */ > > +#define TTRS_DIRECT 0x8 > > + > > +static bool swap_is_has_cache(struct swap_info_struct *si, > > + unsigned long offset, int nr_pages) > > +{ > > + unsigned char *map =3D si->swap_map + offset; > > + unsigned char *map_end =3D map + nr_pages; > > + > > + do { > > + VM_BUG_ON(!(*map & SWAP_HAS_CACHE)); > > + if (*map !=3D SWAP_HAS_CACHE) > > + return false; > > + } while (++map < map_end); > > + > > + return true; > > +} > > > > /* > > * returns number of pages in the folio that backs the swap entry. If = positive, > > @@ -141,12 +165,22 @@ static int __try_to_reclaim_swap(struct swap_info= _struct *si, > > unsigned long offset, unsigned long fl= ags) > > { > > swp_entry_t entry =3D swp_entry(si->type, offset); > > + struct address_space *address_space =3D swap_address_space(entr= y); > > + struct swap_cluster_info *ci; > > struct folio *folio; > > - int ret =3D 0; > > + int ret, nr_pages; > > + bool need_reclaim; > > > > - folio =3D filemap_get_folio(swap_address_space(entry), swap_cac= he_index(entry)); > > + folio =3D filemap_get_folio(address_space, swap_cache_index(ent= ry)); > > if (IS_ERR(folio)) > > return 0; > > + > > + /* offset could point to the middle of a large folio */ > > + entry =3D folio->swap; > > + offset =3D swp_offset(entry); > > + nr_pages =3D folio_nr_pages(folio); > > + ret =3D -nr_pages; > > + > > /* > > * When this function is called from scan_swap_map_slots() and = it's > > * called by vmscan.c at reclaiming folios. So we hold a folio = lock > > @@ -154,14 +188,50 @@ static int __try_to_reclaim_swap(struct swap_info= _struct *si, > > * case and you should use folio_free_swap() with explicit foli= o_lock() > > * in usual operations. > > */ > > - if (folio_trylock(folio)) { > > - if ((flags & TTRS_ANYWAY) || > > - ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) |= | > > - ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)= )) > > - ret =3D folio_free_swap(folio); > > - folio_unlock(folio); > > + if (!folio_trylock(folio)) > > + goto out; > > + > > + need_reclaim =3D ((flags & TTRS_ANYWAY) || > > + ((flags & TTRS_UNMAPPED) && !folio_mapped(folio= )) || > > + ((flags & TTRS_FULL) && mem_cgroup_swap_full(fo= lio))); > > + if (!need_reclaim || !folio_swapcache_freeable(folio)) > > + goto out_unlock; > > + > > + /* > > + * It's safe to delete the folio from swap cache only if the fo= lio's > > + * swap_map is HAS_CACHE only, which means the slots have no pa= ge table > > + * reference or pending writeback, and can't be allocated to ot= hers. > > + */ > > + ci =3D lock_cluster_or_swap_info(si, offset); > > + need_reclaim =3D swap_is_has_cache(si, offset, nr_pages); > > + unlock_cluster_or_swap_info(si, ci); > > + if (!need_reclaim) > > + goto out_unlock; > > + > > + if (!(flags & TTRS_DIRECT)) { > > + /* Free through slot cache */ > > + delete_from_swap_cache(folio); > > + folio_set_dirty(folio); > > + ret =3D nr_pages; > > + goto out_unlock; > > } > > - ret =3D ret ? folio_nr_pages(folio) : -folio_nr_pages(folio); > > + > > + xa_lock_irq(&address_space->i_pages); > > + __delete_from_swap_cache(folio, entry, NULL); > > + xa_unlock_irq(&address_space->i_pages); > > + folio_ref_sub(folio, nr_pages); > > + folio_set_dirty(folio); > > + > > + spin_lock(&si->lock); > > + /* Only sinple page folio can be backed by zswap */ > > + if (!nr_pages) > > + zswap_invalidate(entry); > > I am trying to figure out if I am mad :-) Does nr_pages =3D=3D 0 means s= ingle > page folio? > Hi Barry I'm sorry, this should be nr_pages =3D=3D 1, I messed up order and nr, as zswap only works for single page folios.