From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AD82CD1292 for ; Mon, 1 Apr 2024 14:09:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C4EE6B0085; Mon, 1 Apr 2024 10:09:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 075C16B0087; Mon, 1 Apr 2024 10:09:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA77B6B0089; Mon, 1 Apr 2024 10:09:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CEAE06B0085 for ; Mon, 1 Apr 2024 10:09:14 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 951AEA0AF0 for ; Mon, 1 Apr 2024 14:09:14 +0000 (UTC) X-FDA: 81961145028.30.61F72BF Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf10.hostedemail.com (Postfix) with ESMTP id 2ACB8C001F for ; Mon, 1 Apr 2024 14:09:10 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="hQ+Q/AQp"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf10.hostedemail.com: domain of liuzhaoyu.zackary@bytedance.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=liuzhaoyu.zackary@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711980552; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=JCMZsVU0LmgOz8RDx24Jku4lco/65mmZImsvrq1+XuM=; b=ys6y35tjLelnul/57eDyyt7mmZtk8Hv09arWF82tpXZu5rQKsCjVkTQT2vxXKtPmXAphkc w12o6t+89HzNtCRPY6yW8HjEO+CW0wk7R4e7SkiaR1bp/75paqukrBoY+lxnWDkfsC7vQI C0rhOczgGOE7ezOkJuW+8D6EU4vqdR4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="hQ+Q/AQp"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf10.hostedemail.com: domain of liuzhaoyu.zackary@bytedance.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=liuzhaoyu.zackary@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711980552; a=rsa-sha256; cv=none; b=BMh7ao3DuyB8h3zpsIgO5t+8TN9tSUaFER6mMFWoFNzvG2N80Y43yHeOrTHQYogRoAT6/M M1Sti0utcngKVrAB1MrZE2gUfjeREzuc9pyHjKI3Bff26//iRiHkHDOK7EYIcHeNU7WDKn pxkBvYp8qHIbOnaLCgWYjwzBWlb2ebI= Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-56c147205b9so7111386a12.0 for ; Mon, 01 Apr 2024 07:09:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1711980549; x=1712585349; darn=kvack.org; h=cc:to:subject:message-id:date:mime-version:from:from:to:cc:subject :date:message-id:reply-to; bh=JCMZsVU0LmgOz8RDx24Jku4lco/65mmZImsvrq1+XuM=; b=hQ+Q/AQpILvh/0j+z1FG8uOUqkIrnenSjjveHKkCtpvOctq5IsijT8K6v4nOyID9Us AMViuaAGJxeXQIxHR8i9Uzlm6fTlicEo5tsO/I7QskKcE3wPc+AlhZFx9GLAmLr6h8nU 1rmufa4AAbt3pFTcv+b03LtFrjjLQ7om5luzee+rsBrXK+MvxdGOiG77mmVD/JtrHlFo Rnw3iwBkIx1UYh4fpGUJ5y1KPIfcYPvZYkAaeaN0BBGAmc+WglvxBUKddjxnbDeRE5ZG ZNj7DaLDjqmJtOZltcu3ifBL58yTVAeUPxyzOA8ZMu8E7FGRoWozmndE/+go5HiioCN1 CDfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711980549; x=1712585349; h=cc:to:subject:message-id:date:mime-version:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=JCMZsVU0LmgOz8RDx24Jku4lco/65mmZImsvrq1+XuM=; b=p2hfRib2IprvzUreB5VMhqUAnQFeqSQqzufDoQHnWDJshxMBdnkC7boszy4WiyNCeo A9VPvue6EFTnvzlsEGbELYIo2ea1QaKY122UOvDd71/8A+Kv+x1xinYRSR/cXIEJ+L2+ aVzHeBiIPK7xgky9oPWHhC8eev/0Ig7DvpLmzOrOqMWHdeee+nTzrFb9/Alt/ciNP/W4 7C8wubzP6bS4TnpnIxRM+r+DCQcGDh2GT3UF8RCEQp7zLWQzB7YFOeUMfm78d8JdKzcg EPtRR1qKFDvQxhYdGeE9mODM10jwlGlowIFIa/kbl8uznIKFuAz40Pn+o3TOo3cJlx1c hjyw== X-Forwarded-Encrypted: i=1; AJvYcCWb+t8ZmkgVevzD+mKlbOTZTtd4gBhClE76Xn79GwTlQahUMDzP6rlcTgadPmE6Ki4Vz2T1sBZUHEX5pZrdcI9YJEs= X-Gm-Message-State: AOJu0YySQN5nI6rQcq28ZP+hcreCYQbJupLV7l5nDPdcC+IAUe1IBuP6 4FTmPhMIHqOqZ9tYIfKEENDNlEGVRIHB3AYp/RvZ3SXz78G9Rlcv+dqZdoIfJiuGMTikUzdKBJA JwatM8JwfE0T9wK1+vVGQUuCfRAyiixxt4pAsyA== X-Google-Smtp-Source: AGHT+IF7avnKssLFyc1huiEJ6sN5JEnW+I8wIxMQreDlDKttKW9PvyXfcGYQ/3AT0/1/kx/NW6KK630xX8eninOxvxw= X-Received: by 2002:a50:d60d:0:b0:56d:c721:9f5b with SMTP id x13-20020a50d60d000000b0056dc7219f5bmr3980166edi.10.1711980549202; Mon, 01 Apr 2024 07:09:09 -0700 (PDT) Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST; Mon, 1 Apr 2024 10:09:08 -0400 From: Zhaoyu Liu Mime-Version: 1.0 X-Original-From: Zhaoyu Liu Date: Mon, 1 Apr 2024 10:09:08 -0400 Message-ID: Subject: [PATCH] mm: swap: prejudgement swap_has_cache to avoid page allocation To: akpm@linux-foundation.org, ying.huang@intel.com, songmuchun@bytedance.com Cc: david@redhat.com, willy@infradead.org, chrisl@kernel.org, nphamcs@gmail.com, kasong@tencent.com, yosryahmed@google.com, guo.ziliang@zte.com.cn, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: multipart/alternative; boundary="0000000000007ffb010615098859" X-Rspamd-Queue-Id: 2ACB8C001F X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 6ywtsdzhbnms8ixyu3tneg7naxuas55b X-HE-Tag: 1711980550-305927 X-HE-Meta: U2FsdGVkX1+FszMKVEIb3h/cyQiWVwx2Ku8QSjxE8U1Evw5P2YTdwnmzYcBkwdmpHzgEhWV5pq4p1uAz6WpI3pRq0xhzvfwPMvf6nWFT8QQyZKJmz7FJAmNORPcDh6WoG8XJFdAYXQUs1l0UOePatw2iezZZcgqEYuEMDZOsIzzPDor1Xr+kYkJMgB/kA28YEs5Ye2xDMD5aR1CF5eNIpE+EsmDns0xVkM1Bez48fCrcS2g0gTg+lubUVtzhbkQ3JyH/3nTNmsecekhXJXF82jQ6Jzo05ji7+YA4PbD4D9FoB+QtzU4Z8G4krdDNHjR8OVhWd9sb5HCAo7B2lKy0D74dpehxKXX9Ryo3aoq5GUF7Y3JVIOTvJGnvslnGrIvSVL+4Kbv0nS8zxNKaolzJU7rXYlA+DwSD5DbMkSx2AkXXnq9OMR6HHVNRka7kM5Tn8XLfbj9E6G5jHppUanstlj5rIomjnxxEof4eHr/m6N7iuuO7TpIdlpmw98YxP4deUo1AHtMFn1NTFdos3ZEE5gaNXpDD0uADGNtD0VZyAnmXm86S7KihaM6H6VnLJ5rGj1HFbDFlKtgCjAuTFwOwc9FFWOjAYd2J69DcyRIOpoXvGqJsPAXNqJE+Hgfo4X2XkGeSsaTlBbNDfPGk4H2MmpPm+kK8issRmYJFsxtr5/f1gKxvia+G3Lx0By+lHUKRmxr7PtUs9YsgB5OqcdvUz12nPpQGdl1ccnk6rEpLTzXAcBc5KRZDWCWubzWZY6wnnRJZhmvqy1xamVIEBXWxYkIKKL9foFrXCkO5nGUrhqXT0mEZtZ6cHdo7r1s1vN2lr9aI19KWqLn/3L8t1eqWJFHr7/WcXZ6HnOJQJ/6Ytazn5bBtmah4MG8C5SXDDAhRspHtT1GZ9jz/26qSfQyU5c+Y2j1dTUoSIaGbHSsA6tezKxnTKgdRTszO8Hpehyv6ikQ6PGfdX3+CFXAuhyI rJXlCBCL 0k4G3TvNNeps7bJJtgPFBJen+HfMyQIYtXKhqCr4jwrgLY5KW0p633q9btEydd+u8VbzBsPShuSOm8CUQrlxqDCFqgHzZ9RimhlVdphjG9UBNpQYWfUMHActYyHW/988W4D5a X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --0000000000007ffb010615098859 Content-Type: text/plain; charset="UTF-8" Based on qemu arm64 - latest kernel + 100M memory + 1024M swapfile. Create 1G anon mmap and set it to shared, and has two processes randomly access the shared memory. When they are racing on swap cache, on average, each "alloc_pages_mpol + swapcache_prepare + folio_put" took about 1475 us. So skip page allocation if SWAP_HAS_CACHE was set, just schedule_timeout_uninterruptible and continue to acquire page via filemap_get_folio() from swap cache, to speedup __read_swap_cache_async. Signed-off-by: Zhaoyu Liu --- include/linux/swap.h | 6 ++++++ mm/swap_state.c | 10 ++++++++++ mm/swapfile.c | 15 +++++++++++++++ 3 files changed, 31 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index a211a0383425..8a0013299f38 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -480,6 +480,7 @@ extern sector_t swapdev_block(int, pgoff_t); extern int __swap_count(swp_entry_t entry); extern int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); +extern bool swap_has_cache(struct swap_info_struct *si, swp_entry_t entry); struct swap_info_struct *swp_swap_info(swp_entry_t entry); struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); @@ -570,6 +571,11 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } +static inline bool swap_has_cache(struct swap_info_struct *si, swp_entry_t entry) +{ + return false; +} + static inline swp_entry_t folio_alloc_swap(struct folio *folio) { swp_entry_t entry; diff --git a/mm/swap_state.c b/mm/swap_state.c index bfc7e8c58a6d..f130cfc669ce 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -462,6 +462,15 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (!swap_swapcount(si, entry) && swap_slot_cache_enabled) goto fail_put_swap; + /* + * Skipping page allocation if SWAP_HAS_CACHE was set, + * just schedule_timeout_uninterruptible and continue to + * acquire page via filemap_get_folio() from swap cache, + * to speedup __read_swap_cache_async. + */ + if (swap_has_cache(si, entry)) + goto skip_alloc; + /* * Get a new folio to read into from swap. Allocate it now, * before marking swap_map SWAP_HAS_CACHE, when -EEXIST will @@ -483,6 +492,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (err != -EEXIST) goto fail_put_swap; +skip_alloc: /* * Protect against a recursive call to __read_swap_cache_async() * on the same entry waiting forever here because SWAP_HAS_CACHE diff --git a/mm/swapfile.c b/mm/swapfile.c index cf900794f5ed..5388950c4ca6 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1513,6 +1513,21 @@ int swp_swapcount(swp_entry_t entry) return count; } +/* + * Verify that a swap entry has been tagged with SWAP_HAS_CACHE + */ +bool swap_has_cache(struct swap_info_struct *si, swp_entry_t entry) +{ + pgoff_t offset = swp_offset(entry); + struct swap_cluster_info *ci; + bool has_cache; + + ci = lock_cluster_or_swap_info(si, offset); + has_cache = !!(si->swap_map[offset] & SWAP_HAS_CACHE); + unlock_cluster_or_swap_info(si, ci); + return has_cache; +} + static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, swp_entry_t entry) { -- 2.25.1 --0000000000007ffb010615098859 Content-Type: text/html; charset="UTF-8"

Based on qemu arm64 - latest kernel + 100M memory + 1024M swapfile.
Create 1G anon mmap and set it to shared, and has two processes
randomly access the shared memory. When they are racing on swap cache,
on average, each "alloc_pages_mpol + swapcache_prepare + folio_put"
took about 1475 us.

So skip page allocation if SWAP_HAS_CACHE was set, just
schedule_timeout_uninterruptible and continue to acquire page
via filemap_get_folio() from swap cache, to speedup
__read_swap_cache_async.

Signed-off-by: Zhaoyu Liu
---
include/linux/swap.h | 6 ++++++
mm/swap_state.c | 10 ++++++++++
mm/swapfile.c | 15 +++++++++++++++
3 files changed, 31 insertions(+)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index a211a0383425..8a0013299f38 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -480,6 +480,7 @@ extern sector_t swapdev_block(int, pgoff_t);
extern int __swap_count(swp_entry_t entry);
extern int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry);
extern int swp_swapcount(swp_entry_t entry);
+extern bool swap_has_cache(struct swap_info_struct *si, swp_entry_t entry);
struct swap_info_struct *swp_swap_info(swp_entry_t entry);
struct backing_dev_info;
extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
@@ -570,6 +571,11 @@ static inline int swp_swapcount(swp_entry_t entry)
return 0;
}

+static inline bool swap_has_cache(struct swap_info_struct *si, swp_entry_t entry)
+{
+ return false;
+}
+
static inline swp_entry_t folio_alloc_swap(struct folio *folio)
{
swp_entry_t entry;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index bfc7e8c58a6d..f130cfc669ce 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -462,6 +462,15 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
if (!swap_swapcount(si, entry) && swap_slot_cache_enabled)
goto fail_put_swap;

+ /*
+ * Skipping page allocation if SWAP_HAS_CACHE was set,
+ * just schedule_timeout_uninterruptible and continue to
+ * acquire page via filemap_get_folio() from swap cache,
+ * to speedup __read_swap_cache_async.
+ */
+ if (swap_has_cache(si, entry))
+ goto skip_alloc;
+
/*
* Get a new folio to read into from swap. Allocate it now,
* before marking swap_map SWAP_HAS_CACHE, when -EEXIST will
@@ -483,6 +492,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
if (err != -EEXIST)
goto fail_put_swap;

+skip_alloc:
/*
* Protect against a recursive call to __read_swap_cache_async()
* on the same entry waiting forever here because SWAP_HAS_CACHE
diff --git a/mm/swapfile.c b/mm/swapfile.c
index cf900794f5ed..5388950c4ca6 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1513,6 +1513,21 @@ int swp_swapcount(swp_entry_t entry)
return count;
}

+/*
+ * Verify that a swap entry has been tagged with SWAP_HAS_CACHE
+ */
+bool swap_has_cache(struct swap_info_struct *si, swp_entry_t entry)
+{
+ pgoff_t offset = swp_offset(entry);
+ struct swap_cluster_info *ci;
+ bool has_cache;
+
+ ci = lock_cluster_or_swap_info(si, offset);
+ has_cache = !!(si->swap_map[offset] & SWAP_HAS_CACHE);
+ unlock_cluster_or_swap_info(si, ci);
+ return has_cache;
+}
+
static bool swap_page_trans_huge_swapped(struct swap_info_struct *si,
swp_entry_t entry)
{
--
2.25.1

--0000000000007ffb010615098859--