From: Usama Arif <usamaarif642@gmail.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org,
shakeel.butt@linux.dev, david@redhat.com, ying.huang@intel.com,
hughd@google.com, willy@infradead.org, nphamcs@gmail.com,
chengming.zhou@linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v4 1/2] mm: store zero pages to be swapped out in a bitmap
Date: Thu, 13 Jun 2024 12:37:44 +0100 [thread overview]
Message-ID: <85804484-9973-41a1-a05d-000833285f39@gmail.com> (raw)
In-Reply-To: <ZmoBf6RPJzC2RaqM@google.com>
On 12/06/2024 21:13, Yosry Ahmed wrote:
> On Wed, Jun 12, 2024 at 01:43:35PM +0100, Usama Arif wrote:
> [..]
>
> Hi Usama,
>
> A few more comments/questions, sorry for not looking closely earlier.
No worries, Thanks for the reviews!
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index f1e559e216bd..48d8dca0b94b 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -453,6 +453,8 @@ static unsigned int cluster_list_del_first(struct swap_cluster_list *list,
>> static void swap_cluster_schedule_discard(struct swap_info_struct *si,
>> unsigned int idx)
>> {
>> + unsigned int i;
>> +
>> /*
>> * If scan_swap_map_slots() can't find a free cluster, it will check
>> * si->swap_map directly. To make sure the discarding cluster isn't
>> @@ -461,6 +463,13 @@ static void swap_cluster_schedule_discard(struct swap_info_struct *si,
>> */
>> memset(si->swap_map + idx * SWAPFILE_CLUSTER,
>> SWAP_MAP_BAD, SWAPFILE_CLUSTER);
>> + /*
>> + * zeromap can see updates from concurrent swap_writepage() and swap_read_folio()
>> + * call on other slots, hence use atomic clear_bit for zeromap instead of the
>> + * non-atomic bitmap_clear.
>> + */
> I don't think this is accurate. swap_read_folio() does not update the
> zeromap. I think the need for an atomic operation here is because we may
> be updating adjacent bits simulatenously, so we may cause lost updates
> otherwise (i.e. corrupting adjacent bits).
Thanks, will change to "Use atomic clear_bit instead of non-atomic
bitmap_clear to prevent adjacent bits corruption due to simultaneous
writes." in the next revision
>
>> + for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> + clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);
> Could you explain why we need to clear the zeromap here?
>
> swap_cluster_schedule_discard() is called from:
> - swap_free_cluster() -> free_cluster()
>
> This is already covered below.
>
> - swap_entry_free() -> dec_cluster_info_page() -> free_cluster()
>
> Each entry in the cluster should have its zeromap bit cleared in
> swap_entry_free() before the entire cluster is free and we call
> free_cluster().
>
> Am I missing something?
Yes, it looks like this one is not needed as swap_entry_free and
swap_free_cluster would already have cleared the bit. Will remove it.
I had initially started checking what codepaths zeromap would need to be
cleared. But then thought I could do it wherever si->swap_map is cleared
or set to SWAP_MAP_BAD, which is why I added it here.
>>
>> cluster_list_add_tail(&si->discard_clusters, si->cluster_info, idx);
>>
>> @@ -482,7 +491,7 @@ static void __free_cluster(struct swap_info_struct *si, unsigned long idx)
>> static void swap_do_scheduled_discard(struct swap_info_struct *si)
>> {
>> struct swap_cluster_info *info, *ci;
>> - unsigned int idx;
>> + unsigned int idx, i;
>>
>> info = si->cluster_info;
>>
>> @@ -498,6 +507,8 @@ static void swap_do_scheduled_discard(struct swap_info_struct *si)
>> __free_cluster(si, idx);
>> memset(si->swap_map + idx * SWAPFILE_CLUSTER,
>> 0, SWAPFILE_CLUSTER);
>> + for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> + clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);
> Same here. I didn't look into the specific code paths, but shouldn't the
> cluster be unused (and hence its zeromap bits already cleared?).
>
I think this one is needed (or atleast very good to have). There are 2
paths:
1) swap_cluster_schedule_discard (clears zeromap) -> swap_discard_work
-> swap_do_scheduled_discard (clears zeromap)
Path 1 doesnt need it as swap_cluster_schedule_discard already clears it.
2) scan_swap_map_slots -> scan_swap_map_try_ssd_cluster ->
swap_do_scheduled_discard (clears zeromap)
Path 2 might need it as zeromap isnt cleared earlier I believe
(eventhough I think it might already be 0).
Even if its cleared in path 2, I think its good to keep this one, as the
function is swap_do_scheduled_discard, i.e. incase it gets directly
called or si->discard_work gets scheduled anywhere else in the future,
it should do as the function name suggests, i.e. swap discard(clear
zeromap).
>> unlock_cluster(ci);
>> }
>> }
>> @@ -1059,9 +1070,12 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
>> {
>> unsigned long offset = idx * SWAPFILE_CLUSTER;
>> struct swap_cluster_info *ci;
>> + unsigned int i;
>>
>> ci = lock_cluster(si, offset);
>> memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER);
>> + for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> + clear_bit(offset + i, si->zeromap);
>> cluster_set_count_flag(ci, 0, 0);
>> free_cluster(si, idx);
>> unlock_cluster(ci);
>> @@ -1336,6 +1350,7 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry)
>> count = p->swap_map[offset];
>> VM_BUG_ON(count != SWAP_HAS_CACHE);
>> p->swap_map[offset] = 0;
>> + clear_bit(offset, p->zeromap);
> I think instead of clearing the zeromap in swap_free_cluster() and here
> separately, we can just do it in swap_range_free(). I suspect this may
> be the only place we really need to clear the zero in the swapfile code.
Sure, we could move it to swap_range_free, but then also move the
clearing of swap_map.
When it comes to clearing zeromap, I think its just generally a good
idea to clear it wherever swap_map is cleared.
So the diff over v4 looks like below (should address all comments but
not remove it from swap_do_scheduled_discard, and
move si->swap_map/zeromap clearing from
swap_free_cluster/swap_entry_free to swap_range_free):
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 48d8dca0b94b..39cad0d09525 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -463,13 +463,6 @@ static void swap_cluster_schedule_discard(struct
swap_info_struct *si,
*/
memset(si->swap_map + idx * SWAPFILE_CLUSTER,
SWAP_MAP_BAD, SWAPFILE_CLUSTER);
- /*
- * zeromap can see updates from concurrent swap_writepage() and
swap_read_folio()
- * call on other slots, hence use atomic clear_bit for zeromap
instead of the
- * non-atomic bitmap_clear.
- */
- for (i = 0; i < SWAPFILE_CLUSTER; i++)
- clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);
cluster_list_add_tail(&si->discard_clusters, si->cluster_info,
idx);
@@ -758,6 +751,15 @@ static void swap_range_free(struct swap_info_struct
*si, unsigned long offset,
unsigned long begin = offset;
unsigned long end = offset + nr_entries - 1;
void (*swap_slot_free_notify)(struct block_device *, unsigned
long);
+ unsigned int i;
+
+ memset(si->swap_map + offset, 0, nr_entries);
+ /*
+ * Use atomic clear_bit operations only on zeromap instead of
non-atomic
+ * bitmap_clear to prevent adjacent bits corruption due to
simultaneous writes.
+ */
+ for (i = 0; i < nr_entries; i++)
+ clear_bit(offset + i, si->zeromap);
if (offset < si->lowest_bit)
si->lowest_bit = offset;
@@ -1070,12 +1072,8 @@ static void swap_free_cluster(struct
swap_info_struct *si, unsigned long idx)
{
unsigned long offset = idx * SWAPFILE_CLUSTER;
struct swap_cluster_info *ci;
- unsigned int i;
ci = lock_cluster(si, offset);
- memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER);
- for (i = 0; i < SWAPFILE_CLUSTER; i++)
- clear_bit(offset + i, si->zeromap);
cluster_set_count_flag(ci, 0, 0);
free_cluster(si, idx);
unlock_cluster(ci);
@@ -1349,8 +1347,6 @@ static void swap_entry_free(struct
swap_info_struct *p, swp_entry_t entry)
ci = lock_cluster(p, offset);
count = p->swap_map[offset];
VM_BUG_ON(count != SWAP_HAS_CACHE);
- p->swap_map[offset] = 0;
- clear_bit(offset, p->zeromap);
dec_cluster_info_page(p, p->cluster_info, offset);
unlock_cluster(ci);
next prev parent reply other threads:[~2024-06-13 11:37 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-12 12:43 [PATCH v4 0/2] " Usama Arif
2024-06-12 12:43 ` [PATCH v4 1/2] " Usama Arif
2024-06-12 20:13 ` Yosry Ahmed
2024-06-13 11:37 ` Usama Arif [this message]
2024-06-13 16:38 ` Yosry Ahmed
2024-06-13 19:21 ` Usama Arif
2024-06-13 19:26 ` Yosry Ahmed
2024-06-13 19:38 ` Usama Arif
2024-09-04 5:55 ` Barry Song
2024-09-04 7:12 ` Yosry Ahmed
2024-09-04 7:17 ` Barry Song
2024-09-04 7:22 ` Yosry Ahmed
2024-09-04 7:54 ` Barry Song
2024-09-04 17:40 ` Yosry Ahmed
2024-09-05 7:03 ` Barry Song
2024-09-05 7:55 ` Yosry Ahmed
2024-09-05 8:49 ` Barry Song
2024-09-05 10:10 ` Barry Song
2024-09-05 10:33 ` Barry Song
2024-09-05 10:53 ` Usama Arif
2024-09-05 11:00 ` Barry Song
2024-09-05 19:19 ` Usama Arif
2024-09-05 17:36 ` Yosry Ahmed
2024-09-05 19:28 ` Yosry Ahmed
2024-09-06 10:22 ` Barry Song
2024-09-05 10:37 ` Usama Arif
2024-09-05 10:42 ` Barry Song
2024-09-05 10:50 ` Usama Arif
2024-09-04 11:14 ` Usama Arif
2024-09-04 23:44 ` Barry Song
2024-09-04 23:47 ` Barry Song
2024-09-04 23:57 ` Yosry Ahmed
2024-09-05 0:29 ` Barry Song
2024-09-05 7:38 ` Yosry Ahmed
2024-06-12 12:43 ` [PATCH v4 2/2] mm: remove code to handle same filled pages Usama Arif
2024-06-12 15:09 ` Nhat Pham
2024-06-12 16:34 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=85804484-9973-41a1-a05d-000833285f39@gmail.com \
--to=usamaarif642@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox