[PATCH 0/3] mm: zswap: trivial folio conversions

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] mm: zswap: trivial folio conversions
@ 2024-05-24  3:38 Yosry Ahmed
  2024-05-24  3:38 ` [PATCH 1/3] mm: zswap: use sg_set_folio() in zswap_{compress/decompress}() Yosry Ahmed
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Yosry Ahmed @ 2024-05-24  3:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Nhat Pham, Chengming Zhou, Matthew Wilcox,
	linux-mm, linux-kernel, Yosry Ahmed

Some trivial folio conversions in zswap code.

The mean reason I included a cover letter is that I wanted to get
feedback on what other trivial conversions can/should be done in
mm/zswap.c (keeping in mind that only order-0 folios are supported
anyway).  These are the things I came across while searching for 'page'
in mm/zswap.c, and chose not to do anything about for now:

1. zswap_max_pages(), zswap_accept_thr_pages(), zswap_total_pages():
  - We can use 'size' instead of 'pages' and shift the return by
    PAGE_SHIFT. This adds an unnecessary shift, but I doubt it matters
    at all. The motivation is to get rid of 'page' to find things that
    should be converted more easily.

2. Counters names: zswap_stored_pages, zswap_written_back_pages, etc.

3. Comments all over the place reference 'page' instead of 'folio'.

4. shrink_memcg_cb(), zswap_shrinker_scan():
  - Rename encountered_page_in_swap_cache to
    encounterd_folio_in_swap_cache, or even better: folio_eexist or
    hit_swap_cache.

5. entry_to_nid():
  - It's tempting to try to use folio_to_nid(virt_to_folio()), but I
    think this adds an unnecessary call to compound_head(). It may not
    matter in practice though because the page is always a head page.

Yosry Ahmed (3):
  mm: zswap: use sg_set_folio() in zswap_{compress/decompress}()
  mm :zswap: use kmap_local_folio() in zswap_load()
  mm: zswap: make same_filled functions folio-friendly

 mm/zswap.c | 41 ++++++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 23 deletions(-)

-- 
2.45.1.288.g0e0cd299f1-goog



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] mm: zswap: use sg_set_folio() in zswap_{compress/decompress}()
  2024-05-24  3:38 [PATCH 0/3] mm: zswap: trivial folio conversions Yosry Ahmed
@ 2024-05-24  3:38 ` Yosry Ahmed
  2024-06-03  6:03   ` Chengming Zhou
  2024-05-24  3:38 ` [PATCH 2/3] mm :zswap: use kmap_local_folio() in zswap_load() Yosry Ahmed
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2024-05-24  3:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Nhat Pham, Chengming Zhou, Matthew Wilcox,
	linux-mm, linux-kernel, Yosry Ahmed

sg_set_folio() is equivalent to sg_set_page() for order-0 folios, which
are the only ones supported by zswap. Now zswap_decompress() can take in
a folio directly.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/zswap.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index a50e2986cd2fa..3693df96c81fe 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -917,7 +917,7 @@ static bool zswap_compress(struct folio *folio, struct zswap_entry *entry)
 
 	dst = acomp_ctx->buffer;
 	sg_init_table(&input, 1);
-	sg_set_page(&input, &folio->page, PAGE_SIZE, 0);
+	sg_set_folio(&input, folio, PAGE_SIZE, 0);
 
 	/*
 	 * We need PAGE_SIZE * 2 here since there maybe over-compression case,
@@ -971,7 +971,7 @@ static bool zswap_compress(struct folio *folio, struct zswap_entry *entry)
 	return comp_ret == 0 && alloc_ret == 0;
 }
 
-static void zswap_decompress(struct zswap_entry *entry, struct page *page)
+static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 {
 	struct zpool *zpool = zswap_find_zpool(entry);
 	struct scatterlist input, output;
@@ -1000,7 +1000,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct page *page)
 
 	sg_init_one(&input, src, entry->length);
 	sg_init_table(&output, 1);
-	sg_set_page(&output, page, PAGE_SIZE, 0);
+	sg_set_folio(&output, folio, PAGE_SIZE, 0);
 	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
 	BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
 	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
@@ -1073,7 +1073,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 		return -ENOMEM;
 	}
 
-	zswap_decompress(entry, &folio->page);
+	zswap_decompress(entry, folio);
 
 	count_vm_event(ZSWPWB);
 	if (entry->objcg)
@@ -1580,7 +1580,7 @@ bool zswap_load(struct folio *folio)
 		return false;
 
 	if (entry->length)
-		zswap_decompress(entry, page);
+		zswap_decompress(entry, folio);
 	else {
 		dst = kmap_local_page(page);
 		zswap_fill_page(dst, entry->value);
-- 
2.45.1.288.g0e0cd299f1-goog



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] mm :zswap: use kmap_local_folio() in zswap_load()
  2024-05-24  3:38 [PATCH 0/3] mm: zswap: trivial folio conversions Yosry Ahmed
  2024-05-24  3:38 ` [PATCH 1/3] mm: zswap: use sg_set_folio() in zswap_{compress/decompress}() Yosry Ahmed
@ 2024-05-24  3:38 ` Yosry Ahmed
  2024-05-28 15:16   ` Nhat Pham
  2024-06-03  6:04   ` Chengming Zhou
  2024-05-24  3:38 ` [PATCH 3/3] mm: zswap: make same_filled functions folio-friendly Yosry Ahmed
  2024-05-24  3:59 ` [PATCH 0/3] mm: zswap: trivial folio conversions Matthew Wilcox
  3 siblings, 2 replies; 16+ messages in thread
From: Yosry Ahmed @ 2024-05-24  3:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Nhat Pham, Chengming Zhou, Matthew Wilcox,
	linux-mm, linux-kernel, Yosry Ahmed

Eliminate the last explicit 'struct page' reference in mm/zswap.c.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/zswap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 3693df96c81fe..bac66991fb14e 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1551,7 +1551,6 @@ bool zswap_load(struct folio *folio)
 {
 	swp_entry_t swp = folio->swap;
 	pgoff_t offset = swp_offset(swp);
-	struct page *page = &folio->page;
 	bool swapcache = folio_test_swapcache(folio);
 	struct xarray *tree = swap_zswap_tree(swp);
 	struct zswap_entry *entry;
@@ -1582,7 +1581,7 @@ bool zswap_load(struct folio *folio)
 	if (entry->length)
 		zswap_decompress(entry, folio);
 	else {
-		dst = kmap_local_page(page);
+		dst = kmap_local_folio(folio, 0);
 		zswap_fill_page(dst, entry->value);
 		kunmap_local(dst);
 	}
-- 
2.45.1.288.g0e0cd299f1-goog



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] mm: zswap: make same_filled functions folio-friendly
  2024-05-24  3:38 [PATCH 0/3] mm: zswap: trivial folio conversions Yosry Ahmed
  2024-05-24  3:38 ` [PATCH 1/3] mm: zswap: use sg_set_folio() in zswap_{compress/decompress}() Yosry Ahmed
  2024-05-24  3:38 ` [PATCH 2/3] mm :zswap: use kmap_local_folio() in zswap_load() Yosry Ahmed
@ 2024-05-24  3:38 ` Yosry Ahmed
  2024-05-28 15:18   ` Nhat Pham
  2024-06-03  6:07   ` Chengming Zhou
  2024-05-24  3:59 ` [PATCH 0/3] mm: zswap: trivial folio conversions Matthew Wilcox
  3 siblings, 2 replies; 16+ messages in thread
From: Yosry Ahmed @ 2024-05-24  3:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Nhat Pham, Chengming Zhou, Matthew Wilcox,
	linux-mm, linux-kernel, Yosry Ahmed

A variable name 'page' is used in zswap_is_folio_same_filled() and
zswap_fill_page() to point at the kmapped data in a folio. Use 'data'
instead to avoid confusion and stop it from showing up when searching
for 'page' references in mm/zswap.c.

While we are at it, move the kmap/kunmap calls into zswap_fill_page(),
make it take in a folio, and rename it to zswap_fill_folio().

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/zswap.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index bac66991fb14e..b9b35ef86d9be 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1375,35 +1375,35 @@ static void shrink_worker(struct work_struct *w)
 **********************************/
 static bool zswap_is_folio_same_filled(struct folio *folio, unsigned long *value)
 {
-	unsigned long *page;
+	unsigned long *data;
 	unsigned long val;
-	unsigned int pos, last_pos = PAGE_SIZE / sizeof(*page) - 1;
+	unsigned int pos, last_pos = PAGE_SIZE / sizeof(*data) - 1;
 	bool ret = false;
 
-	page = kmap_local_folio(folio, 0);
-	val = page[0];
+	data = kmap_local_folio(folio, 0);
+	val = data[0];
 
-	if (val != page[last_pos])
+	if (val != data[last_pos])
 		goto out;
 
 	for (pos = 1; pos < last_pos; pos++) {
-		if (val != page[pos])
+		if (val != data[pos])
 			goto out;
 	}
 
 	*value = val;
 	ret = true;
 out:
-	kunmap_local(page);
+	kunmap_local(data);
 	return ret;
 }
 
-static void zswap_fill_page(void *ptr, unsigned long value)
+static void zswap_fill_folio(struct folio *folio, unsigned long value)
 {
-	unsigned long *page;
+	unsigned long *data = kmap_local_folio(folio, 0);
 
-	page = (unsigned long *)ptr;
-	memset_l(page, value, PAGE_SIZE / sizeof(unsigned long));
+	memset_l(data, value, PAGE_SIZE / sizeof(unsigned long));
+	kunmap_local(data);
 }
 
 /*********************************
@@ -1554,7 +1554,6 @@ bool zswap_load(struct folio *folio)
 	bool swapcache = folio_test_swapcache(folio);
 	struct xarray *tree = swap_zswap_tree(swp);
 	struct zswap_entry *entry;
-	u8 *dst;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
 
@@ -1580,11 +1579,8 @@ bool zswap_load(struct folio *folio)
 
 	if (entry->length)
 		zswap_decompress(entry, folio);
-	else {
-		dst = kmap_local_folio(folio, 0);
-		zswap_fill_page(dst, entry->value);
-		kunmap_local(dst);
-	}
+	else
+		zswap_fill_folio(folio, entry->value);
 
 	count_vm_event(ZSWPIN);
 	if (entry->objcg)
-- 
2.45.1.288.g0e0cd299f1-goog



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] mm: zswap: trivial folio conversions
  2024-05-24  3:38 [PATCH 0/3] mm: zswap: trivial folio conversions Yosry Ahmed
                   ` (2 preceding siblings ...)
  2024-05-24  3:38 ` [PATCH 3/3] mm: zswap: make same_filled functions folio-friendly Yosry Ahmed
@ 2024-05-24  3:59 ` Matthew Wilcox
  2024-05-24 19:53   ` Yosry Ahmed
  3 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2024-05-24  3:59 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Johannes Weiner, Nhat Pham, Chengming Zhou,
	linux-mm, linux-kernel

On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote:
> Some trivial folio conversions in zswap code.

The three patches themselves look good.

> The mean reason I included a cover letter is that I wanted to get
> feedback on what other trivial conversions can/should be done in
> mm/zswap.c (keeping in mind that only order-0 folios are supported
> anyway).  These are the things I came across while searching for 'page'
> in mm/zswap.c, and chose not to do anything about for now:

I think there's a deeper question to answer before answering these
questions, which is what we intend to do with large folios and zswap in
the future.  Do we intend to split them?  Compress them as a large
folio?  Compress each page in a large folio separately?  I can see an
argument for choices 2 and 3, but I think choice 1 is going to be
increasingly untenable.

> 1. zswap_max_pages(), zswap_accept_thr_pages(), zswap_total_pages():
>   - We can use 'size' instead of 'pages' and shift the return by
>     PAGE_SHIFT. This adds an unnecessary shift, but I doubt it matters
>     at all. The motivation is to get rid of 'page' to find things that
>     should be converted more easily.
> 
> 2. Counters names: zswap_stored_pages, zswap_written_back_pages, etc.
> 
> 3. Comments all over the place reference 'page' instead of 'folio'.
> 
> 4. shrink_memcg_cb(), zswap_shrinker_scan():
>   - Rename encountered_page_in_swap_cache to
>     encounterd_folio_in_swap_cache, or even better: folio_eexist or
>     hit_swap_cache.
> 
> 5. entry_to_nid():
>   - It's tempting to try to use folio_to_nid(virt_to_folio()), but I
>     think this adds an unnecessary call to compound_head(). It may not
>     matter in practice though because the page is always a head page.
> 
> Yosry Ahmed (3):
>   mm: zswap: use sg_set_folio() in zswap_{compress/decompress}()
>   mm :zswap: use kmap_local_folio() in zswap_load()
>   mm: zswap: make same_filled functions folio-friendly
> 
>  mm/zswap.c | 41 ++++++++++++++++++-----------------------
>  1 file changed, 18 insertions(+), 23 deletions(-)
> 
> -- 
> 2.45.1.288.g0e0cd299f1-goog
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] mm: zswap: trivial folio conversions
  2024-05-24  3:59 ` [PATCH 0/3] mm: zswap: trivial folio conversions Matthew Wilcox
@ 2024-05-24 19:53   ` Yosry Ahmed
  2024-05-24 23:12     ` Yosry Ahmed
  0 siblings, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2024-05-24 19:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Johannes Weiner, Nhat Pham, Chengming Zhou,
	linux-mm, linux-kernel

On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote:
> > Some trivial folio conversions in zswap code.
>
> The three patches themselves look good.
>
> > The mean reason I included a cover letter is that I wanted to get
> > feedback on what other trivial conversions can/should be done in
> > mm/zswap.c (keeping in mind that only order-0 folios are supported
> > anyway).  These are the things I came across while searching for 'page'
> > in mm/zswap.c, and chose not to do anything about for now:
>
> I think there's a deeper question to answer before answering these
> questions, which is what we intend to do with large folios and zswap in
> the future.  Do we intend to split them?  Compress them as a large
> folio?  Compress each page in a large folio separately?  I can see an
> argument for choices 2 and 3, but I think choice 1 is going to be
> increasingly untenable.

Yeah I was kinda getting the small things out of the way so that zswap
is fully folio-ized, before we think about large folios. I haven't
given it a lot of thought, but here's what I have in mind.

Right now, I think most configs enable zswap will disable
CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so
let's assume that today we are splitting large folios before they go
to zswap (i.e. choice 1).

What we do next depends on how the core swap intends to deal with
large folios. My understanding based on recent developments is that we
intend to swapout large folios as a whole, but I saw some discussions
about splitting all large folios before swapping them out, or leaving
them whole but swapping them out in order-0 chunks.

I assume the rationale is that there is little benefit to keeping the
folios whole because they will most likely be freed soon anyway, but I
understand not wanting to spend time on splitting them, so swapping
them out in order-0 chunks makes some sense to me. It also dodges the
whole fragmentation issue.

If we do either of these things in the core swap code, then I think
zswap doesn't need to do anything to support large folios. If not,
then we need to make a choice between 2 (compress large folios) &
choice 3 (compress each page separately) as you mentioned.

Compressing large folios as a whole means that we need to decompress
them as a whole to read a single page, which I think could be very
inefficient in some cases or force us to swapin large folios. Unless
of course we end up in a world where we mostly swapin the same large
folios that we swapped out. Although there can be additional
compression savings from compressing large folios as a whole.

Hence, I think choice 3 is the most reasonable one, at least for the
short-term. I also think this is what zram does, but I haven't
checked. Even if we all agree on this, there are still questions that
we need to answer. For example, do we allocate zswap_entry's for each
order-0 chunk right away, or do we allocate a single zswap_entry for
the entire folio, and then "split" it during swapin if we only need to
read part of the folio?

Wondering what others think here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] mm: zswap: trivial folio conversions
  2024-05-24 19:53   ` Yosry Ahmed
@ 2024-05-24 23:12     ` Yosry Ahmed
  2024-05-28 19:08       ` Nhat Pham
  0 siblings, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2024-05-24 23:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Johannes Weiner, Nhat Pham, Chengming Zhou,
	linux-mm, linux-kernel, David Hildenbrand, Barry Song, Chris Li,
	Ryan Roberts, Kairui Song

On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote:
> > > Some trivial folio conversions in zswap code.
> >
> > The three patches themselves look good.
> >
> > > The mean reason I included a cover letter is that I wanted to get
> > > feedback on what other trivial conversions can/should be done in
> > > mm/zswap.c (keeping in mind that only order-0 folios are supported
> > > anyway).  These are the things I came across while searching for 'page'
> > > in mm/zswap.c, and chose not to do anything about for now:
> >
> > I think there's a deeper question to answer before answering these
> > questions, which is what we intend to do with large folios and zswap in
> > the future.  Do we intend to split them?  Compress them as a large
> > folio?  Compress each page in a large folio separately?  I can see an
> > argument for choices 2 and 3, but I think choice 1 is going to be
> > increasingly untenable.
>
> Yeah I was kinda getting the small things out of the way so that zswap
> is fully folio-ized, before we think about large folios. I haven't
> given it a lot of thought, but here's what I have in mind.
>
> Right now, I think most configs enable zswap will disable
> CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so
> let's assume that today we are splitting large folios before they go
> to zswap (i.e. choice 1).
>
> What we do next depends on how the core swap intends to deal with
> large folios. My understanding based on recent developments is that we
> intend to swapout large folios as a whole, but I saw some discussions
> about splitting all large folios before swapping them out, or leaving
> them whole but swapping them out in order-0 chunks.
>
> I assume the rationale is that there is little benefit to keeping the
> folios whole because they will most likely be freed soon anyway, but I
> understand not wanting to spend time on splitting them, so swapping
> them out in order-0 chunks makes some sense to me. It also dodges the
> whole fragmentation issue.
>
> If we do either of these things in the core swap code, then I think
> zswap doesn't need to do anything to support large folios. If not,
> then we need to make a choice between 2 (compress large folios) &
> choice 3 (compress each page separately) as you mentioned.
>
> Compressing large folios as a whole means that we need to decompress
> them as a whole to read a single page, which I think could be very
> inefficient in some cases or force us to swapin large folios. Unless
> of course we end up in a world where we mostly swapin the same large
> folios that we swapped out. Although there can be additional
> compression savings from compressing large folios as a whole.
>
> Hence, I think choice 3 is the most reasonable one, at least for the
> short-term. I also think this is what zram does, but I haven't
> checked. Even if we all agree on this, there are still questions that
> we need to answer. For example, do we allocate zswap_entry's for each
> order-0 chunk right away, or do we allocate a single zswap_entry for
> the entire folio, and then "split" it during swapin if we only need to
> read part of the folio?
>
> Wondering what others think here.

More thoughts that came to mind here:

- Whether we go with choice 2 or 3, we may face a latency issue. Zswap
compression happens synchronously in the context of reclaim, so if we
start handling large folios in zswap, it may be more efficient to do
it asynchronously like swap to disk.

- Supporting compression of large folios depends on zsmalloc (and
maybe other allocators) supporting it. There have been patches from
Barry to add this support to some extent, but I didn't take a close
look at those.

Adding other folks from the mTHP swap discussions here in case they
have other thoughts about zswap.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] mm :zswap: use kmap_local_folio() in zswap_load()
  2024-05-24  3:38 ` [PATCH 2/3] mm :zswap: use kmap_local_folio() in zswap_load() Yosry Ahmed
@ 2024-05-28 15:16   ` Nhat Pham
  2024-06-03  6:04   ` Chengming Zhou
  1 sibling, 0 replies; 16+ messages in thread
From: Nhat Pham @ 2024-05-28 15:16 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Johannes Weiner, Chengming Zhou, Matthew Wilcox,
	linux-mm, linux-kernel

On Thu, May 23, 2024 at 8:38 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Eliminate the last explicit 'struct page' reference in mm/zswap.c.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Reviewed-by: Nhat Pham <nphamcs@gmail.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] mm: zswap: make same_filled functions folio-friendly
  2024-05-24  3:38 ` [PATCH 3/3] mm: zswap: make same_filled functions folio-friendly Yosry Ahmed
@ 2024-05-28 15:18   ` Nhat Pham
  2024-06-03  6:07   ` Chengming Zhou
  1 sibling, 0 replies; 16+ messages in thread
From: Nhat Pham @ 2024-05-28 15:18 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Johannes Weiner, Chengming Zhou, Matthew Wilcox,
	linux-mm, linux-kernel

On Thu, May 23, 2024 at 8:38 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> A variable name 'page' is used in zswap_is_folio_same_filled() and
> zswap_fill_page() to point at the kmapped data in a folio. Use 'data'
> instead to avoid confusion and stop it from showing up when searching
> for 'page' references in mm/zswap.c.
>
> While we are at it, move the kmap/kunmap calls into zswap_fill_page(),
> make it take in a folio, and rename it to zswap_fill_folio().
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Seems reasonable to me.
Reviewed-by: Nhat Pham <nphamcs@gmail.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] mm: zswap: trivial folio conversions
  2024-05-24 23:12     ` Yosry Ahmed
@ 2024-05-28 19:08       ` Nhat Pham
  2024-05-28 19:32         ` Yosry Ahmed
  2024-06-02  1:30         ` Barry Song
  0 siblings, 2 replies; 16+ messages in thread
From: Nhat Pham @ 2024-05-28 19:08 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Matthew Wilcox, Andrew Morton, Johannes Weiner, Chengming Zhou,
	linux-mm, linux-kernel, David Hildenbrand, Barry Song, Chris Li,
	Ryan Roberts, Kairui Song

On Fri, May 24, 2024 at 4:13 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote:
> > > > Some trivial folio conversions in zswap code.
> > >
> > > The three patches themselves look good.
> > >
> > > > The mean reason I included a cover letter is that I wanted to get
> > > > feedback on what other trivial conversions can/should be done in
> > > > mm/zswap.c (keeping in mind that only order-0 folios are supported
> > > > anyway).  These are the things I came across while searching for 'page'
> > > > in mm/zswap.c, and chose not to do anything about for now:
> > >
> > > I think there's a deeper question to answer before answering these
> > > questions, which is what we intend to do with large folios and zswap in
> > > the future.  Do we intend to split them?  Compress them as a large
> > > folio?  Compress each page in a large folio separately?  I can see an
> > > argument for choices 2 and 3, but I think choice 1 is going to be
> > > increasingly untenable.
> >
> > Yeah I was kinda getting the small things out of the way so that zswap
> > is fully folio-ized, before we think about large folios. I haven't
> > given it a lot of thought, but here's what I have in mind.
> >
> > Right now, I think most configs enable zswap will disable
> > CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so
> > let's assume that today we are splitting large folios before they go
> > to zswap (i.e. choice 1).
> >
> > What we do next depends on how the core swap intends to deal with
> > large folios. My understanding based on recent developments is that we
> > intend to swapout large folios as a whole, but I saw some discussions
> > about splitting all large folios before swapping them out, or leaving
> > them whole but swapping them out in order-0 chunks.
> >
> > I assume the rationale is that there is little benefit to keeping the
> > folios whole because they will most likely be freed soon anyway, but I
> > understand not wanting to spend time on splitting them, so swapping
> > them out in order-0 chunks makes some sense to me. It also dodges the
> > whole fragmentation issue.
> >
> > If we do either of these things in the core swap code, then I think
> > zswap doesn't need to do anything to support large folios. If not,
> > then we need to make a choice between 2 (compress large folios) &
> > choice 3 (compress each page separately) as you mentioned.
> >
> > Compressing large folios as a whole means that we need to decompress
> > them as a whole to read a single page, which I think could be very
> > inefficient in some cases or force us to swapin large folios. Unless
> > of course we end up in a world where we mostly swapin the same large
> > folios that we swapped out. Although there can be additional
> > compression savings from compressing large folios as a whole.
> >
> > Hence, I think choice 3 is the most reasonable one, at least for the
> > short-term. I also think this is what zram does, but I haven't
> > checked. Even if we all agree on this, there are still questions that
> > we need to answer. For example, do we allocate zswap_entry's for each
> > order-0 chunk right away, or do we allocate a single zswap_entry for
> > the entire folio, and then "split" it during swapin if we only need to
> > read part of the folio?
> >
> > Wondering what others think here.
>
> More thoughts that came to mind here:
>
> - Whether we go with choice 2 or 3, we may face a latency issue. Zswap
> compression happens synchronously in the context of reclaim, so if we
> start handling large folios in zswap, it may be more efficient to do
> it asynchronously like swap to disk.

We've been discussing this in private as well :)

It doesn't have to be these two extremes right? I'm perfectly happy
with starting with compressing each subpage separately, but perhaps we
can consider managing larger folios in bigger chunks (say 64KB). That
way, on swap-in, we just have to bring a whole chunk in, not the
entire folio, and still take advantage of compression efficiencies on
bigger-than-one-page chunks. I'd also check with other filesystems
that leverage compression, to see what's their unit of compression is.

I believe this is the approach Barry is suggesting for zram:

https://lore.kernel.org/linux-block/20240327214816.31191-2-21cnbao@gmail.com/T/

Once the zsmalloc infrastructure is there, we can play with this :)

Barry - what's the progress regarding this front?

>
> - Supporting compression of large folios depends on zsmalloc (and
> maybe other allocators) supporting it. There have been patches from
> Barry to add this support to some extent, but I didn't take a close
> look at those.
>
> Adding other folks from the mTHP swap discussions here in case they
> have other thoughts about zswap.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] mm: zswap: trivial folio conversions
  2024-05-28 19:08       ` Nhat Pham
@ 2024-05-28 19:32         ` Yosry Ahmed
  2024-06-03  6:19           ` Chengming Zhou
  2024-06-02  1:30         ` Barry Song
  1 sibling, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2024-05-28 19:32 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Matthew Wilcox, Andrew Morton, Johannes Weiner, Chengming Zhou,
	linux-mm, linux-kernel, David Hildenbrand, Barry Song, Chris Li,
	Ryan Roberts, Kairui Song

On Tue, May 28, 2024 at 12:08 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Fri, May 24, 2024 at 4:13 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote:
> > > > > Some trivial folio conversions in zswap code.
> > > >
> > > > The three patches themselves look good.
> > > >
> > > > > The mean reason I included a cover letter is that I wanted to get
> > > > > feedback on what other trivial conversions can/should be done in
> > > > > mm/zswap.c (keeping in mind that only order-0 folios are supported
> > > > > anyway).  These are the things I came across while searching for 'page'
> > > > > in mm/zswap.c, and chose not to do anything about for now:
> > > >
> > > > I think there's a deeper question to answer before answering these
> > > > questions, which is what we intend to do with large folios and zswap in
> > > > the future.  Do we intend to split them?  Compress them as a large
> > > > folio?  Compress each page in a large folio separately?  I can see an
> > > > argument for choices 2 and 3, but I think choice 1 is going to be
> > > > increasingly untenable.
> > >
> > > Yeah I was kinda getting the small things out of the way so that zswap
> > > is fully folio-ized, before we think about large folios. I haven't
> > > given it a lot of thought, but here's what I have in mind.
> > >
> > > Right now, I think most configs enable zswap will disable
> > > CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so
> > > let's assume that today we are splitting large folios before they go
> > > to zswap (i.e. choice 1).
> > >
> > > What we do next depends on how the core swap intends to deal with
> > > large folios. My understanding based on recent developments is that we
> > > intend to swapout large folios as a whole, but I saw some discussions
> > > about splitting all large folios before swapping them out, or leaving
> > > them whole but swapping them out in order-0 chunks.
> > >
> > > I assume the rationale is that there is little benefit to keeping the
> > > folios whole because they will most likely be freed soon anyway, but I
> > > understand not wanting to spend time on splitting them, so swapping
> > > them out in order-0 chunks makes some sense to me. It also dodges the
> > > whole fragmentation issue.
> > >
> > > If we do either of these things in the core swap code, then I think
> > > zswap doesn't need to do anything to support large folios. If not,
> > > then we need to make a choice between 2 (compress large folios) &
> > > choice 3 (compress each page separately) as you mentioned.
> > >
> > > Compressing large folios as a whole means that we need to decompress
> > > them as a whole to read a single page, which I think could be very
> > > inefficient in some cases or force us to swapin large folios. Unless
> > > of course we end up in a world where we mostly swapin the same large
> > > folios that we swapped out. Although there can be additional
> > > compression savings from compressing large folios as a whole.
> > >
> > > Hence, I think choice 3 is the most reasonable one, at least for the
> > > short-term. I also think this is what zram does, but I haven't
> > > checked. Even if we all agree on this, there are still questions that
> > > we need to answer. For example, do we allocate zswap_entry's for each
> > > order-0 chunk right away, or do we allocate a single zswap_entry for
> > > the entire folio, and then "split" it during swapin if we only need to
> > > read part of the folio?
> > >
> > > Wondering what others think here.
> >
> > More thoughts that came to mind here:
> >
> > - Whether we go with choice 2 or 3, we may face a latency issue. Zswap
> > compression happens synchronously in the context of reclaim, so if we
> > start handling large folios in zswap, it may be more efficient to do
> > it asynchronously like swap to disk.
>
> We've been discussing this in private as well :)
>
> It doesn't have to be these two extremes right? I'm perfectly happy
> with starting with compressing each subpage separately, but perhaps we
> can consider managing larger folios in bigger chunks (say 64KB). That
> way, on swap-in, we just have to bring a whole chunk in, not the
> entire folio, and still take advantage of compression efficiencies on
> bigger-than-one-page chunks. I'd also check with other filesystems
> that leverage compression, to see what's their unit of compression is.

Right. But I think it will be a clearer win to start with compressing
each subpage separately, and it avoids splitting folios during reclaim
to zswap. It also doesn't depend on the zsmalloc work.

Once we have that, we can experiment with compressing folios in larger
chunks. The tradeoffs become less clear at that point, and the number
of variables you can tune goes up :)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] mm: zswap: trivial folio conversions
  2024-05-28 19:08       ` Nhat Pham
  2024-05-28 19:32         ` Yosry Ahmed
@ 2024-06-02  1:30         ` Barry Song
  1 sibling, 0 replies; 16+ messages in thread
From: Barry Song @ 2024-06-02  1:30 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Yosry Ahmed, Matthew Wilcox, Andrew Morton, Johannes Weiner,
	Chengming Zhou, linux-mm, linux-kernel, David Hildenbrand,
	Chris Li, Ryan Roberts, Kairui Song

On Wed, May 29, 2024 at 7:08 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Fri, May 24, 2024 at 4:13 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote:
> > > > > Some trivial folio conversions in zswap code.
> > > >
> > > > The three patches themselves look good.
> > > >
> > > > > The mean reason I included a cover letter is that I wanted to get
> > > > > feedback on what other trivial conversions can/should be done in
> > > > > mm/zswap.c (keeping in mind that only order-0 folios are supported
> > > > > anyway).  These are the things I came across while searching for 'page'
> > > > > in mm/zswap.c, and chose not to do anything about for now:
> > > >
> > > > I think there's a deeper question to answer before answering these
> > > > questions, which is what we intend to do with large folios and zswap in
> > > > the future.  Do we intend to split them?  Compress them as a large
> > > > folio?  Compress each page in a large folio separately?  I can see an
> > > > argument for choices 2 and 3, but I think choice 1 is going to be
> > > > increasingly untenable.
> > >
> > > Yeah I was kinda getting the small things out of the way so that zswap
> > > is fully folio-ized, before we think about large folios. I haven't
> > > given it a lot of thought, but here's what I have in mind.
> > >
> > > Right now, I think most configs enable zswap will disable
> > > CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so
> > > let's assume that today we are splitting large folios before they go
> > > to zswap (i.e. choice 1).
> > >
> > > What we do next depends on how the core swap intends to deal with
> > > large folios. My understanding based on recent developments is that we
> > > intend to swapout large folios as a whole, but I saw some discussions
> > > about splitting all large folios before swapping them out, or leaving
> > > them whole but swapping them out in order-0 chunks.
> > >
> > > I assume the rationale is that there is little benefit to keeping the
> > > folios whole because they will most likely be freed soon anyway, but I
> > > understand not wanting to spend time on splitting them, so swapping
> > > them out in order-0 chunks makes some sense to me. It also dodges the
> > > whole fragmentation issue.
> > >
> > > If we do either of these things in the core swap code, then I think
> > > zswap doesn't need to do anything to support large folios. If not,
> > > then we need to make a choice between 2 (compress large folios) &
> > > choice 3 (compress each page separately) as you mentioned.
> > >
> > > Compressing large folios as a whole means that we need to decompress
> > > them as a whole to read a single page, which I think could be very
> > > inefficient in some cases or force us to swapin large folios. Unless
> > > of course we end up in a world where we mostly swapin the same large
> > > folios that we swapped out. Although there can be additional
> > > compression savings from compressing large folios as a whole.
> > >
> > > Hence, I think choice 3 is the most reasonable one, at least for the
> > > short-term. I also think this is what zram does, but I haven't
> > > checked. Even if we all agree on this, there are still questions that
> > > we need to answer. For example, do we allocate zswap_entry's for each
> > > order-0 chunk right away, or do we allocate a single zswap_entry for
> > > the entire folio, and then "split" it during swapin if we only need to
> > > read part of the folio?
> > >
> > > Wondering what others think here.
> >
> > More thoughts that came to mind here:
> >
> > - Whether we go with choice 2 or 3, we may face a latency issue. Zswap
> > compression happens synchronously in the context of reclaim, so if we
> > start handling large folios in zswap, it may be more efficient to do
> > it asynchronously like swap to disk.
>
> We've been discussing this in private as well :)
>
> It doesn't have to be these two extremes right? I'm perfectly happy
> with starting with compressing each subpage separately, but perhaps we
> can consider managing larger folios in bigger chunks (say 64KB). That
> way, on swap-in, we just have to bring a whole chunk in, not the
> entire folio, and still take advantage of compression efficiencies on
> bigger-than-one-page chunks. I'd also check with other filesystems
> that leverage compression, to see what's their unit of compression is.
>
> I believe this is the approach Barry is suggesting for zram:
>
> https://lore.kernel.org/linux-block/20240327214816.31191-2-21cnbao@gmail.com/T/
>
> Once the zsmalloc infrastructure is there, we can play with this :)
>
> Barry - what's the progress regarding this front?

Thanks for reaching out.

Not too much. It depends on large folios swap-in because we need to swap in
large folios if we compress them as a whole. For example, if we swap out
64KiB but only swap in 4KiB, we still need to decompress the entire 64KiB
but copy only 4KiB.

Recently, we’ve only addressed the large folio swap-in refault cases in
the mm-unstable branch[1].
[1] https://lore.kernel.org/linux-mm/20240529082824.150954-1-21cnbao@gmail.com/

Currently, swap-in is not allocating large folios in any mm branch.

A major debate is that my original patch[2] started from SYNC_IO case for zRAM
and embedded devices first, while Ying argue we should start from non-SYNC
IO and decide swapin sizes based on read-ahead window but not based on
the original sizes of how folios are swapped out.
[2] https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.com/

So I guess we need more work to get large folios swap-in ready, and it
won't happen
shortly.

>
> >
> > - Supporting compression of large folios depends on zsmalloc (and
> > maybe other allocators) supporting it. There have been patches from
> > Barry to add this support to some extent, but I didn't take a close
> > look at those.
> >
> > Adding other folks from the mTHP swap discussions here in case they
> > have other thoughts about zswap.

Thanks
Barry


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm: zswap: use sg_set_folio() in zswap_{compress/decompress}()
  2024-05-24  3:38 ` [PATCH 1/3] mm: zswap: use sg_set_folio() in zswap_{compress/decompress}() Yosry Ahmed
@ 2024-06-03  6:03   ` Chengming Zhou
  0 siblings, 0 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-06-03  6:03 UTC (permalink / raw)
  To: Yosry Ahmed, Andrew Morton
  Cc: Johannes Weiner, Nhat Pham, Matthew Wilcox, linux-mm, linux-kernel

On 2024/5/24 11:38, Yosry Ahmed wrote:
> sg_set_folio() is equivalent to sg_set_page() for order-0 folios, which
> are the only ones supported by zswap. Now zswap_decompress() can take in
> a folio directly.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

LGTM, thanks!

Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>

> ---
>  mm/zswap.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index a50e2986cd2fa..3693df96c81fe 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -917,7 +917,7 @@ static bool zswap_compress(struct folio *folio, struct zswap_entry *entry)
>  
>  	dst = acomp_ctx->buffer;
>  	sg_init_table(&input, 1);
> -	sg_set_page(&input, &folio->page, PAGE_SIZE, 0);
> +	sg_set_folio(&input, folio, PAGE_SIZE, 0);
>  
>  	/*
>  	 * We need PAGE_SIZE * 2 here since there maybe over-compression case,
> @@ -971,7 +971,7 @@ static bool zswap_compress(struct folio *folio, struct zswap_entry *entry)
>  	return comp_ret == 0 && alloc_ret == 0;
>  }
>  
> -static void zswap_decompress(struct zswap_entry *entry, struct page *page)
> +static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>  {
>  	struct zpool *zpool = zswap_find_zpool(entry);
>  	struct scatterlist input, output;
> @@ -1000,7 +1000,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct page *page)
>  
>  	sg_init_one(&input, src, entry->length);
>  	sg_init_table(&output, 1);
> -	sg_set_page(&output, page, PAGE_SIZE, 0);
> +	sg_set_folio(&output, folio, PAGE_SIZE, 0);
>  	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
>  	BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
>  	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> @@ -1073,7 +1073,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>  		return -ENOMEM;
>  	}
>  
> -	zswap_decompress(entry, &folio->page);
> +	zswap_decompress(entry, folio);
>  
>  	count_vm_event(ZSWPWB);
>  	if (entry->objcg)
> @@ -1580,7 +1580,7 @@ bool zswap_load(struct folio *folio)
>  		return false;
>  
>  	if (entry->length)
> -		zswap_decompress(entry, page);
> +		zswap_decompress(entry, folio);
>  	else {
>  		dst = kmap_local_page(page);
>  		zswap_fill_page(dst, entry->value);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] mm :zswap: use kmap_local_folio() in zswap_load()
  2024-05-24  3:38 ` [PATCH 2/3] mm :zswap: use kmap_local_folio() in zswap_load() Yosry Ahmed
  2024-05-28 15:16   ` Nhat Pham
@ 2024-06-03  6:04   ` Chengming Zhou
  1 sibling, 0 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-06-03  6:04 UTC (permalink / raw)
  To: Yosry Ahmed, Andrew Morton
  Cc: Johannes Weiner, Nhat Pham, Matthew Wilcox, linux-mm, linux-kernel

On 2024/5/24 11:38, Yosry Ahmed wrote:
> Eliminate the last explicit 'struct page' reference in mm/zswap.c.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

LGTM, thanks!

Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>

> ---
>  mm/zswap.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 3693df96c81fe..bac66991fb14e 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1551,7 +1551,6 @@ bool zswap_load(struct folio *folio)
>  {
>  	swp_entry_t swp = folio->swap;
>  	pgoff_t offset = swp_offset(swp);
> -	struct page *page = &folio->page;
>  	bool swapcache = folio_test_swapcache(folio);
>  	struct xarray *tree = swap_zswap_tree(swp);
>  	struct zswap_entry *entry;
> @@ -1582,7 +1581,7 @@ bool zswap_load(struct folio *folio)
>  	if (entry->length)
>  		zswap_decompress(entry, folio);
>  	else {
> -		dst = kmap_local_page(page);
> +		dst = kmap_local_folio(folio, 0);
>  		zswap_fill_page(dst, entry->value);
>  		kunmap_local(dst);
>  	}


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] mm: zswap: make same_filled functions folio-friendly
  2024-05-24  3:38 ` [PATCH 3/3] mm: zswap: make same_filled functions folio-friendly Yosry Ahmed
  2024-05-28 15:18   ` Nhat Pham
@ 2024-06-03  6:07   ` Chengming Zhou
  1 sibling, 0 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-06-03  6:07 UTC (permalink / raw)
  To: Yosry Ahmed, Andrew Morton
  Cc: Johannes Weiner, Nhat Pham, Matthew Wilcox, linux-mm, linux-kernel

On 2024/5/24 11:38, Yosry Ahmed wrote:
> A variable name 'page' is used in zswap_is_folio_same_filled() and
> zswap_fill_page() to point at the kmapped data in a folio. Use 'data'
> instead to avoid confusion and stop it from showing up when searching
> for 'page' references in mm/zswap.c.
> 
> While we are at it, move the kmap/kunmap calls into zswap_fill_page(),
> make it take in a folio, and rename it to zswap_fill_folio().
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

LGTM, thanks!

Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>

> ---
>  mm/zswap.c | 30 +++++++++++++-----------------
>  1 file changed, 13 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index bac66991fb14e..b9b35ef86d9be 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1375,35 +1375,35 @@ static void shrink_worker(struct work_struct *w)
>  **********************************/
>  static bool zswap_is_folio_same_filled(struct folio *folio, unsigned long *value)
>  {
> -	unsigned long *page;
> +	unsigned long *data;
>  	unsigned long val;
> -	unsigned int pos, last_pos = PAGE_SIZE / sizeof(*page) - 1;
> +	unsigned int pos, last_pos = PAGE_SIZE / sizeof(*data) - 1;
>  	bool ret = false;
>  
> -	page = kmap_local_folio(folio, 0);
> -	val = page[0];
> +	data = kmap_local_folio(folio, 0);
> +	val = data[0];
>  
> -	if (val != page[last_pos])
> +	if (val != data[last_pos])
>  		goto out;
>  
>  	for (pos = 1; pos < last_pos; pos++) {
> -		if (val != page[pos])
> +		if (val != data[pos])
>  			goto out;
>  	}
>  
>  	*value = val;
>  	ret = true;
>  out:
> -	kunmap_local(page);
> +	kunmap_local(data);
>  	return ret;
>  }
>  
> -static void zswap_fill_page(void *ptr, unsigned long value)
> +static void zswap_fill_folio(struct folio *folio, unsigned long value)
>  {
> -	unsigned long *page;
> +	unsigned long *data = kmap_local_folio(folio, 0);
>  
> -	page = (unsigned long *)ptr;
> -	memset_l(page, value, PAGE_SIZE / sizeof(unsigned long));
> +	memset_l(data, value, PAGE_SIZE / sizeof(unsigned long));
> +	kunmap_local(data);
>  }
>  
>  /*********************************
> @@ -1554,7 +1554,6 @@ bool zswap_load(struct folio *folio)
>  	bool swapcache = folio_test_swapcache(folio);
>  	struct xarray *tree = swap_zswap_tree(swp);
>  	struct zswap_entry *entry;
> -	u8 *dst;
>  
>  	VM_WARN_ON_ONCE(!folio_test_locked(folio));
>  
> @@ -1580,11 +1579,8 @@ bool zswap_load(struct folio *folio)
>  
>  	if (entry->length)
>  		zswap_decompress(entry, folio);
> -	else {
> -		dst = kmap_local_folio(folio, 0);
> -		zswap_fill_page(dst, entry->value);
> -		kunmap_local(dst);
> -	}
> +	else
> +		zswap_fill_folio(folio, entry->value);
>  
>  	count_vm_event(ZSWPIN);
>  	if (entry->objcg)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] mm: zswap: trivial folio conversions
  2024-05-28 19:32         ` Yosry Ahmed
@ 2024-06-03  6:19           ` Chengming Zhou
  0 siblings, 0 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-06-03  6:19 UTC (permalink / raw)
  To: Yosry Ahmed, Nhat Pham
  Cc: Matthew Wilcox, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel, David Hildenbrand, Barry Song, Chris Li,
	Ryan Roberts, Kairui Song

On 2024/5/29 03:32, Yosry Ahmed wrote:
> On Tue, May 28, 2024 at 12:08 PM Nhat Pham <nphamcs@gmail.com> wrote:
>>
>> On Fri, May 24, 2024 at 4:13 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>>>
>>> On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>>>>
>>>> On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>>>
>>>>> On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote:
>>>>>> Some trivial folio conversions in zswap code.
>>>>>
>>>>> The three patches themselves look good.
>>>>>
>>>>>> The mean reason I included a cover letter is that I wanted to get
>>>>>> feedback on what other trivial conversions can/should be done in
>>>>>> mm/zswap.c (keeping in mind that only order-0 folios are supported
>>>>>> anyway).  These are the things I came across while searching for 'page'
>>>>>> in mm/zswap.c, and chose not to do anything about for now:
>>>>>
>>>>> I think there's a deeper question to answer before answering these
>>>>> questions, which is what we intend to do with large folios and zswap in
>>>>> the future.  Do we intend to split them?  Compress them as a large
>>>>> folio?  Compress each page in a large folio separately?  I can see an
>>>>> argument for choices 2 and 3, but I think choice 1 is going to be
>>>>> increasingly untenable.
>>>>
>>>> Yeah I was kinda getting the small things out of the way so that zswap
>>>> is fully folio-ized, before we think about large folios. I haven't
>>>> given it a lot of thought, but here's what I have in mind.
>>>>
>>>> Right now, I think most configs enable zswap will disable
>>>> CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so
>>>> let's assume that today we are splitting large folios before they go
>>>> to zswap (i.e. choice 1).
>>>>
>>>> What we do next depends on how the core swap intends to deal with
>>>> large folios. My understanding based on recent developments is that we
>>>> intend to swapout large folios as a whole, but I saw some discussions
>>>> about splitting all large folios before swapping them out, or leaving
>>>> them whole but swapping them out in order-0 chunks.
>>>>
>>>> I assume the rationale is that there is little benefit to keeping the
>>>> folios whole because they will most likely be freed soon anyway, but I
>>>> understand not wanting to spend time on splitting them, so swapping
>>>> them out in order-0 chunks makes some sense to me. It also dodges the
>>>> whole fragmentation issue.
>>>>
>>>> If we do either of these things in the core swap code, then I think
>>>> zswap doesn't need to do anything to support large folios. If not,
>>>> then we need to make a choice between 2 (compress large folios) &
>>>> choice 3 (compress each page separately) as you mentioned.
>>>>
>>>> Compressing large folios as a whole means that we need to decompress
>>>> them as a whole to read a single page, which I think could be very
>>>> inefficient in some cases or force us to swapin large folios. Unless
>>>> of course we end up in a world where we mostly swapin the same large
>>>> folios that we swapped out. Although there can be additional
>>>> compression savings from compressing large folios as a whole.
>>>>
>>>> Hence, I think choice 3 is the most reasonable one, at least for the
>>>> short-term. I also think this is what zram does, but I haven't
>>>> checked. Even if we all agree on this, there are still questions that
>>>> we need to answer. For example, do we allocate zswap_entry's for each
>>>> order-0 chunk right away, or do we allocate a single zswap_entry for
>>>> the entire folio, and then "split" it during swapin if we only need to
>>>> read part of the folio?
>>>>
>>>> Wondering what others think here.
>>>
>>> More thoughts that came to mind here:
>>>
>>> - Whether we go with choice 2 or 3, we may face a latency issue. Zswap
>>> compression happens synchronously in the context of reclaim, so if we
>>> start handling large folios in zswap, it may be more efficient to do
>>> it asynchronously like swap to disk.
>>
>> We've been discussing this in private as well :)
>>
>> It doesn't have to be these two extremes right? I'm perfectly happy
>> with starting with compressing each subpage separately, but perhaps we
>> can consider managing larger folios in bigger chunks (say 64KB). That
>> way, on swap-in, we just have to bring a whole chunk in, not the
>> entire folio, and still take advantage of compression efficiencies on
>> bigger-than-one-page chunks. I'd also check with other filesystems
>> that leverage compression, to see what's their unit of compression is.
> 
> Right. But I think it will be a clearer win to start with compressing
> each subpage separately, and it avoids splitting folios during reclaim
> to zswap. It also doesn't depend on the zsmalloc work.
> 
> Once we have that, we can experiment with compressing folios in larger
> chunks. The tradeoffs become less clear at that point, and the number
> of variables you can tune goes up :)

Agree, it's a good approach! And it hasn't any decompression amplification
problem.

Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-06-03  6:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-24  3:38 [PATCH 0/3] mm: zswap: trivial folio conversions Yosry Ahmed
2024-05-24  3:38 ` [PATCH 1/3] mm: zswap: use sg_set_folio() in zswap_{compress/decompress}() Yosry Ahmed
2024-06-03  6:03   ` Chengming Zhou
2024-05-24  3:38 ` [PATCH 2/3] mm :zswap: use kmap_local_folio() in zswap_load() Yosry Ahmed
2024-05-28 15:16   ` Nhat Pham
2024-06-03  6:04   ` Chengming Zhou
2024-05-24  3:38 ` [PATCH 3/3] mm: zswap: make same_filled functions folio-friendly Yosry Ahmed
2024-05-28 15:18   ` Nhat Pham
2024-06-03  6:07   ` Chengming Zhou
2024-05-24  3:59 ` [PATCH 0/3] mm: zswap: trivial folio conversions Matthew Wilcox
2024-05-24 19:53   ` Yosry Ahmed
2024-05-24 23:12     ` Yosry Ahmed
2024-05-28 19:08       ` Nhat Pham
2024-05-28 19:32         ` Yosry Ahmed
2024-06-03  6:19           ` Chengming Zhou
2024-06-02  1:30         ` Barry Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox