linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCHv3 04/11] mm/zswap: Use PG_dropbehind instead of PG_reclaim
       [not found] ` <20250130100050.1868208-5-kirill.shutemov@linux.intel.com>
@ 2025-01-30 17:39   ` Nhat Pham
  0 siblings, 0 replies; 14+ messages in thread
From: Nhat Pham @ 2025-01-30 17:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Oscar Salvador, Ran Xiaokai,
	Rodrigo Vivi, Simona Vetter, Steven Rostedt, Tvrtko Ursulin,
	Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx, dri-devel,
	linux-kernel, linux-fsdevel, linux-mm, linux-trace-kernel

On Thu, Jan 30, 2025 at 2:02 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.

Neat!

>
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> zswap_writeback_entry().
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Yosry Ahmed <yosryahmed@google.com>

Acked-by: Nhat Pham <nphamcs@gmail.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 05/11] mm/truncate: Use folio_set_dropbehind() instead of deactivate_file_folio()
       [not found] ` <20250130100050.1868208-6-kirill.shutemov@linux.intel.com>
@ 2025-01-30 20:34   ` Yu Zhao
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2025-01-30 20:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Thu, Jan 30, 2025 at 3:01 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
>
> The new flag allows to replace whole deactivate_file_folio() machinery
> with simple folio_set_dropbehind().
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Acked-by: Yu Zhao <yuzhao@google.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 01/11] mm/migrate: Transfer PG_dropbehind to the new folio
       [not found] ` <20250130100050.1868208-2-kirill.shutemov@linux.intel.com>
@ 2025-01-30 20:36   ` Yu Zhao
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2025-01-30 20:36 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Thu, Jan 30, 2025 at 3:01 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Do not lose the flag on page migration.
>
> Ideally, these folios should be freed instead of migration. But it
> requires to find right spot do this and proper testing.
>
> Transfer the flag for now.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Acked-by: Yu Zhao <yuzhao@google.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 11/11] mm: Rename PG_dropbehind to PG_reclaim
       [not found] ` <20250130100050.1868208-12-kirill.shutemov@linux.intel.com>
@ 2025-01-30 20:38   ` Yu Zhao
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2025-01-30 20:38 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Thu, Jan 30, 2025 at 3:01 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Now as PG_reclaim is gone, its name can be reclaimed for better
> use :)
>
> Rename PG_dropbehind to PG_reclaim and rename all helpers around it.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Acked-by: Yu Zhao <yuzhao@google.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 02/11] drm/i915/gem: Convert __shmem_writeback() to folios
       [not found] ` <20250130100050.1868208-3-kirill.shutemov@linux.intel.com>
@ 2025-01-31 13:37   ` Andi Shyti
  2025-01-31 14:21   ` Matthew Wilcox
  2025-01-31 20:27   ` Shakeel Butt
  2 siblings, 0 replies; 14+ messages in thread
From: Andi Shyti @ 2025-01-31 13:37 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

Hi Kirill,

On Thu, Jan 30, 2025 at 12:00:40PM +0200, Kirill A. Shutemov wrote:
> Use folios instead of pages.
> 
> This is preparation for removing PG_reclaim.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: David Hildenbrand <david@redhat.com>

looks good:

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 03/11] drm/i915/gem: Use PG_dropbehind instead of PG_reclaim
       [not found] ` <20250130100050.1868208-4-kirill.shutemov@linux.intel.com>
@ 2025-01-31 13:38   ` Andi Shyti
  0 siblings, 0 replies; 14+ messages in thread
From: Andi Shyti @ 2025-01-31 13:38 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

Hi Kirill,

On Thu, Jan 30, 2025 at 12:00:41PM +0200, Kirill A. Shutemov wrote:
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
> 
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> __shmem_writeback()
> 
> It is safe to leave PG_dropbehind on the folio if, for some reason
> (bug?), the folio is not in a writeback state after ->writepage().
> In these cases, the kernel had to clear PG_reclaim as it shared a page
> flag bit with PG_readahead.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 02/11] drm/i915/gem: Convert __shmem_writeback() to folios
       [not found] ` <20250130100050.1868208-3-kirill.shutemov@linux.intel.com>
  2025-01-31 13:37   ` [PATCHv3 02/11] drm/i915/gem: Convert __shmem_writeback() to folios Andi Shyti
@ 2025-01-31 14:21   ` Matthew Wilcox
  2025-01-31 20:27   ` Shakeel Butt
  2 siblings, 0 replies; 14+ messages in thread
From: Matthew Wilcox @ 2025-01-31 14:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Chris Wilson, Andrew Morton, Jens Axboe, Jason A. Donenfeld,
	Andi Shyti, Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Thu, Jan 30, 2025 at 12:00:40PM +0200, Kirill A. Shutemov wrote:
> Use folios instead of pages.
> 
> This is preparation for removing PG_reclaim.

Well, this is a horrid little function.  Rather than iterating just the
dirty folios, it iterates all folios, then locks them before checking
whether they're dirty.

I don't know whether the comments are correct or the code is correct.
This comment doesn't match with setting PageReclaim:

         * Leave mmapings intact (GTT will have been revoked on unbinding,
         * leaving only CPU mmapings around) and add those pages to the LRU
         * instead of invoking writeback so they are aged and paged out
         * as normal.

so I wonder if Chris was confused about what PageReclaim actually does.
Let's find out if he still remembers what he thought it did!


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 02/11] drm/i915/gem: Convert __shmem_writeback() to folios
       [not found] ` <20250130100050.1868208-3-kirill.shutemov@linux.intel.com>
  2025-01-31 13:37   ` [PATCHv3 02/11] drm/i915/gem: Convert __shmem_writeback() to folios Andi Shyti
  2025-01-31 14:21   ` Matthew Wilcox
@ 2025-01-31 20:27   ` Shakeel Butt
  2 siblings, 0 replies; 14+ messages in thread
From: Shakeel Butt @ 2025-01-31 20:27 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Thu, Jan 30, 2025 at 12:00:40PM +0200, Kirill A. Shutemov wrote:
> Use folios instead of pages.
> 
> This is preparation for removing PG_reclaim.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index fe69f2c8527d..9016832b20fc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -320,25 +320,25 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
>  
>  	/* Begin writeback on each dirty page */
>  	for (i = 0; i < size >> PAGE_SHIFT; i++) {

With folio conversion, should the iteration step be folio_nr_pages()
instead of 1?

> -		struct page *page;
> +		struct folio *folio;
>  
> -		page = find_lock_page(mapping, i);
> -		if (!page)
> +		folio = filemap_lock_folio(mapping, i);
> +		if (!folio)
>  			continue;
>  
> -		if (!page_mapped(page) && clear_page_dirty_for_io(page)) {
> +		if (!folio_mapped(folio) && folio_clear_dirty_for_io(folio)) {
>  			int ret;
>  
> -			SetPageReclaim(page);
> -			ret = mapping->a_ops->writepage(page, &wbc);
> +			folio_set_reclaim(folio);
> +			ret = mapping->a_ops->writepage(&folio->page, &wbc);
>  			if (!PageWriteback(page))
> -				ClearPageReclaim(page);
> +				folio_clear_reclaim(folio);
>  			if (!ret)
>  				goto put;
>  		}
> -		unlock_page(page);
> +		folio_unlock(folio);
>  put:
> -		put_page(page);
> +		folio_put(folio);
>  	}
>  }
>  
> -- 
> 2.47.2
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 06/11] mm/vmscan: Use PG_dropbehind instead of PG_reclaim
       [not found] ` <20250130100050.1868208-7-kirill.shutemov@linux.intel.com>
@ 2025-01-31 21:32   ` Shakeel Butt
  2025-02-01  8:01   ` Kairui Song
  1 sibling, 0 replies; 14+ messages in thread
From: Shakeel Butt @ 2025-01-31 21:32 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Thu, Jan 30, 2025 at 12:00:44PM +0200, Kirill A. Shutemov wrote:
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
> 
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> pageout().
> 
> It is safe to leave PG_dropbehind on the folio if, for some reason
> (bug?), the folio is not in a writeback state after ->writepage().
> In these cases, the kernel had to clear PG_reclaim as it shared a page
> flag bit with PG_readahead.

Is it correct to say that leaving PG_dropbehind on folios which doesn't
have writeback state after ->writepage() (i.e. store to zswap) is fine
because PG_dropbehind is not in PAGE_FLAGS_CHECK_AT_FREE?

> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/vmscan.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bc1826020159..c97adb0fdaa4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -692,19 +692,16 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>  		if (shmem_mapping(mapping) && folio_test_large(folio))
>  			wbc.list = folio_list;
>  
> -		folio_set_reclaim(folio);
> +		folio_set_dropbehind(folio);
> +
>  		res = mapping->a_ops->writepage(&folio->page, &wbc);
>  		if (res < 0)
>  			handle_write_error(mapping, folio, res);
>  		if (res == AOP_WRITEPAGE_ACTIVATE) {
> -			folio_clear_reclaim(folio);
> +			folio_clear_dropbehind(folio);
>  			return PAGE_ACTIVATE;
>  		}
>  
> -		if (!folio_test_writeback(folio)) {
> -			/* synchronous write or broken a_ops? */
> -			folio_clear_reclaim(folio);
> -		}
>  		trace_mm_vmscan_write_folio(folio);
>  		node_stat_add_folio(folio, NR_VMSCAN_WRITE);
>  		return PAGE_SUCCESS;
> -- 
> 2.47.2
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 06/11] mm/vmscan: Use PG_dropbehind instead of PG_reclaim
       [not found] ` <20250130100050.1868208-7-kirill.shutemov@linux.intel.com>
  2025-01-31 21:32   ` [PATCHv3 06/11] mm/vmscan: " Shakeel Butt
@ 2025-02-01  8:01   ` Kairui Song
  2025-02-03  7:14     ` Yu Zhao
  2025-02-03  8:39     ` Kirill A. Shutemov
  1 sibling, 2 replies; 14+ messages in thread
From: Kairui Song @ 2025-02-01  8:01 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Thu, Jan 30, 2025 at 6:02 PM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
>
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> pageout().
>
> It is safe to leave PG_dropbehind on the folio if, for some reason
> (bug?), the folio is not in a writeback state after ->writepage().
> In these cases, the kernel had to clear PG_reclaim as it shared a page
> flag bit with PG_readahead.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/vmscan.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bc1826020159..c97adb0fdaa4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -692,19 +692,16 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>                 if (shmem_mapping(mapping) && folio_test_large(folio))
>                         wbc.list = folio_list;
>
> -               folio_set_reclaim(folio);
> +               folio_set_dropbehind(folio);
> +
>                 res = mapping->a_ops->writepage(&folio->page, &wbc);
>                 if (res < 0)
>                         handle_write_error(mapping, folio, res);
>                 if (res == AOP_WRITEPAGE_ACTIVATE) {
> -                       folio_clear_reclaim(folio);
> +                       folio_clear_dropbehind(folio);
>                         return PAGE_ACTIVATE;
>                 }
>
> -               if (!folio_test_writeback(folio)) {
> -                       /* synchronous write or broken a_ops? */
> -                       folio_clear_reclaim(folio);
> -               }
>                 trace_mm_vmscan_write_folio(folio);
>                 node_stat_add_folio(folio, NR_VMSCAN_WRITE);
>                 return PAGE_SUCCESS;
> --
> 2.47.2
>

Hi, I'm seeing following panic with SWAP after this commit:

[   29.672319] Oops: general protection fault, probably for
non-canonical address 0xffff88909a3be3: 0000 [#1] PREEMPT SMP NOPTI
[   29.675503] CPU: 82 UID: 0 PID: 5145 Comm: tar Kdump: loaded Not
tainted 6.13.0.ptch-g1fe9ea48ec98 #917
[   29.677508] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
[   29.678886] RIP: 0010:__lock_acquire+0x20/0x15d0
[   29.679891] Code: 90 90 90 90 90 90 90 90 90 90 41 57 41 56 41 55
41 54 55 53 48 83 ec 30 8b 2d 10 ac f3 01 44 8b ac 24 88 00 00 00 85
ed 74 64 <48> 8b 07 49 89 ff 48 3d 20 1d bf 83 74 56 8b 1d 8c f5 b1 01
41 89
[   29.683852] RSP: 0018:ffffc9000bea3148 EFLAGS: 00010002
[   29.684980] RAX: ffff8890874b2940 RBX: 0000000000000200 RCX: 0000000000000000
[   29.686510] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00ffff88909a3be3
[   29.688031] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[   29.689561] R10: 0000000000000000 R11: 0000000000000020 R12: 00ffff88909a3be3
[   29.691087] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   29.692613] FS:  00007fa05c2824c0(0000) GS:ffff88a03fa80000(0000)
knlGS:0000000000000000
[   29.694339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   29.695581] CR2: 000055f9abb7fc7d CR3: 00000010932f2002 CR4: 0000000000770eb0
[   29.697109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   29.698637] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   29.700161] PKRU: 55555554
[   29.700759] Call Trace:
[   29.701296]  <TASK>
[   29.701770]  ? __die_body+0x1e/0x60
[   29.702540]  ? die_addr+0x3c/0x60
[   29.703267]  ? exc_general_protection+0x18f/0x3c0
[   29.704290]  ? asm_exc_general_protection+0x26/0x30
[   29.705345]  ? __lock_acquire+0x20/0x15d0
[   29.706215]  ? lockdep_hardirqs_on_prepare+0xda/0x190
[   29.707304]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   29.708452]  lock_acquire+0xbf/0x2e0
[   29.709229]  ? folio_unmap_invalidate+0x12f/0x220
[   29.710257]  ? __folio_end_writeback+0x15d/0x430
[   29.711260]  ? __folio_end_writeback+0x116/0x430
[   29.712261]  _raw_spin_lock+0x30/0x40
[   29.713064]  ? folio_unmap_invalidate+0x12f/0x220
[   29.714076]  folio_unmap_invalidate+0x12f/0x220
[   29.715058]  folio_end_writeback+0xdf/0x190
[   29.715967]  swap_writepage_bdev_sync+0x1e0/0x450
[   29.716994]  ? __pfx_submit_bio_wait_endio+0x10/0x10
[   29.718074]  swap_writepage+0x46b/0x6b0
[   29.718917]  pageout+0x14b/0x360
[   29.719628]  shrink_folio_list+0x67d/0xec0
[   29.720519]  ? mark_held_locks+0x48/0x80
[   29.721375]  evict_folios+0x2a7/0x9e0
[   29.722179]  try_to_shrink_lruvec+0x19a/0x270
[   29.723130]  lru_gen_shrink_lruvec+0x70/0xc0
[   29.724060]  ? __lock_acquire+0x558/0x15d0
[   29.724954]  shrink_lruvec+0x57/0x780
[   29.725754]  ? find_held_lock+0x2d/0xa0
[   29.726588]  ? rcu_read_unlock+0x17/0x60
[   29.727449]  shrink_node+0x2ad/0x930
[   29.728229]  do_try_to_free_pages+0xbd/0x4e0
[   29.729160]  try_to_free_mem_cgroup_pages+0x123/0x2c0
[   29.730252]  try_charge_memcg+0x222/0x660
[   29.731128]  charge_memcg+0x3c/0x80
[   29.731888]  __mem_cgroup_charge+0x30/0x70
[   29.732776]  shmem_alloc_and_add_folio+0x1a5/0x480
[   29.733818]  ? filemap_get_entry+0x155/0x390
[   29.734748]  shmem_get_folio_gfp+0x28c/0x6c0
[   29.735680]  shmem_write_begin+0x5a/0xc0
[   29.736535]  generic_perform_write+0x12a/0x2e0
[   29.737503]  shmem_file_write_iter+0x86/0x90
[   29.738428]  vfs_write+0x364/0x530
[   29.739180]  ksys_write+0x6c/0xe0
[   29.739906]  do_syscall_64+0x66/0x140
[   29.740713]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   29.741800] RIP: 0033:0x7fa05c439984
[   29.742584] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f
84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 74 13 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20
48 89
[   29.746542] RSP: 002b:00007ffece7720f8 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[   29.748157] RAX: ffffffffffffffda RBX: 0000000000002800 RCX: 00007fa05c439984
[   29.749682] RDX: 0000000000002800 RSI: 000055f9cfa08000 RDI: 0000000000000004
[   29.751216] RBP: 00007ffece772140 R08: 0000000000002800 R09: 0000000000000007
[   29.752743] R10: 0000000000000180 R11: 0000000000000202 R12: 000055f9cfa08000
[   29.754262] R13: 0000000000000004 R14: 0000000000002800 R15: 00000000000009af
[   29.755797]  </TASK>
[   29.756285] Modules linked in: zram virtiofs

I'm testing with PROVE_LOCKING on. It seems folio_unmap_invalidate is
called for swapcache folio and it doesn't work well, following PATCH
on top of mm-unstable seems fix it well:

diff --git a/mm/filemap.c b/mm/filemap.c
index 4fe551037bf7..98493443d120 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1605,8 +1605,9 @@ static void folio_end_reclaim_write(struct folio *folio)
         * invalidation in that case.
         */
        if (in_task() && folio_trylock(folio)) {
-               if (folio->mapping)
-                       folio_unmap_invalidate(folio->mapping, folio, 0);
+               struct address_space *mapping = folio_mapping(folio);
+               if (mapping)
+                       folio_unmap_invalidate(mapping, folio, 0);
                folio_unlock(folio);
        }
 }
diff --git a/mm/truncate.c b/mm/truncate.c
index e922ceb66c44..4f3e34c52d8b 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -565,23 +565,29 @@ int folio_unmap_invalidate(struct address_space
*mapping, struct folio *folio,
        if (!filemap_release_folio(folio, gfp))
                return -EBUSY;

-       spin_lock(&mapping->host->i_lock);
+       if (!folio_test_swapcache(folio)) {
+               spin_lock(&mapping->host->i_lock);
+               BUG_ON(folio_has_private(folio));
+       }
+
        xa_lock_irq(&mapping->i_pages);
        if (folio_test_dirty(folio))
                goto failed;

-       BUG_ON(folio_has_private(folio));
        __filemap_remove_folio(folio, NULL);
        xa_unlock_irq(&mapping->i_pages);
        if (mapping_shrinkable(mapping))
                inode_add_lru(mapping->host);
-       spin_unlock(&mapping->host->i_lock);
+
+       if (!folio_test_swapcache(folio))
+               spin_unlock(&mapping->host->i_lock);

        filemap_free_folio(mapping, folio);
        return 1;
 failed:
        xa_unlock_irq(&mapping->i_pages);
-       spin_unlock(&mapping->host->i_lock);
+       if (!folio_test_swapcache(folio))
+               spin_unlock(&mapping->host->i_lock);
        return -EBUSY;
 }


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 06/11] mm/vmscan: Use PG_dropbehind instead of PG_reclaim
  2025-02-01  8:01   ` Kairui Song
@ 2025-02-03  7:14     ` Yu Zhao
  2025-02-03  8:39     ` Kirill A. Shutemov
  1 sibling, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2025-02-03  7:14 UTC (permalink / raw)
  To: Kairui Song, Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Sat, Feb 1, 2025 at 1:02 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Thu, Jan 30, 2025 at 6:02 PM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > The recently introduced PG_dropbehind allows for freeing folios
> > immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> > to be involved to get the folio freed.
> >
> > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> > pageout().
> >
> > It is safe to leave PG_dropbehind on the folio if, for some reason
> > (bug?), the folio is not in a writeback state after ->writepage().
> > In these cases, the kernel had to clear PG_reclaim as it shared a page
> > flag bit with PG_readahead.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Acked-by: David Hildenbrand <david@redhat.com>
> > ---
> >  mm/vmscan.c | 9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bc1826020159..c97adb0fdaa4 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -692,19 +692,16 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
> >                 if (shmem_mapping(mapping) && folio_test_large(folio))
> >                         wbc.list = folio_list;
> >
> > -               folio_set_reclaim(folio);
> > +               folio_set_dropbehind(folio);
> > +
> >                 res = mapping->a_ops->writepage(&folio->page, &wbc);
> >                 if (res < 0)
> >                         handle_write_error(mapping, folio, res);
> >                 if (res == AOP_WRITEPAGE_ACTIVATE) {
> > -                       folio_clear_reclaim(folio);
> > +                       folio_clear_dropbehind(folio);
> >                         return PAGE_ACTIVATE;
> >                 }
> >
> > -               if (!folio_test_writeback(folio)) {
> > -                       /* synchronous write or broken a_ops? */
> > -                       folio_clear_reclaim(folio);
> > -               }
> >                 trace_mm_vmscan_write_folio(folio);
> >                 node_stat_add_folio(folio, NR_VMSCAN_WRITE);
> >                 return PAGE_SUCCESS;
> > --
> > 2.47.2
> >
>
> Hi, I'm seeing following panic with SWAP after this commit:
>
> [   29.672319] Oops: general protection fault, probably for
> non-canonical address 0xffff88909a3be3: 0000 [#1] PREEMPT SMP NOPTI
> [   29.675503] CPU: 82 UID: 0 PID: 5145 Comm: tar Kdump: loaded Not
> tainted 6.13.0.ptch-g1fe9ea48ec98 #917
> [   29.677508] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
> [   29.678886] RIP: 0010:__lock_acquire+0x20/0x15d0
> [   29.679891] Code: 90 90 90 90 90 90 90 90 90 90 41 57 41 56 41 55
> 41 54 55 53 48 83 ec 30 8b 2d 10 ac f3 01 44 8b ac 24 88 00 00 00 85
> ed 74 64 <48> 8b 07 49 89 ff 48 3d 20 1d bf 83 74 56 8b 1d 8c f5 b1 01
> 41 89
> [   29.683852] RSP: 0018:ffffc9000bea3148 EFLAGS: 00010002
> [   29.684980] RAX: ffff8890874b2940 RBX: 0000000000000200 RCX: 0000000000000000
> [   29.686510] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00ffff88909a3be3
> [   29.688031] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> [   29.689561] R10: 0000000000000000 R11: 0000000000000020 R12: 00ffff88909a3be3
> [   29.691087] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [   29.692613] FS:  00007fa05c2824c0(0000) GS:ffff88a03fa80000(0000)
> knlGS:0000000000000000
> [   29.694339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   29.695581] CR2: 000055f9abb7fc7d CR3: 00000010932f2002 CR4: 0000000000770eb0
> [   29.697109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   29.698637] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   29.700161] PKRU: 55555554
> [   29.700759] Call Trace:
> [   29.701296]  <TASK>
> [   29.701770]  ? __die_body+0x1e/0x60
> [   29.702540]  ? die_addr+0x3c/0x60
> [   29.703267]  ? exc_general_protection+0x18f/0x3c0
> [   29.704290]  ? asm_exc_general_protection+0x26/0x30
> [   29.705345]  ? __lock_acquire+0x20/0x15d0
> [   29.706215]  ? lockdep_hardirqs_on_prepare+0xda/0x190
> [   29.707304]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [   29.708452]  lock_acquire+0xbf/0x2e0
> [   29.709229]  ? folio_unmap_invalidate+0x12f/0x220
> [   29.710257]  ? __folio_end_writeback+0x15d/0x430
> [   29.711260]  ? __folio_end_writeback+0x116/0x430
> [   29.712261]  _raw_spin_lock+0x30/0x40
> [   29.713064]  ? folio_unmap_invalidate+0x12f/0x220
> [   29.714076]  folio_unmap_invalidate+0x12f/0x220
> [   29.715058]  folio_end_writeback+0xdf/0x190
> [   29.715967]  swap_writepage_bdev_sync+0x1e0/0x450
> [   29.716994]  ? __pfx_submit_bio_wait_endio+0x10/0x10
> [   29.718074]  swap_writepage+0x46b/0x6b0
> [   29.718917]  pageout+0x14b/0x360
> [   29.719628]  shrink_folio_list+0x67d/0xec0
> [   29.720519]  ? mark_held_locks+0x48/0x80
> [   29.721375]  evict_folios+0x2a7/0x9e0
> [   29.722179]  try_to_shrink_lruvec+0x19a/0x270
> [   29.723130]  lru_gen_shrink_lruvec+0x70/0xc0
> [   29.724060]  ? __lock_acquire+0x558/0x15d0
> [   29.724954]  shrink_lruvec+0x57/0x780
> [   29.725754]  ? find_held_lock+0x2d/0xa0
> [   29.726588]  ? rcu_read_unlock+0x17/0x60
> [   29.727449]  shrink_node+0x2ad/0x930
> [   29.728229]  do_try_to_free_pages+0xbd/0x4e0
> [   29.729160]  try_to_free_mem_cgroup_pages+0x123/0x2c0
> [   29.730252]  try_charge_memcg+0x222/0x660
> [   29.731128]  charge_memcg+0x3c/0x80
> [   29.731888]  __mem_cgroup_charge+0x30/0x70
> [   29.732776]  shmem_alloc_and_add_folio+0x1a5/0x480
> [   29.733818]  ? filemap_get_entry+0x155/0x390
> [   29.734748]  shmem_get_folio_gfp+0x28c/0x6c0
> [   29.735680]  shmem_write_begin+0x5a/0xc0
> [   29.736535]  generic_perform_write+0x12a/0x2e0
> [   29.737503]  shmem_file_write_iter+0x86/0x90
> [   29.738428]  vfs_write+0x364/0x530
> [   29.739180]  ksys_write+0x6c/0xe0
> [   29.739906]  do_syscall_64+0x66/0x140
> [   29.740713]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   29.741800] RIP: 0033:0x7fa05c439984
> [   29.742584] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f
> 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 74 13 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20
> 48 89
> [   29.746542] RSP: 002b:00007ffece7720f8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000001
> [   29.748157] RAX: ffffffffffffffda RBX: 0000000000002800 RCX: 00007fa05c439984
> [   29.749682] RDX: 0000000000002800 RSI: 000055f9cfa08000 RDI: 0000000000000004
> [   29.751216] RBP: 00007ffece772140 R08: 0000000000002800 R09: 0000000000000007
> [   29.752743] R10: 0000000000000180 R11: 0000000000000202 R12: 000055f9cfa08000
> [   29.754262] R13: 0000000000000004 R14: 0000000000002800 R15: 00000000000009af
> [   29.755797]  </TASK>
> [   29.756285] Modules linked in: zram virtiofs
>
> I'm testing with PROVE_LOCKING on. It seems folio_unmap_invalidate is
> called for swapcache folio and it doesn't work well, following PATCH
> on top of mm-unstable seems fix it well:

I think there is a bigger problem here. folio_end_reclaim_write()
currently calls folio_unmap_invalidate() to remove the mapping, and
that's very different from what __remove_mapping() does in the reclaim
path: not only it breaks the swapcache case, the shadow entry is also
lost.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 06/11] mm/vmscan: Use PG_dropbehind instead of PG_reclaim
  2025-02-01  8:01   ` Kairui Song
  2025-02-03  7:14     ` Yu Zhao
@ 2025-02-03  8:39     ` Kirill A. Shutemov
  2025-02-04  0:47       ` Andrew Morton
  2025-02-06  6:34       ` Sergey Senozhatsky
  1 sibling, 2 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2025-02-03  8:39 UTC (permalink / raw)
  To: Kairui Song
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Sat, Feb 01, 2025 at 04:01:43PM +0800, Kairui Song wrote:
> On Thu, Jan 30, 2025 at 6:02 PM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > The recently introduced PG_dropbehind allows for freeing folios
> > immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> > to be involved to get the folio freed.
> >
> > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> > pageout().
> >
> > It is safe to leave PG_dropbehind on the folio if, for some reason
> > (bug?), the folio is not in a writeback state after ->writepage().
> > In these cases, the kernel had to clear PG_reclaim as it shared a page
> > flag bit with PG_readahead.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Acked-by: David Hildenbrand <david@redhat.com>
> > ---
> >  mm/vmscan.c | 9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bc1826020159..c97adb0fdaa4 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -692,19 +692,16 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
> >                 if (shmem_mapping(mapping) && folio_test_large(folio))
> >                         wbc.list = folio_list;
> >
> > -               folio_set_reclaim(folio);
> > +               folio_set_dropbehind(folio);
> > +
> >                 res = mapping->a_ops->writepage(&folio->page, &wbc);
> >                 if (res < 0)
> >                         handle_write_error(mapping, folio, res);
> >                 if (res == AOP_WRITEPAGE_ACTIVATE) {
> > -                       folio_clear_reclaim(folio);
> > +                       folio_clear_dropbehind(folio);
> >                         return PAGE_ACTIVATE;
> >                 }
> >
> > -               if (!folio_test_writeback(folio)) {
> > -                       /* synchronous write or broken a_ops? */
> > -                       folio_clear_reclaim(folio);
> > -               }
> >                 trace_mm_vmscan_write_folio(folio);
> >                 node_stat_add_folio(folio, NR_VMSCAN_WRITE);
> >                 return PAGE_SUCCESS;
> > --
> > 2.47.2
> >
> 
> Hi, I'm seeing following panic with SWAP after this commit:
> 
> [   29.672319] Oops: general protection fault, probably for
> non-canonical address 0xffff88909a3be3: 0000 [#1] PREEMPT SMP NOPTI
> [   29.675503] CPU: 82 UID: 0 PID: 5145 Comm: tar Kdump: loaded Not
> tainted 6.13.0.ptch-g1fe9ea48ec98 #917
> [   29.677508] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
> [   29.678886] RIP: 0010:__lock_acquire+0x20/0x15d0

Ouch.

I failed to trigger it my setup. Could you share your reproducer?

> I'm testing with PROVE_LOCKING on. It seems folio_unmap_invalidate is
> called for swapcache folio and it doesn't work well, following PATCH
> on top of mm-unstable seems fix it well:

Right. I don't understand swapping good enough. I missed this.

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4fe551037bf7..98493443d120 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1605,8 +1605,9 @@ static void folio_end_reclaim_write(struct folio *folio)
>          * invalidation in that case.
>          */
>         if (in_task() && folio_trylock(folio)) {
> -               if (folio->mapping)
> -                       folio_unmap_invalidate(folio->mapping, folio, 0);
> +               struct address_space *mapping = folio_mapping(folio);
> +               if (mapping)
> +                       folio_unmap_invalidate(mapping, folio, 0);
>                 folio_unlock(folio);
>         }
>  }

Once you do this, folio_unmap_invalidate() will never succeed for
swapcache as folio->mapping != mapping check will always be true and it
will fail with -EBUSY.

I guess we need to do something similar to what __remove_mapping() does
for swapcache folios.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 06/11] mm/vmscan: Use PG_dropbehind instead of PG_reclaim
  2025-02-03  8:39     ` Kirill A. Shutemov
@ 2025-02-04  0:47       ` Andrew Morton
  2025-02-06  6:34       ` Sergey Senozhatsky
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2025-02-04  0:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kairui Song, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Mon, 3 Feb 2025 10:39:58 +0200 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 4fe551037bf7..98493443d120 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -1605,8 +1605,9 @@ static void folio_end_reclaim_write(struct folio *folio)
> >          * invalidation in that case.
> >          */
> >         if (in_task() && folio_trylock(folio)) {
> > -               if (folio->mapping)
> > -                       folio_unmap_invalidate(folio->mapping, folio, 0);
> > +               struct address_space *mapping = folio_mapping(folio);
> > +               if (mapping)
> > +                       folio_unmap_invalidate(mapping, folio, 0);
> >                 folio_unlock(folio);
> >         }
> >  }
> 
> Once you do this, folio_unmap_invalidate() will never succeed for
> swapcache as folio->mapping != mapping check will always be true and it
> will fail with -EBUSY.
> 
> I guess we need to do something similar to what __remove_mapping() does
> for swapcache folios.

Thanks, I'll drop the v3 series from mm.git.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 06/11] mm/vmscan: Use PG_dropbehind instead of PG_reclaim
  2025-02-03  8:39     ` Kirill A. Shutemov
  2025-02-04  0:47       ` Andrew Morton
@ 2025-02-06  6:34       ` Sergey Senozhatsky
  1 sibling, 0 replies; 14+ messages in thread
From: Sergey Senozhatsky @ 2025-02-06  6:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kairui Song, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu,
	Mathieu Desnoyers, Miklos Szeredi, Nhat Pham, Oscar Salvador,
	Ran Xiaokai, Rodrigo Vivi, Simona Vetter, Steven Rostedt,
	Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed, Yu Zhao, intel-gfx,
	dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On (25/02/03 10:39), Kirill A. Shutemov wrote:
> > Hi, I'm seeing following panic with SWAP after this commit:
> >
> > [   29.672319] Oops: general protection fault, probably for
> > non-canonical address 0xffff88909a3be3: 0000 [#1] PREEMPT SMP NOPTI
> > [   29.675503] CPU: 82 UID: 0 PID: 5145 Comm: tar Kdump: loaded Not
> > tainted 6.13.0.ptch-g1fe9ea48ec98 #917
> > [   29.677508] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
> > [   29.678886] RIP: 0010:__lock_acquire+0x20/0x15d0
>
> Ouch.
>
> I failed to trigger it my setup. Could you share your reproducer?

I'm seeing this as well (backtraces below).

My repro is:

- 4GB VM with 2 zram devices
  - one is setup as swap
  - the other one has ext4 fs on it
	- I dd large files to it


---

xa_lock_irq(&mapping->i_pages):

[   94.609589][  T157] Oops: general protection fault, probably for non-canonical address 0xe01ffbf11020301a: 0000 [#1] PREEMPT SMP KASAN PTI
[   94.611881][  T157] KASAN: maybe wild-memory-access in range [0x00ffff88810180d0-0x00ffff88810180d7]
[   94.613567][  T157] CPU: 1 UID: 0 PID: 157 Comm: kswapd0 Not tainted 6.13.0+ #927
[   94.614947][  T157] RIP: 0010:__lock_acquire+0x6a/0x1ef0
[   94.615942][  T157] Code: 08 84 d2 0f 85 ed 13 00 00 44 8b 05 24 30 d5 02 45 85 c0 0f 84 bc 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 eb 18 00 00 49 8b 04 24 48 3d a0 8b ac 84 0f 84
[   94.619668][  T157] RSP: 0018:ffff88810510eec0 EFLAGS: 00010002
[   94.620835][  T157] RAX: dffffc0000000000 RBX: 1ffff11020a21df5 RCX: 1ffffffff084c092
[   94.622329][  T157] RDX: 001ffff11020301a RSI: 0000000000000000 RDI: 00ffff88810180d1
[   94.623779][  T157] RBP: 00ffff88810180d1 R08: 0000000000000001 R09: 0000000000000000
[   94.625213][  T157] R10: ffffffff8425d0d7 R11: 0000000000000000 R12: 00ffff88810180d1
[   94.626656][  T157] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[   94.628086][  T157] FS:  0000000000000000(0000) GS:ffff88815aa80000(0000) knlGS:0000000000000000
[   94.629700][  T157] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   94.630894][  T157] CR2: 00007f757719c2b0 CR3: 0000000003c82005 CR4: 0000000000770ef0
[   94.632333][  T157] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   94.633796][  T157] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   94.635265][  T157] PKRU: 55555554
[   94.635909][  T157] Call Trace:
[   94.636512][  T157]  <TASK>
[   94.637052][  T157]  ? show_trace_log_lvl+0x1a7/0x2e0
[   94.638005][  T157]  ? show_trace_log_lvl+0x1a7/0x2e0
[   94.638960][  T157]  ? lock_acquire.part.0+0xfa/0x310
[   94.639909][  T157]  ? __die_body.cold+0x8/0x12
[   94.640765][  T157]  ? die_addr+0x42/0x70
[   94.641530][  T157]  ? exc_general_protection+0x12e/0x210
[   94.642558][  T157]  ? asm_exc_general_protection+0x22/0x30
[   94.643610][  T157]  ? __lock_acquire+0x6a/0x1ef0
[   94.644506][  T157]  ? _raw_spin_unlock_irq+0x24/0x40
[   94.645468][  T157]  ? __wait_for_common+0x2f2/0x610
[   94.646412][  T157]  ? pci_mmcfg_reserved+0x120/0x120
[   94.647364][  T157]  ? submit_bio_noacct_nocheck+0x32e/0x3e0
[   94.648448][  T157]  ? lock_is_held_type+0x81/0xe0
[   94.649360][  T157]  lock_acquire.part.0+0xfa/0x310
[   94.650288][  T157]  ? folio_unmap_invalidate+0x286/0x550
[   94.651324][  T157]  ? __lock_acquire+0x1ef0/0x1ef0
[   94.652250][  T157]  ? submit_bio_wait+0x17c/0x200
[   94.653166][  T157]  ? submit_bio_wait_endio+0x40/0x40
[   94.654140][  T157]  ? lock_acquire+0x18a/0x1f0
[   94.655008][  T157]  _raw_spin_lock+0x2c/0x40
[   94.655853][  T157]  ? folio_unmap_invalidate+0x286/0x550
[   94.656879][  T157]  folio_unmap_invalidate+0x286/0x550
[   94.657866][  T157]  folio_end_writeback+0x146/0x190
[   94.658815][  T157]  swap_writepage_bdev_sync+0x312/0x410
[   94.659840][  T157]  ? swap_read_folio_bdev_sync+0x3c0/0x3c0
[   94.660917][  T157]  ? do_raw_spin_lock+0x12a/0x260
[   94.661845][  T157]  ? __rwlock_init+0x150/0x150
[   94.662726][  T157]  ? bio_kmalloc+0x20/0x20
[   94.663548][  T157]  ? swapcache_clear+0xd0/0xd0
[   94.664431][  T157]  swap_writepage+0x2a5/0x720
[   94.665298][  T157]  pageout+0x304/0x6a0
[   94.666052][  T157]  ? get_pte_pfn.isra.0+0x4d0/0x4d0
[   94.667025][  T157]  ? find_held_lock+0x2d/0x110
[   94.667912][  T157]  ? enable_swap_slots_cache+0x90/0x90
[   94.668925][  T157]  ? arch_tlbbatch_flush+0x1f6/0x370
[   94.669903][  T157]  shrink_folio_list+0x19b5/0x2600
[   94.670856][  T157]  ? pageout+0x6a0/0x6a0
[   94.671649][  T157]  ? isolate_folios+0x156/0x320
[   94.672544][  T157]  ? find_held_lock+0x2d/0x110
[   94.673428][  T157]  ? mark_lock+0xcc/0x12c0
[   94.674258][  T157]  ? mark_lock_irq+0x1cd0/0x1cd0
[   94.675174][  T157]  ? reacquire_held_locks+0x4d0/0x4d0
[   94.676166][  T157]  ? mark_held_locks+0x94/0xe0
[   94.677045][  T157]  evict_folios+0x4bb/0x1580
[   94.677890][  T157]  ? isolate_folios+0x320/0x320
[   94.678787][  T157]  ? __lock_acquire+0xc4c/0x1ef0
[   94.679695][  T157]  ? lock_is_held_type+0x81/0xe0
[   94.680607][  T157]  try_to_shrink_lruvec+0x41e/0x9e0
[   94.681564][  T157]  ? __lock_acquire+0xc4c/0x1ef0
[   94.682482][  T157]  ? evict_folios+0x1580/0x1580
[   94.683390][  T157]  ? lock_release+0x105/0x260
[   94.684255][  T157]  lru_gen_shrink_node+0x25d/0x660
[   94.685202][  T157]  ? balance_pgdat+0x5b5/0xf00
[   94.686083][  T157]  ? try_to_shrink_lruvec+0x9e0/0x9e0
[   94.687076][  T157]  ? pgdat_balanced+0xb8/0x110
[   94.687957][  T157]  balance_pgdat+0x532/0xf00
[   94.688803][  T157]  ? shrink_node.part.0+0xc30/0xc30
[   94.689758][  T157]  ? io_schedule_timeout+0x110/0x110
[   94.690741][  T157]  ? reacquire_held_locks+0x4d0/0x4d0
[   94.691723][  T157]  ? __lock_acquire+0x1ef0/0x1ef0
[   94.692643][  T157]  ? zone_watermark_ok_safe+0x32/0x290
[   94.693650][  T157]  ? inactive_is_low.isra.0+0xe0/0xe0
[   94.694639][  T157]  ? do_raw_spin_lock+0x12a/0x260
[   94.695567][  T157]  kswapd+0x2ef/0x4e0
[   94.696297][  T157]  ? balance_pgdat+0xf00/0xf00
[   94.697176][  T157]  ? __kthread_parkme+0xb1/0x1c0
[   94.698087][  T157]  ? balance_pgdat+0xf00/0xf00
[   94.698971][  T157]  kthread+0x38b/0x700
[   94.699721][  T157]  ? kthread_is_per_cpu+0xb0/0xb0
[   94.700648][  T157]  ? lock_acquire+0x18a/0x1f0
[   94.701516][  T157]  ? kthread_is_per_cpu+0xb0/0xb0
[   94.702438][  T157]  ret_from_fork+0x2d/0x70
[   94.703267][  T157]  ? kthread_is_per_cpu+0xb0/0xb0
[   94.704193][  T157]  ret_from_fork_asm+0x11/0x20
[   94.705074][  T157]  </TASK>


Also UAF in compactd

[   95.249096][  T146] ==================================================================
[   95.254091][  T146] BUG: KASAN: slab-use-after-free in kcompactd+0x9cd/0xa60
[   95.257959][  T146] Read of size 4 at addr ffff888105100018 by task kcompactd0/146
[   95.262100][  T146] 
[   95.263347][  T146] CPU: 11 UID: 0 PID: 146 Comm: kcompactd0 Tainted: G      D W          6.13.0+ #927
[   95.263363][  T146] Tainted: [D]=DIE, [W]=WARN
[   95.263367][  T146] Call Trace:
[   95.263379][  T146]  <TASK>
[   95.263386][  T146]  dump_stack_lvl+0x57/0x80
[   95.263403][  T146]  print_address_description.constprop.0+0x88/0x330
[   95.263416][  T146]  ? kcompactd+0x9cd/0xa60
[   95.263425][  T146]  print_report+0xe2/0x1cc
[   95.263433][  T146]  ? __virt_addr_valid+0x1d1/0x3b0
[   95.263442][  T146]  ? kcompactd+0x9cd/0xa60
[   95.263449][  T146]  ? kcompactd+0x9cd/0xa60
[   95.263456][  T146]  kasan_report+0xb9/0x180
[   95.263466][  T146]  ? kcompactd+0x9cd/0xa60
[   95.263476][  T146]  kcompactd+0x9cd/0xa60
[   95.263487][  T146]  ? kcompactd_do_work+0x710/0x710
[   95.263495][  T146]  ? prepare_to_swait_exclusive+0x260/0x260
[   95.263506][  T146]  ? __kthread_parkme+0xb1/0x1c0
[   95.263520][  T146]  ? kcompactd_do_work+0x710/0x710
[   95.263527][  T146]  kthread+0x38b/0x700
[   95.263535][  T146]  ? kthread_is_per_cpu+0xb0/0xb0
[   95.263542][  T146]  ? lock_acquire+0x18a/0x1f0
[   95.263552][  T146]  ? kthread_is_per_cpu+0xb0/0xb0
[   95.263559][  T146]  ret_from_fork+0x2d/0x70
[   95.263569][  T146]  ? kthread_is_per_cpu+0xb0/0xb0
[   95.263576][  T146]  ret_from_fork_asm+0x11/0x20
[   95.263589][  T146]  </TASK>
[   95.263592][  T146] 
[   95.293474][  T146] Allocated by task 2:
[   95.294209][  T146]  kasan_save_stack+0x1e/0x40
[   95.295111][  T146]  kasan_save_track+0x10/0x30
[   95.295978][  T146]  __kasan_slab_alloc+0x62/0x70
[   95.296860][  T146]  kmem_cache_alloc_node_noprof+0xdb/0x2a0
[   95.297915][  T146]  dup_task_struct+0x32/0x550
[   95.298797][  T146]  copy_process+0x309/0x45d0
[   95.299656][  T146]  kernel_clone+0xb7/0x600
[   95.300451][  T146]  kernel_thread+0xb0/0xe0
[   95.301253][  T146]  kthreadd+0x3b5/0x620
[   95.302019][  T146]  ret_from_fork+0x2d/0x70
[   95.302865][  T146]  ret_from_fork_asm+0x11/0x20
[   95.303724][  T146] 
[   95.304146][  T146] Freed by task 0:
[   95.304836][  T146]  kasan_save_stack+0x1e/0x40
[   95.305708][  T146]  kasan_save_track+0x10/0x30
[   95.306569][  T146]  kasan_save_free_info+0x37/0x50
[   95.307515][  T146]  __kasan_slab_free+0x33/0x40
[   95.308402][  T146]  kmem_cache_free+0xff/0x480
[   95.309256][  T146]  delayed_put_task_struct+0x15a/0x1d0
[   95.310258][  T146]  rcu_do_batch+0x2ee/0xb70
[   95.311113][  T146]  rcu_core+0x4a6/0xa10
[   95.311868][  T146]  handle_softirqs+0x191/0x650
[   95.312747][  T146]  __irq_exit_rcu+0xaf/0xe0
[   95.313643][  T146]  irq_exit_rcu+0xa/0x20
[   95.314536][  T146]  sysvec_apic_timer_interrupt+0x65/0x80
[   95.315616][  T146]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[   95.316702][  T146] 
[   95.317127][  T146] Last potentially related work creation:
[   95.318155][  T146]  kasan_save_stack+0x1e/0x40
[   95.319006][  T146]  kasan_record_aux_stack+0x97/0xa0
[   95.319947][  T146]  __call_rcu_common.constprop.0+0x70/0x7b0
[   95.321014][  T146]  __schedule+0x75d/0x1720
[   95.321817][  T146]  schedule_idle+0x55/0x80
[   95.322624][  T146]  cpu_startup_entry+0x50/0x60
[   95.323490][  T146]  start_secondary+0x1b6/0x210
[   95.324354][  T146]  common_startup_64+0x12c/0x138
[   95.325248][  T146] 
[   95.325669][  T146] The buggy address belongs to the object at ffff888105100000
[   95.325669][  T146]  which belongs to the cache task_struct of size 8200
[   95.328215][  T146] The buggy address is located 24 bytes inside of
[   95.328215][  T146]  freed 8200-byte region [ffff888105100000, ffff888105102008)
[   95.330692][  T146] 
[   95.331116][  T146] The buggy address belongs to the physical page:
[   95.332275][  T146] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x105100
[   95.333862][  T146] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   95.335399][  T146] flags: 0x8000000000000040(head|zone=2)
[   95.336425][  T146] page_type: f5(slab)
[   95.337155][  T146] raw: 8000000000000040 ffff888100a80c80 dead000000000122 0000000000000000
[   95.338716][  T146] raw: 0000000000000000 0000000000030003 00000000f5000000 0000000000000000
[   95.340273][  T146] head: 8000000000000040 ffff888100a80c80 dead000000000122 0000000000000000
[   95.341844][  T146] head: 0000000000000000 0000000000030003 00000000f5000000 0000000000000000
[   95.343418][  T146] head: 8000000000000003 ffffea0004144001 ffffffffffffffff 0000000000000000
[   95.344977][  T146] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
[   95.346540][  T146] page dumped because: kasan: bad access detected
[   95.347701][  T146] 
[   95.348123][  T146] Memory state around the buggy address:
[   95.349139][  T146]  ffff8881050fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   95.350598][  T146]  ffff8881050fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   95.352054][  T146] >ffff888105100000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   95.353510][  T146]                             ^
[   95.354389][  T146]  ffff888105100080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   95.355856][  T146]  ffff888105100100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   95.357315][  T146] ==================================================================


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-02-06  6:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20250130100050.1868208-1-kirill.shutemov@linux.intel.com>
     [not found] ` <20250130100050.1868208-5-kirill.shutemov@linux.intel.com>
2025-01-30 17:39   ` [PATCHv3 04/11] mm/zswap: Use PG_dropbehind instead of PG_reclaim Nhat Pham
     [not found] ` <20250130100050.1868208-6-kirill.shutemov@linux.intel.com>
2025-01-30 20:34   ` [PATCHv3 05/11] mm/truncate: Use folio_set_dropbehind() instead of deactivate_file_folio() Yu Zhao
     [not found] ` <20250130100050.1868208-2-kirill.shutemov@linux.intel.com>
2025-01-30 20:36   ` [PATCHv3 01/11] mm/migrate: Transfer PG_dropbehind to the new folio Yu Zhao
     [not found] ` <20250130100050.1868208-12-kirill.shutemov@linux.intel.com>
2025-01-30 20:38   ` [PATCHv3 11/11] mm: Rename PG_dropbehind to PG_reclaim Yu Zhao
     [not found] ` <20250130100050.1868208-3-kirill.shutemov@linux.intel.com>
2025-01-31 13:37   ` [PATCHv3 02/11] drm/i915/gem: Convert __shmem_writeback() to folios Andi Shyti
2025-01-31 14:21   ` Matthew Wilcox
2025-01-31 20:27   ` Shakeel Butt
     [not found] ` <20250130100050.1868208-4-kirill.shutemov@linux.intel.com>
2025-01-31 13:38   ` [PATCHv3 03/11] drm/i915/gem: Use PG_dropbehind instead of PG_reclaim Andi Shyti
     [not found] ` <20250130100050.1868208-7-kirill.shutemov@linux.intel.com>
2025-01-31 21:32   ` [PATCHv3 06/11] mm/vmscan: " Shakeel Butt
2025-02-01  8:01   ` Kairui Song
2025-02-03  7:14     ` Yu Zhao
2025-02-03  8:39     ` Kirill A. Shutemov
2025-02-04  0:47       ` Andrew Morton
2025-02-06  6:34       ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox