* [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd
@ 2025-02-25 0:33 Mike Snitzer
2025-02-25 2:20 ` [PATCH v2 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback Mike Snitzer
2025-02-26 14:58 ` [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Christoph Hellwig
0 siblings, 2 replies; 11+ messages in thread
From: Mike Snitzer @ 2025-02-25 0:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
it so nfs_release_folio() can skip calling nfs_wb_folio() from
kcompactd.
Otherwise NFS can deadlock waiting for kcompactd enduced writeback
which recurses back to NFS (which triggers writeback to NFSD via
NFS loopback mount on the same host, NFSD blocks waiting for XFS's
call to __filemap_get_folio):
6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds.
{---
[58] "kcompactd0"
[<0>] folio_wait_bit+0xe8/0x200
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] nfs_wb_folio+0x80/0x1b0 [nfs]
[<0>] nfs_release_folio+0x68/0x130 [nfs]
[<0>] split_huge_page_to_list_to_order+0x362/0x840
[<0>] migrate_pages_batch+0x43d/0xb90
[<0>] migrate_pages_sync+0x9a/0x240
[<0>] migrate_pages+0x93c/0x9f0
[<0>] compact_zone+0x8e2/0x1030
[<0>] compact_node+0xdb/0x120
[<0>] kcompactd+0x121/0x2e0
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x40
[<0>] ret_from_fork_asm+0x1a/0x30
---}
Fixes: 96780ca55e3cb ("NFS: fix up nfs_release_folio() to try to release the page")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/file.c | 3 ++-
include/linux/compaction.h | 5 +++++
include/linux/sched.h | 2 +-
mm/compaction.c | 3 +++
4 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 1bb646752e466..033feeab8c346 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -29,6 +29,7 @@
#include <linux/pagemap.h>
#include <linux/gfp.h>
#include <linux/swap.h>
+#include <linux/compaction.h>
#include <linux/uaccess.h>
#include <linux/filelock.h>
@@ -457,7 +458,7 @@ static bool nfs_release_folio(struct folio *folio, gfp_t gfp)
/* If the private flag is set, then the folio is not freeable */
if (folio_test_private(folio)) {
if ((current_gfp_context(gfp) & GFP_KERNEL) != GFP_KERNEL ||
- current_is_kswapd())
+ current_is_kswapd() || current_is_kcompactd())
return false;
if (nfs_wb_folio(folio->mapping->host, folio) < 0)
return false;
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index e947764960496..7bf0c521db634 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -80,6 +80,11 @@ static inline unsigned long compact_gap(unsigned int order)
return 2UL << order;
}
+static inline int current_is_kcompactd(void)
+{
+ return current->flags & PF_KCOMPACTD;
+}
+
#ifdef CONFIG_COMPACTION
extern unsigned int extfrag_for_order(struct zone *zone, unsigned int order);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8982820dae213..0d1d70aded38f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1682,7 +1682,7 @@ extern struct pid *cad_pid;
#define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */
#define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */
#define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */
-#define PF__HOLE__00010000 0x00010000
+#define PF_KCOMPACTD 0x00010000 /* I am kcompactd */
#define PF_KSWAPD 0x00020000 /* I am kswapd */
#define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */
#define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */
diff --git a/mm/compaction.c b/mm/compaction.c
index 384e4672998e5..2dd03105e6898 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3164,6 +3164,7 @@ static int kcompactd(void *p)
if (!cpumask_empty(cpumask))
set_cpus_allowed_ptr(tsk, cpumask);
+ tsk->flags | PF_KCOMPACTD;
set_freezable();
pgdat->kcompactd_max_order = 0;
@@ -3220,6 +3221,8 @@ static int kcompactd(void *p)
pgdat->proactive_compact_trigger = false;
}
+ tsk->flags &= ~PF_KCOMPACTD;
+
return 0;
}
--
2.44.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
2025-02-25 0:33 [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Mike Snitzer
@ 2025-02-25 2:20 ` Mike Snitzer
2025-02-25 6:53 ` Andrew Morton
2025-02-26 14:58 ` [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Christoph Hellwig
1 sibling, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2025-02-25 2:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
it so nfs_release_folio() can skip calling nfs_wb_folio() from
kcompactd.
Otherwise NFS can deadlock waiting for kcompactd enduced writeback
which recurses back to NFS (which triggers writeback to NFSD via
NFS loopback mount on the same host, NFSD blocks waiting for XFS's
call to __filemap_get_folio):
6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds.
{---
[58] "kcompactd0"
[<0>] folio_wait_bit+0xe8/0x200
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] nfs_wb_folio+0x80/0x1b0 [nfs]
[<0>] nfs_release_folio+0x68/0x130 [nfs]
[<0>] split_huge_page_to_list_to_order+0x362/0x840
[<0>] migrate_pages_batch+0x43d/0xb90
[<0>] migrate_pages_sync+0x9a/0x240
[<0>] migrate_pages+0x93c/0x9f0
[<0>] compact_zone+0x8e2/0x1030
[<0>] compact_node+0xdb/0x120
[<0>] kcompactd+0x121/0x2e0
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x40
[<0>] ret_from_fork_asm+0x1a/0x30
---}
Fixes: 96780ca55e3cb ("NFS: fix up nfs_release_folio() to try to release the page")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/file.c | 3 ++-
include/linux/compaction.h | 5 +++++
include/linux/sched.h | 2 +-
mm/compaction.c | 3 +++
4 files changed, 11 insertions(+), 2 deletions(-)
v2: use |= to properly set PF_KCOMPACTD in kcompactd(). Also adjust
subject to reflect this patch fixes a deadlock.
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 1bb646752e466..033feeab8c346 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -29,6 +29,7 @@
#include <linux/pagemap.h>
#include <linux/gfp.h>
#include <linux/swap.h>
+#include <linux/compaction.h>
#include <linux/uaccess.h>
#include <linux/filelock.h>
@@ -457,7 +458,7 @@ static bool nfs_release_folio(struct folio *folio, gfp_t gfp)
/* If the private flag is set, then the folio is not freeable */
if (folio_test_private(folio)) {
if ((current_gfp_context(gfp) & GFP_KERNEL) != GFP_KERNEL ||
- current_is_kswapd())
+ current_is_kswapd() || current_is_kcompactd())
return false;
if (nfs_wb_folio(folio->mapping->host, folio) < 0)
return false;
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index e947764960496..7bf0c521db634 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -80,6 +80,11 @@ static inline unsigned long compact_gap(unsigned int order)
return 2UL << order;
}
+static inline int current_is_kcompactd(void)
+{
+ return current->flags & PF_KCOMPACTD;
+}
+
#ifdef CONFIG_COMPACTION
extern unsigned int extfrag_for_order(struct zone *zone, unsigned int order);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8982820dae213..0d1d70aded38f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1682,7 +1682,7 @@ extern struct pid *cad_pid;
#define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */
#define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */
#define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */
-#define PF__HOLE__00010000 0x00010000
+#define PF_KCOMPACTD 0x00010000 /* I am kcompactd */
#define PF_KSWAPD 0x00020000 /* I am kswapd */
#define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */
#define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */
diff --git a/mm/compaction.c b/mm/compaction.c
index 384e4672998e5..836c36f9b2f00 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3164,6 +3164,7 @@ static int kcompactd(void *p)
if (!cpumask_empty(cpumask))
set_cpus_allowed_ptr(tsk, cpumask);
+ tsk->flags |= PF_KCOMPACTD;
set_freezable();
pgdat->kcompactd_max_order = 0;
@@ -3220,6 +3221,8 @@ static int kcompactd(void *p)
pgdat->proactive_compact_trigger = false;
}
+ tsk->flags &= ~PF_KCOMPACTD;
+
return 0;
}
--
2.44.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
2025-02-25 2:20 ` [PATCH v2 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback Mike Snitzer
@ 2025-02-25 6:53 ` Andrew Morton
2025-02-25 17:02 ` Mike Snitzer
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2025-02-25 6:53 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
On Mon, 24 Feb 2025 21:20:02 -0500 Mike Snitzer <snitzer@kernel.org> wrote:
> Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
> it so nfs_release_folio() can skip calling nfs_wb_folio() from
> kcompactd.
>
--- a/mm/compaction.c~a
+++ a/mm/compaction.c
@@ -3182,7 +3182,7 @@ static int kcompactd(void *p)
long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
long timeout = default_timeout;
- tsk->flags |= PF_KCOMPACTD;
+ current->flags |= PF_KCOMPACTD;
set_freezable();
pgdat->kcompactd_max_order = 0;
@@ -3239,7 +3239,7 @@ static int kcompactd(void *p)
pgdat->proactive_compact_trigger = false;
}
- tsk->flags &= ~PF_KCOMPACTD;
+ current->flags &= ~PF_KCOMPACTD;
return 0;
}
I am of course concerned about how well tested this was!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
2025-02-25 6:53 ` Andrew Morton
@ 2025-02-25 17:02 ` Mike Snitzer
2025-02-25 20:19 ` [PATCH v3 " Mike Snitzer
2025-02-25 20:28 ` Mike Snitzer
0 siblings, 2 replies; 11+ messages in thread
From: Mike Snitzer @ 2025-02-25 17:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
On Mon, Feb 24, 2025 at 10:53:06PM -0800, Andrew Morton wrote:
> On Mon, 24 Feb 2025 21:20:02 -0500 Mike Snitzer <snitzer@kernel.org> wrote:
>
> > Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
> > it so nfs_release_folio() can skip calling nfs_wb_folio() from
> > kcompactd.
> >
>
> --- a/mm/compaction.c~a
> +++ a/mm/compaction.c
> @@ -3182,7 +3182,7 @@ static int kcompactd(void *p)
> long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
> long timeout = default_timeout;
>
> - tsk->flags |= PF_KCOMPACTD;
> + current->flags |= PF_KCOMPACTD;
> set_freezable();
>
> pgdat->kcompactd_max_order = 0;
> @@ -3239,7 +3239,7 @@ static int kcompactd(void *p)
> pgdat->proactive_compact_trigger = false;
> }
>
> - tsk->flags &= ~PF_KCOMPACTD;
> + current->flags &= ~PF_KCOMPACTD;
>
> return 0;
> }
>
> I am of course concerned about how well tested this was!
Understandable, the patch was created and tested against v6.12.16.
Sincere apologies, I didn't rebase before submitting.
I'll rebase, retest and then send v3 today.
Thanks,
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
2025-02-25 17:02 ` Mike Snitzer
@ 2025-02-25 20:19 ` Mike Snitzer
2025-02-25 20:28 ` Mike Snitzer
1 sibling, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2025-02-25 20:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
it so nfs_release_folio() can skip calling nfs_wb_folio() from
kcompactd.
Otherwise NFS can deadlock waiting for kcompactd enduced writeback
which recurses back to NFS (which triggers writeback to NFSD via
NFS loopback mount on the same host, NFSD blocks waiting for XFS's
call to __filemap_get_folio):
6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds.
{---
[58] "kcompactd0"
[<0>] folio_wait_bit+0xe8/0x200
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] nfs_wb_folio+0x80/0x1b0 [nfs]
[<0>] nfs_release_folio+0x68/0x130 [nfs]
[<0>] split_huge_page_to_list_to_order+0x362/0x840
[<0>] migrate_pages_batch+0x43d/0xb90
[<0>] migrate_pages_sync+0x9a/0x240
[<0>] migrate_pages+0x93c/0x9f0
[<0>] compact_zone+0x8e2/0x1030
[<0>] compact_node+0xdb/0x120
[<0>] kcompactd+0x121/0x2e0
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x40
[<0>] ret_from_fork_asm+0x1a/0x30
---}
Fixes: 96780ca55e3cb ("NFS: fix up nfs_release_folio() to try to release the page")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/file.c | 3 ++-
include/linux/compaction.h | 5 +++++
include/linux/sched.h | 2 +-
mm/compaction.c | 3 +++
4 files changed, 11 insertions(+), 2 deletions(-)
v3: rebase to latest Linus tree (commit cc8a0934d099).
v2: use |= to properly set PF_KCOMPACTD in kcompactd(). Also adjust
subject to reflect this patch fixes a deadlock.
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 1bb646752e46..033feeab8c34 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -29,6 +29,7 @@
#include <linux/pagemap.h>
#include <linux/gfp.h>
#include <linux/swap.h>
+#include <linux/compaction.h>
#include <linux/uaccess.h>
#include <linux/filelock.h>
@@ -457,7 +458,7 @@ static bool nfs_release_folio(struct folio *folio, gfp_t gfp)
/* If the private flag is set, then the folio is not freeable */
if (folio_test_private(folio)) {
if ((current_gfp_context(gfp) & GFP_KERNEL) != GFP_KERNEL ||
- current_is_kswapd())
+ current_is_kswapd() || current_is_kcompactd())
return false;
if (nfs_wb_folio(folio->mapping->host, folio) < 0)
return false;
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index e94776496049..7bf0c521db63 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -80,6 +80,11 @@ static inline unsigned long compact_gap(unsigned int order)
return 2UL << order;
}
+static inline int current_is_kcompactd(void)
+{
+ return current->flags & PF_KCOMPACTD;
+}
+
#ifdef CONFIG_COMPACTION
extern unsigned int extfrag_for_order(struct zone *zone, unsigned int order);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9632e3318e0d..9c15365a30c0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1701,7 +1701,7 @@ extern struct pid *cad_pid;
#define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */
#define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */
#define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */
-#define PF__HOLE__00010000 0x00010000
+#define PF_KCOMPACTD 0x00010000 /* I am kcompactd */
#define PF_KSWAPD 0x00020000 /* I am kswapd */
#define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */
#define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */
diff --git a/mm/compaction.c b/mm/compaction.c
index 12ed8425fa17..a3203d97123e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3181,6 +3181,7 @@ static int kcompactd(void *p)
long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
long timeout = default_timeout;
+ current->flags |= PF_KCOMPACTD;
set_freezable();
pgdat->kcompactd_max_order = 0;
@@ -3237,6 +3238,8 @@ static int kcompactd(void *p)
pgdat->proactive_compact_trigger = false;
}
+ current->flags &= ~PF_KCOMPACTD;
+
return 0;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
2025-02-25 17:02 ` Mike Snitzer
2025-02-25 20:19 ` [PATCH v3 " Mike Snitzer
@ 2025-02-25 20:28 ` Mike Snitzer
1 sibling, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2025-02-25 20:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
it so nfs_release_folio() can skip calling nfs_wb_folio() from
kcompactd.
Otherwise NFS can deadlock waiting for kcompactd enduced writeback
which recurses back to NFS (which triggers writeback to NFSD via
NFS loopback mount on the same host, NFSD blocks waiting for XFS's
call to __filemap_get_folio):
6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds.
{---
[58] "kcompactd0"
[<0>] folio_wait_bit+0xe8/0x200
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] nfs_wb_folio+0x80/0x1b0 [nfs]
[<0>] nfs_release_folio+0x68/0x130 [nfs]
[<0>] split_huge_page_to_list_to_order+0x362/0x840
[<0>] migrate_pages_batch+0x43d/0xb90
[<0>] migrate_pages_sync+0x9a/0x240
[<0>] migrate_pages+0x93c/0x9f0
[<0>] compact_zone+0x8e2/0x1030
[<0>] compact_node+0xdb/0x120
[<0>] kcompactd+0x121/0x2e0
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x40
[<0>] ret_from_fork_asm+0x1a/0x30
---}
Fixes: 96780ca55e3cb ("NFS: fix up nfs_release_folio() to try to release the page")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/file.c | 3 ++-
include/linux/compaction.h | 5 +++++
include/linux/sched.h | 2 +-
mm/compaction.c | 3 +++
4 files changed, 11 insertions(+), 2 deletions(-)
v3: rebase to latest Linus tree (commit cc8a0934d099).
v2: use |= to properly set PF_KCOMPACTD in kcompactd(). Also adjust
subject to reflect this patch fixes a deadlock.
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 1bb646752e46..033feeab8c34 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -29,6 +29,7 @@
#include <linux/pagemap.h>
#include <linux/gfp.h>
#include <linux/swap.h>
+#include <linux/compaction.h>
#include <linux/uaccess.h>
#include <linux/filelock.h>
@@ -457,7 +458,7 @@ static bool nfs_release_folio(struct folio *folio, gfp_t gfp)
/* If the private flag is set, then the folio is not freeable */
if (folio_test_private(folio)) {
if ((current_gfp_context(gfp) & GFP_KERNEL) != GFP_KERNEL ||
- current_is_kswapd())
+ current_is_kswapd() || current_is_kcompactd())
return false;
if (nfs_wb_folio(folio->mapping->host, folio) < 0)
return false;
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index e94776496049..7bf0c521db63 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -80,6 +80,11 @@ static inline unsigned long compact_gap(unsigned int order)
return 2UL << order;
}
+static inline int current_is_kcompactd(void)
+{
+ return current->flags & PF_KCOMPACTD;
+}
+
#ifdef CONFIG_COMPACTION
extern unsigned int extfrag_for_order(struct zone *zone, unsigned int order);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9632e3318e0d..9c15365a30c0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1701,7 +1701,7 @@ extern struct pid *cad_pid;
#define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */
#define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */
#define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */
-#define PF__HOLE__00010000 0x00010000
+#define PF_KCOMPACTD 0x00010000 /* I am kcompactd */
#define PF_KSWAPD 0x00020000 /* I am kswapd */
#define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */
#define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */
diff --git a/mm/compaction.c b/mm/compaction.c
index 12ed8425fa17..a3203d97123e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3181,6 +3181,7 @@ static int kcompactd(void *p)
long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
long timeout = default_timeout;
+ current->flags |= PF_KCOMPACTD;
set_freezable();
pgdat->kcompactd_max_order = 0;
@@ -3237,6 +3238,8 @@ static int kcompactd(void *p)
pgdat->proactive_compact_trigger = false;
}
+ current->flags &= ~PF_KCOMPACTD;
+
return 0;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd
2025-02-25 0:33 [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Mike Snitzer
2025-02-25 2:20 ` [PATCH v2 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback Mike Snitzer
@ 2025-02-26 14:58 ` Christoph Hellwig
2025-02-26 15:44 ` Mike Snitzer
` (2 more replies)
1 sibling, 3 replies; 11+ messages in thread
From: Christoph Hellwig @ 2025-02-26 14:58 UTC (permalink / raw)
To: Mike Snitzer
Cc: Andrew Morton, Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
On Mon, Feb 24, 2025 at 07:33:01PM -0500, Mike Snitzer wrote:
> Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
> it so nfs_release_folio() can skip calling nfs_wb_folio() from
> kcompactd.
>
> Otherwise NFS can deadlock waiting for kcompactd enduced writeback
> which recurses back to NFS (which triggers writeback to NFSD via
> NFS loopback mount on the same host, NFSD blocks waiting for XFS's
> call to __filemap_get_folio):
Having a flag for a specific kernel thread feels wrong. I'm not an
expert in this area, but as fast as I can tell the problem is that
kcompactd should be calling into ->release_folio without __GFP_IO
set.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd
2025-02-26 14:58 ` [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Christoph Hellwig
@ 2025-02-26 15:44 ` Mike Snitzer
2025-02-26 15:55 ` Mike Snitzer
2025-02-26 17:22 ` Trond Myklebust
2025-02-26 22:02 ` Andrew Morton
2 siblings, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2025-02-26 15:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrew Morton, Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
On Wed, Feb 26, 2025 at 06:58:11AM -0800, Christoph Hellwig wrote:
> On Mon, Feb 24, 2025 at 07:33:01PM -0500, Mike Snitzer wrote:
> > Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
> > it so nfs_release_folio() can skip calling nfs_wb_folio() from
> > kcompactd.
> >
> > Otherwise NFS can deadlock waiting for kcompactd enduced writeback
> > which recurses back to NFS (which triggers writeback to NFSD via
> > NFS loopback mount on the same host, NFSD blocks waiting for XFS's
> > call to __filemap_get_folio):
>
> Having a flag for a specific kernel thread feels wrong. I'm not an
> expert in this area, but as fast as I can tell the problem is that
> kcompactd should be calling into ->release_folio without __GFP_IO
> set.
We can easily remove PF_KCOMPACTD and current_is_kcompactd() if/when
more analysis and a cleaner fix emerges (from you, mm experts, etc).
I'm not saying you're wrong about GFP flags needing to be clamped by
kcompactd's call to ->release_folio, but it seems more finishing work
needed.
As you can tell from my patch, we already have kswapd specialization
with PF_KSWAPD and current_is_kswapd(). A consumer of large folios
knowing that it being called as kcompactd is useful, hence my fix for
deadlock seen with NFS loopback mounts on memory constrained systems.
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd
2025-02-26 15:44 ` Mike Snitzer
@ 2025-02-26 15:55 ` Mike Snitzer
0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2025-02-26 15:55 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrew Morton, Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
On Wed, Feb 26, 2025 at 10:44:15AM -0500, Mike Snitzer wrote:
> On Wed, Feb 26, 2025 at 06:58:11AM -0800, Christoph Hellwig wrote:
> > On Mon, Feb 24, 2025 at 07:33:01PM -0500, Mike Snitzer wrote:
> > > Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
> > > it so nfs_release_folio() can skip calling nfs_wb_folio() from
> > > kcompactd.
> > >
> > > Otherwise NFS can deadlock waiting for kcompactd enduced writeback
> > > which recurses back to NFS (which triggers writeback to NFSD via
> > > NFS loopback mount on the same host, NFSD blocks waiting for XFS's
> > > call to __filemap_get_folio):
> >
> > Having a flag for a specific kernel thread feels wrong. I'm not an
> > expert in this area, but as fast as I can tell the problem is that
> > kcompactd should be calling into ->release_folio without __GFP_IO
> > set.
>
> We can easily remove PF_KCOMPACTD and current_is_kcompactd() if/when
> more analysis and a cleaner fix emerges (from you, mm experts, etc).
> I'm not saying you're wrong about GFP flags needing to be clamped by
> kcompactd's call to ->release_folio, but it seems more finishing work
> needed.
>
> As you can tell from my patch, we already have kswapd specialization
> with PF_KSWAPD and current_is_kswapd(). A consumer of large folios
> knowing that it being called as kcompactd is useful, hence my fix for
> deadlock seen with NFS loopback mounts on memory constrained systems.
Also, as followup to my fix: Trond and I have started exploring giving
NFS the ability to do more meaningful work on behalf of kswapd and
kcompactd from ->release_folio (rather than simply punt, by returning
false, like your proposed GFP flag clamping would do), see:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=kernel-6.12/nfs-next&id=b8fd0d3864e4bd43eaca1045002e89bede8fd91f
(I'm still working on testing/refining this patch...)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd
2025-02-26 14:58 ` [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Christoph Hellwig
2025-02-26 15:44 ` Mike Snitzer
@ 2025-02-26 17:22 ` Trond Myklebust
2025-02-26 22:02 ` Andrew Morton
2 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2025-02-26 17:22 UTC (permalink / raw)
To: snitzer, hch; +Cc: linux-mm, linux-nfs, anna.schumaker, akpm
On Wed, 2025-02-26 at 06:58 -0800, Christoph Hellwig wrote:
> On Mon, Feb 24, 2025 at 07:33:01PM -0500, Mike Snitzer wrote:
> > Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check
> > for
> > it so nfs_release_folio() can skip calling nfs_wb_folio() from
> > kcompactd.
> >
> > Otherwise NFS can deadlock waiting for kcompactd enduced writeback
> > which recurses back to NFS (which triggers writeback to NFSD via
> > NFS loopback mount on the same host, NFSD blocks waiting for XFS's
> > call to __filemap_get_folio):
>
> Having a flag for a specific kernel thread feels wrong. I'm not an
> expert in this area, but as fast as I can tell the problem is that
> kcompactd should be calling into ->release_folio without __GFP_IO
> set.
I've had that conversation before with several mm folks before we added
the PF_KSWAPD flag, and they appeared to disagree.
However perhaps these daemons could set PF_MEMALLOC_NOFS before calling
into release_folio()?
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd
2025-02-26 14:58 ` [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Christoph Hellwig
2025-02-26 15:44 ` Mike Snitzer
2025-02-26 17:22 ` Trond Myklebust
@ 2025-02-26 22:02 ` Andrew Morton
2 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2025-02-26 22:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Trond Myklebust, Anna Schumaker, linux-nfs, linux-mm
On Wed, 26 Feb 2025 06:58:11 -0800 Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Feb 24, 2025 at 07:33:01PM -0500, Mike Snitzer wrote:
> > Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for
> > it so nfs_release_folio() can skip calling nfs_wb_folio() from
> > kcompactd.
> >
> > Otherwise NFS can deadlock waiting for kcompactd enduced writeback
> > which recurses back to NFS (which triggers writeback to NFSD via
> > NFS loopback mount on the same host, NFSD blocks waiting for XFS's
> > call to __filemap_get_folio):
>
> Having a flag for a specific kernel thread feels wrong.
Why do you think this?
We are getting a bit short of space in tsk->flags. Five PF__HOLE__'s remain.
For this patch and for kswapd I guess we could simply record the
task_struct*'s in a static array, something like
static struct task_struct nice_name[max nodes];
...
if (current == nice_name[my node])
...
It's something we can look at if we start getting really tight on PF_ space.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-02-26 22:02 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-25 0:33 [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Mike Snitzer
2025-02-25 2:20 ` [PATCH v2 for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback Mike Snitzer
2025-02-25 6:53 ` Andrew Morton
2025-02-25 17:02 ` Mike Snitzer
2025-02-25 20:19 ` [PATCH v3 " Mike Snitzer
2025-02-25 20:28 ` Mike Snitzer
2025-02-26 14:58 ` [PATCH for-akpm for-6.14-rcX] NFS: fix nfs_release_folio() to not call nfs_wb_folio() from kcompactd Christoph Hellwig
2025-02-26 15:44 ` Mike Snitzer
2025-02-26 15:55 ` Mike Snitzer
2025-02-26 17:22 ` Trond Myklebust
2025-02-26 22:02 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox