[PATCH V4 0/2] JFS: Implement migrate_folio for jfs_metapage

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V4 0/2] JFS: Implement migrate_folio for jfs_metapage_aops
@ 2025-04-22 11:40 Shivank Garg
  2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Shivank Garg @ 2025-04-22 11:40 UTC (permalink / raw)
  To: shaggy, akpm
  Cc: willy, shivankg, david, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

This patch addresses a warning that occurs during memory compaction due
to JFS's missing migrate_folio operation. The warning was introduced by
commit 7ee3647243e5 ("migrate: Remove call to ->writepage") which added
explicit warnings when filesystem don't implement migrate_folio.

The syzbot reported following [1]:
  jfs_metapage_aops does not implement migrate_folio
  WARNING: CPU: 1 PID: 5861 at mm/migrate.c:955 fallback_migrate_folio mm/migrate.c:953 [inline]
  WARNING: CPU: 1 PID: 5861 at mm/migrate.c:955 move_to_new_folio+0x70e/0x840 mm/migrate.c:1007
  Modules linked in:
  CPU: 1 UID: 0 PID: 5861 Comm: syz-executor280 Not tainted 6.15.0-rc1-next-20250411-syzkaller #0 PREEMPT(full) 
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
  RIP: 0010:fallback_migrate_folio mm/migrate.c:953 [inline]
  RIP: 0010:move_to_new_folio+0x70e/0x840 mm/migrate.c:1007

This implement metapage_migrate_folio which handles both single and multiple
metapages per page configurations.

[1]: https://syzkaller.appspot.com/bug?extid=8bb6fd945af4e0ad9299

Previous Versions:
V1/V2:
https://lore.kernel.org/all/20250413172356.561544-1-shivankg@amd.com
V3:
https://lore.kernel.org/all/20250417060630.197278-1-shivankg@amd.com

#syz test: https://github.com/AMDESE/linux-mm.git f17a3b8bc


Shivank Garg (2):
  mm: export folio_expected_refs for JFS migration handler
  jfs: implement migrate_folio for jfs_metapage_aops

 fs/jfs/jfs_metapage.c   | 94 +++++++++++++++++++++++++++++++++++++++++
 include/linux/migrate.h |  1 +
 mm/migrate.c            |  3 +-
 3 files changed, 97 insertions(+), 1 deletion(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-22 11:40 [PATCH V4 0/2] JFS: Implement migrate_folio for jfs_metapage_aops Shivank Garg
@ 2025-04-22 11:40 ` Shivank Garg
  2025-04-22 15:18   ` David Hildenbrand
                     ` (2 more replies)
  2025-04-22 11:40 ` [PATCH V4 2/2] jfs: implement migrate_folio for jfs_metapage_aops Shivank Garg
  2025-04-22 13:59 ` [syzbot] [mm?] WARNING in move_to_new_folio syzbot
  2 siblings, 3 replies; 16+ messages in thread
From: Shivank Garg @ 2025-04-22 11:40 UTC (permalink / raw)
  To: shaggy, akpm
  Cc: willy, shivankg, david, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

Rename the previously static folio_expected_refs() to clarify its
purpose and scope, making it an inline function
folio_migration_expected_refs() to calculate expected folio references
during migration. The function is only suitable for folios unmapped from
page tables.

Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 include/linux/migrate.h | 26 ++++++++++++++++++++++++++
 mm/migrate.c            | 22 ++++------------------
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index aaa2114498d6..083293a6d261 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -60,6 +60,32 @@ struct movable_operations {
 /* Defined in mm/debug.c: */
 extern const char *migrate_reason_names[MR_TYPES];
 
+/**
+ * folio_migrate_expected_refs - Count expected references for an unmapped folio.
+ * @mapping: The address space the folio belongs to.
+ * @folio: The folio to check.
+ *
+ * Calculate the expected reference count for a folio during migration.
+ * This function is only suitable for folios that are unmapped from page tables
+ * (i.e., no references from page table mappings: !folio_mapped()).
+ *
+ * Return: The expected reference count
+ */
+static inline int folio_migration_expected_refs(struct address_space *mapping,
+		struct folio *folio)
+{
+	int refs = 1;
+
+	if (!mapping)
+		return refs;
+
+	refs += folio_nr_pages(folio);
+	if (folio_test_private(folio))
+		refs++;
+
+	return refs;
+}
+
 #ifdef CONFIG_MIGRATION
 
 void putback_movable_pages(struct list_head *l);
diff --git a/mm/migrate.c b/mm/migrate.c
index 6e2488e5dbe4..6c785abce90e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -445,20 +445,6 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
 }
 #endif
 
-static int folio_expected_refs(struct address_space *mapping,
-		struct folio *folio)
-{
-	int refs = 1;
-	if (!mapping)
-		return refs;
-
-	refs += folio_nr_pages(folio);
-	if (folio_test_private(folio))
-		refs++;
-
-	return refs;
-}
-
 /*
  * Replace the folio in the mapping.
  *
@@ -601,7 +587,7 @@ static int __folio_migrate_mapping(struct address_space *mapping,
 int folio_migrate_mapping(struct address_space *mapping,
 		struct folio *newfolio, struct folio *folio, int extra_count)
 {
-	int expected_count = folio_expected_refs(mapping, folio) + extra_count;
+	int expected_count = folio_migration_expected_refs(mapping, folio) + extra_count;
 
 	if (folio_ref_count(folio) != expected_count)
 		return -EAGAIN;
@@ -618,7 +604,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
 				   struct folio *dst, struct folio *src)
 {
 	XA_STATE(xas, &mapping->i_pages, folio_index(src));
-	int rc, expected_count = folio_expected_refs(mapping, src);
+	int rc, expected_count = folio_migration_expected_refs(mapping, src);
 
 	if (folio_ref_count(src) != expected_count)
 		return -EAGAIN;
@@ -749,7 +735,7 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
 			   struct folio *src, void *src_private,
 			   enum migrate_mode mode)
 {
-	int rc, expected_count = folio_expected_refs(mapping, src);
+	int rc, expected_count = folio_migration_expected_refs(mapping, src);
 
 	/* Check whether src does not have extra refs before we do more work */
 	if (folio_ref_count(src) != expected_count)
@@ -837,7 +823,7 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 		return migrate_folio(mapping, dst, src, mode);
 
 	/* Check whether page does not have extra refs before we do more work */
-	expected_count = folio_expected_refs(mapping, src);
+	expected_count = folio_migration_expected_refs(mapping, src);
 	if (folio_ref_count(src) != expected_count)
 		return -EAGAIN;
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V4 2/2] jfs: implement migrate_folio for jfs_metapage_aops
  2025-04-22 11:40 [PATCH V4 0/2] JFS: Implement migrate_folio for jfs_metapage_aops Shivank Garg
  2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
@ 2025-04-22 11:40 ` Shivank Garg
  2025-04-22 15:23   ` David Hildenbrand
  2025-04-22 13:59 ` [syzbot] [mm?] WARNING in move_to_new_folio syzbot
  2 siblings, 1 reply; 16+ messages in thread
From: Shivank Garg @ 2025-04-22 11:40 UTC (permalink / raw)
  To: shaggy, akpm
  Cc: willy, shivankg, david, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

Add the missing migrate_folio operation to jfs_metapage_aops to fix
warnings during memory compaction. These warnings were introduced by
commit 7ee3647243e5 ("migrate: Remove call to ->writepage") which
added explicit warnings when filesystems don't implement migrate_folio.

System reports following warnings:
  jfs_metapage_aops does not implement migrate_folio
  WARNING: CPU: 0 PID: 6870 at mm/migrate.c:955 fallback_migrate_folio mm/migrate.c:953 [inline]
  WARNING: CPU: 0 PID: 6870 at mm/migrate.c:955 move_to_new_folio+0x70e/0x840 mm/migrate.c:1007

Implement metapage_migrate_folio which handles both single and multiple
metapages per page configurations.

Fixes: 35474d52c605 ("jfs: Convert metapage_writepage to metapage_write_folio")
Reported-by: syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67faff52.050a0220.379d84.001b.GAE@google.com
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 fs/jfs/jfs_metapage.c | 94 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index df575a873ec6..a12fbd92cc69 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -15,6 +15,7 @@
 #include <linux/mempool.h>
 #include <linux/seq_file.h>
 #include <linux/writeback.h>
+#include <linux/migrate.h>
 #include "jfs_incore.h"
 #include "jfs_superblock.h"
 #include "jfs_filsys.h"
@@ -151,6 +152,54 @@ static inline void dec_io(struct folio *folio, blk_status_t status,
 		handler(folio, anchor->status);
 }
 
+static int __metapage_migrate_folio(struct address_space *mapping, struct folio *dst,
+				    struct folio *src, enum migrate_mode mode)
+{
+	struct meta_anchor *src_anchor = src->private;
+	struct metapage *mps[MPS_PER_PAGE] = {0};
+	struct metapage *mp;
+	int i, rc;
+
+	for (i = 0; i < MPS_PER_PAGE; i++) {
+		mp = src_anchor->mp[i];
+		if (mp && metapage_locked(mp))
+			return -EAGAIN;
+	}
+
+	rc = filemap_migrate_folio(mapping, dst, src, mode);
+	if (rc != MIGRATEPAGE_SUCCESS)
+		return rc;
+
+	for (i = 0; i < MPS_PER_PAGE; i++) {
+		mp = src_anchor->mp[i];
+		if (!mp)
+			continue;
+		if (unlikely(insert_metapage(dst, mp))) {
+			/* If error, roll-back previosly inserted pages */
+			for (int j = 0 ; j < i; j++) {
+				if (mps[j])
+					remove_metapage(dst, mps[j]);
+			}
+			return -EAGAIN;
+		}
+		mps[i] = mp;
+	}
+
+	/* Update the metapage and remove it from src */
+	for (i = 0; i < MPS_PER_PAGE; i++) {
+		mp = mps[i];
+		if (mp) {
+			int page_offset = mp->data - folio_address(src);
+
+			mp->data = folio_address(dst) + page_offset;
+			mp->folio = dst;
+			remove_metapage(src, mp);
+		}
+	}
+
+	return MIGRATEPAGE_SUCCESS;
+}
+
 #else
 static inline struct metapage *folio_to_mp(struct folio *folio, int offset)
 {
@@ -175,6 +224,32 @@ static inline void remove_metapage(struct folio *folio, struct metapage *mp)
 #define inc_io(folio) do {} while(0)
 #define dec_io(folio, status, handler) handler(folio, status)
 
+static int __metapage_migrate_folio(struct address_space *mapping, struct folio *dst,
+				    struct folio *src, enum migrate_mode mode)
+{
+	struct metapage *mp;
+	int page_offset;
+	int rc;
+
+	mp = folio_to_mp(src, 0);
+	if (mp && metapage_locked(mp))
+		return -EAGAIN;
+
+	rc = filemap_migrate_folio(mapping, dst, src, mode);
+	if (rc != MIGRATEPAGE_SUCCESS)
+		return rc;
+
+	if (unlikely(insert_metapage(dst, mp)))
+		return -EAGAIN;
+
+	page_offset = mp->data - folio_address(src);
+	mp->data = folio_address(dst) + page_offset;
+	mp->folio = dst;
+	remove_metapage(src, mp);
+
+	return MIGRATEPAGE_SUCCESS;
+}
+
 #endif
 
 static inline struct metapage *alloc_metapage(gfp_t gfp_mask)
@@ -554,6 +629,24 @@ static bool metapage_release_folio(struct folio *folio, gfp_t gfp_mask)
 	return ret;
 }
 
+/**
+ * metapage_migrate_folio - Migration function for JFS metapages
+ */
+static int metapage_migrate_folio(struct address_space *mapping, struct folio *dst,
+				  struct folio *src, enum migrate_mode mode)
+{
+	int expected_count;
+
+	if (!src->private)
+		return filemap_migrate_folio(mapping, dst, src, mode);
+
+	/* Check whether page does not have extra refs before we do more work */
+	expected_count = folio_migration_expected_refs(mapping, src);
+	if (folio_ref_count(src) != expected_count)
+		return -EAGAIN;
+	return __metapage_migrate_folio(mapping, dst, src, mode);
+}
+
 static void metapage_invalidate_folio(struct folio *folio, size_t offset,
 				    size_t length)
 {
@@ -570,6 +663,7 @@ const struct address_space_operations jfs_metapage_aops = {
 	.release_folio	= metapage_release_folio,
 	.invalidate_folio = metapage_invalidate_folio,
 	.dirty_folio	= filemap_dirty_folio,
+	.migrate_folio	= metapage_migrate_folio,
 };
 
 struct metapage *__get_metapage(struct inode *inode, unsigned long lblock,
-- 
2.34.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [syzbot] [mm?] WARNING in move_to_new_folio
  2025-04-22 11:40 [PATCH V4 0/2] JFS: Implement migrate_folio for jfs_metapage_aops Shivank Garg
  2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
  2025-04-22 11:40 ` [PATCH V4 2/2] jfs: implement migrate_folio for jfs_metapage_aops Shivank Garg
@ 2025-04-22 13:59 ` syzbot
  2 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-04-22 13:59 UTC (permalink / raw)
  To: akpm, apopple, david, donettom, jane.chu, jfs-discussion,
	linux-kernel, linux-mm, shaggy, shivankg, syzkaller-bugs,
	wangkefeng.wang, willy, ziy

Hello,

syzbot tried to test the proposed patch but the build/boot failed:

 legacy bootconsole [earlyser0] disabled
[    2.023888][    T0] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    2.025496][    T0] ... MAX_LOCKDEP_SUBCLASSES:  8
[    2.026234][    T0] ... MAX_LOCK_DEPTH:          48
[    2.027545][    T0] ... MAX_LOCKDEP_KEYS:        8192
[    2.028529][    T0] ... CLASHASH_SIZE:          4096
[    2.045554][    T0] ... MAX_LOCKDEP_ENTRIES:     1048576
[    2.047369][    T0] ... MAX_LOCKDEP_CHAINS:      1048576
[    2.049406][    T0] ... CHAINHASH_SIZE:          524288
[    2.051376][    T0]  memory used by lock dependency info: 106625 kB
[    2.053638][    T0]  memory used for stack traces: 8320 kB
[    2.055715][    T0]  per task-struct memory footprint: 1920 bytes
[    2.058123][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    2.062523][    T0] ACPI: Core revision 20241212
[    2.064822][    T0] APIC: Switch to symmetric I/O mode setup
[    2.066614][    T0] x2apic enabled
[    2.071611][    T0] APIC: Switched APIC routing to: physical x2apic
[    2.080202][    T0] ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
[    2.082132][    T0] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fb63109b96, max_idle_ns: 440795265316 ns
[    2.094615][    T0] Calibrating delay loop (skipped) preset value.. 4399.99 BogoMIPS (lpj=21999980)
[    2.097159][    T0] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[    2.098499][    T0] Last level dTLB entries: 4KB 64, 2MB 32, 4MB 32, 1GB 4
[    2.099918][    T0] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    2.102698][    T0] Spectre V2 : Spectre BHI mitigation: SW BHB clearing on syscall and VM exit
[    2.104484][    T0] Spectre V2 : Mitigation: IBRS
[    2.105540][    T0] Spectre V2 : Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT
[    2.109513][    T0] RETBleed: Mitigation: IBRS
[    2.111317][    T0] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    2.113299][    T0] Spectre V2 : User space: Mitigation: STIBP via prctl
[    2.114649][    T0] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
[    2.117580][    T0] MDS: Mitigation: Clear CPU buffers
[    2.118495][    T0] TAA: Mitigation: Clear CPU buffers
[    2.120451][    T0] MMIO Stale Data: Vulnerable: Clear CPU buffers attempted, no microcode
[    2.123857][    T0] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    2.124558][    T0] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    2.126861][    T0] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    2.129671][    T0] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    2.134556][    T0] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    2.558992][    T0] Freeing SMP alternatives memory: 132K
[    2.560524][    T0] pid_max: default: 32768 minimum: 301
[    2.562227][    T0] LSM: initializing lsm=lockdown,capability,landlock,yama,safesetid,tomoyo,apparmor,bpf,ima,evm
[    2.565143][    T0] landlock: Up and running.
[    2.566184][    T0] Yama: becoming mindful.
[    2.567752][    T0] TOMOYO Linux initialized
[    2.570977][    T0] AppArmor: AppArmor initialized
[    2.575204][    T0] LSM support for eBPF active
[    2.584242][    T0] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, vmalloc hugepage)
[    2.588245][    T0] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc hugepage)
[    2.594675][    T0] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, vmalloc)
[    2.598390][    T0] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, vmalloc)
[    2.606995][    T0] Running RCU synchronous self tests
[    2.609248][    T0] Running RCU synchronous self tests
[    2.733732][    T1] smpboot: CPU0: Intel(R) Xeon(R) CPU @ 2.20GHz (family: 0x6, model: 0x4f, stepping: 0x0)
[    2.734535][    T9] ------------[ cut here ]------------
[    2.734535][    T9] WARNING: CPU: 0 PID: 9 at arch/x86/mm/tlb.c:919 switch_mm_irqs_off+0x686/0x810
[    2.734535][    T9] Modules linked in:
[    2.734535][    T9] CPU:serialport: Connected to syzkaller.us-central1-c.ci-upstream-linux-next-kasan-gce-root-test-job-parallel-0 port 1 (session ID: 8b8c42debbe22e4907102c6ca34b474bf814b5a2c14fd2264a6785197a17a5b8, active connections: 1).
 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.15.0-rc2-next-20250417-syzkaller-04782-gf17a3b8bcabd #0 PREEMPT(full) 
[    2.734535][    T9] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
[    2.734535][    T9] Workqueue: events once_deferred
[    2.734535][    T9] RIP: 0010:switch_mm_irqs_off+0x686/0x810
[    2.734535][    T9] Code: 90 41 f7 c5 00 08 00 00 0f 84 ee fa ff ff 90 0f 0b 90 e9 e5 fa ff ff 90 0f 0b 90 e9 76 fe ff ff 90 0f 0b 90 e9 cc fb ff ff 90 <0f> 0b 90 4d 39 f4 0f 85 eb fb ff ff e9 31 fc ff ff 90 0f 0b 90 e9
[    2.734535][    T9] RSP: 0000:ffffc900000e75c0 EFLAGS: 00010056
[    2.734535][    T9] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff816f9cdd
[    2.734535][    T9] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88801b070940
[    2.734535][    T9] RBP: ffffc900000e7690 R08: ffff88801b070947 R09: 1ffff1100360e128
[    2.734535][    T9] R10: dffffc0000000000 R11: ffffed100360e129 R12: ffffffff8ee49240
[    2.734535][    T9] R13: ffff88801b070940 R14: ffffffff8ee49240 R15: 0000000000000000
[    2.734535][    T9] FS:  0000000000000000(0000) GS:ffff888124fa0000(0000) knlGS:0000000000000000
[    2.734535][    T9] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.734535][    T9] CR2: ffff88823ffff000 CR3: 000000001b078000 CR4: 00000000003506f0
[    2.734535][    T9] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.734535][    T9] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    2.734535][    T9] Call Trace:
[    2.734535][    T9]  <TASK>
[    2.734535][    T9]  ? __pfx_switch_mm_irqs_off+0x10/0x10
[    2.734535][    T9]  unuse_temporary_mm+0x160/0x270
[    2.734535][    T9]  ? __pfx_unuse_temporary_mm+0x10/0x10
[    2.734535][    T9]  ? __text_poke+0x6bb/0xb40
[    2.734535][    T9]  ? __text_poke+0x6bb/0xb40
[    2.734535][    T9]  ? serial8250_isa_init_ports+0x6b/0x110
[    2.734535][    T9]  __text_poke+0x7b6/0xb40
[    2.734535][    T9]  ? serial8250_isa_init_ports+0x6b/0x110
[    2.734535][    T9]  ? __pfx_text_poke_memcpy+0x10/0x10
[    2.734535][    T9]  ? __pfx___text_poke+0x10/0x10
[    2.734535][    T9]  ? __pfx___mutex_trylock_common+0x10/0x10
[    2.734535][    T9]  ? __pfx___might_resched+0x10/0x10
[    2.734535][    T9]  ? rcu_is_watching+0x15/0xb0
[    2.734535][    T9]  smp_text_poke_batch_finish+0x3e7/0x12c0
[    2.734535][    T9]  ? arch_jump_label_transform_apply+0x17/0x30
[    2.734535][    T9]  ? __pfx___mutex_lock+0x10/0x10
[    2.734535][    T9]  ? __pfx_smp_text_poke_batch_finish+0x10/0x10
[    2.734535][    T9]  ? arch_jump_label_transform_queue+0x9b/0x100
[    2.734535][    T9]  ? __jump_label_update+0x387/0x3b0
[    2.734535][    T9]  arch_jump_label_transform_apply+0x1c/0x30
[    2.734535][    T9]  static_key_disable_cpuslocked+0xd2/0x1c0
[    2.734535][    T9]  static_key_disable+0x1a/0x20
[    2.734535][    T9]  once_deferred+0x70/0xb0
[    2.734535][    T9]  ? process_scheduled_works+0x9cb/0x18e0
[    2.734535][    T9]  process_scheduled_works+0xac3/0x18e0
[    2.734535][    T9]  ? __pfx_process_scheduled_works+0x10/0x10
[    2.734535][    T9]  ? assign_work+0x367/0x3d0
[    2.734535][    T9]  worker_thread+0x870/0xd50
[    2.734535][    T9]  ? __kthread_parkme+0x1a8/0x200
[    2.734535][    T9]  ? __pfx_worker_thread+0x10/0x10
[    2.734535][    T9]  kthread+0x7b7/0x940
[    2.734535][    T9]  ? __pfx_worker_thread+0x10/0x10
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ? _raw_spin_unlock_irq+0x23/0x50
[    2.734535][    T9]  ? lockdep_hardirqs_on+0x9d/0x150
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ret_from_fork+0x4b/0x80
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ret_from_fork_asm+0x1a/0x30
[    2.734535][    T9]  </TASK>
[    2.734535][    T9] Kernel panic - not syncing: kernel: panic_on_warn set ...
[    2.734535][    T9] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.15.0-rc2-next-20250417-syzkaller-04782-gf17a3b8bcabd #0 PREEMPT(full) 
[    2.734535][    T9] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
[    2.734535][    T9] Workqueue: events once_deferred
[    2.734535][    T9] Call Trace:
[    2.734535][    T9]  <TASK>
[    2.734535][    T9]  dump_stack_lvl+0x241/0x360
[    2.734535][    T9]  ? __pfx_dump_stack_lvl+0x10/0x10
[    2.734535][    T9]  ? __pfx__printk+0x10/0x10
[    2.734535][    T9]  ? vscnprintf+0x5d/0x90
[    2.734535][    T9]  panic+0x349/0x880
[    2.734535][    T9]  ? __warn+0x174/0x4d0
[    2.734535][    T9]  ? __pfx_panic+0x10/0x10
[    2.734535][    T9]  ? ret_from_fork_asm+0x1a/0x30
[    2.734535][    T9]  __warn+0x344/0x4d0
[    2.734535][    T9]  ? switch_mm_irqs_off+0x686/0x810
[    2.734535][    T9]  report_bug+0x2b3/0x500
[    2.734535][    T9]  ? switch_mm_irqs_off+0x686/0x810
[    2.734535][    T9]  ? switch_mm_irqs_off+0x686/0x810
[    2.734535][    T9]  ? switch_mm_irqs_off+0x688/0x810
[    2.734535][    T9]  handle_bug+0x89/0x170
[    2.734535][    T9]  exc_invalid_op+0x1a/0x50
[    2.734535][    T9]  asm_exc_invalid_op+0x1a/0x20
[    2.734535][    T9] RIP: 0010:switch_mm_irqs_off+0x686/0x810
[    2.734535][    T9] Code: 90 41 f7 c5 00 08 00 00 0f 84 ee fa ff ff 90 0f 0b 90 e9 e5 fa ff ff 90 0f 0b 90 e9 76 fe ff ff 90 0f 0b 90 e9 cc fb ff ff 90 <0f> 0b 90 4d 39 f4 0f 85 eb fb ff ff e9 31 fc ff ff 90 0f 0b 90 e9
[    2.734535][    T9] RSP: 0000:ffffc900000e75c0 EFLAGS: 00010056
[    2.734535][    T9] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff816f9cdd
[    2.734535][    T9] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88801b070940
[    2.734535][    T9] RBP: ffffc900000e7690 R08: ffff88801b070947 R09: 1ffff1100360e128
[    2.734535][    T9] R10: dffffc0000000000 R11: ffffed100360e129 R12: ffffffff8ee49240
[    2.734535][    T9] R13: ffff88801b070940 R14: ffffffff8ee49240 R15: 0000000000000000
[    2.734535][    T9]  ? switch_mm_irqs_off+0x26d/0x810
[    2.734535][    T9]  ? __pfx_switch_mm_irqs_off+0x10/0x10
[    2.734535][    T9]  unuse_temporary_mm+0x160/0x270
[    2.734535][    T9]  ? __pfx_unuse_temporary_mm+0x10/0x10
[    2.734535][    T9]  ? __text_poke+0x6bb/0xb40
[    2.734535][    T9]  ? __text_poke+0x6bb/0xb40
[    2.734535][    T9]  ? serial8250_isa_init_ports+0x6b/0x110
[    2.734535][    T9]  __text_poke+0x7b6/0xb40
[    2.734535][    T9]  ? serial8250_isa_init_ports+0x6b/0x110
[    2.734535][    T9]  ? __pfx_text_poke_memcpy+0x10/0x10
[    2.734535][    T9]  ? __pfx___text_poke+0x10/0x10
[    2.734535][    T9]  ? __pfx___mutex_trylock_common+0x10/0x10
[    2.734535][    T9]  ? __pfx___might_resched+0x10/0x10
[    2.734535][    T9]  ? rcu_is_watching+0x15/0xb0
[    2.734535][    T9]  smp_text_poke_batch_finish+0x3e7/0x12c0
[    2.734535][    T9]  ? arch_jump_label_transform_apply+0x17/0x30
[    2.734535][    T9]  ? __pfx___mutex_lock+0x10/0x10
[    2.734535][    T9]  ? __pfx_smp_text_poke_batch_finish+0x10/0x10
[    2.734535][    T9]  ? arch_jump_label_transform_queue+0x9b/0x100
[    2.734535][    T9]  ? __jump_label_update+0x387/0x3b0
[    2.734535][    T9]  arch_jump_label_transform_apply+0x1c/0x30
[    2.734535][    T9]  static_key_disable_cpuslocked+0xd2/0x1c0
[    2.734535][    T9]  static_key_disable+0x1a/0x20
[    2.734535][    T9]  once_deferred+0x70/0xb0
[    2.734535][    T9]  ? process_scheduled_works+0x9cb/0x18e0
[    2.734535][    T9]  process_scheduled_works+0xac3/0x18e0
[    2.734535][    T9]  ? __pfx_process_scheduled_works+0x10/0x10
[    2.734535][    T9]  ? assign_work+0x367/0x3d0
[    2.734535][    T9]  worker_thread+0x870/0xd50
[    2.734535][    T9]  ? __kthread_parkme+0x1a8/0x200
[    2.734535][    T9]  ? __pfx_worker_thread+0x10/0x10
[    2.734535][    T9]  kthread+0x7b7/0x940
[    2.734535][    T9]  ? __pfx_worker_thread+0x10/0x10
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ? _raw_spin_unlock_irq+0x23/0x50
[    2.734535][    T9]  ? lockdep_hardirqs_on+0x9d/0x150
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ret_from_fork+0x4b/0x80
[    2.734535][    T9]  ? __pfx_kthread+0x10/0x10
[    2.734535][    T9]  ret_from_fork_asm+0x1a/0x30
[    2.734535][    T9]  </TASK>
[    2.734535][    T9] Rebooting in 86400 seconds..


syzkaller build log:
go env (err=<nil>)
GO111MODULE='auto'
GOARCH='amd64'
GOBIN=''
GOCACHE='/syzkaller/.cache/go-build'
GOENV='/syzkaller/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/syzkaller/jobs-2/linux/gopath/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/syzkaller/jobs-2/linux/gopath'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/syzkaller/jobs-2/linux/gopath/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.7.linux-amd64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/syzkaller/jobs-2/linux/gopath/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.7.linux-amd64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.7'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/syzkaller/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/syzkaller/jobs-2/linux/gopath/src/github.com/google/syzkaller/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1748814211=/tmp/go-build -gno-record-gcc-switches'

git status (err=<nil>)
HEAD detached at 0bd6db4180
nothing to commit, working tree clean


tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:31: run command via tools/syz-env for best compatibility, see:
Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
go list -f '{{.Stale}}' ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen
make .descriptions
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:31: run command via tools/syz-env for best compatibility, see:
Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
bin/syz-sysgen
touch .descriptions
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=0bd6db418098e2d98a2edf948b41410d3d9f9e70 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20250411-130225'" -o ./bin/linux_amd64/syz-execprog github.com/google/syzkaller/tools/syz-execprog
mkdir -p ./bin/linux_amd64
g++ -o ./bin/linux_amd64/syz-executor executor/executor.cc \
	-m64 -O2 -pthread -Wall -Werror -Wparentheses -Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow -Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable -Wno-unused-command-line-argument -static-pie -std=c++17 -I. -Iexecutor/_include   -DGOOS_linux=1 -DGOARCH_amd64=1 \
	-DHOSTGOOS_linux=1 -DGIT_REVISION=\"0bd6db418098e2d98a2edf948b41410d3d9f9e70\"
/usr/bin/ld: /tmp/ccfRLisG.o: in function `Connection::Connect(char const*, char const*)':
executor.cc:(.text._ZN10Connection7ConnectEPKcS1_[_ZN10Connection7ConnectEPKcS1_]+0x104): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking


Error text is too large and was truncated, full error text is at:
https://syzkaller.appspot.com/x/error.txt?x=1143c568580000


Tested on:

commit:         f17a3b8b jfs: implement migrate_folio for jfs_metapage..
git tree:       https://github.com/AMDESE/linux-mm.git
kernel config:  https://syzkaller.appspot.com/x/.config?x=796b05042c1188b
dashboard link: https://syzkaller.appspot.com/bug?extid=8bb6fd945af4e0ad9299
compiler:       Debian clang version 15.0.6, Debian LLD 15.0.6

Note: no patches were applied.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
@ 2025-04-22 15:18   ` David Hildenbrand
  2025-04-22 15:22   ` Zi Yan
  2025-04-22 23:41   ` Andrew Morton
  2 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-04-22 15:18 UTC (permalink / raw)
  To: Shivank Garg, shaggy, akpm
  Cc: willy, wangkefeng.wang, jane.chu, ziy, donettom, apopple,
	jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On 22.04.25 13:40, Shivank Garg wrote:
> Rename the previously static folio_expected_refs() to clarify its
> purpose and scope, making it an inline function
> folio_migration_expected_refs() to calculate expected folio references
> during migration. The function is only suitable for folios unmapped from
> page tables.
> 
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---

Thanks!

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
  2025-04-22 15:18   ` David Hildenbrand
@ 2025-04-22 15:22   ` Zi Yan
  2025-04-22 23:41   ` Andrew Morton
  2 siblings, 0 replies; 16+ messages in thread
From: Zi Yan @ 2025-04-22 15:22 UTC (permalink / raw)
  To: Shivank Garg
  Cc: shaggy, akpm, willy, david, wangkefeng.wang, jane.chu, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On 22 Apr 2025, at 7:40, Shivank Garg wrote:

> Rename the previously static folio_expected_refs() to clarify its
> purpose and scope, making it an inline function
> folio_migration_expected_refs() to calculate expected folio references
> during migration. The function is only suitable for folios unmapped from
> page tables.
>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>  include/linux/migrate.h | 26 ++++++++++++++++++++++++++
>  mm/migrate.c            | 22 ++++------------------
>  2 files changed, 30 insertions(+), 18 deletions(-)
>

Acked-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 2/2] jfs: implement migrate_folio for jfs_metapage_aops
  2025-04-22 11:40 ` [PATCH V4 2/2] jfs: implement migrate_folio for jfs_metapage_aops Shivank Garg
@ 2025-04-22 15:23   ` David Hildenbrand
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-04-22 15:23 UTC (permalink / raw)
  To: Shivank Garg, shaggy, akpm
  Cc: willy, wangkefeng.wang, jane.chu, ziy, donettom, apopple,
	jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On 22.04.25 13:40, Shivank Garg wrote:
> Add the missing migrate_folio operation to jfs_metapage_aops to fix
> warnings during memory compaction. These warnings were introduced by
> commit 7ee3647243e5 ("migrate: Remove call to ->writepage") which
> added explicit warnings when filesystems don't implement migrate_folio.
> 
> System reports following warnings:
>    jfs_metapage_aops does not implement migrate_folio
>    WARNING: CPU: 0 PID: 6870 at mm/migrate.c:955 fallback_migrate_folio mm/migrate.c:953 [inline]
>    WARNING: CPU: 0 PID: 6870 at mm/migrate.c:955 move_to_new_folio+0x70e/0x840 mm/migrate.c:1007
> 
> Implement metapage_migrate_folio which handles both single and multiple
> metapages per page configurations.
> 
> Fixes: 35474d52c605 ("jfs: Convert metapage_writepage to metapage_write_folio")
> Reported-by: syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/67faff52.050a0220.379d84.001b.GAE@google.com
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>   fs/jfs/jfs_metapage.c | 94 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 94 insertions(+)
> 
> diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
> index df575a873ec6..a12fbd92cc69 100644
> --- a/fs/jfs/jfs_metapage.c
> +++ b/fs/jfs/jfs_metapage.c
> @@ -15,6 +15,7 @@
>   #include <linux/mempool.h>
>   #include <linux/seq_file.h>
>   #include <linux/writeback.h>
> +#include <linux/migrate.h>
>   #include "jfs_incore.h"
>   #include "jfs_superblock.h"
>   #include "jfs_filsys.h"
> @@ -151,6 +152,54 @@ static inline void dec_io(struct folio *folio, blk_status_t status,
>   		handler(folio, anchor->status);
>   }
>   
> +static int __metapage_migrate_folio(struct address_space *mapping, struct folio *dst,
> +				    struct folio *src, enum migrate_mode mode)
> +{
> +	struct meta_anchor *src_anchor = src->private;
> +	struct metapage *mps[MPS_PER_PAGE] = {0};
> +	struct metapage *mp;
> +	int i, rc;
> +
> +	for (i = 0; i < MPS_PER_PAGE; i++) {
> +		mp = src_anchor->mp[i];
> +		if (mp && metapage_locked(mp))
> +			return -EAGAIN;
> +	}
> +
> +	rc = filemap_migrate_folio(mapping, dst, src, mode);
> +	if (rc != MIGRATEPAGE_SUCCESS)
> +		return rc;
> +
> +	for (i = 0; i < MPS_PER_PAGE; i++) {
> +		mp = src_anchor->mp[i];
> +		if (!mp)
> +			continue;
> +		if (unlikely(insert_metapage(dst, mp))) {
> +			/* If error, roll-back previosly inserted pages */
> +			for (int j = 0 ; j < i; j++) {
> +				if (mps[j])
> +					remove_metapage(dst, mps[j]);
> +			}
> +			return -EAGAIN;
> +		}
> +		mps[i] = mp;
> +	}
> +
> +	/* Update the metapage and remove it from src */
> +	for (i = 0; i < MPS_PER_PAGE; i++) {
> +		mp = mps[i];
> +		if (mp) {
> +			int page_offset = mp->data - folio_address(src);
> +
> +			mp->data = folio_address(dst) + page_offset;
> +			mp->folio = dst;
> +			remove_metapage(src, mp);
> +		}
> +	}
> +
> +	return MIGRATEPAGE_SUCCESS;
> +}
> +
>   #else
>   static inline struct metapage *folio_to_mp(struct folio *folio, int offset)
>   {
> @@ -175,6 +224,32 @@ static inline void remove_metapage(struct folio *folio, struct metapage *mp)
>   #define inc_io(folio) do {} while(0)
>   #define dec_io(folio, status, handler) handler(folio, status)
>   
> +static int __metapage_migrate_folio(struct address_space *mapping, struct folio *dst,
> +				    struct folio *src, enum migrate_mode mode)
> +{
> +	struct metapage *mp;
> +	int page_offset;
> +	int rc;
> +
> +	mp = folio_to_mp(src, 0);
> +	if (mp && metapage_locked(mp))
> +		return -EAGAIN;
> +
> +	rc = filemap_migrate_folio(mapping, dst, src, mode);
> +	if (rc != MIGRATEPAGE_SUCCESS)
> +		return rc;
> +
> +	if (unlikely(insert_metapage(dst, mp)))
> +		return -EAGAIN;
> +
> +	page_offset = mp->data - folio_address(src);
> +	mp->data = folio_address(dst) + page_offset;
> +	mp->folio = dst;
> +	remove_metapage(src, mp);
> +
> +	return MIGRATEPAGE_SUCCESS;
> +}
> +
>   #endif
>   
>   static inline struct metapage *alloc_metapage(gfp_t gfp_mask)
> @@ -554,6 +629,24 @@ static bool metapage_release_folio(struct folio *folio, gfp_t gfp_mask)
>   	return ret;
>   }
>   
> +/**
> + * metapage_migrate_folio - Migration function for JFS metapages
> + */
> +static int metapage_migrate_folio(struct address_space *mapping, struct folio *dst,
> +				  struct folio *src, enum migrate_mode mode)
> +{
> +	int expected_count;
> +
> +	if (!src->private)
> +		return filemap_migrate_folio(mapping, dst, src, mode);
> +
> +	/* Check whether page does not have extra refs before we do more work */
> +	expected_count = folio_migration_expected_refs(mapping, src);
> +	if (folio_ref_count(src) != expected_count)

Probably no need for the temporary variable.

Hm, makes me wonder if it should be called 
folio_migration_expected_ref_count() ... :)

Bit it's even longer, whatever you think is best.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
  2025-04-22 15:18   ` David Hildenbrand
  2025-04-22 15:22   ` Zi Yan
@ 2025-04-22 23:41   ` Andrew Morton
  2025-04-23  0:36     ` Matthew Wilcox
  2 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2025-04-22 23:41 UTC (permalink / raw)
  To: Shivank Garg
  Cc: shaggy, willy, david, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On Tue, 22 Apr 2025 11:40:03 +0000 Shivank Garg <shivankg@amd.com> wrote:

> Rename the previously static folio_expected_refs() to clarify its
> purpose and scope, making it an inline function
> folio_migration_expected_refs() to calculate expected folio references
> during migration. The function is only suitable for folios unmapped from
> page tables.
> 
> ...
>
> +/**
> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.

"folio_migration_expected_refs"

It's concerning that one particular filesystem needs this - one
suspects that it is doing something wrong, or that the present API
offerings were misdesigned.  It would be helpful if the changelogs were
to explain what is special about JFS.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-22 23:41   ` Andrew Morton
@ 2025-04-23  0:36     ` Matthew Wilcox
  2025-04-23  7:22       ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2025-04-23  0:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shivank Garg, shaggy, david, wangkefeng.wang, jane.chu, ziy,
	donettom, apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
> > +/**
> > + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
> 
> "folio_migration_expected_refs"

Please run make W=1 fs/jfs/ in order to run kernel-doc on this file.
It'll flag this kind of error.

> It's concerning that one particular filesystem needs this - one
> suspects that it is doing something wrong, or that the present API
> offerings were misdesigned.  It would be helpful if the changelogs were
> to explain what is special about JFS.

It doesn't surprise me at all.  Almost no filesystem implements its own
migrate_folio operation.  Without going into too much detail, almost
all filesystems can use filemap_migrate_folio(), buffer_migrate_folio()
or buffer_migrate_folio_norefs().  So this is not an indication that
jfs is doing anything wrong (except maybe it's misdesigned in that the
per-folio metadata caches the address of the folio, but changing that
seems very much too much work to ask someone to do).

What I do wonder is whether we want to have such a specialised
function existing.  We have can_split_folio() in huge_memory.c
which is somewhat more comprehensive and doesn't require the folio to be
unmapped first.

I currently lack the capacity to write pseudo-code illustrating what I
mean, but I'll have a try tomorrow.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-23  0:36     ` Matthew Wilcox
@ 2025-04-23  7:22       ` David Hildenbrand
  2025-04-23  7:25         ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2025-04-23  7:22 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton
  Cc: Shivank Garg, shaggy, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On 23.04.25 02:36, Matthew Wilcox wrote:
> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>> +/**
>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>
>> "folio_migration_expected_refs"
> 
> Please run make W=1 fs/jfs/ in order to run kernel-doc on this file.
> It'll flag this kind of error.
> 
>> It's concerning that one particular filesystem needs this - one
>> suspects that it is doing something wrong, or that the present API
>> offerings were misdesigned.  It would be helpful if the changelogs were
>> to explain what is special about JFS.
> 
> It doesn't surprise me at all.  Almost no filesystem implements its own
> migrate_folio operation.  Without going into too much detail, almost
> all filesystems can use filemap_migrate_folio(), buffer_migrate_folio()
> or buffer_migrate_folio_norefs().  So this is not an indication that
> jfs is doing anything wrong (except maybe it's misdesigned in that the
> per-folio metadata caches the address of the folio, but changing that
> seems very much too much work to ask someone to do).
> 
> What I do wonder is whether we want to have such a specialised
> function existing.  We have can_split_folio() in huge_memory.c
> which is somewhat more comprehensive and doesn't require the folio to be
> unmapped first.

I was debating with myself whether we should do the usual "refs from 
->private, refs from page table mappings" .. dance, and look up the 
mapping from the folio instead of passing it in.

I concluded that for this (migration) purpose the function is good 
enough as it is: if abused in wrong context (e.g., still ->private, 
still page table mappings), it would not fake that there are no 
unexpected references.

Because references from ->private and page tables would be unexpected at 
this point.

So I'm fine with this.

A more generic function might be helpful, but in general it is more 
prone to races (e.g., page table mappings concurrently going away), so 
it gets trickier to document that properly.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-23  7:22       ` David Hildenbrand
@ 2025-04-23  7:25         ` David Hildenbrand
  2025-04-24  3:19           ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2025-04-23  7:25 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton
  Cc: Shivank Garg, shaggy, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On 23.04.25 09:22, David Hildenbrand wrote:
> On 23.04.25 02:36, Matthew Wilcox wrote:
>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>> +/**
>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>
>>> "folio_migration_expected_refs"
>>
>> Please run make W=1 fs/jfs/ in order to run kernel-doc on this file.
>> It'll flag this kind of error.
>>
>>> It's concerning that one particular filesystem needs this - one
>>> suspects that it is doing something wrong, or that the present API
>>> offerings were misdesigned.  It would be helpful if the changelogs were
>>> to explain what is special about JFS.
>>
>> It doesn't surprise me at all.  Almost no filesystem implements its own
>> migrate_folio operation.  Without going into too much detail, almost
>> all filesystems can use filemap_migrate_folio(), buffer_migrate_folio()
>> or buffer_migrate_folio_norefs().  So this is not an indication that
>> jfs is doing anything wrong (except maybe it's misdesigned in that the
>> per-folio metadata caches the address of the folio, but changing that
>> seems very much too much work to ask someone to do).
>>
>> What I do wonder is whether we want to have such a specialised
>> function existing.  We have can_split_folio() in huge_memory.c
>> which is somewhat more comprehensive and doesn't require the folio to be
>> unmapped first.
> 
> I was debating with myself whether we should do the usual "refs from
> ->private, refs from page table mappings" .. dance, and look up the
> mapping from the folio instead of passing it in.
> 
> I concluded that for this (migration) purpose the function is good
> enough as it is: if abused in wrong context (e.g., still ->private,
> still page table mappings), it would not fake that there are no
> unexpected references.

Sorry, I forgot that we still care about the reference from ->private 
here. We expect the folio to be unmapped.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-23  7:25         ` David Hildenbrand
@ 2025-04-24  3:19           ` Matthew Wilcox
  2025-04-24 11:57             ` Shivank Garg
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2025-04-24  3:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Shivank Garg, shaggy, wangkefeng.wang, jane.chu,
	ziy, donettom, apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
> On 23.04.25 09:22, David Hildenbrand wrote:
> > On 23.04.25 02:36, Matthew Wilcox wrote:
> > > On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
> > > > > +/**
> > > > > + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
> > > > 
> > > > "folio_migration_expected_refs"
> > > 
> > > What I do wonder is whether we want to have such a specialised
> > > function existing.  We have can_split_folio() in huge_memory.c
> > > which is somewhat more comprehensive and doesn't require the folio to be
> > > unmapped first.
> > 
> > I was debating with myself whether we should do the usual "refs from
> > ->private, refs from page table mappings" .. dance, and look up the
> > mapping from the folio instead of passing it in.
> > 
> > I concluded that for this (migration) purpose the function is good
> > enough as it is: if abused in wrong context (e.g., still ->private,
> > still page table mappings), it would not fake that there are no
> > unexpected references.
> 
> Sorry, I forgot that we still care about the reference from ->private here.
> We expect the folio to be unmapped.

Right, so just adding in folio_mapcount() will be a no-op for migration,
but enable its reuse by can_split_folio().  Maybe.  Anyway, the way I
explain page refocunts to people (and I need to put this in a document
somewhere):

There are three types of contribution to the refcount:

 - Expected.  These are deducible from the folio itself, and they're all
   findable.  You need to figure out what the expected number of
   references are to a folio if you're going to try to freeze it.
   These can be references from the mapcount, the page cache, the swap
   cache, the private data, your call chain.
 - Temporary.  Someone else has found the folio somehow; perhaps through
   the page cache, or by calling GUP or something.  They mean you can't
   freeze the folio because you don't know who has the reference or how
   long they might hold it for.
 - Spurious.  This is like a temporary reference, but worse because if
   you read the code, there should be no way for there to be any temporary
   references to the folio.  Someone's found a stale pointer to this
   folio and has bumped the reference count while they check that the
   folio they have is the one they expected to find.  They're going
   to find out that the pointer they followed is stale and put their
   refcount soon, but in the meantime you still can't freeze the folio.

So I don't love the idea of having a function with the word "expected"
in the name that returns a value which doesn't take into account all
the potential contributors to the expected value.  And sure we can keep
adding qualifiers to the function name to indicate how it is to be used,
but at some point I think we should say "It's OK for this to be a little
less efficient so we can understand what it means".


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-24  3:19           ` Matthew Wilcox
@ 2025-04-24 11:57             ` Shivank Garg
  2025-04-25  7:47               ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Shivank Garg @ 2025-04-24 11:57 UTC (permalink / raw)
  To: Matthew Wilcox, David Hildenbrand
  Cc: Andrew Morton, shaggy, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

Hi All,

Thank you for reviewing my patch and providing feedback.

On 4/24/2025 8:49 AM, Matthew Wilcox wrote:
> On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
>> On 23.04.25 09:22, David Hildenbrand wrote:
>>> On 23.04.25 02:36, Matthew Wilcox wrote:
>>>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>>>> +/**
>>>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>>>
>>>>> "folio_migration_expected_refs"

Thank you for catching this, I'll fix it.

I wasn't previously aware of using make W=1 to build kernel-docs and
check for warnings - this is very useful information for me.

I'll add to changelog to better explain why this is needed for JFS.

>>>>
>>>> What I do wonder is whether we want to have such a specialised
>>>> function existing.  We have can_split_folio() in huge_memory.c
>>>> which is somewhat more comprehensive and doesn't require the folio to be
>>>> unmapped first.
>>>
>>> I was debating with myself whether we should do the usual "refs from
>>> ->private, refs from page table mappings" .. dance, and look up the
>>> mapping from the folio instead of passing it in.
>>>
>>> I concluded that for this (migration) purpose the function is good
>>> enough as it is: if abused in wrong context (e.g., still ->private,
>>> still page table mappings), it would not fake that there are no
>>> unexpected references.
>>
>> Sorry, I forgot that we still care about the reference from ->private here.
>> We expect the folio to be unmapped.
> 
> Right, so just adding in folio_mapcount() will be a no-op for migration,
> but enable its reuse by can_split_folio().  Maybe.  Anyway, the way I
> explain page refocunts to people (and I need to put this in a document
> somewhere):
> 
> There are three types of contribution to the refcount:
> 
>  - Expected.  These are deducible from the folio itself, and they're all
>    findable.  You need to figure out what the expected number of
>    references are to a folio if you're going to try to freeze it.
>    These can be references from the mapcount, the page cache, the swap
>    cache, the private data, your call chain.
>  - Temporary.  Someone else has found the folio somehow; perhaps through
>    the page cache, or by calling GUP or something.  They mean you can't
>    freeze the folio because you don't know who has the reference or how
>    long they might hold it for.
>  - Spurious.  This is like a temporary reference, but worse because if
>    you read the code, there should be no way for there to be any temporary
>    references to the folio.  Someone's found a stale pointer to this
>    folio and has bumped the reference count while they check that the
>    folio they have is the one they expected to find.  They're going
>    to find out that the pointer they followed is stale and put their
>    refcount soon, but in the meantime you still can't freeze the folio.
> 
> So I don't love the idea of having a function with the word "expected"
> in the name that returns a value which doesn't take into account all
> the potential contributors to the expected value.  And sure we can keep
> adding qualifiers to the function name to indicate how it is to be used,
> but at some point I think we should say "It's OK for this to be a little
> less efficient so we can understand what it means".

Thank you, Willy, for the detailed explanation about page reference counting.
This has helped me understand the concept much better.

Based on your explanation and the discussion, I'm summarizing the 2 approaches:

1. Rename folio_migration_expected_refs to folio_migration_expected_base_refs, to
to clarify it does not account for other potential contributors.
or folio_unmapped_base_refs?
2. Accounting all possible contributors to expected refs:
folio_expected_refs(mapping, folio)
{	
	int refs = 1;

	if (mapping) {
		if (folio_test_anon(folio))
			refs += folio_test_swapcache(folio) ?
				folio_nr_pages(folio) : 0;
		else
			refs += folio_nr_pages(folio);

		if (folio_test_private(folio))
			refs++;
	}
	refs += folio_mapcount(folio); // takes mapped folio into account and evaluate as no-op for unmapped folios during migration
	return refs;
}

Please let me know if this approach is acceptable or if you have
other suggestions for improvement.

Best Regards,
Shivank



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-24 11:57             ` Shivank Garg
@ 2025-04-25  7:47               ` David Hildenbrand
  2025-04-29 10:57                 ` Shivank Garg
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2025-04-25  7:47 UTC (permalink / raw)
  To: Shivank Garg, Matthew Wilcox
  Cc: Andrew Morton, shaggy, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On 24.04.25 13:57, Shivank Garg wrote:
> Hi All,
> 
> Thank you for reviewing my patch and providing feedback.
> 
> On 4/24/2025 8:49 AM, Matthew Wilcox wrote:
>> On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
>>> On 23.04.25 09:22, David Hildenbrand wrote:
>>>> On 23.04.25 02:36, Matthew Wilcox wrote:
>>>>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>>>>> +/**
>>>>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>>>>
>>>>>> "folio_migration_expected_refs"
> 
> Thank you for catching this, I'll fix it.
> 
> I wasn't previously aware of using make W=1 to build kernel-docs and
> check for warnings - this is very useful information for me.
> 
> I'll add to changelog to better explain why this is needed for JFS.
> 
>>>>>
>>>>> What I do wonder is whether we want to have such a specialised
>>>>> function existing.  We have can_split_folio() in huge_memory.c
>>>>> which is somewhat more comprehensive and doesn't require the folio to be
>>>>> unmapped first.
>>>>
>>>> I was debating with myself whether we should do the usual "refs from
>>>> ->private, refs from page table mappings" .. dance, and look up the
>>>> mapping from the folio instead of passing it in.
>>>>
>>>> I concluded that for this (migration) purpose the function is good
>>>> enough as it is: if abused in wrong context (e.g., still ->private,
>>>> still page table mappings), it would not fake that there are no
>>>> unexpected references.
>>>
>>> Sorry, I forgot that we still care about the reference from ->private here.
>>> We expect the folio to be unmapped.
>>
>> Right, so just adding in folio_mapcount() will be a no-op for migration,
>> but enable its reuse by can_split_folio().  Maybe.  Anyway, the way I
>> explain page refocunts to people (and I need to put this in a document
>> somewhere):
>>
>> There are three types of contribution to the refcount:
>>
>>   - Expected.  These are deducible from the folio itself, and they're all
>>     findable.  You need to figure out what the expected number of
>>     references are to a folio if you're going to try to freeze it.
>>     These can be references from the mapcount, the page cache, the swap
>>     cache, the private data, your call chain.
>>   - Temporary.  Someone else has found the folio somehow; perhaps through
>>     the page cache, or by calling GUP or something.  They mean you can't
>>     freeze the folio because you don't know who has the reference or how
>>     long they might hold it for.
>>   - Spurious.  This is like a temporary reference, but worse because if
>>     you read the code, there should be no way for there to be any temporary
>>     references to the folio.  Someone's found a stale pointer to this
>>     folio and has bumped the reference count while they check that the
>>     folio they have is the one they expected to find.  They're going
>>     to find out that the pointer they followed is stale and put their
>>     refcount soon, but in the meantime you still can't freeze the folio.
>>
>> So I don't love the idea of having a function with the word "expected"
>> in the name that returns a value which doesn't take into account all
>> the potential contributors to the expected value.  And sure we can keep
>> adding qualifiers to the function name to indicate how it is to be used,
>> but at some point I think we should say "It's OK for this to be a little
>> less efficient so we can understand what it means".
> 
> Thank you, Willy, for the detailed explanation about page reference counting.
> This has helped me understand the concept much better.
> 
> Based on your explanation and the discussion, I'm summarizing the 2 approaches:
> 
> 1. Rename folio_migration_expected_refs to folio_migration_expected_base_refs, to
> to clarify it does not account for other potential contributors.
> or folio_unmapped_base_refs?
> 2. Accounting all possible contributors to expected refs:
> folio_expected_refs(mapping, folio)
> {	
> 	int refs = 1;
> 
> 	if (mapping) {
> 		if (folio_test_anon(folio))
> 			refs += folio_test_swapcache(folio) ?
> 				folio_nr_pages(folio) : 0;
> 		else
> 			refs += folio_nr_pages(folio);
> 
> 		if (folio_test_private(folio))
> 			refs++;
> 	}
> 	refs += folio_mapcount(folio); // takes mapped folio into account and evaluate as no-op for unmapped folios during migration
> 	return refs;
> }
> 
> Please let me know if this approach is acceptable or if you have
> other suggestions for improvement.

A couple of points:

1) Can we name it folio_expected_ref_count()

2) Can we avoid passing in the mapping? Might not be expensive to look it
    up again. Below I avoid calling folio_mapping().

3) Can we delegate adding the additional reference to the caller? Will make it
    easier to use elsewhere (e.g., not additional reference because we are holding
    the page table lock).

4) Can we add kerneldoc, and in particular document the semantics?

Not sure if we should inline this function or put it into mm/utils.c


I'm thinking of something like (completely untested):

  
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a205020e2a58b..a0ad4ed9a75ff 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2112,6 +2112,61 @@ static inline bool folio_maybe_mapped_shared(struct folio *folio)
  	return folio_test_large_maybe_mapped_shared(folio);
  }
  
+/**
+ * folio_expected_ref_count - calculate the expected folio refcount
+ * @folio: the folio
+ *
+ * Calculate the expected folio refcount, taking references from the pagecache,
+ * swapcache, PG_private and page table mappings into account. Useful in
+ * combination with folio_ref_count() to detect unexpected references (e.g.,
+ * GUP or other temporary references).
+ *
+ * Does currently not consider references from the LRU cache. If the folio
+ * was isolated from the LRU (which is the case during migration or split),
+ * the folio was already isolated from the LRU and the LRU cache does not apply.
+ *
+ * Calling this function on an unmapped folio -- !folio_mapped() -- that is
+ * locked will return a stable result.
+ *
+ * Calling this function on a mapped folio will not result in a stable result,
+ * because nothing stops additional page table mappings from coming (e.g.,
+ * fork()) or going (e.g., munmap()).
+ *
+ * Calling this function without the folio lock will also not result in a
+ * stable result: for example, the folio might get dropped from the swapcache
+ * concurrently.
+ *
+ * However, even when called without the folio lock or on a mapped folio,
+ * this function can be used to detect unexpected references early (for example.
+ * if it makes sense to even lock the folio and unmap it).
+ *
+ * The caller must add any reference (e.g., from folio_try_get()) it might be
+ * holding itself to the result.
+ *
+ * Returns the expected folio refcount.
+ */
+static inline int folio_expected_ref_count(const struct folio *folio)
+{
+	const int order = folio_order(folio);
+	int ref_count = 0;
+
+	if (WARN_ON_ONCE(folio_test_slab(folio)))
+		return 0;
+
+	if (folio_test_anon(folio)) {
+		/* One reference per page from the swapcache. */
+		ref_count += folio_test_swapcache(folio) << order;
+	} else if (!((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS)) {
+		/* One reference per page from the pagecache. */
+		ref_count += !!folio->mapping << order;
+		/* One reference from PG_private. */
+		ref_count += folio_test_private(folio);
+	}
+
+	/* One reference per page table mapping. */
+	return ref_count + folio_mapcount(folio);;
+}
+
  #ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
  static inline int arch_make_folio_accessible(struct folio *folio)
  {
-- 
2.49.0


The PAGE_MAPPING_FLAGS can likely go away soon (I have patches for that),
then we only have to test for folio->mapping.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-25  7:47               ` David Hildenbrand
@ 2025-04-29 10:57                 ` Shivank Garg
  2025-04-29 11:31                   ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Shivank Garg @ 2025-04-29 10:57 UTC (permalink / raw)
  To: David Hildenbrand, Matthew Wilcox
  Cc: Andrew Morton, shaggy, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299



On 4/25/2025 1:17 PM, David Hildenbrand wrote:
> On 24.04.25 13:57, Shivank Garg wrote:
>> Hi All,
>>
>> Thank you for reviewing my patch and providing feedback.
>>
>> On 4/24/2025 8:49 AM, Matthew Wilcox wrote:
>>> On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
>>>> On 23.04.25 09:22, David Hildenbrand wrote:
>>>>> On 23.04.25 02:36, Matthew Wilcox wrote:
>>>>>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>>>>>> +/**
>>>>>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>>>>>
>>>>>>> "folio_migration_expected_refs"
>>
>> Thank you for catching this, I'll fix it.
>>
>> I wasn't previously aware of using make W=1 to build kernel-docs and
>> check for warnings - this is very useful information for me.
>>
>> I'll add to changelog to better explain why this is needed for JFS.
>>
>>>>>>
>>>>>> What I do wonder is whether we want to have such a specialised
>>>>>> function existing.  We have can_split_folio() in huge_memory.c
>>>>>> which is somewhat more comprehensive and doesn't require the folio to be
>>>>>> unmapped first.
>>>>>
>>>>> I was debating with myself whether we should do the usual "refs from
>>>>> ->private, refs from page table mappings" .. dance, and look up the
>>>>> mapping from the folio instead of passing it in.
>>>>>
>>>>> I concluded that for this (migration) purpose the function is good
>>>>> enough as it is: if abused in wrong context (e.g., still ->private,
>>>>> still page table mappings), it would not fake that there are no
>>>>> unexpected references.
>>>>
>>>> Sorry, I forgot that we still care about the reference from ->private here.
>>>> We expect the folio to be unmapped.
>>>
>>> Right, so just adding in folio_mapcount() will be a no-op for migration,
>>> but enable its reuse by can_split_folio().  Maybe.  Anyway, the way I
>>> explain page refocunts to people (and I need to put this in a document
>>> somewhere):
>>>
>>> There are three types of contribution to the refcount:
>>>
>>>   - Expected.  These are deducible from the folio itself, and they're all
>>>     findable.  You need to figure out what the expected number of
>>>     references are to a folio if you're going to try to freeze it.
>>>     These can be references from the mapcount, the page cache, the swap
>>>     cache, the private data, your call chain.
>>>   - Temporary.  Someone else has found the folio somehow; perhaps through
>>>     the page cache, or by calling GUP or something.  They mean you can't
>>>     freeze the folio because you don't know who has the reference or how
>>>     long they might hold it for.
>>>   - Spurious.  This is like a temporary reference, but worse because if
>>>     you read the code, there should be no way for there to be any temporary
>>>     references to the folio.  Someone's found a stale pointer to this
>>>     folio and has bumped the reference count while they check that the
>>>     folio they have is the one they expected to find.  They're going
>>>     to find out that the pointer they followed is stale and put their
>>>     refcount soon, but in the meantime you still can't freeze the folio.
>>>
>>> So I don't love the idea of having a function with the word "expected"
>>> in the name that returns a value which doesn't take into account all
>>> the potential contributors to the expected value.  And sure we can keep
>>> adding qualifiers to the function name to indicate how it is to be used,
>>> but at some point I think we should say "It's OK for this to be a little
>>> less efficient so we can understand what it means".
>>
>> Thank you, Willy, for the detailed explanation about page reference counting.
>> This has helped me understand the concept much better.
>>
>> Based on your explanation and the discussion, I'm summarizing the 2 approaches:
>>
>> 1. Rename folio_migration_expected_refs to folio_migration_expected_base_refs, to
>> to clarify it does not account for other potential contributors.
>> or folio_unmapped_base_refs?
>> 2. Accounting all possible contributors to expected refs:
>> folio_expected_refs(mapping, folio)
>> {   
>>     int refs = 1;
>>
>>     if (mapping) {
>>         if (folio_test_anon(folio))
>>             refs += folio_test_swapcache(folio) ?
>>                 folio_nr_pages(folio) : 0;
>>         else
>>             refs += folio_nr_pages(folio);
>>
>>         if (folio_test_private(folio))
>>             refs++;
>>     }
>>     refs += folio_mapcount(folio); // takes mapped folio into account and evaluate as no-op for unmapped folios during migration
>>     return refs;
>> }
>>
>> Please let me know if this approach is acceptable or if you have
>> other suggestions for improvement.
> 
> A couple of points:
> 
> 1) Can we name it folio_expected_ref_count()
> 
> 2) Can we avoid passing in the mapping? Might not be expensive to look it
>    up again. Below I avoid calling folio_mapping().
> 
> 3) Can we delegate adding the additional reference to the caller? Will make it
>    easier to use elsewhere (e.g., not additional reference because we are holding
>    the page table lock).
> 
> 4) Can we add kerneldoc, and in particular document the semantics?
> 
> Not sure if we should inline this function or put it into mm/utils.c
> 

Hi David,

Thank you for the detailed suggestions. They all make sense to me.

I did not understand a few changes in your patch below:
> 
> I'm thinking of something like (completely untested):
> 
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a205020e2a58b..a0ad4ed9a75ff 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2112,6 +2112,61 @@ static inline bool folio_maybe_mapped_shared(struct folio *folio)
>      return folio_test_large_maybe_mapped_shared(folio);
>  }
>  
> +/**
> + * folio_expected_ref_count - calculate the expected folio refcount
> + * @folio: the folio
> + *
> + * Calculate the expected folio refcount, taking references from the pagecache,
> + * swapcache, PG_private and page table mappings into account. Useful in
> + * combination with folio_ref_count() to detect unexpected references (e.g.,
> + * GUP or other temporary references).
> + *
> + * Does currently not consider references from the LRU cache. If the folio
> + * was isolated from the LRU (which is the case during migration or split),
> + * the folio was already isolated from the LRU and the LRU cache does not apply.
> + *
> + * Calling this function on an unmapped folio -- !folio_mapped() -- that is
> + * locked will return a stable result.
> + *
> + * Calling this function on a mapped folio will not result in a stable result,
> + * because nothing stops additional page table mappings from coming (e.g.,
> + * fork()) or going (e.g., munmap()).
> + *
> + * Calling this function without the folio lock will also not result in a
> + * stable result: for example, the folio might get dropped from the swapcache
> + * concurrently.
> + *
> + * However, even when called without the folio lock or on a mapped folio,
> + * this function can be used to detect unexpected references early (for example.
> + * if it makes sense to even lock the folio and unmap it).
> + *
> + * The caller must add any reference (e.g., from folio_try_get()) it might be
> + * holding itself to the result.
> + *
> + * Returns the expected folio refcount.
> + */
> +static inline int folio_expected_ref_count(const struct folio *folio)
> +{
> +    const int order = folio_order(folio);
> +    int ref_count = 0;

Why are we not taking base ref_count as 1 like it's done in original folio_expected_refs
implementation?

> +
> +    if (WARN_ON_ONCE(folio_test_slab(folio)))
> +        return 0;
> +
> +    if (folio_test_anon(folio)) {
> +        /* One reference per page from the swapcache. */
> +        ref_count += folio_test_swapcache(folio) << order;

why not use folio_nr_pages() here instead 1 << order?
something like folio_test_swapcache(folio) * folio_nr_pages(folio).

> +    } else if (!((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS)) {
> +        /* One reference per page from the pagecache. */
> +        ref_count += !!folio->mapping << order;
> +        /* One reference from PG_private. */
> +        ref_count += folio_test_private(folio);
> +    }
> +
> +    /* One reference per page table mapping. */
> +    return ref_count + folio_mapcount(folio);;

> +}
> +
>  #ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
>  static inline int arch_make_folio_accessible(struct folio *folio)
>  {

I tested your patch with stress-ng and my move-pages test code. I did not see
any bugs/errors.

Thanks,
Shivank






^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
  2025-04-29 10:57                 ` Shivank Garg
@ 2025-04-29 11:31                   ` David Hildenbrand
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-04-29 11:31 UTC (permalink / raw)
  To: Shivank Garg, Matthew Wilcox
  Cc: Andrew Morton, shaggy, wangkefeng.wang, jane.chu, ziy, donettom,
	apopple, jfs-discussion, linux-kernel, linux-mm,
	syzbot+8bb6fd945af4e0ad9299

On 29.04.25 12:57, Shivank Garg wrote:
> 
> 
> On 4/25/2025 1:17 PM, David Hildenbrand wrote:
>> On 24.04.25 13:57, Shivank Garg wrote:
>>> Hi All,
>>>
>>> Thank you for reviewing my patch and providing feedback.
>>>
>>> On 4/24/2025 8:49 AM, Matthew Wilcox wrote:
>>>> On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
>>>>> On 23.04.25 09:22, David Hildenbrand wrote:
>>>>>> On 23.04.25 02:36, Matthew Wilcox wrote:
>>>>>>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>>>>>>> +/**
>>>>>>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>>>>>>
>>>>>>>> "folio_migration_expected_refs"
>>>
>>> Thank you for catching this, I'll fix it.
>>>
>>> I wasn't previously aware of using make W=1 to build kernel-docs and
>>> check for warnings - this is very useful information for me.
>>>
>>> I'll add to changelog to better explain why this is needed for JFS.
>>>
>>>>>>>
>>>>>>> What I do wonder is whether we want to have such a specialised
>>>>>>> function existing.  We have can_split_folio() in huge_memory.c
>>>>>>> which is somewhat more comprehensive and doesn't require the folio to be
>>>>>>> unmapped first.
>>>>>>
>>>>>> I was debating with myself whether we should do the usual "refs from
>>>>>> ->private, refs from page table mappings" .. dance, and look up the
>>>>>> mapping from the folio instead of passing it in.
>>>>>>
>>>>>> I concluded that for this (migration) purpose the function is good
>>>>>> enough as it is: if abused in wrong context (e.g., still ->private,
>>>>>> still page table mappings), it would not fake that there are no
>>>>>> unexpected references.
>>>>>
>>>>> Sorry, I forgot that we still care about the reference from ->private here.
>>>>> We expect the folio to be unmapped.
>>>>
>>>> Right, so just adding in folio_mapcount() will be a no-op for migration,
>>>> but enable its reuse by can_split_folio().  Maybe.  Anyway, the way I
>>>> explain page refocunts to people (and I need to put this in a document
>>>> somewhere):
>>>>
>>>> There are three types of contribution to the refcount:
>>>>
>>>>    - Expected.  These are deducible from the folio itself, and they're all
>>>>      findable.  You need to figure out what the expected number of
>>>>      references are to a folio if you're going to try to freeze it.
>>>>      These can be references from the mapcount, the page cache, the swap
>>>>      cache, the private data, your call chain.
>>>>    - Temporary.  Someone else has found the folio somehow; perhaps through
>>>>      the page cache, or by calling GUP or something.  They mean you can't
>>>>      freeze the folio because you don't know who has the reference or how
>>>>      long they might hold it for.
>>>>    - Spurious.  This is like a temporary reference, but worse because if
>>>>      you read the code, there should be no way for there to be any temporary
>>>>      references to the folio.  Someone's found a stale pointer to this
>>>>      folio and has bumped the reference count while they check that the
>>>>      folio they have is the one they expected to find.  They're going
>>>>      to find out that the pointer they followed is stale and put their
>>>>      refcount soon, but in the meantime you still can't freeze the folio.
>>>>
>>>> So I don't love the idea of having a function with the word "expected"
>>>> in the name that returns a value which doesn't take into account all
>>>> the potential contributors to the expected value.  And sure we can keep
>>>> adding qualifiers to the function name to indicate how it is to be used,
>>>> but at some point I think we should say "It's OK for this to be a little
>>>> less efficient so we can understand what it means".
>>>
>>> Thank you, Willy, for the detailed explanation about page reference counting.
>>> This has helped me understand the concept much better.
>>>
>>> Based on your explanation and the discussion, I'm summarizing the 2 approaches:
>>>
>>> 1. Rename folio_migration_expected_refs to folio_migration_expected_base_refs, to
>>> to clarify it does not account for other potential contributors.
>>> or folio_unmapped_base_refs?
>>> 2. Accounting all possible contributors to expected refs:
>>> folio_expected_refs(mapping, folio)
>>> {
>>>      int refs = 1;
>>>
>>>      if (mapping) {
>>>          if (folio_test_anon(folio))
>>>              refs += folio_test_swapcache(folio) ?
>>>                  folio_nr_pages(folio) : 0;
>>>          else
>>>              refs += folio_nr_pages(folio);
>>>
>>>          if (folio_test_private(folio))
>>>              refs++;
>>>      }
>>>      refs += folio_mapcount(folio); // takes mapped folio into account and evaluate as no-op for unmapped folios during migration
>>>      return refs;
>>> }
>>>
>>> Please let me know if this approach is acceptable or if you have
>>> other suggestions for improvement.
>>
>> A couple of points:
>>
>> 1) Can we name it folio_expected_ref_count()
>>
>> 2) Can we avoid passing in the mapping? Might not be expensive to look it
>>     up again. Below I avoid calling folio_mapping().
>>
>> 3) Can we delegate adding the additional reference to the caller? Will make it
>>     easier to use elsewhere (e.g., not additional reference because we are holding
>>     the page table lock).
>>
>> 4) Can we add kerneldoc, and in particular document the semantics?
>>
>> Not sure if we should inline this function or put it into mm/utils.c
>>
> 
> Hi David,
> 
> Thank you for the detailed suggestions. They all make sense to me.
> 
> I did not understand a few changes in your patch below:
>>
>> I'm thinking of something like (completely untested):
>>
>>   
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index a205020e2a58b..a0ad4ed9a75ff 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2112,6 +2112,61 @@ static inline bool folio_maybe_mapped_shared(struct folio *folio)
>>       return folio_test_large_maybe_mapped_shared(folio);
>>   }
>>   
>> +/**
>> + * folio_expected_ref_count - calculate the expected folio refcount
>> + * @folio: the folio
>> + *
>> + * Calculate the expected folio refcount, taking references from the pagecache,
>> + * swapcache, PG_private and page table mappings into account. Useful in
>> + * combination with folio_ref_count() to detect unexpected references (e.g.,
>> + * GUP or other temporary references).
>> + *
>> + * Does currently not consider references from the LRU cache. If the folio
>> + * was isolated from the LRU (which is the case during migration or split),
>> + * the folio was already isolated from the LRU and the LRU cache does not apply.
>> + *
>> + * Calling this function on an unmapped folio -- !folio_mapped() -- that is
>> + * locked will return a stable result.
>> + *
>> + * Calling this function on a mapped folio will not result in a stable result,
>> + * because nothing stops additional page table mappings from coming (e.g.,
>> + * fork()) or going (e.g., munmap()).
>> + *
>> + * Calling this function without the folio lock will also not result in a
>> + * stable result: for example, the folio might get dropped from the swapcache
>> + * concurrently.
>> + *
>> + * However, even when called without the folio lock or on a mapped folio,
>> + * this function can be used to detect unexpected references early (for example.
>> + * if it makes sense to even lock the folio and unmap it).
>> + *
>> + * The caller must add any reference (e.g., from folio_try_get()) it might be
>> + * holding itself to the result.
>> + *
>> + * Returns the expected folio refcount.
>> + */
>> +static inline int folio_expected_ref_count(const struct folio *folio)
>> +{
>> +    const int order = folio_order(folio);
>> +    int ref_count = 0;
> 
> Why are we not taking base ref_count as 1 like it's done in original folio_expected_refs
> implementation?

The idea is that this is the responsibility of the caller, which will 
make this function more versatile.

For example, when we're holding the page table lock and want to check 
for unexpected references, we wouldn't be holding any additional 
reference from a folio_try_get() like migration code would.

> 
>> +
>> +    if (WARN_ON_ONCE(folio_test_slab(folio)))
>> +        return 0;
>> +
>> +    if (folio_test_anon(folio)) {
>> +        /* One reference per page from the swapcache. */
>> +        ref_count += folio_test_swapcache(folio) << order;
> 
> why not use folio_nr_pages() here instead 1 << order?
> something like folio_test_swapcache(folio) * folio_nr_pages(folio).

A shift is typically cheaper than a multiplication, so it looked like a 
low hanging fruit to use a shift here.

> 
>> +    } else if (!((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS)) {
>> +        /* One reference per page from the pagecache. */
>> +        ref_count += !!folio->mapping << order;
>> +        /* One reference from PG_private. */
>> +        ref_count += folio_test_private(folio);
>> +    }
>> +
>> +    /* One reference per page table mapping. */
>> +    return ref_count + folio_mapcount(folio);;
> 
>> +}
>> +
>>   #ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
>>   static inline int arch_make_folio_accessible(struct folio *folio)
>>   {
> 
> I tested your patch with stress-ng and my move-pages test code. I did not see
> any bugs/errors.


Cool! It would be good to get some feedback from Willy on the kerneldoc, 
if he's aware of other constraints etc.


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-04-29 11:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-22 11:40 [PATCH V4 0/2] JFS: Implement migrate_folio for jfs_metapage_aops Shivank Garg
2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
2025-04-22 15:18   ` David Hildenbrand
2025-04-22 15:22   ` Zi Yan
2025-04-22 23:41   ` Andrew Morton
2025-04-23  0:36     ` Matthew Wilcox
2025-04-23  7:22       ` David Hildenbrand
2025-04-23  7:25         ` David Hildenbrand
2025-04-24  3:19           ` Matthew Wilcox
2025-04-24 11:57             ` Shivank Garg
2025-04-25  7:47               ` David Hildenbrand
2025-04-29 10:57                 ` Shivank Garg
2025-04-29 11:31                   ` David Hildenbrand
2025-04-22 11:40 ` [PATCH V4 2/2] jfs: implement migrate_folio for jfs_metapage_aops Shivank Garg
2025-04-22 15:23   ` David Hildenbrand
2025-04-22 13:59 ` [syzbot] [mm?] WARNING in move_to_new_folio syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox