linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [mm?] BUG: Bad page map (7)
@ 2023-09-09 17:12 syzbot
  2023-09-10  3:02 ` Matthew Wilcox
  2023-09-11  7:12 ` Yin Fengwei
  0 siblings, 2 replies; 24+ messages in thread
From: syzbot @ 2023-09-09 17:12 UTC (permalink / raw)
  To: akpm, fengwei.yin, linux-kernel, linux-mm, syzkaller-bugs, willy

Hello,

syzbot found the following issue on:

HEAD commit:    3f86ed6ec0b3 Merge tag 'arc-6.6-rc1' of git://git.kernel.o..
git tree:       upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=142a0e00680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ff0db7a15ba54ead
dashboard link: https://syzkaller.appspot.com/bug?extid=55cc72f8cc3a549119df
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17ff1fa8680000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1445ba2fa80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/15ea526c030f/disk-3f86ed6e.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/e8f0baca67e5/vmlinux-3f86ed6e.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e39fafbb687d/bzImage-3f86ed6e.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/f82bb81a1d50/mount_0.gz

The issue was bisected to:

commit 617c28ecab22d98a3809370eb6cb50fa24b7bfe1
Author: Yin Fengwei <fengwei.yin@intel.com>
Date:   Wed Aug 2 15:14:05 2023 +0000

    filemap: batch PTE mappings

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13c37c58680000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=10237c58680000
console output: https://syzkaller.appspot.com/x/log.txt?x=17c37c58680000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com
Fixes: 617c28ecab22 ("filemap: batch PTE mappings")

BUG: Bad page map in process syz-executor332  pte:fffff8ce8c120 pmd:79462067
page:ffffea0001cc5cc0 refcount:9 mapcount:-1 mapping:ffff8880774b1b50 index:0x3 pfn:0x73173
head:ffffea0001cc5c00 order:2 entire_mapcount:0 nr_pages_mapped:8388607 pincount:0
memcg:ffff888015e5a000
aops:xfs_address_space_operations ino:244a dentry name:"bus"
flags: 0xfff0000000816c(referenced|uptodate|lru|active|private|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000000 ffffea0001cc5c01 dead000000000122 dead000000000400
raw: 0000000000000001 0000000000000000 00000000fffffffe 0000000000000000
head: 00fff0000000816c ffffea00007c9948 ffff888013245030 ffff8880774b1b50
head: 0000000000000000 ffff888027450e00 00000009ffffffff ffff888015e5a000
page dumped because: bad pte
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Movable, gfp_mask 0x152c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 5036, tgid 5036 (syz-executor332), ts 61415422939, free_ts 21789924659
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1e6/0x210 mm/page_alloc.c:1536
 prep_new_page mm/page_alloc.c:1543 [inline]
 get_page_from_freelist+0x31ec/0x3370 mm/page_alloc.c:3183
 __alloc_pages+0x255/0x670 mm/page_alloc.c:4439
 folio_alloc+0x1e/0x60 mm/mempolicy.c:2308
 filemap_alloc_folio+0xde/0x500 mm/filemap.c:979
 ra_alloc_folio mm/readahead.c:468 [inline]
 page_cache_ra_order+0x423/0xcc0 mm/readahead.c:524
 do_sync_mmap_readahead+0x444/0x850
 filemap_fault+0x7d3/0x1710 mm/filemap.c:3294
 __xfs_filemap_fault+0x286/0x960 fs/xfs/xfs_file.c:1354
 __do_fault+0x133/0x4e0 mm/memory.c:4204
 do_read_fault mm/memory.c:4568 [inline]
 do_fault mm/memory.c:4705 [inline]
 do_pte_missing mm/memory.c:3669 [inline]
 handle_pte_fault mm/memory.c:4978 [inline]
 __handle_mm_fault mm/memory.c:5119 [inline]
 handle_mm_fault+0x48d2/0x6200 mm/memory.c:5284
 faultin_page mm/gup.c:956 [inline]
 __get_user_pages+0x6bd/0x15e0 mm/gup.c:1239
 __get_user_pages_locked mm/gup.c:1504 [inline]
 get_dump_page+0x146/0x2b0 mm/gup.c:2018
 dump_user_range+0x126/0x910 fs/coredump.c:913
 elf_core_dump+0x3b75/0x4490 fs/binfmt_elf.c:2142
 do_coredump+0x1b73/0x2ab0 fs/coredump.c:764
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1136 [inline]
 free_unref_page_prepare+0x8c3/0x9f0 mm/page_alloc.c:2312
 free_unref_page+0x37/0x3f0 mm/page_alloc.c:2405
 free_contig_range+0x9e/0x150 mm/page_alloc.c:6355
 destroy_args+0x95/0x7c0 mm/debug_vm_pgtable.c:1028
 debug_vm_pgtable+0x4ac/0x540 mm/debug_vm_pgtable.c:1408
 do_one_initcall+0x23d/0x7d0 init/main.c:1232
 do_initcall_level+0x157/0x210 init/main.c:1294
 do_initcalls+0x3f/0x80 init/main.c:1310
 kernel_init_freeable+0x440/0x5d0 init/main.c:1547
 kernel_init+0x1d/0x2a0 init/main.c:1437
 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
addr:0000000020006000 vm_flags:080000d0 anon_vma:0000000000000000 mapping:ffff8880774b1b50 index:5
file:bus fault:xfs_filemap_fault mmap:xfs_file_mmap read_folio:xfs_vm_read_folio
CPU: 1 PID: 5036 Comm: syz-executor332 Not tainted 6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 print_bad_pte+0x581/0x5c0 mm/memory.c:535
 zap_pte_range mm/memory.c:1458 [inline]
 zap_pmd_range mm/memory.c:1573 [inline]
 zap_pud_range mm/memory.c:1602 [inline]
 zap_p4d_range mm/memory.c:1623 [inline]
 unmap_page_range+0x1a76/0x3300 mm/memory.c:1644
 unmap_vmas+0x209/0x3a0 mm/memory.c:1731
 exit_mmap+0x297/0xc50 mm/mmap.c:3210
 __mmput+0x115/0x3c0 kernel/fork.c:1349
 exit_mm+0x21f/0x300 kernel/exit.c:567
 do_exit+0x612/0x2290 kernel/exit.c:861
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x175d/0x1840 kernel/signal.c:2892
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff93b1d0eb9
Code: Unable to access opcode bytes at 0x7ff93b1d0e8f.
RSP: 002b:00007ffc50f66f08 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffe5 RBX: 0000000000000003 RCX: 00007ff93b1d0eb9
RDX: 0000000000000002 RSI: 0000000020000300 RDI: 0000000000000007
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000008800000 R11: 0000000000000246 R12: 00000000000f4240
R13: 00007ffc50f67188 R14: 0000000000000001 R15: 00007ffc50f66f50
 </TASK>
BUG: Bad page map in process syz-executor332  pte:fffff8ce8d120 pmd:79462067
page:ffffea0001cc5c80 refcount:9 mapcount:-1 mapping:ffff8880774b1b50 index:0x2 pfn:0x73172
head:ffffea0001cc5c00 order:2 entire_mapcount:0 nr_pages_mapped:8388606 pincount:0
memcg:ffff888015e5a000
aops:xfs_address_space_operations ino:244a dentry name:"bus"
flags: 0xfff0000000816c(referenced|uptodate|lru|active|private|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000000 ffffea0001cc5c01 ffffea0001cc5c90 ffffea0001cc5c90
raw: 0000000000000001 0000000000000000 00000000fffffffe 0000000000000000
head: 00fff0000000816c ffffea00007c9948 ffff888013245030 ffff8880774b1b50
head: 0000000000000000 ffff888027450e00 00000009ffffffff ffff888015e5a000
page dumped because: bad pte
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Movable, gfp_mask 0x152c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 5036, tgid 5036 (syz-executor332), ts 61415422939, free_ts 21789914922
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1e6/0x210 mm/page_alloc.c:1536
 prep_new_page mm/page_alloc.c:1543 [inline]
 get_page_from_freelist+0x31ec/0x3370 mm/page_alloc.c:3183
 __alloc_pages+0x255/0x670 mm/page_alloc.c:4439
 folio_alloc+0x1e/0x60 mm/mempolicy.c:2308
 filemap_alloc_folio+0xde/0x500 mm/filemap.c:979
 ra_alloc_folio mm/readahead.c:468 [inline]
 page_cache_ra_order+0x423/0xcc0 mm/readahead.c:524
 do_sync_mmap_readahead+0x444/0x850
 filemap_fault+0x7d3/0x1710 mm/filemap.c:3294
 __xfs_filemap_fault+0x286/0x960 fs/xfs/xfs_file.c:1354
 __do_fault+0x133/0x4e0 mm/memory.c:4204
 do_read_fault mm/memory.c:4568 [inline]
 do_fault mm/memory.c:4705 [inline]
 do_pte_missing mm/memory.c:3669 [inline]
 handle_pte_fault mm/memory.c:4978 [inline]
 __handle_mm_fault mm/memory.c:5119 [inline]
 handle_mm_fault+0x48d2/0x6200 mm/memory.c:5284
 faultin_page mm/gup.c:956 [inline]
 __get_user_pages+0x6bd/0x15e0 mm/gup.c:1239
 __get_user_pages_locked mm/gup.c:1504 [inline]
 get_dump_page+0x146/0x2b0 mm/gup.c:2018
 dump_user_range+0x126/0x910 fs/coredump.c:913
 elf_core_dump+0x3b75/0x4490 fs/binfmt_elf.c:2142
 do_coredump+0x1b73/0x2ab0 fs/coredump.c:764
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1136 [inline]
 free_unref_page_prepare+0x8c3/0x9f0 mm/page_alloc.c:2312
 free_unref_page+0x37/0x3f0 mm/page_alloc.c:2405
 free_contig_range+0x9e/0x150 mm/page_alloc.c:6355
 destroy_args+0x95/0x7c0 mm/debug_vm_pgtable.c:1028
 debug_vm_pgtable+0x4ac/0x540 mm/debug_vm_pgtable.c:1408
 do_one_initcall+0x23d/0x7d0 init/main.c:1232
 do_initcall_level+0x157/0x210 init/main.c:1294
 do_initcalls+0x3f/0x80 init/main.c:1310
 kernel_init_freeable+0x440/0x5d0 init/main.c:1547
 kernel_init+0x1d/0x2a0 init/main.c:1437
 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
addr:0000000020007000 vm_flags:080000d0 anon_vma:0000000000000000 mapping:ffff8880774b1b50 index:6
file:bus fault:xfs_filemap_fault mmap:xfs_file_mmap read_folio:xfs_vm_read_folio
CPU: 0 PID: 5036 Comm: syz-executor332 Tainted: G    B              6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 print_bad_pte+0x581/0x5c0 mm/memory.c:535
 zap_pte_range mm/memory.c:1458 [inline]
 zap_pmd_range mm/memory.c:1573 [inline]
 zap_pud_range mm/memory.c:1602 [inline]
 zap_p4d_range mm/memory.c:1623 [inline]
 unmap_page_range+0x1a76/0x3300 mm/memory.c:1644
 unmap_vmas+0x209/0x3a0 mm/memory.c:1731
 exit_mmap+0x297/0xc50 mm/mmap.c:3210
 __mmput+0x115/0x3c0 kernel/fork.c:1349
 exit_mm+0x21f/0x300 kernel/exit.c:567
 do_exit+0x612/0x2290 kernel/exit.c:861
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x175d/0x1840 kernel/signal.c:2892
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff93b1d0eb9
Code: Unable to access opcode bytes at 0x7ff93b1d0e8f.
RSP: 002b:00007ffc50f66f08 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffe5 RBX: 0000000000000003 RCX: 00007ff93b1d0eb9
RDX: 0000000000000002 RSI: 0000000020000300 RDI: 0000000000000007
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000008800000 R11: 0000000000000246 R12: 00000000000f4240
R13: 00007ffc50f67188 R14: 0000000000000001 R15: 00007ffc50f66f50
 </TASK>
BUG: Bad page map in process syz-executor332  pte:fffff8ce8e120 pmd:79462067
page:ffffea0001cc5c40 refcount:9 mapcount:-1 mapping:ffff8880774b1b50 index:0x1 pfn:0x73171
head:ffffea0001cc5c00 order:2 entire_mapcount:0 nr_pages_mapped:8388605 pincount:0
memcg:ffff888015e5a000
aops:xfs_address_space_operations ino:244a dentry name:"bus"
flags: 0xfff0000000816c(referenced|uptodate|lru|active|private|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000202 ffffea0001cc5c01 dead000000000122 fffffffdffffffff
raw: 0000000400000000 0000000000000000 00000000fffffffe 0000000000000000
head: 00fff0000000816c ffffea00007c9948 ffff888013245030 ffff8880774b1b50
head: 0000000000000000 ffff888027450e00 00000009ffffffff ffff888015e5a000
page dumped because: bad pte
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Movable, gfp_mask 0x152c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 5036, tgid 5036 (syz-executor332), ts 61415422939, free_ts 21789904946
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1e6/0x210 mm/page_alloc.c:1536
 prep_new_page mm/page_alloc.c:1543 [inline]
 get_page_from_freelist+0x31ec/0x3370 mm/page_alloc.c:3183
 __alloc_pages+0x255/0x670 mm/page_alloc.c:4439
 folio_alloc+0x1e/0x60 mm/mempolicy.c:2308
 filemap_alloc_folio+0xde/0x500 mm/filemap.c:979
 ra_alloc_folio mm/readahead.c:468 [inline]
 page_cache_ra_order+0x423/0xcc0 mm/readahead.c:524
 do_sync_mmap_readahead+0x444/0x850
 filemap_fault+0x7d3/0x1710 mm/filemap.c:3294
 __xfs_filemap_fault+0x286/0x960 fs/xfs/xfs_file.c:1354
 __do_fault+0x133/0x4e0 mm/memory.c:4204
 do_read_fault mm/memory.c:4568 [inline]
 do_fault mm/memory.c:4705 [inline]
 do_pte_missing mm/memory.c:3669 [inline]
 handle_pte_fault mm/memory.c:4978 [inline]
 __handle_mm_fault mm/memory.c:5119 [inline]
 handle_mm_fault+0x48d2/0x6200 mm/memory.c:5284
 faultin_page mm/gup.c:956 [inline]
 __get_user_pages+0x6bd/0x15e0 mm/gup.c:1239
 __get_user_pages_locked mm/gup.c:1504 [inline]
 get_dump_page+0x146/0x2b0 mm/gup.c:2018
 dump_user_range+0x126/0x910 fs/coredump.c:913
 elf_core_dump+0x3b75/0x4490 fs/binfmt_elf.c:2142
 do_coredump+0x1b73/0x2ab0 fs/coredump.c:764
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1136 [inline]
 free_unref_page_prepare+0x8c3/0x9f0 mm/page_alloc.c:2312
 free_unref_page+0x37/0x3f0 mm/page_alloc.c:2405
 free_contig_range+0x9e/0x150 mm/page_alloc.c:6355
 destroy_args+0x95/0x7c0 mm/debug_vm_pgtable.c:1028
 debug_vm_pgtable+0x4ac/0x540 mm/debug_vm_pgtable.c:1408
 do_one_initcall+0x23d/0x7d0 init/main.c:1232
 do_initcall_level+0x157/0x210 init/main.c:1294
 do_initcalls+0x3f/0x80 init/main.c:1310
 kernel_init_freeable+0x440/0x5d0 init/main.c:1547
 kernel_init+0x1d/0x2a0 init/main.c:1437
 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
addr:0000000020008000 vm_flags:080000d0 anon_vma:0000000000000000 mapping:ffff8880774b1b50 index:7
file:bus fault:xfs_filemap_fault mmap:xfs_file_mmap read_folio:xfs_vm_read_folio
CPU: 1 PID: 5036 Comm: syz-executor332 Tainted: G    B              6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 print_bad_pte+0x581/0x5c0 mm/memory.c:535
 zap_pte_range mm/memory.c:1458 [inline]
 zap_pmd_range mm/memory.c:1573 [inline]
 zap_pud_range mm/memory.c:1602 [inline]
 zap_p4d_range mm/memory.c:1623 [inline]
 unmap_page_range+0x1a76/0x3300 mm/memory.c:1644
 unmap_vmas+0x209/0x3a0 mm/memory.c:1731
 exit_mmap+0x297/0xc50 mm/mmap.c:3210
 __mmput+0x115/0x3c0 kernel/fork.c:1349
 exit_mm+0x21f/0x300 kernel/exit.c:567
 do_exit+0x612/0x2290 kernel/exit.c:861
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x175d/0x1840 kernel/signal.c:2892
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff93b1d0eb9
Code: Unable to access opcode bytes at 0x7ff93b1d0e8f.
RSP: 002b:00007ffc50f66f08 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffe5 RBX: 0000000000000003 RCX: 00007ff93b1d0eb9
RDX: 0000000000000002 RSI: 0000000020000300 RDI: 0000000000000007
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000008800000 R11: 0000000000000246 R12: 00000000000f4240
R13: 00007ffc50f67188 R14: 0000000000000001 R15: 00007ffc50f66f50
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-09 17:12 [syzbot] [mm?] BUG: Bad page map (7) syzbot
@ 2023-09-10  3:02 ` Matthew Wilcox
  2023-09-10  3:29   ` syzbot
                     ` (2 more replies)
  2023-09-11  7:12 ` Yin Fengwei
  1 sibling, 3 replies; 24+ messages in thread
From: Matthew Wilcox @ 2023-09-10  3:02 UTC (permalink / raw)
  To: syzbot; +Cc: akpm, fengwei.yin, linux-kernel, linux-mm, syzkaller-bugs

On Sat, Sep 09, 2023 at 10:12:48AM -0700, syzbot wrote:
> commit 617c28ecab22d98a3809370eb6cb50fa24b7bfe1
> Author: Yin Fengwei <fengwei.yin@intel.com>
> Date:   Wed Aug 2 15:14:05 2023 +0000
> 
>     filemap: batch PTE mappings

Hmm ... I don't know if this is the bug, but ...

#syz test

diff --git a/mm/filemap.c b/mm/filemap.c
index 582f5317ff71..580d0b2b1a7c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3506,7 +3506,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 		if (count) {
 			set_pte_range(vmf, folio, page, count, addr);
 			folio_ref_add(folio, count);
-			if (in_range(vmf->address, addr, count))
+			if (in_range(vmf->address, addr, count * PAGE_SIZE))
 				ret = VM_FAULT_NOPAGE;
 		}
 
@@ -3520,7 +3520,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 	if (count) {
 		set_pte_range(vmf, folio, page, count, addr);
 		folio_ref_add(folio, count);
-		if (in_range(vmf->address, addr, count))
+		if (in_range(vmf->address, addr, count * PAGE_SIZE))
 			ret = VM_FAULT_NOPAGE;
 	}
 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-10  3:02 ` Matthew Wilcox
@ 2023-09-10  3:29   ` syzbot
  2023-09-10  3:40   ` Yin, Fengwei
  2023-09-11  7:24   ` Yin Fengwei
  2 siblings, 0 replies; 24+ messages in thread
From: syzbot @ 2023-09-10  3:29 UTC (permalink / raw)
  To: akpm, fengwei.yin, linux-kernel, linux-mm, syzkaller-bugs, willy

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: Bad page map

BUG: Bad page map in process syz-executor.0  pte:fffff9b7dc120 pmd:1ce8f067
page:ffffea00019208c0 refcount:9 mapcount:-1 mapping:ffff8880766e5190 index:0x3 pfn:0x64823
head:ffffea0001920800 order:2 entire_mapcount:0 nr_pages_mapped:8388607 pincount:0
memcg:ffff88801d8b4000
aops:xfs_address_space_operations ino:244a dentry name:"bus"
flags: 0xfff0000000816c(referenced|uptodate|lru|active|private|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000000 ffffea0001920801 dead000000000122 dead000000000400
raw: 0000000000000001 0000000000000000 00000000fffffffe 0000000000000000
head: 00fff0000000816c ffffea0001a08288 ffff88807acd1030 ffff8880766e5190
head: 0000000000000000 ffff888019789200 00000009ffffffff ffff88801d8b4000
page dumped because: bad pte
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Movable, gfp_mask 0x152c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 5453, tgid 5452 (syz-executor.0), ts 76204282340, free_ts 15924727179
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1e6/0x210 mm/page_alloc.c:1536
 prep_new_page mm/page_alloc.c:1543 [inline]
 get_page_from_freelist+0x31db/0x3360 mm/page_alloc.c:3170
 __alloc_pages+0x255/0x670 mm/page_alloc.c:4426
 folio_alloc+0x1e/0x60 mm/mempolicy.c:2308
 filemap_alloc_folio+0xde/0x500 mm/filemap.c:976
 ra_alloc_folio mm/readahead.c:468 [inline]
 page_cache_ra_order+0x423/0xcc0 mm/readahead.c:524
 do_sync_mmap_readahead+0x444/0x850
 filemap_fault+0x7d3/0x1710 mm/filemap.c:3291
 __xfs_filemap_fault+0x286/0x960 fs/xfs/xfs_file.c:1354
 __do_fault+0x133/0x4e0 mm/memory.c:4204
 do_read_fault mm/memory.c:4568 [inline]
 do_fault mm/memory.c:4705 [inline]
 do_pte_missing mm/memory.c:3669 [inline]
 handle_pte_fault mm/memory.c:4978 [inline]
 __handle_mm_fault mm/memory.c:5119 [inline]
 handle_mm_fault+0x48d2/0x6200 mm/memory.c:5284
 faultin_page mm/gup.c:956 [inline]
 __get_user_pages+0x6bd/0x15e0 mm/gup.c:1239
 __get_user_pages_locked mm/gup.c:1504 [inline]
 get_dump_page+0x146/0x2b0 mm/gup.c:2018
 dump_user_range+0x126/0x910 fs/coredump.c:913
 elf_core_dump+0x3b75/0x4490 fs/binfmt_elf.c:2142
 do_coredump+0x1b73/0x2ab0 fs/coredump.c:764
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1136 [inline]
 free_unref_page_prepare+0x8c3/0x9f0 mm/page_alloc.c:2312
 free_unref_page+0x37/0x3f0 mm/page_alloc.c:2405
 free_contig_range+0x9e/0x150 mm/page_alloc.c:6342
 destroy_args+0x95/0x7c0 mm/debug_vm_pgtable.c:1028
 debug_vm_pgtable+0x4ac/0x540 mm/debug_vm_pgtable.c:1408
 do_one_initcall+0x23d/0x7d0 init/main.c:1232
 do_initcall_level+0x157/0x210 init/main.c:1294
 do_initcalls+0x3f/0x80 init/main.c:1310
 kernel_init_freeable+0x440/0x5d0 init/main.c:1547
 kernel_init+0x1d/0x2a0 init/main.c:1437
 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
addr:0000000020006000 vm_flags:080000d0 anon_vma:0000000000000000 mapping:ffff8880766e5190 index:5
file:bus fault:xfs_filemap_fault mmap:xfs_file_mmap read_folio:xfs_vm_read_folio
CPU: 1 PID: 5453 Comm: syz-executor.0 Not tainted 6.5.0-syzkaller-12921-ga3c57ab79a06-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 print_bad_pte+0x581/0x5c0 mm/memory.c:535
 zap_pte_range mm/memory.c:1458 [inline]
 zap_pmd_range mm/memory.c:1573 [inline]
 zap_pud_range mm/memory.c:1602 [inline]
 zap_p4d_range mm/memory.c:1623 [inline]
 unmap_page_range+0x1a76/0x3300 mm/memory.c:1644
 unmap_vmas+0x209/0x3a0 mm/memory.c:1731
 exit_mmap+0x297/0xc50 mm/mmap.c:3210
 __mmput+0x115/0x3c0 kernel/fork.c:1349
 exit_mm+0x21f/0x300 kernel/exit.c:567
 do_exit+0x612/0x2290 kernel/exit.c:861
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x175d/0x1840 kernel/signal.c:2892
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f4cbf47cae9
Code: Unable to access opcode bytes at 0x7f4cbf47cabf.
RSP: 002b:00007f4cc01920c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffe5 RBX: 00007f4cbf59bf80 RCX: 00007f4cbf47cae9
RDX: 0000000000000002 RSI: 0000000020000300 RDI: 0000000000000007
RBP: 00007f4cbf4c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000008800000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f4cbf59bf80 R15: 00007fff582f4a98
 </TASK>
BUG: Bad page map in process syz-executor.0  pte:fffff9b7dd120 pmd:1ce8f067
page:ffffea0001920880 refcount:9 mapcount:-1 mapping:ffff8880766e5190 index:0x2 pfn:0x64822
head:ffffea0001920800 order:2 entire_mapcount:0 nr_pages_mapped:8388606 pincount:0
memcg:ffff88801d8b4000
aops:xfs_address_space_operations ino:244a dentry name:"bus"
flags: 0xfff0000000816c(referenced|uptodate|lru|active|private|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000000 ffffea0001920801 ffffea0001920890 ffffea0001920890
raw: 0000000000000001 0000000000000000 00000000fffffffe 0000000000000000
head: 00fff0000000816c ffffea0001a08288 ffff88807acd1030 ffff8880766e5190
head: 0000000000000000 ffff888019789200 00000009ffffffff ffff88801d8b4000
page dumped because: bad pte
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Movable, gfp_mask 0x152c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 5453, tgid 5452 (syz-executor.0), ts 76204282340, free_ts 15924721374
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1e6/0x210 mm/page_alloc.c:1536
 prep_new_page mm/page_alloc.c:1543 [inline]
 get_page_from_freelist+0x31db/0x3360 mm/page_alloc.c:3170
 __alloc_pages+0x255/0x670 mm/page_alloc.c:4426
 folio_alloc+0x1e/0x60 mm/mempolicy.c:2308
 filemap_alloc_folio+0xde/0x500 mm/filemap.c:976
 ra_alloc_folio mm/readahead.c:468 [inline]
 page_cache_ra_order+0x423/0xcc0 mm/readahead.c:524
 do_sync_mmap_readahead+0x444/0x850
 filemap_fault+0x7d3/0x1710 mm/filemap.c:3291
 __xfs_filemap_fault+0x286/0x960 fs/xfs/xfs_file.c:1354
 __do_fault+0x133/0x4e0 mm/memory.c:4204
 do_read_fault mm/memory.c:4568 [inline]
 do_fault mm/memory.c:4705 [inline]
 do_pte_missing mm/memory.c:3669 [inline]
 handle_pte_fault mm/memory.c:4978 [inline]
 __handle_mm_fault mm/memory.c:5119 [inline]
 handle_mm_fault+0x48d2/0x6200 mm/memory.c:5284
 faultin_page mm/gup.c:956 [inline]
 __get_user_pages+0x6bd/0x15e0 mm/gup.c:1239
 __get_user_pages_locked mm/gup.c:1504 [inline]
 get_dump_page+0x146/0x2b0 mm/gup.c:2018
 dump_user_range+0x126/0x910 fs/coredump.c:913
 elf_core_dump+0x3b75/0x4490 fs/binfmt_elf.c:2142
 do_coredump+0x1b73/0x2ab0 fs/coredump.c:764
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1136 [inline]
 free_unref_page_prepare+0x8c3/0x9f0 mm/page_alloc.c:2312
 free_unref_page+0x37/0x3f0 mm/page_alloc.c:2405
 free_contig_range+0x9e/0x150 mm/page_alloc.c:6342
 destroy_args+0x95/0x7c0 mm/debug_vm_pgtable.c:1028
 debug_vm_pgtable+0x4ac/0x540 mm/debug_vm_pgtable.c:1408
 do_one_initcall+0x23d/0x7d0 init/main.c:1232
 do_initcall_level+0x157/0x210 init/main.c:1294
 do_initcalls+0x3f/0x80 init/main.c:1310
 kernel_init_freeable+0x440/0x5d0 init/main.c:1547
 kernel_init+0x1d/0x2a0 init/main.c:1437
 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
addr:0000000020007000 vm_flags:080000d0 anon_vma:0000000000000000 mapping:ffff8880766e5190 index:6
file:bus fault:xfs_filemap_fault mmap:xfs_file_mmap read_folio:xfs_vm_read_folio
CPU: 0 PID: 5453 Comm: syz-executor.0 Tainted: G    B              6.5.0-syzkaller-12921-ga3c57ab79a06-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 print_bad_pte+0x581/0x5c0 mm/memory.c:535
 zap_pte_range mm/memory.c:1458 [inline]
 zap_pmd_range mm/memory.c:1573 [inline]
 zap_pud_range mm/memory.c:1602 [inline]
 zap_p4d_range mm/memory.c:1623 [inline]
 unmap_page_range+0x1a76/0x3300 mm/memory.c:1644
 unmap_vmas+0x209/0x3a0 mm/memory.c:1731
 exit_mmap+0x297/0xc50 mm/mmap.c:3210
 __mmput+0x115/0x3c0 kernel/fork.c:1349
 exit_mm+0x21f/0x300 kernel/exit.c:567
 do_exit+0x612/0x2290 kernel/exit.c:861
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x175d/0x1840 kernel/signal.c:2892
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f4cbf47cae9
Code: Unable to access opcode bytes at 0x7f4cbf47cabf.
RSP: 002b:00007f4cc01920c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffe5 RBX: 00007f4cbf59bf80 RCX: 00007f4cbf47cae9
RDX: 0000000000000002 RSI: 0000000020000300 RDI: 0000000000000007
RBP: 00007f4cbf4c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000008800000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f4cbf59bf80 R15: 00007fff582f4a98
 </TASK>
BUG: Bad page map in process syz-executor.0  pte:fffff9b7de120 pmd:1ce8f067
page:ffffea0001920840 refcount:9 mapcount:-1 mapping:ffff8880766e5190 index:0x1 pfn:0x64821
head:ffffea0001920800 order:2 entire_mapcount:0 nr_pages_mapped:8388605 pincount:0
memcg:ffff88801d8b4000
aops:xfs_address_space_operations ino:244a dentry name:"bus"
flags: 0xfff0000000816c(referenced|uptodate|lru|active|private|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000202 ffffea0001920801 dead000000000122 fffffffdffffffff
raw: 0000000400000000 0000000000000000 00000000fffffffe 0000000000000000
head: 00fff0000000816c ffffea0001a08288 ffff88807acd1030 ffff8880766e5190
head: 0000000000000000 ffff888019789200 00000009ffffffff ffff88801d8b4000
page dumped because: bad pte
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Movable, gfp_mask 0x152c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 5453, tgid 5452 (syz-executor.0), ts 76204282340, free_ts 15924715505
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1e6/0x210 mm/page_alloc.c:1536
 prep_new_page mm/page_alloc.c:1543 [inline]
 get_page_from_freelist+0x31db/0x3360 mm/page_alloc.c:3170
 __alloc_pages+0x255/0x670 mm/page_alloc.c:4426
 folio_alloc+0x1e/0x60 mm/mempolicy.c:2308
 filemap_alloc_folio+0xde/0x500 mm/filemap.c:976
 ra_alloc_folio mm/readahead.c:468 [inline]
 page_cache_ra_order+0x423/0xcc0 mm/readahead.c:524
 do_sync_mmap_readahead+0x444/0x850
 filemap_fault+0x7d3/0x1710 mm/filemap.c:3291
 __xfs_filemap_fault+0x286/0x960 fs/xfs/xfs_file.c:1354
 __do_fault+0x133/0x4e0 mm/memory.c:4204
 do_read_fault mm/memory.c:4568 [inline]
 do_fault mm/memory.c:4705 [inline]
 do_pte_missing mm/memory.c:3669 [inline]
 handle_pte_fault mm/memory.c:4978 [inline]
 __handle_mm_fault mm/memory.c:5119 [inline]
 handle_mm_fault+0x48d2/0x6200 mm/memory.c:5284
 faultin_page mm/gup.c:956 [inline]
 __get_user_pages+0x6bd/0x15e0 mm/gup.c:1239
 __get_user_pages_locked mm/gup.c:1504 [inline]
 get_dump_page+0x146/0x2b0 mm/gup.c:2018
 dump_user_range+0x126/0x910 fs/coredump.c:913
 elf_core_dump+0x3b75/0x4490 fs/binfmt_elf.c:2142
 do_coredump+0x1b73/0x2ab0 fs/coredump.c:764
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1136 [inline]
 free_unref_page_prepare+0x8c3/0x9f0 mm/page_alloc.c:2312
 free_unref_page+0x37/0x3f0 mm/page_alloc.c:2405
 free_contig_range+0x9e/0x150 mm/page_alloc.c:6342
 destroy_args+0x95/0x7c0 mm/debug_vm_pgtable.c:1028
 debug_vm_pgtable+0x4ac/0x540 mm/debug_vm_pgtable.c:1408
 do_one_initcall+0x23d/0x7d0 init/main.c:1232
 do_initcall_level+0x157/0x210 init/main.c:1294
 do_initcalls+0x3f/0x80 init/main.c:1310
 kernel_init_freeable+0x440/0x5d0 init/main.c:1547
 kernel_init+0x1d/0x2a0 init/main.c:1437
 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
addr:0000000020008000 vm_flags:080000d0 anon_vma:0000000000000000 mapping:ffff8880766e5190 index:7
file:bus fault:xfs_filemap_fault mmap:xfs_file_mmap read_folio:xfs_vm_read_folio
CPU: 1 PID: 5453 Comm: syz-executor.0 Tainted: G    B              6.5.0-syzkaller-12921-ga3c57ab79a06-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 print_bad_pte+0x581/0x5c0 mm/memory.c:535
 zap_pte_range mm/memory.c:1458 [inline]
 zap_pmd_range mm/memory.c:1573 [inline]
 zap_pud_range mm/memory.c:1602 [inline]
 zap_p4d_range mm/memory.c:1623 [inline]
 unmap_page_range+0x1a76/0x3300 mm/memory.c:1644
 unmap_vmas+0x209/0x3a0 mm/memory.c:1731
 exit_mmap+0x297/0xc50 mm/mmap.c:3210
 __mmput+0x115/0x3c0 kernel/fork.c:1349
 exit_mm+0x21f/0x300 kernel/exit.c:567
 do_exit+0x612/0x2290 kernel/exit.c:861
 do_group_exit+0x206/0x2c0 kernel/exit.c:1024
 get_signal+0x175d/0x1840 kernel/signal.c:2892
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop+0x6a/0x100 kernel/entry/common.c:168
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:296
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f4cbf47cae9
Code: Unable to access opcode bytes at 0x7f4cbf47cabf.
RSP: 002b:00007f4cc01920c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffe5 RBX: 00007f4cbf59bf80 RCX: 00007f4cbf47cae9
RDX: 0000000000000002 RSI: 0000000020000300 RDI: 0000000000000007
RBP: 00007f4cbf4c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000008800000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f4cbf59bf80 R15: 00007fff582f4a98
 </TASK>


Tested on:

commit:         a3c57ab7 iov_iter: Kunit tests for page extraction
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16a308d8680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=50ac7dadde9e1c0e
dashboard link: https://syzkaller.appspot.com/bug?extid=55cc72f8cc3a549119df
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=1037a92c680000



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-10  3:02 ` Matthew Wilcox
  2023-09-10  3:29   ` syzbot
@ 2023-09-10  3:40   ` Yin, Fengwei
  2023-09-11  7:24   ` Yin Fengwei
  2 siblings, 0 replies; 24+ messages in thread
From: Yin, Fengwei @ 2023-09-10  3:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: syzbot, akpm, fengwei.yin, linux-kernel, linux-mm, syzkaller-bugs

Hi Matthew,

On Sun, Sep 10, 2023 at 04:02:32AM +0100, Matthew Wilcox wrote:
> On Sat, Sep 09, 2023 at 10:12:48AM -0700, syzbot wrote:
> > commit 617c28ecab22d98a3809370eb6cb50fa24b7bfe1
> > Author: Yin Fengwei <fengwei.yin@intel.com>
> > Date:   Wed Aug 2 15:14:05 2023 +0000
> > 
> >     filemap: batch PTE mappings
> 
> Hmm ... I don't know if this is the bug, but ...
This is Fengwei. Sorry for replying with my private email. I can't access
my compony email now.
Yes. This is a bug. But I think it just impact the performance.

I will look at this regression. Thanks and sorry for the trouble.


Regards
Yin, Fengwei

> 
> #syz test
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 582f5317ff71..580d0b2b1a7c 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3506,7 +3506,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>  		if (count) {
>  			set_pte_range(vmf, folio, page, count, addr);
>  			folio_ref_add(folio, count);
> -			if (in_range(vmf->address, addr, count))
> +			if (in_range(vmf->address, addr, count * PAGE_SIZE))
>  				ret = VM_FAULT_NOPAGE;
>  		}
>  
> @@ -3520,7 +3520,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>  	if (count) {
>  		set_pte_range(vmf, folio, page, count, addr);
>  		folio_ref_add(folio, count);
> -		if (in_range(vmf->address, addr, count))
> +		if (in_range(vmf->address, addr, count * PAGE_SIZE))
>  			ret = VM_FAULT_NOPAGE;
>  	}
>  
> 
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-09 17:12 [syzbot] [mm?] BUG: Bad page map (7) syzbot
  2023-09-10  3:02 ` Matthew Wilcox
@ 2023-09-11  7:12 ` Yin Fengwei
  2023-09-11  7:48   ` syzbot
  2023-09-11 13:26   ` Matthew Wilcox
  1 sibling, 2 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-09-11  7:12 UTC (permalink / raw)
  To: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs, willy



On 9/10/23 01:12, syzbot wrote:
> commit 617c28ecab22d98a3809370eb6cb50fa24b7bfe1
> Author: Yin Fengwei <fengwei.yin@intel.com>
> Date:   Wed Aug 2 15:14:05 2023 +0000
> 
>     filemap: batch PTE mappings

#syz test

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index a629b1b9f65a6..2701b47efa8f7 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -168,6 +168,28 @@ static inline void native_pgd_clear(pgd_t *pgd)
        native_set_pgd(pgd, native_make_pgd(0));
 }
 
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+               pte_t *ptep, pte_t pte, unsigned int nr)
+{
+       bool protnone = (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
+                       == _PAGE_PROTNONE;
+
+       page_table_check_ptes_set(mm, ptep, pte, nr);
+
+       for(;;) {
+               native_set_pte(ptep, pte);
+               if (--nr == 0)
+                       break;
+
+               ptep++;
+               if (protnone)
+                       pte = __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
+               else
+                       pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
+       }
+}
+#define set_ptes set_ptes
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-10  3:02 ` Matthew Wilcox
  2023-09-10  3:29   ` syzbot
  2023-09-10  3:40   ` Yin, Fengwei
@ 2023-09-11  7:24   ` Yin Fengwei
  2023-09-11  7:32     ` Yin Fengwei
  2 siblings, 1 reply; 24+ messages in thread
From: Yin Fengwei @ 2023-09-11  7:24 UTC (permalink / raw)
  To: Matthew Wilcox, syzbot; +Cc: akpm, linux-kernel, linux-mm, syzkaller-bugs

Hi Matthew,

On 9/10/23 11:02, Matthew Wilcox wrote:
> On Sat, Sep 09, 2023 at 10:12:48AM -0700, syzbot wrote:
>> commit 617c28ecab22d98a3809370eb6cb50fa24b7bfe1
>> Author: Yin Fengwei <fengwei.yin@intel.com>
>> Date:   Wed Aug 2 15:14:05 2023 +0000
>>
>>     filemap: batch PTE mappings
> 
> Hmm ... I don't know if this is the bug, but ...
I do think we should merge your patch here. LKP already noticed some performance
regressions. I suppose this patch can fix some of them.


I root caused the this "bad page map" issue in my local env. It's related with pte
with protnone on x86_64. So if pte is not protnone, advancing pte by adding
1UL << PFN_PTE_SHIFT is correct. But if pte is protnone, should subtract
1UL << PFN_PTE_SHIFT. I saw pfn_pte() had pfn ^= protnone_mask() and just realized
it.


The producer mmap with PROT_NONE and then trigger SIGXFSZ and create core file.
That will cause GUP with FOLL_FORCE and create protnone pte.

I submitted request to sysbot to test the fixing worked on my local env. Thanks.


Regards
Yin, Fengwei

> 
> #syz test
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 582f5317ff71..580d0b2b1a7c 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3506,7 +3506,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>  		if (count) {
>  			set_pte_range(vmf, folio, page, count, addr);
>  			folio_ref_add(folio, count);
> -			if (in_range(vmf->address, addr, count))
> +			if (in_range(vmf->address, addr, count * PAGE_SIZE))
>  				ret = VM_FAULT_NOPAGE;
>  		}
>  
> @@ -3520,7 +3520,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>  	if (count) {
>  		set_pte_range(vmf, folio, page, count, addr);
>  		folio_ref_add(folio, count);
> -		if (in_range(vmf->address, addr, count))
> +		if (in_range(vmf->address, addr, count * PAGE_SIZE))
>  			ret = VM_FAULT_NOPAGE;
>  	}
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11  7:24   ` Yin Fengwei
@ 2023-09-11  7:32     ` Yin Fengwei
  0 siblings, 0 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-09-11  7:32 UTC (permalink / raw)
  To: Matthew Wilcox, syzbot; +Cc: akpm, linux-kernel, linux-mm, syzkaller-bugs



On 9/11/23 15:24, Yin Fengwei wrote:
> Hi Matthew,
> 
> On 9/10/23 11:02, Matthew Wilcox wrote:
>> On Sat, Sep 09, 2023 at 10:12:48AM -0700, syzbot wrote:
>>> commit 617c28ecab22d98a3809370eb6cb50fa24b7bfe1
>>> Author: Yin Fengwei <fengwei.yin@intel.com>
>>> Date:   Wed Aug 2 15:14:05 2023 +0000
>>>
>>>     filemap: batch PTE mappings
>>
>> Hmm ... I don't know if this is the bug, but ...
> I do think we should merge your patch here. LKP already noticed some performance
> regressions. I suppose this patch can fix some of them.
I will verify this patch to see whether the regressions noticed by LKP can be
fixed. Will keep you updated for any progress. Thanks.


Regards
Yin, Fengwei

> 
> 
> I root caused the this "bad page map" issue in my local env. It's related with pte
> with protnone on x86_64. So if pte is not protnone, advancing pte by adding
> 1UL << PFN_PTE_SHIFT is correct. But if pte is protnone, should subtract
> 1UL << PFN_PTE_SHIFT. I saw pfn_pte() had pfn ^= protnone_mask() and just realized
> it.
> 
> 
> The producer mmap with PROT_NONE and then trigger SIGXFSZ and create core file.
> That will cause GUP with FOLL_FORCE and create protnone pte.
> 
> I submitted request to sysbot to test the fixing worked on my local env. Thanks.
> 
> 
> Regards
> Yin, Fengwei
> 
>>
>> #syz test
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 582f5317ff71..580d0b2b1a7c 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3506,7 +3506,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>>  		if (count) {
>>  			set_pte_range(vmf, folio, page, count, addr);
>>  			folio_ref_add(folio, count);
>> -			if (in_range(vmf->address, addr, count))
>> +			if (in_range(vmf->address, addr, count * PAGE_SIZE))
>>  				ret = VM_FAULT_NOPAGE;
>>  		}
>>  
>> @@ -3520,7 +3520,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>>  	if (count) {
>>  		set_pte_range(vmf, folio, page, count, addr);
>>  		folio_ref_add(folio, count);
>> -		if (in_range(vmf->address, addr, count))
>> +		if (in_range(vmf->address, addr, count * PAGE_SIZE))
>>  			ret = VM_FAULT_NOPAGE;
>>  	}
>>  
>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11  7:12 ` Yin Fengwei
@ 2023-09-11  7:48   ` syzbot
  2023-09-11 13:26   ` Matthew Wilcox
  1 sibling, 0 replies; 24+ messages in thread
From: syzbot @ 2023-09-11  7:48 UTC (permalink / raw)
  To: akpm, fengwei.yin, linux-kernel, linux-mm, syzkaller-bugs, willy

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com

Tested on:

commit:         0bb80ecc Linux 6.6-rc1
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=174c0ad8680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=13f2a37749f07ab2
dashboard link: https://syzkaller.appspot.com/bug?extid=55cc72f8cc3a549119df
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=1421990c680000

Note: testing is done by a robot and is best-effort only.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11  7:12 ` Yin Fengwei
  2023-09-11  7:48   ` syzbot
@ 2023-09-11 13:26   ` Matthew Wilcox
  2023-09-11 14:00     ` syzbot
  2023-09-11 15:34     ` Dave Hansen
  1 sibling, 2 replies; 24+ messages in thread
From: Matthew Wilcox @ 2023-09-11 13:26 UTC (permalink / raw)
  To: Yin Fengwei; +Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On Mon, Sep 11, 2023 at 03:12:27PM +0800, Yin Fengwei wrote:
>  
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +               pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +       bool protnone = (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
> +                       == _PAGE_PROTNONE;
> +
> +       page_table_check_ptes_set(mm, ptep, pte, nr);
> +
> +       for(;;) {
> +               native_set_pte(ptep, pte);
> +               if (--nr == 0)
> +                       break;
> +
> +               ptep++;
> +               if (protnone)
> +                       pte = __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
> +               else
> +                       pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> +       }
> +}
> +#define set_ptes set_ptes

Thanks for figuring this out.  I don't think I would have been able to!

I think this solution probably breaks pgtable-2level configs,
unfortunately.  How about this?  If other architectures decide to adopt
the inverted page table entry in the future, it'll work for them too.

#syz test

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index e9482a11ac52..a89be3e9b032 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -123,9 +123,6 @@ static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask)
 	return val;
 }
 
-static inline bool __pte_needs_invert(u64 val)
-{
-	return false;
-}
+#define __pte_needs_invert(val)	false
 
 #endif /* _ASM_X86_PGTABLE_2LEVEL_H */
diff --git a/arch/x86/include/asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h
index a0c1525f1b6f..f21726add655 100644
--- a/arch/x86/include/asm/pgtable-invert.h
+++ b/arch/x86/include/asm/pgtable-invert.h
@@ -17,6 +17,7 @@ static inline bool __pte_needs_invert(u64 val)
 {
 	return val && !(val & _PAGE_PRESENT);
 }
+#define __pte_needs_invert __pte_needs_invert
 
 /* Get a mask to xor with the page table entry to get the correct pfn. */
 static inline u64 protnone_mask(u64 val)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 1fba072b3dac..34b12e94b850 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -205,6 +205,10 @@ static inline int pmd_young(pmd_t pmd)
 #define arch_flush_lazy_mmu_mode()	do {} while (0)
 #endif
 
+#ifndef __pte_needs_invert
+#define __pte_needs_invert(pte)	false
+#endif
+
 #ifndef set_ptes
 /**
  * set_ptes - Map consecutive pages to a contiguous range of addresses.
@@ -231,7 +235,10 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		if (--nr == 0)
 			break;
 		ptep++;
-		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
+		if (__pte_needs_invert(pte_val(pte)))
+			pte = __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
+		else
+			pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
 	}
 	arch_leave_lazy_mmu_mode();
 }


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11 13:26   ` Matthew Wilcox
@ 2023-09-11 14:00     ` syzbot
  2023-09-11 15:34     ` Dave Hansen
  1 sibling, 0 replies; 24+ messages in thread
From: syzbot @ 2023-09-11 14:00 UTC (permalink / raw)
  To: akpm, fengwei.yin, linux-kernel, linux-mm, syzkaller-bugs, willy

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com

Tested on:

commit:         0bb80ecc Linux 6.6-rc1
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1010b50c680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=13f2a37749f07ab2
dashboard link: https://syzkaller.appspot.com/bug?extid=55cc72f8cc3a549119df
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=155d6578680000

Note: testing is done by a robot and is best-effort only.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11 13:26   ` Matthew Wilcox
  2023-09-11 14:00     ` syzbot
@ 2023-09-11 15:34     ` Dave Hansen
  2023-09-11 16:44       ` Matthew Wilcox
  1 sibling, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2023-09-11 15:34 UTC (permalink / raw)
  To: Matthew Wilcox, Yin Fengwei
  Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On 9/11/23 06:26, Matthew Wilcox wrote:
> @@ -231,7 +235,10 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>  		if (--nr == 0)
>  			break;
>  		ptep++;
> -		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> +		if (__pte_needs_invert(pte_val(pte)))
> +			pte = __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
> +		else
> +			pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
>  	}
>  	arch_leave_lazy_mmu_mode();
>  }

This is much better than a whole x86 fork of set_ptes().  But it's still
a bit wonky because it exposes the PTE inversion logic to generic code.

Could we do something like this instead?  It'll (probably) end up
repeating the PTE inversion logic each way though the loop, so it's less
efficient than what you have above.  But unless I buggered something, it
"just works" without exposing any of the inversion logic to generic code.

The trick is that pte_pfn() undoes the inversion and then pfn_pte()
re-does it on each trip through the loop.

static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
                pte_t *ptep, pte_t pte, unsigned int nr)
{
	pgprot_t prot = pte_pgprot(x);
	unsigned long pfn = pte_pfn(pte);

        page_table_check_ptes_set(mm, ptep, pte, nr);

        arch_enter_lazy_mmu_mode();
        for (;;) {
                set_pte(ptep, pte);
                if (--nr == 0)
                        break;
                ptep++;
		pfn++;
                pte = pfn_pte(pfn, pgprot);
        }
        arch_leave_lazy_mmu_mode();
}

Obviously completely untested. :)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11 15:34     ` Dave Hansen
@ 2023-09-11 16:44       ` Matthew Wilcox
  2023-09-11 16:55         ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2023-09-11 16:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Yin Fengwei, syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On Mon, Sep 11, 2023 at 08:34:57AM -0700, Dave Hansen wrote:
> On 9/11/23 06:26, Matthew Wilcox wrote:
> > @@ -231,7 +235,10 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> >  		if (--nr == 0)
> >  			break;
> >  		ptep++;
> > -		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> > +		if (__pte_needs_invert(pte_val(pte)))
> > +			pte = __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
> > +		else
> > +			pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> >  	}
> >  	arch_leave_lazy_mmu_mode();
> >  }
> 
> This is much better than a whole x86 fork of set_ptes().  But it's still
> a bit wonky because it exposes the PTE inversion logic to generic code.

I saw that as an advantage ... let people know that it exists as a
concept.

> static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>                 pte_t *ptep, pte_t pte, unsigned int nr)
> {
> 	pgprot_t prot = pte_pgprot(x);
> 	unsigned long pfn = pte_pfn(pte);
> 
>         page_table_check_ptes_set(mm, ptep, pte, nr);
> 
>         arch_enter_lazy_mmu_mode();
>         for (;;) {
>                 set_pte(ptep, pte);
>                 if (--nr == 0)
>                         break;
>                 ptep++;
> 		pfn++;
>                 pte = pfn_pte(pfn, pgprot);
>         }
>         arch_leave_lazy_mmu_mode();
> }
> 
> Obviously completely untested. :)

After fixing your two typos, this assembles to 176 bytes more code than
my version.  Not sure that's great.

How about this?  Keeps the inverted knowledge entirely in arch/x86.
Compiles to exactly the same code as the version I sent earlier.

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index d6ad98ca1288..c9781b8b14af 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -955,6 +955,14 @@ static inline int pte_same(pte_t a, pte_t b)
 	return a.pte == b.pte;
 }
 
+static inline pte_t pte_next(pte_t pte)
+{
+	if (__pte_needs_invert(pte_val(pte)))
+		return __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
+	return __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
+}
+#define pte_next	pte_next
+
 static inline int pte_present(pte_t a)
 {
 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 1fba072b3dac..7a932ed59c27 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -205,6 +205,10 @@ static inline int pmd_young(pmd_t pmd)
 #define arch_flush_lazy_mmu_mode()	do {} while (0)
 #endif
 
+#ifndef pte_next
+#define pte_next(pte)	((pte) + (1UL << PFN_PTE_SHIFT))
+#endif
+
 #ifndef set_ptes
 /**
  * set_ptes - Map consecutive pages to a contiguous range of addresses.
@@ -231,7 +235,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		if (--nr == 0)
 			break;
 		ptep++;
-		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
+		pte = pte_next(pte);
 	}
 	arch_leave_lazy_mmu_mode();
 }


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11 16:44       ` Matthew Wilcox
@ 2023-09-11 16:55         ` Dave Hansen
  2023-09-11 19:12           ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2023-09-11 16:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yin Fengwei, syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On 9/11/23 09:44, Matthew Wilcox wrote:
>> static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>>                 pte_t *ptep, pte_t pte, unsigned int nr)
>> {
>> 	pgprot_t prot = pte_pgprot(x);
>> 	unsigned long pfn = pte_pfn(pte);
>>
>>         page_table_check_ptes_set(mm, ptep, pte, nr);
>>
>>         arch_enter_lazy_mmu_mode();
>>         for (;;) {
>>                 set_pte(ptep, pte);
>>                 if (--nr == 0)
>>                         break;
>>                 ptep++;
>> 		pfn++;
>>                 pte = pfn_pte(pfn, pgprot);
>>         }
>>         arch_leave_lazy_mmu_mode();
>> }
>>
>> Obviously completely untested. 😄
> After fixing your two typos, this assembles to 176 bytes more code than
> my version.  Not sure that's great.

Heh, only two? ;)

Maybe I'm a fool, but 176 bytes of text bloat isn't scaring me off too
much.  I'd much rather have that than another window into x86 goofiness
to maintain.

Does that 176 bytes translate into meaningful performance, or is it just
a bunch of register bit twiddling that the CPU will sail through?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11 16:55         ` Dave Hansen
@ 2023-09-11 19:12           ` Matthew Wilcox
  2023-09-11 20:22             ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2023-09-11 19:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Yin Fengwei, syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On Mon, Sep 11, 2023 at 09:55:37AM -0700, Dave Hansen wrote:
> On 9/11/23 09:44, Matthew Wilcox wrote:
> > After fixing your two typos, this assembles to 176 bytes more code than
> > my version.  Not sure that's great.
> 
> Maybe I'm a fool, but 176 bytes of text bloat isn't scaring me off too
> much.  I'd much rather have that than another window into x86 goofiness
> to maintain.
> 
> Does that 176 bytes translate into meaningful performance, or is it just
> a bunch of register bit twiddling that the CPU will sail through?

I'm ... not sure how to tell.  It's 1120 bytes vs 944 bytes and crawling
through that much x86 assembly isn't my idea of a great time.  I can
send you objdump -dr for all three options if you like?  Maybe there's
a quick way to compare them that I've never known about.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11 19:12           ` Matthew Wilcox
@ 2023-09-11 20:22             ` Dave Hansen
  2023-09-12  4:59               ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2023-09-11 20:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yin Fengwei, syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On 9/11/23 12:12, Matthew Wilcox wrote:
> On Mon, Sep 11, 2023 at 09:55:37AM -0700, Dave Hansen wrote:
>> On 9/11/23 09:44, Matthew Wilcox wrote:
>>> After fixing your two typos, this assembles to 176 bytes more code than
>>> my version.  Not sure that's great.
>> Maybe I'm a fool, but 176 bytes of text bloat isn't scaring me off too
>> much.  I'd much rather have that than another window into x86 goofiness
>> to maintain.
>>
>> Does that 176 bytes translate into meaningful performance, or is it just
>> a bunch of register bit twiddling that the CPU will sail through?
> I'm ... not sure how to tell.  It's 1120 bytes vs 944 bytes and crawling
> through that much x86 assembly isn't my idea of a great time.  I can
> send you objdump -dr for all three options if you like?  Maybe there's
> a quick way to compare them that I've never known about.

Working patches would be great if you're got 'em handy, plus your
.config and generally what compiler you're on.

I'll see if there's anything silly happening that's causing the
generated code to blow up.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-11 20:22             ` Dave Hansen
@ 2023-09-12  4:59               ` Matthew Wilcox
  2023-09-12 16:07                 ` Dave Hansen
                                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Matthew Wilcox @ 2023-09-12  4:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Yin Fengwei, syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On Mon, Sep 11, 2023 at 01:22:51PM -0700, Dave Hansen wrote:
> On 9/11/23 12:12, Matthew Wilcox wrote:
> > On Mon, Sep 11, 2023 at 09:55:37AM -0700, Dave Hansen wrote:
> >> On 9/11/23 09:44, Matthew Wilcox wrote:
> >>> After fixing your two typos, this assembles to 176 bytes more code than
> >>> my version.  Not sure that's great.
> >> Maybe I'm a fool, but 176 bytes of text bloat isn't scaring me off too
> >> much.  I'd much rather have that than another window into x86 goofiness
> >> to maintain.
> >>
> >> Does that 176 bytes translate into meaningful performance, or is it just
> >> a bunch of register bit twiddling that the CPU will sail through?
> > I'm ... not sure how to tell.  It's 1120 bytes vs 944 bytes and crawling
> > through that much x86 assembly isn't my idea of a great time.  I can
> > send you objdump -dr for all three options if you like?  Maybe there's
> > a quick way to compare them that I've never known about.
> 
> Working patches would be great if you're got 'em handy, plus your
> .config and generally what compiler you're on.

gcc (Debian 13.2.0-2) 13.2.0

I don't think there's anything particularly strange about my .config

If you compile this patch as-is, you'll get your preferred code.
Remove the #define DH and you get mine.

I would say that 176 bytes is 3 cachelines of I$, which isn't free,
even if all the insns in it can be executed while the CPU is waiting
for cache misses.  This ought to be a pretty tight loop anyway; we're
just filling in adjacent PTEs.  There may not be many spare cycles
for "free" uops to execute.

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index d6ad98ca1288..c9781b8b14af 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -955,6 +955,14 @@ static inline int pte_same(pte_t a, pte_t b)
 	return a.pte == b.pte;
 }
 
+static inline pte_t pte_next(pte_t pte)
+{
+	if (__pte_needs_invert(pte_val(pte)))
+		return __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
+	return __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
+}
+#define pte_next	pte_next
+
 static inline int pte_present(pte_t a)
 {
 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 1fba072b3dac..25333cf3c865 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -205,6 +205,10 @@ static inline int pmd_young(pmd_t pmd)
 #define arch_flush_lazy_mmu_mode()	do {} while (0)
 #endif
 
+#ifndef pte_next
+#define pte_next(pte)	((pte) + (1UL << PFN_PTE_SHIFT))
+#endif
+
 #ifndef set_ptes
 /**
  * set_ptes - Map consecutive pages to a contiguous range of addresses.
@@ -223,6 +227,11 @@ static inline int pmd_young(pmd_t pmd)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_t *ptep, pte_t pte, unsigned int nr)
 {
+#define DH
+#ifdef DH
+	pgprot_t prot = pte_pgprot(pte);
+	unsigned long pfn = pte_pfn(pte);
+#endif
 	page_table_check_ptes_set(mm, ptep, pte, nr);
 
 	arch_enter_lazy_mmu_mode();
@@ -231,7 +240,12 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		if (--nr == 0)
 			break;
 		ptep++;
-		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
+#ifdef DH
+		pfn++;
+		pte = pfn_pte(pfn, prot);
+#else
+		pte = pte_next(pte);
+#endif
 	}
 	arch_leave_lazy_mmu_mode();
 }


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-12  4:59               ` Matthew Wilcox
@ 2023-09-12 16:07                 ` Dave Hansen
  2023-09-12 18:01                 ` Dave Hansen
  2023-09-14  7:33                 ` Yin Fengwei
  2 siblings, 0 replies; 24+ messages in thread
From: Dave Hansen @ 2023-09-12 16:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yin Fengwei, syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On 9/11/23 21:59, Matthew Wilcox wrote:
> I don't think there's anything particularly strange about my .config

I just saw some DEBUG_VM #ifdefs around the area and wondered if any of
them were to blame for the bloat.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-12  4:59               ` Matthew Wilcox
  2023-09-12 16:07                 ` Dave Hansen
@ 2023-09-12 18:01                 ` Dave Hansen
  2023-09-14  7:33                 ` Yin Fengwei
  2 siblings, 0 replies; 24+ messages in thread
From: Dave Hansen @ 2023-09-12 18:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yin Fengwei, syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On 9/11/23 21:59, Matthew Wilcox wrote:
> On Mon, Sep 11, 2023 at 01:22:51PM -0700, Dave Hansen wrote:
>> On 9/11/23 12:12, Matthew Wilcox wrote:
>>> On Mon, Sep 11, 2023 at 09:55:37AM -0700, Dave Hansen wrote:
>>>> On 9/11/23 09:44, Matthew Wilcox wrote:
>>>>> After fixing your two typos, this assembles to 176 bytes more code than
>>>>> my version.  Not sure that's great.
>>>> Maybe I'm a fool, but 176 bytes of text bloat isn't scaring me off too
>>>> much.  I'd much rather have that than another window into x86 goofiness
>>>> to maintain.
>>>>
>>>> Does that 176 bytes translate into meaningful performance, or is it just
>>>> a bunch of register bit twiddling that the CPU will sail through?
>>> I'm ... not sure how to tell.  It's 1120 bytes vs 944 bytes and crawling
>>> through that much x86 assembly isn't my idea of a great time.  I can
>>> send you objdump -dr for all three options if you like?  Maybe there's
>>> a quick way to compare them that I've never known about.
>> Working patches would be great if you're got 'em handy, plus your
>> .config and generally what compiler you're on.
> gcc (Debian 13.2.0-2) 13.2.0
> 
> I don't think there's anything particularly strange about my .config
> 
> If you compile this patch as-is, you'll get your preferred code.
> Remove the #define DH and you get mine.
> 
> I would say that 176 bytes is 3 cachelines of I$, which isn't free,
> even if all the insns in it can be executed while the CPU is waiting
> for cache misses.  This ought to be a pretty tight loop anyway; we're
> just filling in adjacent PTEs.  There may not be many spare cycles
> for "free" uops to execute.

Thanks for that!

I went poking at it a bit.  One remarkable thing is how many pv_ops
calls there are.  Those are definitely keeping the compiler from helping
is out here too much.

Your version has 9 pv_ops calls while mine has 6.  So mine may have more
instructions in _this_ function, but it could easily be made up for by
call overhead and extra instructions in the pv_ops.

Also, I went looking for a way to poke at set_ptes() and profile it a
bit and get some actual numbers.  It seems like in most cases it would
be limited to use via fault around.  Is there some other way to poke at
it easily?

So, in the end, I see code which is not (as far as I can see) in a hot
path, and (again, to me) there's no compelling performance argument one
way or another.

I still like my version.  *Known* simplicity and uniformity win out in
my book over unknown performance benefits.

But, fixing the bug is the most important thing.  I don't feel strongly
about it to NAK your version either.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-12  4:59               ` Matthew Wilcox
  2023-09-12 16:07                 ` Dave Hansen
  2023-09-12 18:01                 ` Dave Hansen
@ 2023-09-14  7:33                 ` Yin Fengwei
  2023-09-14  8:37                   ` Yin Fengwei
  2023-09-19  1:11                   ` Yin Fengwei
  2 siblings, 2 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-09-14  7:33 UTC (permalink / raw)
  To: Matthew Wilcox, Dave Hansen
  Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

Hi Matthew,

On 9/12/23 12:59, Matthew Wilcox wrote:
> On Mon, Sep 11, 2023 at 01:22:51PM -0700, Dave Hansen wrote:
>> On 9/11/23 12:12, Matthew Wilcox wrote:
>>> On Mon, Sep 11, 2023 at 09:55:37AM -0700, Dave Hansen wrote:
>>>> On 9/11/23 09:44, Matthew Wilcox wrote:
>>>>> After fixing your two typos, this assembles to 176 bytes more code than
>>>>> my version.  Not sure that's great.
>>>> Maybe I'm a fool, but 176 bytes of text bloat isn't scaring me off too
>>>> much.  I'd much rather have that than another window into x86 goofiness
>>>> to maintain.
>>>>
>>>> Does that 176 bytes translate into meaningful performance, or is it just
>>>> a bunch of register bit twiddling that the CPU will sail through?
>>> I'm ... not sure how to tell.  It's 1120 bytes vs 944 bytes and crawling
>>> through that much x86 assembly isn't my idea of a great time.  I can
>>> send you objdump -dr for all three options if you like?  Maybe there's
>>> a quick way to compare them that I've never known about.
>>
>> Working patches would be great if you're got 'em handy, plus your
>> .config and generally what compiler you're on.
> 
> gcc (Debian 13.2.0-2) 13.2.0
> 
> I don't think there's anything particularly strange about my .config
> 
> If you compile this patch as-is, you'll get your preferred code.
> Remove the #define DH and you get mine.
> 
> I would say that 176 bytes is 3 cachelines of I$, which isn't free,
> even if all the insns in it can be executed while the CPU is waiting
> for cache misses.  This ought to be a pretty tight loop anyway; we're
> just filling in adjacent PTEs.  There may not be many spare cycles
> for "free" uops to execute.
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index d6ad98ca1288..c9781b8b14af 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -955,6 +955,14 @@ static inline int pte_same(pte_t a, pte_t b)
>  	return a.pte == b.pte;
>  }
>  
> +static inline pte_t pte_next(pte_t pte)
> +{
> +	if (__pte_needs_invert(pte_val(pte)))
> +		return __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
> +	return __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> +}
> +#define pte_next	pte_next
> +
>  static inline int pte_present(pte_t a)
>  {
>  	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 1fba072b3dac..25333cf3c865 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -205,6 +205,10 @@ static inline int pmd_young(pmd_t pmd)
>  #define arch_flush_lazy_mmu_mode()	do {} while (0)
>  #endif
>  
> +#ifndef pte_next
> +#define pte_next(pte)	((pte) + (1UL << PFN_PTE_SHIFT))
> +#endif
> +
>  #ifndef set_ptes
>  /**
>   * set_ptes - Map consecutive pages to a contiguous range of addresses.
> @@ -223,6 +227,11 @@ static inline int pmd_young(pmd_t pmd)
>  static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>  		pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> +#define DH
> +#ifdef DH
> +	pgprot_t prot = pte_pgprot(pte);
> +	unsigned long pfn = pte_pfn(pte);
> +#endif
>  	page_table_check_ptes_set(mm, ptep, pte, nr);
>  
>  	arch_enter_lazy_mmu_mode();
> @@ -231,7 +240,12 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>  		if (--nr == 0)
>  			break;
>  		ptep++;
> -		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> +#ifdef DH
> +		pfn++;
> +		pte = pfn_pte(pfn, prot);
> +#else
> +		pte = pte_next(pte);
> +#endif
>  	}
>  	arch_leave_lazy_mmu_mode();
>  }

I checked the commit message of 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c:
    The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
    pte/pmd/pud_pfn undo it.
    
    This assume that no code path touches the PFN part of a PTE directly
    without using these primitives.

So maybe we should always use these APIs even we make x86 specific set_ptes()?

I will find a test machine to measure the performance difference of these two
versions by using xfs + will-it-scale. Will keep you guys updated.


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-14  7:33                 ` Yin Fengwei
@ 2023-09-14  8:37                   ` Yin Fengwei
  2023-09-19  1:11                   ` Yin Fengwei
  1 sibling, 0 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-09-14  8:37 UTC (permalink / raw)
  To: Matthew Wilcox, Dave Hansen
  Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs



On 9/14/23 15:33, Yin Fengwei wrote:
> Hi Matthew,
> 
> On 9/12/23 12:59, Matthew Wilcox wrote:
>> On Mon, Sep 11, 2023 at 01:22:51PM -0700, Dave Hansen wrote:
>>> On 9/11/23 12:12, Matthew Wilcox wrote:
>>>> On Mon, Sep 11, 2023 at 09:55:37AM -0700, Dave Hansen wrote:
>>>>> On 9/11/23 09:44, Matthew Wilcox wrote:
>>>>>> After fixing your two typos, this assembles to 176 bytes more code than
>>>>>> my version.  Not sure that's great.
>>>>> Maybe I'm a fool, but 176 bytes of text bloat isn't scaring me off too
>>>>> much.  I'd much rather have that than another window into x86 goofiness
>>>>> to maintain.
>>>>>
>>>>> Does that 176 bytes translate into meaningful performance, or is it just
>>>>> a bunch of register bit twiddling that the CPU will sail through?
>>>> I'm ... not sure how to tell.  It's 1120 bytes vs 944 bytes and crawling
>>>> through that much x86 assembly isn't my idea of a great time.  I can
>>>> send you objdump -dr for all three options if you like?  Maybe there's
>>>> a quick way to compare them that I've never known about.
>>>
>>> Working patches would be great if you're got 'em handy, plus your
>>> .config and generally what compiler you're on.
>>
>> gcc (Debian 13.2.0-2) 13.2.0
>>
>> I don't think there's anything particularly strange about my .config
>>
>> If you compile this patch as-is, you'll get your preferred code.
>> Remove the #define DH and you get mine.
>>
>> I would say that 176 bytes is 3 cachelines of I$, which isn't free,
>> even if all the insns in it can be executed while the CPU is waiting
>> for cache misses.  This ought to be a pretty tight loop anyway; we're
>> just filling in adjacent PTEs.  There may not be many spare cycles
>> for "free" uops to execute.
>>
>> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
>> index d6ad98ca1288..c9781b8b14af 100644
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -955,6 +955,14 @@ static inline int pte_same(pte_t a, pte_t b)
>>  	return a.pte == b.pte;
>>  }
>>  
>> +static inline pte_t pte_next(pte_t pte)
>> +{
>> +	if (__pte_needs_invert(pte_val(pte)))
>> +		return __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
>> +	return __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
>> +}
>> +#define pte_next	pte_next
>> +
>>  static inline int pte_present(pte_t a)
>>  {
>>  	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index 1fba072b3dac..25333cf3c865 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -205,6 +205,10 @@ static inline int pmd_young(pmd_t pmd)
>>  #define arch_flush_lazy_mmu_mode()	do {} while (0)
>>  #endif
>>  
>> +#ifndef pte_next
>> +#define pte_next(pte)	((pte) + (1UL << PFN_PTE_SHIFT))
>> +#endif
>> +
>>  #ifndef set_ptes
>>  /**
>>   * set_ptes - Map consecutive pages to a contiguous range of addresses.
>> @@ -223,6 +227,11 @@ static inline int pmd_young(pmd_t pmd)
>>  static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>>  		pte_t *ptep, pte_t pte, unsigned int nr)
>>  {
>> +#define DH
>> +#ifdef DH
>> +	pgprot_t prot = pte_pgprot(pte);
>> +	unsigned long pfn = pte_pfn(pte);
>> +#endif
>>  	page_table_check_ptes_set(mm, ptep, pte, nr);
>>  
>>  	arch_enter_lazy_mmu_mode();
>> @@ -231,7 +240,12 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>>  		if (--nr == 0)
>>  			break;
>>  		ptep++;
>> -		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
>> +#ifdef DH
>> +		pfn++;
>> +		pte = pfn_pte(pfn, prot);
>> +#else
>> +		pte = pte_next(pte);
>> +#endif
>>  	}
>>  	arch_leave_lazy_mmu_mode();
>>  }
> 
> I checked the commit message of 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c:
>     The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
>     pte/pmd/pud_pfn undo it.
>     
>     This assume that no code path touches the PFN part of a PTE directly
>     without using these primitives.
> 
> So maybe we should always use these APIs even we make x86 specific set_ptes()?
> 
> I will find a test machine to measure the performance difference of these two
> versions by using xfs + will-it-scale. Will keep you guys updated.
I run the test from here (https://github.com/antonblanchard/will-it-scale/pull/37)
on an IceLake with 48C/96T + 192G RAM.


The host filesystem is ext4 (I can't change it to xfs). So I create a diskimage,
format it as xfs and mount it to test directory.


The test result is like following:
	Matthew's version			Dave's version
run1	379045929				375241566
run2	377870413				373950068
run3	378623159				371884035
run4	376890127				372391340
avg	378107407				373366752.3			-1.23%
stddev	0.20%					0.40%

run1,2,3,4 uses: page_fault4_processes -s 2 -t 96


run5	9696280					9599164
run6	9683840					9579984
run7	9684832					9595912
run8	9697936					9617408
avg	9690722					9598117				-0.96%
stddev	0%					0%

run5,6,7,8 uses: page_fault4_processes -s 2 -t 1


Conclusion: Dave's version is a little slower than Matthew's version. But the difference
is very small from what I can tell. Let me know if you have any question. Thanks.


Regards
Yin, Fengwei

> 
> 
> Regards
> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-14  7:33                 ` Yin Fengwei
  2023-09-14  8:37                   ` Yin Fengwei
@ 2023-09-19  1:11                   ` Yin Fengwei
  2023-09-19 16:11                     ` Dave Hansen
  1 sibling, 1 reply; 24+ messages in thread
From: Yin Fengwei @ 2023-09-19  1:11 UTC (permalink / raw)
  To: Matthew Wilcox, Dave Hansen
  Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs, Yin Fengwei

Hi Matthew,

On 9/14/23 15:33, Yin Fengwei wrote:
> Hi Matthew,
> 
> On 9/12/23 12:59, Matthew Wilcox wrote:
>> On Mon, Sep 11, 2023 at 01:22:51PM -0700, Dave Hansen wrote:
>>> On 9/11/23 12:12, Matthew Wilcox wrote:
>>>> On Mon, Sep 11, 2023 at 09:55:37AM -0700, Dave Hansen wrote:
>>>>> On 9/11/23 09:44, Matthew Wilcox wrote:
>>>>>> After fixing your two typos, this assembles to 176 bytes more code than
>>>>>> my version.  Not sure that's great.
>>>>> Maybe I'm a fool, but 176 bytes of text bloat isn't scaring me off too
>>>>> much.  I'd much rather have that than another window into x86 goofiness
>>>>> to maintain.
>>>>>
>>>>> Does that 176 bytes translate into meaningful performance, or is it just
>>>>> a bunch of register bit twiddling that the CPU will sail through?
>>>> I'm ... not sure how to tell.  It's 1120 bytes vs 944 bytes and crawling
>>>> through that much x86 assembly isn't my idea of a great time.  I can
>>>> send you objdump -dr for all three options if you like?  Maybe there's
>>>> a quick way to compare them that I've never known about.
>>>
>>> Working patches would be great if you're got 'em handy, plus your
>>> .config and generally what compiler you're on.
>>
>> gcc (Debian 13.2.0-2) 13.2.0
>>
>> I don't think there's anything particularly strange about my .config
>>
>> If you compile this patch as-is, you'll get your preferred code.
>> Remove the #define DH and you get mine.
>>
>> I would say that 176 bytes is 3 cachelines of I$, which isn't free,
>> even if all the insns in it can be executed while the CPU is waiting
>> for cache misses.  This ought to be a pretty tight loop anyway; we're
>> just filling in adjacent PTEs.  There may not be many spare cycles
>> for "free" uops to execute.
>>
>> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
>> index d6ad98ca1288..c9781b8b14af 100644
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -955,6 +955,14 @@ static inline int pte_same(pte_t a, pte_t b)
>>  	return a.pte == b.pte;
>>  }
>>  
>> +static inline pte_t pte_next(pte_t pte)
>> +{
>> +	if (__pte_needs_invert(pte_val(pte)))
>> +		return __pte(pte_val(pte) - (1UL << PFN_PTE_SHIFT));
>> +	return __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
>> +}
>> +#define pte_next	pte_next
>> +
>>  static inline int pte_present(pte_t a)
>>  {
>>  	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index 1fba072b3dac..25333cf3c865 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -205,6 +205,10 @@ static inline int pmd_young(pmd_t pmd)
>>  #define arch_flush_lazy_mmu_mode()	do {} while (0)
>>  #endif
>>  
>> +#ifndef pte_next
>> +#define pte_next(pte)	((pte) + (1UL << PFN_PTE_SHIFT))
>> +#endif
>> +
>>  #ifndef set_ptes
>>  /**
>>   * set_ptes - Map consecutive pages to a contiguous range of addresses.
>> @@ -223,6 +227,11 @@ static inline int pmd_young(pmd_t pmd)
>>  static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>>  		pte_t *ptep, pte_t pte, unsigned int nr)
>>  {
>> +#define DH
>> +#ifdef DH
>> +	pgprot_t prot = pte_pgprot(pte);
>> +	unsigned long pfn = pte_pfn(pte);
>> +#endif
>>  	page_table_check_ptes_set(mm, ptep, pte, nr);
>>  
>>  	arch_enter_lazy_mmu_mode();
>> @@ -231,7 +240,12 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>>  		if (--nr == 0)
>>  			break;
>>  		ptep++;
>> -		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
>> +#ifdef DH
>> +		pfn++;
>> +		pte = pfn_pte(pfn, prot);
>> +#else
>> +		pte = pte_next(pte);
>> +#endif
>>  	}
>>  	arch_leave_lazy_mmu_mode();
>>  }
> 
> I checked the commit message of 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c:
>     The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
>     pte/pmd/pud_pfn undo it.
>     
>     This assume that no code path touches the PFN part of a PTE directly
>     without using these primitives.
> 
> So maybe we should always use these APIs even we make x86 specific set_ptes()?
> 
> I will find a test machine to measure the performance difference of these two
> versions by using xfs + will-it-scale. Will keep you guys updated.
I'd like to move this bug fixing forward. Based on the test result here:
https://lore.kernel.org/linux-mm/124631ab-eb4c-6584-12d4-f3c91e69c873@intel.com/
There is very small performance delta between your version and Dave's.

What do you think if we propose to merge Dave's version? Or do I need collect
more data? Thanks.


Regards
Yin, Fengwei

> 
> 
> Regards
> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-19  1:11                   ` Yin Fengwei
@ 2023-09-19 16:11                     ` Dave Hansen
  2023-09-20  1:29                       ` Yin Fengwei
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2023-09-19 16:11 UTC (permalink / raw)
  To: Yin Fengwei, Matthew Wilcox
  Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On 9/18/23 18:11, Yin Fengwei wrote:
>> I will find a test machine to measure the performance difference of these two
>> versions by using xfs + will-it-scale. Will keep you guys updated.
> I'd like to move this bug fixing forward. Based on the test result here:
> https://lore.kernel.org/linux-mm/124631ab-eb4c-6584-12d4-f3c91e69c873@intel.com/
> There is very small performance delta between your version and Dave's.
> 
> What do you think if we propose to merge Dave's version? Or do I need collect
> more data? Thanks.

I honestly don't feel that strongly about my version versus Matthew's.
I like mine, but I'll happily ack either approach.

The thing I care about the most is getting the bug fixed ... quickly. :)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-19 16:11                     ` Dave Hansen
@ 2023-09-20  1:29                       ` Yin Fengwei
  2023-09-20  1:47                         ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: Yin Fengwei @ 2023-09-20  1:29 UTC (permalink / raw)
  To: Dave Hansen, Matthew Wilcox
  Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs



On 9/20/23 00:11, Dave Hansen wrote:
> On 9/18/23 18:11, Yin Fengwei wrote:
>>> I will find a test machine to measure the performance difference of these two
>>> versions by using xfs + will-it-scale. Will keep you guys updated.
>> I'd like to move this bug fixing forward. Based on the test result here:
>> https://lore.kernel.org/linux-mm/124631ab-eb4c-6584-12d4-f3c91e69c873@intel.com/
>> There is very small performance delta between your version and Dave's.
>>
>> What do you think if we propose to merge Dave's version? Or do I need collect
>> more data? Thanks.
> 
> I honestly don't feel that strongly about my version versus Matthew's.
> I like mine, but I'll happily ack either approach.
> 
> The thing I care about the most is getting the bug fixed ... quickly. :)
Same in my side.

Regarding the performance delta is very small, I thought we should follow the
commit message of 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c:
    The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
    pte/pmd/pud_pfn undo it.
    
    This assume that no code path touches the PFN part of a PTE directly
    without using these primitives.


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [syzbot] [mm?] BUG: Bad page map (7)
  2023-09-20  1:29                       ` Yin Fengwei
@ 2023-09-20  1:47                         ` Matthew Wilcox
  0 siblings, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2023-09-20  1:47 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: Dave Hansen, syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On Wed, Sep 20, 2023 at 09:29:18AM +0800, Yin Fengwei wrote:
> 
> 
> On 9/20/23 00:11, Dave Hansen wrote:
> > On 9/18/23 18:11, Yin Fengwei wrote:
> >>> I will find a test machine to measure the performance difference of these two
> >>> versions by using xfs + will-it-scale. Will keep you guys updated.
> >> I'd like to move this bug fixing forward. Based on the test result here:
> >> https://lore.kernel.org/linux-mm/124631ab-eb4c-6584-12d4-f3c91e69c873@intel.com/
> >> There is very small performance delta between your version and Dave's.
> >>
> >> What do you think if we propose to merge Dave's version? Or do I need collect
> >> more data? Thanks.
> > 
> > I honestly don't feel that strongly about my version versus Matthew's.
> > I like mine, but I'll happily ack either approach.
> > 
> > The thing I care about the most is getting the bug fixed ... quickly. :)
> Same in my side.

I'm just redoing the commit message now.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-09-20  1:47 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-09 17:12 [syzbot] [mm?] BUG: Bad page map (7) syzbot
2023-09-10  3:02 ` Matthew Wilcox
2023-09-10  3:29   ` syzbot
2023-09-10  3:40   ` Yin, Fengwei
2023-09-11  7:24   ` Yin Fengwei
2023-09-11  7:32     ` Yin Fengwei
2023-09-11  7:12 ` Yin Fengwei
2023-09-11  7:48   ` syzbot
2023-09-11 13:26   ` Matthew Wilcox
2023-09-11 14:00     ` syzbot
2023-09-11 15:34     ` Dave Hansen
2023-09-11 16:44       ` Matthew Wilcox
2023-09-11 16:55         ` Dave Hansen
2023-09-11 19:12           ` Matthew Wilcox
2023-09-11 20:22             ` Dave Hansen
2023-09-12  4:59               ` Matthew Wilcox
2023-09-12 16:07                 ` Dave Hansen
2023-09-12 18:01                 ` Dave Hansen
2023-09-14  7:33                 ` Yin Fengwei
2023-09-14  8:37                   ` Yin Fengwei
2023-09-19  1:11                   ` Yin Fengwei
2023-09-19 16:11                     ` Dave Hansen
2023-09-20  1:29                       ` Yin Fengwei
2023-09-20  1:47                         ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox