On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote: > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote: > > Hello, it's too late since I sent previos patch. > > https://lkml.org/lkml/2015/6/3/37 > > > > This patch is alomost new compared to previos approach. > > I think this is more simple, clear and easy to review. > > > > One thing I should notice is that I have tested this patch > > and couldn't find any critical problem so I rebased patchset > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal > > patchset. Unfortunately, I start to see sudden discarding of > > the page we shouldn't do. IOW, application's valid anonymous page > > was disappeared suddenly. > > > > When I look through THP changes, I think we could lose > > dirty bit of pte between freeze_page and unfreeze_page > > when we mark it as migration entry and restore it. > > So, I added below simple code without enough considering > > and cannot see the problem any more. > > I hope it's good hint to find right fix this problem. > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index d5ea516ffb54..e881c04f5950 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page, > > if (is_write_migration_entry(swp_entry)) > > entry = maybe_mkwrite(entry, vma); > > > > + if (PageDirty(page)) > > + SetPageDirty(page); > > The condition of PageDirty was typo. I didn't add the condition. > Just added. > > SetPageDirty(page); For the first step to find this bug, I removed all MADV_FREE related code in mmotm-2015-10-15-15-20. IOW, git checkout 54bad5da4834 (arm64: add pmd_[dirty|mkclean] for THP) so the tree doesn't have any core code of MADV_FREE. I tested following workloads in my KVM machine. 0. make memcg 1. limit memcg 2. fork several processes 3. each process allocates THP page and fill 4. increase limit of the memcg to swapoff successfully 5. swapoff 6. kill all of processes 7. goto 1 Within a few hours, I encounter following bug. Attached detailed boot log and dmesg result. Initializing cgroup subsys cpu Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw KERNEL supported cpus: Intel GenuineIntel x86/fpu: Legacy x87 FPU detected. x86/fpu: Using 'lazy' FPU context switches. e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [] down_read_trylock+0x9/0x30 PGD 0 Oops: 0000 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 26445 Comm: sh Not tainted 4.3.0-rc5-mm1-diet-meta+ #1545 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff8800b9af3480 ti: ffff88007fea0000 task.ti: ffff88007fea0000 RIP: 0010:[] [] down_read_trylock+0x9/0x30 RSP: 0018:ffff88007fea3648 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffea0002324900 RCX: ffff88007fea37e8 RDX: 0000000000000000 RSI: ffff88007fea36e8 RDI: 0000000000000008 RBP: ffff88007fea3648 R08: ffffffff818446a0 R09: ffff8800b9af4c80 R10: 0000000000000216 R11: 0000000000000001 R12: ffff88007f58d6e1 R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001 FS: 00007f0993e78740(0000) GS:ffff8800bfa20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 000000007edee000 CR4: 00000000000006a0 Stack: ffff88007fea3678 ffffffff81124ff0 ffffea0002324900 ffff88007fea36e8 ffff88009ffe8400 0000000000000000 ffff88007fea36c0 ffffffff81125733 ffff8800bfa34540 ffffffff8105dc9d ffffea0002324900 ffff88007fea37e8 Call Trace: [] page_lock_anon_vma_read+0x60/0x180 [] rmap_walk+0x1b3/0x3f0 [] ? finish_task_switch+0x5d/0x1f0 [] page_referenced+0x1a3/0x220 [] ? __page_check_address+0x1a0/0x1a0 [] ? page_get_anon_vma+0xd0/0xd0 [] ? anon_vma_ctor+0x40/0x40 [] shrink_page_list+0x5ab/0xde0 [] shrink_inactive_list+0x18c/0x4b0 [] shrink_lruvec+0x59d/0x740 [] shrink_zone+0x90/0x250 [] do_try_to_free_pages+0x12d/0x3b0 [] try_to_free_mem_cgroup_pages+0x9d/0x120 [] try_charge+0x163/0x700 [] mem_cgroup_do_precharge+0x54/0x70 [] mem_cgroup_can_attach+0x175/0x1b0 [] ? kernfs_iattrs.isra.6+0x37/0xd0 [] ? get_mctgt_type+0x320/0x320 [] cgroup_migrate+0x149/0x440 [] cgroup_attach_task+0x7c/0xe0 [] __cgroup_procs_write.isra.33+0x1d4/0x2b0 [] cgroup_tasks_write+0x10/0x20 [] cgroup_file_write+0x38/0xf0 [] kernfs_fop_write+0x11d/0x170 [] __vfs_write+0x28/0xe0 [] ? __fd_install+0x24/0xc0 [] ? percpu_down_read+0x21/0x50 [] vfs_write+0xa1/0x170 [] SyS_write+0x46/0xa0 [] entry_SYSCALL_64_fastpath+0x12/0x6a Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 RIP [] down_read_trylock+0x9/0x30 RSP CR2: 0000000000000008 BUG: unable to handle kernel ---[ end trace e81a82c8122b447d ]--- Kernel panic - not syncing: Fatal exception NULL pointer dereference at 0000000000000008 IP: [] down_read_trylock+0x9/0x30 PGD 0 Oops: 0000 [#2] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 10 PID: 59 Comm: khugepaged Tainted: G D 4.3.0-rc5-mm1-diet-meta+ #1545 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000 RIP: 0010:[] [] down_read_trylock+0x9/0x30 RSP: 0018:ffff8800b985f778 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffea0002321800 RCX: ffff8800b985f918 RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008 RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b9853240 R10: 000000000000ba03 R11: 0000000000000001 R12: ffff88007f58d6e1 R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8800bfb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0 Stack: ffff8800b985f7a8 ffffffff81124ff0 ffffea0002321800 ffff8800b985f818 ffff88009ffe8400 0000000000000000 ffff8800b985f7f0 ffffffff81125733 ffff8800bfb54540 ffffffff8105dc9d ffffea0002321800 ffff8800b985f918 Call Trace: [] page_lock_anon_vma_read+0x60/0x180 [] rmap_walk+0x1b3/0x3f0 [] ? finish_task_switch+0x5d/0x1f0 [] page_referenced+0x1a3/0x220 [] ? __page_check_address+0x1a0/0x1a0 [] ? page_get_anon_vma+0xd0/0xd0 [] ? anon_vma_ctor+0x40/0x40 [] shrink_page_list+0x5ab/0xde0 [] shrink_inactive_list+0x18c/0x4b0 [] shrink_lruvec+0x59d/0x740 [] shrink_zone+0x90/0x250 [] do_try_to_free_pages+0x12d/0x3b0 [] try_to_free_mem_cgroup_pages+0x9d/0x120 [] try_charge+0x163/0x700 [] ? schedule+0x33/0x80 [] mem_cgroup_try_charge+0x9f/0x1d0 [] khugepaged+0x7cc/0x1ac0 [] ? hrtick_update+0x1/0x70 [] ? prepare_to_wait_event+0xf0/0xf0 [] ? total_mapcount+0x70/0x70 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 RIP [] down_read_trylock+0x9/0x30 RSP CR2: 0000000000000008 ---[ end trace e81a82c8122b447e ]--- Shutting down cpus with NMI Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled