* [RESEND][PATCH v5 0/3] fix hugepage coredump
@ 2013-04-10 16:17 Naoya Horiguchi
2013-04-10 16:17 ` [RESEND][PATCH v5 1/3] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Naoya Horiguchi
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2013-04-10 16:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro,
Konstantin Khlebnikov, Michal Hocko, HATAYAMA Daisuke, linux-mm,
linux-kernel, Naoya Horiguchi
I forgot to add Reviewed/Acked. Please ignore my previous post.
Sorry for the noise.
-----
Hi,
Here is 5th version of hugepage coredump fix.
I changed the place to put swap entry check in 3/3,
and explained more in comment.
Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread* [RESEND][PATCH v5 1/3] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) 2013-04-10 16:17 [RESEND][PATCH v5 0/3] fix hugepage coredump Naoya Horiguchi @ 2013-04-10 16:17 ` Naoya Horiguchi 2013-04-10 21:46 ` David Rientjes 2013-04-10 16:17 ` [RESEND][PATCH v5 2/3] fix hugetlb memory check in vma_dump_size() Naoya Horiguchi 2013-04-10 16:17 ` [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() Naoya Horiguchi 2 siblings, 1 reply; 10+ messages in thread From: Naoya Horiguchi @ 2013-04-10 16:17 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, HATAYAMA Daisuke, linux-mm, linux-kernel, Naoya Horiguchi Currently we fail to include any data on hugepages into coredump, because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and mm->reserved_vm counter". This looks to me a serious regression, so let's fix it. ChangeLog v3: - move 'return 0' into a separate patch ChangeLog v2: - add 'return 0' in hugepage memory check Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: stable@vger.kernel.org --- fs/hugetlbfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c index 84e3d85..523464e 100644 --- v3.9-rc3.orig/fs/hugetlbfs/inode.c +++ v3.9-rc3/fs/hugetlbfs/inode.c @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) * way when do_mmap_pgoff unwinds (may be important on powerpc * and ia64). */ - vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP; + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND; vma->vm_ops = &hugetlb_vm_ops; if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT)) -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND][PATCH v5 1/3] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) 2013-04-10 16:17 ` [RESEND][PATCH v5 1/3] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Naoya Horiguchi @ 2013-04-10 21:46 ` David Rientjes 0 siblings, 0 replies; 10+ messages in thread From: David Rientjes @ 2013-04-10 21:46 UTC (permalink / raw) To: Naoya Horiguchi Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, HATAYAMA Daisuke, linux-mm, linux-kernel On Wed, 10 Apr 2013, Naoya Horiguchi wrote: > Currently we fail to include any data on hugepages into coredump, > because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently > introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and > mm->reserved_vm counter". This looks to me a serious regression, > so let's fix it. > > ChangeLog v3: > - move 'return 0' into a separate patch > > ChangeLog v2: > - add 'return 0' in hugepage memory check > > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> > Acked-by: Michal Hocko <mhocko@suse.cz> > Reviewed-by: Rik van Riel <riel@redhat.com> > Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Cc: stable@vger.kernel.org Acked-by: David Rientjes <rientjes@google.com> Stable for 3.7+. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RESEND][PATCH v5 2/3] fix hugetlb memory check in vma_dump_size() 2013-04-10 16:17 [RESEND][PATCH v5 0/3] fix hugepage coredump Naoya Horiguchi 2013-04-10 16:17 ` [RESEND][PATCH v5 1/3] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Naoya Horiguchi @ 2013-04-10 16:17 ` Naoya Horiguchi 2013-04-10 21:49 ` David Rientjes 2013-04-10 16:17 ` [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() Naoya Horiguchi 2 siblings, 1 reply; 10+ messages in thread From: Naoya Horiguchi @ 2013-04-10 16:17 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, HATAYAMA Daisuke, linux-mm, linux-kernel, Naoya Horiguchi Documentation/filesystems/proc.txt says about coredump_filter bitmask, Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only effected by bit 5-6. However current code can go into the subsequent flag checks of bit 0-4 for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work as written in the document. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: stable@vger.kernel.org --- fs/binfmt_elf.c | 1 + 1 file changed, 1 insertion(+) diff --git v3.9-rc3.orig/fs/binfmt_elf.c v3.9-rc3/fs/binfmt_elf.c index 3939829..86af964 100644 --- v3.9-rc3.orig/fs/binfmt_elf.c +++ v3.9-rc3/fs/binfmt_elf.c @@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma, goto whole; if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE)) goto whole; + return 0; } /* Do not dump I/O mapped devices or special mappings */ -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND][PATCH v5 2/3] fix hugetlb memory check in vma_dump_size() 2013-04-10 16:17 ` [RESEND][PATCH v5 2/3] fix hugetlb memory check in vma_dump_size() Naoya Horiguchi @ 2013-04-10 21:49 ` David Rientjes 2013-04-11 7:08 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: David Rientjes @ 2013-04-10 21:49 UTC (permalink / raw) To: Naoya Horiguchi Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, HATAYAMA Daisuke, linux-mm, linux-kernel On Wed, 10 Apr 2013, Naoya Horiguchi wrote: > Documentation/filesystems/proc.txt says about coredump_filter bitmask, > > Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only > effected by bit 5-6. > > However current code can go into the subsequent flag checks of bit 0-4 > for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work > as written in the document. > > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Reviewed-by: Rik van Riel <riel@redhat.com> > Acked-by: Michal Hocko <mhocko@suse.cz> > Reviewed-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> > Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Cc: stable@vger.kernel.org Acked-by: David Rientjes <rientjes@google.com> Stable for 2.6.34+. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND][PATCH v5 2/3] fix hugetlb memory check in vma_dump_size() 2013-04-10 21:49 ` David Rientjes @ 2013-04-11 7:08 ` Michal Hocko 0 siblings, 0 replies; 10+ messages in thread From: Michal Hocko @ 2013-04-11 7:08 UTC (permalink / raw) To: David Rientjes Cc: Naoya Horiguchi, Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, Konstantin Khlebnikov, HATAYAMA Daisuke, linux-mm, linux-kernel On Wed 10-04-13 14:49:07, David Rientjes wrote: > On Wed, 10 Apr 2013, Naoya Horiguchi wrote: > > > Documentation/filesystems/proc.txt says about coredump_filter bitmask, > > > > Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only > > effected by bit 5-6. > > > > However current code can go into the subsequent flag checks of bit 0-4 > > for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work > > as written in the document. > > > > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > > Reviewed-by: Rik van Riel <riel@redhat.com> > > Acked-by: Michal Hocko <mhocko@suse.cz> > > Reviewed-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> > > Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > Cc: stable@vger.kernel.org > > Acked-by: David Rientjes <rientjes@google.com> > > Stable for 2.6.34+. I think it is only 3.7+ as well because VM_RESERVED stopped use before (314e51b9). -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() 2013-04-10 16:17 [RESEND][PATCH v5 0/3] fix hugepage coredump Naoya Horiguchi 2013-04-10 16:17 ` [RESEND][PATCH v5 1/3] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Naoya Horiguchi 2013-04-10 16:17 ` [RESEND][PATCH v5 2/3] fix hugetlb memory check in vma_dump_size() Naoya Horiguchi @ 2013-04-10 16:17 ` Naoya Horiguchi 2013-04-10 16:31 ` KOSAKI Motohiro ` (2 more replies) 2 siblings, 3 replies; 10+ messages in thread From: Naoya Horiguchi @ 2013-04-10 16:17 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, HATAYAMA Daisuke, linux-mm, linux-kernel, Naoya Horiguchi # I suspended Reviewed and Acked given for the previous version, because # it has a non-minor change. If you want to restore it, please let me know. ----- With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory error happens on a hugepage and the affected processes try to access the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) in get_page(). The reason for this bug is that coredump-related code doesn't recognise "hugepage hwpoison entry" with which a pmd entry is replaced when a memory error occurs on a hugepage. In other words, physical address information is stored in different bit layout between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page() which is called in get_dump_page() returns a wrong page from a given address. The expected behavior is like this: absent is_swap_pte FOLL_DUMP Expected behavior ------------------------------------------------------------------- true false false hugetlb_fault false true false hugetlb_fault false false false return page true false true skip page (to avoid allocation) false true true hugetlb_fault false false true return page With this patch, we can call hugetlb_fault() and take proper actions (we wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for hwpoisoned entries,) and as the result we can dump all hugepages except for hwpoisoned ones. ChangeLog v5: - improve comment and description. ChangeLog v4: - move is_swap_page() to right place. ChangeLog v3: - add comment about using is_swap_pte() Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: stable@vger.kernel.org --- mm/hugetlb.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c index 0d1705b..bf26ee8 100644 --- v3.9-rc3.orig/mm/hugetlb.c +++ v3.9-rc3/mm/hugetlb.c @@ -2983,7 +2983,17 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, break; } - if (absent || + /* + * We need call hugetlb_fault for both hugepages under migration + * (in which case hugetlb_fault waits for the migration,) and + * hwpoisoned hugepages (in which case we need to prevent the + * caller from accessing to them.) In order to do this, we use + * here is_swap_pte instead of is_hugetlb_entry_migration and + * is_hugetlb_entry_hwpoisoned. This is because it simply covers + * both cases, and because we can't follow correct pages + * directly from any kind of swap entries. + */ + if (absent || is_swap_pte(huge_ptep_get(pte)) || ((flags & FOLL_WRITE) && !pte_write(huge_ptep_get(pte)))) { int ret; -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() 2013-04-10 16:17 ` [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() Naoya Horiguchi @ 2013-04-10 16:31 ` KOSAKI Motohiro 2013-04-10 20:21 ` Michal Hocko 2013-04-10 21:51 ` David Rientjes 2 siblings, 0 replies; 10+ messages in thread From: KOSAKI Motohiro @ 2013-04-10 16:31 UTC (permalink / raw) To: Naoya Horiguchi Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel, Konstantin Khlebnikov, Michal Hocko, HATAYAMA Daisuke, linux-mm, LKML On Wed, Apr 10, 2013 at 12:17 PM, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote: > # I suspended Reviewed and Acked given for the previous version, because > # it has a non-minor change. If you want to restore it, please let me know. > ----- > With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in > initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory > error happens on a hugepage and the affected processes try to access > the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) > in get_page(). > > The reason for this bug is that coredump-related code doesn't recognise > "hugepage hwpoison entry" with which a pmd entry is replaced when a memory > error occurs on a hugepage. > In other words, physical address information is stored in different bit layout > between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page() > which is called in get_dump_page() returns a wrong page from a given address. > > The expected behavior is like this: > > absent is_swap_pte FOLL_DUMP Expected behavior > ------------------------------------------------------------------- > true false false hugetlb_fault > false true false hugetlb_fault > false false false return page > true false true skip page (to avoid allocation) > false true true hugetlb_fault > false false true return page > > With this patch, we can call hugetlb_fault() and take proper actions > (we wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for > hwpoisoned entries,) and as the result we can dump all hugepages except > for hwpoisoned ones. > > ChangeLog v5: > - improve comment and description. > > ChangeLog v4: > - move is_swap_page() to right place. > > ChangeLog v3: > - add comment about using is_swap_pte() > > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Cc: stable@vger.kernel.org Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() 2013-04-10 16:17 ` [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() Naoya Horiguchi 2013-04-10 16:31 ` KOSAKI Motohiro @ 2013-04-10 20:21 ` Michal Hocko 2013-04-10 21:51 ` David Rientjes 2 siblings, 0 replies; 10+ messages in thread From: Michal Hocko @ 2013-04-10 20:21 UTC (permalink / raw) To: Naoya Horiguchi Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, Konstantin Khlebnikov, HATAYAMA Daisuke, linux-mm, linux-kernel On Wed 10-04-13 12:17:49, Naoya Horiguchi wrote: > # I suspended Reviewed and Acked given for the previous version, because > # it has a non-minor change. If you want to restore it, please let me know. > ----- > With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in > initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory > error happens on a hugepage and the affected processes try to access > the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) > in get_page(). > > The reason for this bug is that coredump-related code doesn't recognise > "hugepage hwpoison entry" with which a pmd entry is replaced when a memory > error occurs on a hugepage. > In other words, physical address information is stored in different bit layout > between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page() > which is called in get_dump_page() returns a wrong page from a given address. > > The expected behavior is like this: > > absent is_swap_pte FOLL_DUMP Expected behavior > ------------------------------------------------------------------- > true false false hugetlb_fault > false true false hugetlb_fault > false false false return page > true false true skip page (to avoid allocation) > false true true hugetlb_fault > false false true return page > > With this patch, we can call hugetlb_fault() and take proper actions > (we wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for > hwpoisoned entries,) and as the result we can dump all hugepages except > for hwpoisoned ones. > > ChangeLog v5: > - improve comment and description. > > ChangeLog v4: > - move is_swap_page() to right place. > > ChangeLog v3: > - add comment about using is_swap_pte() > > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Cc: stable@vger.kernel.org Acked-by: Michal Hocko <mhocko@suse.cz> Thanks! > --- > mm/hugetlb.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c > index 0d1705b..bf26ee8 100644 > --- v3.9-rc3.orig/mm/hugetlb.c > +++ v3.9-rc3/mm/hugetlb.c > @@ -2983,7 +2983,17 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, > break; > } > > - if (absent || > + /* > + * We need call hugetlb_fault for both hugepages under migration > + * (in which case hugetlb_fault waits for the migration,) and > + * hwpoisoned hugepages (in which case we need to prevent the > + * caller from accessing to them.) In order to do this, we use > + * here is_swap_pte instead of is_hugetlb_entry_migration and > + * is_hugetlb_entry_hwpoisoned. This is because it simply covers > + * both cases, and because we can't follow correct pages > + * directly from any kind of swap entries. > + */ > + if (absent || is_swap_pte(huge_ptep_get(pte)) || > ((flags & FOLL_WRITE) && !pte_write(huge_ptep_get(pte)))) { > int ret; > > -- > 1.7.11.7 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() 2013-04-10 16:17 ` [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() Naoya Horiguchi 2013-04-10 16:31 ` KOSAKI Motohiro 2013-04-10 20:21 ` Michal Hocko @ 2013-04-10 21:51 ` David Rientjes 2 siblings, 0 replies; 10+ messages in thread From: David Rientjes @ 2013-04-10 21:51 UTC (permalink / raw) To: Naoya Horiguchi Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, HATAYAMA Daisuke, linux-mm, linux-kernel On Wed, 10 Apr 2013, Naoya Horiguchi wrote: > # I suspended Reviewed and Acked given for the previous version, because > # it has a non-minor change. If you want to restore it, please let me know. > ----- > With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in > initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory > error happens on a hugepage and the affected processes try to access > the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) > in get_page(). > > The reason for this bug is that coredump-related code doesn't recognise > "hugepage hwpoison entry" with which a pmd entry is replaced when a memory > error occurs on a hugepage. > In other words, physical address information is stored in different bit layout > between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page() > which is called in get_dump_page() returns a wrong page from a given address. > > The expected behavior is like this: > > absent is_swap_pte FOLL_DUMP Expected behavior > ------------------------------------------------------------------- > true false false hugetlb_fault > false true false hugetlb_fault > false false false return page > true false true skip page (to avoid allocation) > false true true hugetlb_fault > false false true return page > > With this patch, we can call hugetlb_fault() and take proper actions > (we wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for > hwpoisoned entries,) and as the result we can dump all hugepages except > for hwpoisoned ones. > > ChangeLog v5: > - improve comment and description. > > ChangeLog v4: > - move is_swap_page() to right place. > > ChangeLog v3: > - add comment about using is_swap_pte() > > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Cc: stable@vger.kernel.org Acked-by: David Rientjes <rientjes@google.com> Stable for 2.6.34+? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-04-11 7:08 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-04-10 16:17 [RESEND][PATCH v5 0/3] fix hugepage coredump Naoya Horiguchi 2013-04-10 16:17 ` [RESEND][PATCH v5 1/3] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Naoya Horiguchi 2013-04-10 21:46 ` David Rientjes 2013-04-10 16:17 ` [RESEND][PATCH v5 2/3] fix hugetlb memory check in vma_dump_size() Naoya Horiguchi 2013-04-10 21:49 ` David Rientjes 2013-04-11 7:08 ` Michal Hocko 2013-04-10 16:17 ` [RESEND][PATCH v5 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page() Naoya Horiguchi 2013-04-10 16:31 ` KOSAKI Motohiro 2013-04-10 20:21 ` Michal Hocko 2013-04-10 21:51 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox