linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dax: Split pmd map when fallback on COW
@ 2015-11-23 20:05 Toshi Kani
  2015-11-23 20:45 ` Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Toshi Kani @ 2015-11-23 20:05 UTC (permalink / raw)
  To: dan.j.williams
  Cc: kirill.shutemov, willy, ross.zwisler, linux-mm, linux-fsdevel,
	linux-nvdimm, linux-kernel, Toshi Kani

An infinite loop of PMD faults was observed when attempted to
mlock() a private read-only PMD mmap'd range of a DAX file.

__dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when
falling back to PTE on COW.  However, __handle_mm_fault()
returns without falling back to handle_pte_fault() because
a PMD map is present in this case.

Change __dax_pmd_fault() to split the PMD map, if present,
before returning with VM_FAULT_FALLBACK.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 fs/dax.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 43671b6..3405583 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -546,8 +546,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		return VM_FAULT_FALLBACK;
 
 	/* Fall back to PTEs if we're going to COW */
-	if (write && !(vma->vm_flags & VM_SHARED))
+	if (write && !(vma->vm_flags & VM_SHARED)) {
+		split_huge_page_pmd(vma, address, pmd);
 		return VM_FAULT_FALLBACK;
+	}
 	/* If the PMD would extend outside the VMA */
 	if (pmd_addr < vma->vm_start)
 		return VM_FAULT_FALLBACK;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] dax: Split pmd map when fallback on COW
  2015-11-23 20:05 [PATCH] dax: Split pmd map when fallback on COW Toshi Kani
@ 2015-11-23 20:45 ` Dan Williams
  2015-11-23 20:45   ` Toshi Kani
  2015-11-23 22:58 ` Toshi Kani
  2015-11-24 17:08 ` Dan Williams
  2 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2015-11-23 20:45 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Kirill A. Shutemov, Matthew Wilcox, Ross Zwisler, Linux MM,
	linux-fsdevel, linux-nvdimm, linux-kernel

On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> An infinite loop of PMD faults was observed when attempted to
> mlock() a private read-only PMD mmap'd range of a DAX file.
>
> __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when
> falling back to PTE on COW.  However, __handle_mm_fault()
> returns without falling back to handle_pte_fault() because
> a PMD map is present in this case.
>
> Change __dax_pmd_fault() to split the PMD map, if present,
> before returning with VM_FAULT_FALLBACK.
>
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>

I thought the patch from Ross already addressed the infinite loop:

https://patchwork.kernel.org/patch/7653731/

> ---
>  fs/dax.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 43671b6..3405583 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -546,8 +546,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>                 return VM_FAULT_FALLBACK;
>
>         /* Fall back to PTEs if we're going to COW */
> -       if (write && !(vma->vm_flags & VM_SHARED))
> +       if (write && !(vma->vm_flags & VM_SHARED)) {
> +               split_huge_page_pmd(vma, address, pmd);
>                 return VM_FAULT_FALLBACK;
> +       }
>         /* If the PMD would extend outside the VMA */
>         if (pmd_addr < vma->vm_start)
>                 return VM_FAULT_FALLBACK;

This is a nop if CONFIG_TRANSPARENT_HUGEPAGE=n, so I don't think it's
a complete fix.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] dax: Split pmd map when fallback on COW
  2015-11-23 20:45 ` Dan Williams
@ 2015-11-23 20:45   ` Toshi Kani
  2015-11-23 20:56     ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Toshi Kani @ 2015-11-23 20:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: Kirill A. Shutemov, Matthew Wilcox, Ross Zwisler, Linux MM,
	linux-fsdevel, linux-nvdimm, linux-kernel

On Mon, 2015-11-23 at 12:45 -0800, Dan Williams wrote:
> On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > An infinite loop of PMD faults was observed when attempted to
> > mlock() a private read-only PMD mmap'd range of a DAX file.
> > 
> > __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when
> > falling back to PTE on COW.  However, __handle_mm_fault()
> > returns without falling back to handle_pte_fault() because
> > a PMD map is present in this case.
> > 
> > Change __dax_pmd_fault() to split the PMD map, if present,
> > before returning with VM_FAULT_FALLBACK.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Cc: Matthew Wilcox <willy@linux.intel.com>
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> I thought the patch from Ross already addressed the infinite loop:
> 
> https://patchwork.kernel.org/patch/7653731/

This fixes a different issue.  I hit this one while testing my other patch along
with the Ross's patch.

> > ---
> >  fs/dax.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 43671b6..3405583 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -546,8 +546,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma,
> > unsigned long address,
> >                 return VM_FAULT_FALLBACK;
> > 
> >         /* Fall back to PTEs if we're going to COW */
> > -       if (write && !(vma->vm_flags & VM_SHARED))
> > +       if (write && !(vma->vm_flags & VM_SHARED)) {
> > +               split_huge_page_pmd(vma, address, pmd);
> >                 return VM_FAULT_FALLBACK;
> > +       }
> >         /* If the PMD would extend outside the VMA */
> >         if (pmd_addr < vma->vm_start)
> >                 return VM_FAULT_FALLBACK;
> 
> This is a nop if CONFIG_TRANSPARENT_HUGEPAGE=n, so I don't think it's
> a complete fix.

Well, __dax_pmd_fault() itself depends on CONFIG_TRANSPARENT_HUGEPAGE.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] dax: Split pmd map when fallback on COW
  2015-11-23 20:45   ` Toshi Kani
@ 2015-11-23 20:56     ` Dan Williams
  2015-11-23 21:04       ` Toshi Kani
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2015-11-23 20:56 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Kirill A. Shutemov, Matthew Wilcox, Ross Zwisler, Linux MM,
	linux-fsdevel, linux-nvdimm, linux-kernel

On Mon, Nov 23, 2015 at 12:45 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> On Mon, 2015-11-23 at 12:45 -0800, Dan Williams wrote:
>> On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
[..]
>> This is a nop if CONFIG_TRANSPARENT_HUGEPAGE=n, so I don't think it's
>> a complete fix.
>
> Well, __dax_pmd_fault() itself depends on CONFIG_TRANSPARENT_HUGEPAGE.
>

Indeed it is... I think that's wrong because transparent huge pages
rely on struct page??

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] dax: Split pmd map when fallback on COW
  2015-11-23 20:56     ` Dan Williams
@ 2015-11-23 21:04       ` Toshi Kani
  0 siblings, 0 replies; 7+ messages in thread
From: Toshi Kani @ 2015-11-23 21:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: Kirill A. Shutemov, Matthew Wilcox, Ross Zwisler, Linux MM,
	linux-fsdevel, linux-nvdimm, linux-kernel

On Mon, 2015-11-23 at 12:56 -0800, Dan Williams wrote:
> On Mon, Nov 23, 2015 at 12:45 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > On Mon, 2015-11-23 at 12:45 -0800, Dan Williams wrote:
> > > On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> [..]
> > > This is a nop if CONFIG_TRANSPARENT_HUGEPAGE=n, so I don't think it's
> > > a complete fix.
> > 
> > Well, __dax_pmd_fault() itself depends on CONFIG_TRANSPARENT_HUGEPAGE.
> > 
> 
> Indeed it is... I think that's wrong because transparent huge pages
> rely on struct page??

I do not think this issue is related with struct page.  wp_huge_pmd() calls
either do_huge_pmd_wp_page() or dax_pmd_fault().  do_huge_pmd_wp_page() splits a
pmd page when it returns with VM_FAULT_FALLBACK.  So, this change keeps them
consistent on VM_FAULT_FALLBACK.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] dax: Split pmd map when fallback on COW
  2015-11-23 20:05 [PATCH] dax: Split pmd map when fallback on COW Toshi Kani
  2015-11-23 20:45 ` Dan Williams
@ 2015-11-23 22:58 ` Toshi Kani
  2015-11-24 17:08 ` Dan Williams
  2 siblings, 0 replies; 7+ messages in thread
From: Toshi Kani @ 2015-11-23 22:58 UTC (permalink / raw)
  To: dan.j.williams
  Cc: kirill.shutemov, willy, ross.zwisler, linux-mm, linux-fsdevel,
	linux-nvdimm, linux-kernel

On Mon, 2015-11-23 at 13:05 -0700, Toshi Kani wrote:
> An infinite loop of PMD faults was observed when attempted to
> mlock() a private read-only PMD mmap'd range of a DAX file.

Typo: the above description should be (remove "read-only"): 

An infinite loop of PMD faults was observed when attempted to mlock() a private
PMD mmap'd range of a DAX file.

-Toshi

> __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when
> falling back to PTE on COW.  However, __handle_mm_fault()
> returns without falling back to handle_pte_fault() because
> a PMD map is present in this case.
> 
> Change __dax_pmd_fault() to split the PMD map, if present,
> before returning with VM_FAULT_FALLBACK.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] dax: Split pmd map when fallback on COW
  2015-11-23 20:05 [PATCH] dax: Split pmd map when fallback on COW Toshi Kani
  2015-11-23 20:45 ` Dan Williams
  2015-11-23 22:58 ` Toshi Kani
@ 2015-11-24 17:08 ` Dan Williams
  2 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2015-11-24 17:08 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Kirill A. Shutemov, Matthew Wilcox, Ross Zwisler, Linux MM,
	linux-fsdevel, linux-nvdimm, linux-kernel

On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> An infinite loop of PMD faults was observed when attempted to
> mlock() a private read-only PMD mmap'd range of a DAX file.
>
> __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when
> falling back to PTE on COW.  However, __handle_mm_fault()
> returns without falling back to handle_pte_fault() because
> a PMD map is present in this case.
>
> Change __dax_pmd_fault() to split the PMD map, if present,
> before returning with VM_FAULT_FALLBACK.
>
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
>  fs/dax.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 43671b6..3405583 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -546,8 +546,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>                 return VM_FAULT_FALLBACK;
>
>         /* Fall back to PTEs if we're going to COW */
> -       if (write && !(vma->vm_flags & VM_SHARED))
> +       if (write && !(vma->vm_flags & VM_SHARED)) {
> +               split_huge_page_pmd(vma, address, pmd);
>                 return VM_FAULT_FALLBACK;
> +       }

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

I took a closer look at dax's CONFIG_TRANSPARENT_HUGEPAGE interactions
and it turns out THP is a performance enhancement not a functional
dependency.  I.e. a performance enhancement to use a huge_zero_page
where available, but not a requirement.

I'll fold this in with my series make pmd_trans_huge() return false
for non-huge_zero_page dax mappings, and in that case I'll need to
up-level the call to  pmdp_huge_clear_flush_notify() from
__split_huge_page_pmd.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-11-24 17:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-23 20:05 [PATCH] dax: Split pmd map when fallback on COW Toshi Kani
2015-11-23 20:45 ` Dan Williams
2015-11-23 20:45   ` Toshi Kani
2015-11-23 20:56     ` Dan Williams
2015-11-23 21:04       ` Toshi Kani
2015-11-23 22:58 ` Toshi Kani
2015-11-24 17:08 ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox