On Mon, 2015-11-23 at 12:53 -0800, Dan Williams wrote: > On Mon, Nov 23, 2015 at 12:04 PM, Toshi Kani wrote: > > The following oops was observed when mmap() with MAP_POPULATE > > pre-faulted pmd mappings of a DAX file. follow_trans_huge_pmd() > > expects that a target address has a struct page. > > > > BUG: unable to handle kernel paging request at ffffea0012220000 > > follow_trans_huge_pmd+0xba/0x390 > > follow_page_mask+0x33d/0x420 > > __get_user_pages+0xdc/0x800 > > populate_vma_page_range+0xb5/0xe0 > > __mm_populate+0xc5/0x150 > > vm_mmap_pgoff+0xd5/0xe0 > > SyS_mmap_pgoff+0x1c1/0x290 > > SyS_mmap+0x1b/0x30 > > > > Fix it by making the PMD pre-fault handling consistent with PTE. > > After pre-faulted in faultin_page(), follow_page_mask() calls > > follow_trans_huge_pmd(), which is changed to call follow_pfn_pmd() > > for VM_PFNMAP or VM_MIXEDMAP. follow_pfn_pmd() handles FOLL_TOUCH > > and returns with -EEXIST. > > As of 4.4.-rc2 DAX pmd mappings are disabled. So we have time to do > something more comprehensive in 4.5. Yes, I noticed during my testing that I could not use pmd... > > Reported-by: Mauricio Porto > > Signed-off-by: Toshi Kani > > Cc: Andrew Morton > > Cc: Kirill A. Shutemov > > Cc: Matthew Wilcox > > Cc: Dan Williams > > Cc: Ross Zwisler > > --- > > mm/huge_memory.c | 34 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 34 insertions(+) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index d5b8920..f56e034 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > [..] > > @@ -1288,6 +1315,13 @@ struct page *follow_trans_huge_pmd(struct > > vm_area_struct *vma, > > if ((flags & FOLL_NUMA) && pmd_protnone(*pmd)) > > goto out; > > > > + /* pfn map does not have a struct page */ > > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) { > > + ret = follow_pfn_pmd(vma, addr, pmd, flags); > > + page = ERR_PTR(ret); > > + goto out; > > + } > > + > > page = pmd_page(*pmd); > > VM_BUG_ON_PAGE(!PageHead(page), page); > > if (flags & FOLL_TOUCH) { > > I think it is already problematic that dax pmd mappings are getting > confused with transparent huge pages. We had the same issue with dax pte mapping [1], and this change extends the pfn map handling to pmd. So, this problem is not specific to pmd. [1] https://lkml.org/lkml/2015/6/23/181 > They're more closely related to > a hugetlbfs pmd mappings in that they are mapping an explicit > allocation. I have some pending patches to address this dax-pmd vs > hugetlb-pmd vs thp-pmd classification that I will post shortly. Not sure which way is better, but I am certainly interested in your changes. > By the way, I'm collecting DAX pmd regression tests [1], is this just > a simple crash upon using MAP_POPULATE? > > [1]: https://github.com/pmem/ndctl/blob/master/lib/test-dax-pmd.c Yes, this issue is easy to reproduce with MAP_POPULATE. In case it helps, attached are the test I used for testing the patches. Sorry, the code is messy since it was only intended for my internal use... - The test was originally written for the pte change [1] and comments in test.sh (ex. mlock fail, ok) reflect the results without the pte change. - For the pmd test, I modified test-mmap.c to call posix_memalign() before mmap(). By calling free(), the 2MB-aligned address from posix_memalign() can be used for mmap(). This keeps the mmap'd address aligned on 2MB. - I created test file(s) with dd (i.e. all blocks written) in my test. - The other infinite loop issue (fixed by my other patch) was found by the test case with option "-LMSr". Thanks, -Toshi