linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* VM_PFNMAP and do_no_pfn handler
@ 2006-02-20 14:20 Jes Sorensen
  2006-02-20 15:39 ` Hugh Dickins
  0 siblings, 1 reply; 4+ messages in thread
From: Jes Sorensen @ 2006-02-20 14:20 UTC (permalink / raw)
  To: linux-mm; +Cc: Linus Torvalds, Carsten Otte, roe, Robin Holt, Jack Steiner

Hi,

I am looking at implementing a do_no_pfn handler similar to
do_no_page, but for pages which are not backed by a struct page. I'd
like to use it for the mspec driver which maps uncached pages to
userland. The reason we need the do_no_pfn handler is to get the first
touch locality of the mapping on NUMA systems.

I have it all working, however I have a question about the VM_PFNMAP
flag. Right now mm/memory.c claims the following above
vm_normal_page():

 * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
 * will have each page table entry just pointing to a raw page frame
 * number, and as far as the VM layer is concerned, those do not have
 * pages associated with them - even if the PFN might point to memory
 * that otherwise is perfectly fine and has a "struct page".
 *
 * The way we recognize those mappings is through the rules set up
 * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
 * and the vm_pgoff will point to the first PFN mapped: thus every
 * page that is a raw mapping will always honor the rule
 *
 *      pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)

vm_normal_page() then does this:

        if (vma->vm_flags & VM_PFNMAP) {
                unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
                if (pfn == vma->vm_pgoff + off)
                        return NULL;
                if (!is_cow_mapping(vma->vm_flags))
                        return NULL;
        }

Everywhere else it is stated that the VM_PFNMAP flag is only set for
pages without a struct page backing it. In other words, are there any
cases where the above requirement is really needed? Wouldn't it be
sufficient to simply return NULL in vm_normal_page() if VM_PFNMAP is
set?

The problem I have is that it the uncached pages in the mspec driver
aren't physically contiguous and the above rule doesn't match for
us. Right now we are safe since the mspec driver doesn't allow cow
mappings, but I fear that something could change in vm_normal_page()
that would make the behavior change underneath us. Alternatively one
could add yet another flag for this, but it seems somewhat overkill
for something which is so similar in behavior?

Any suggestions? (or rather, what obvious thing did I miss? ;-)

Thanks,
Jes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VM_PFNMAP and do_no_pfn handler
  2006-02-20 14:20 VM_PFNMAP and do_no_pfn handler Jes Sorensen
@ 2006-02-20 15:39 ` Hugh Dickins
  2006-02-20 15:55   ` Jes Sorensen
  0 siblings, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2006-02-20 15:39 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: linux-mm, Linus Torvalds, Carsten Otte, roe, Robin Holt, Jack Steiner

On Mon, 20 Feb 2006, Jes Sorensen wrote:
> 
> I am looking at implementing a do_no_pfn handler similar to
> do_no_page, but for pages which are not backed by a struct page. I'd
> like to use it for the mspec driver which maps uncached pages to
> userland. The reason we need the do_no_pfn handler is to get the first
> touch locality of the mapping on NUMA systems.
> 
> I have it all working, however I have a question about the VM_PFNMAP
> flag. Right now mm/memory.c claims the following above
> vm_normal_page():
> 
>  * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
>  * will have each page table entry just pointing to a raw page frame
>  * number, and as far as the VM layer is concerned, those do not have
>  * pages associated with them - even if the PFN might point to memory
>  * that otherwise is perfectly fine and has a "struct page".
>  *
>  * The way we recognize those mappings is through the rules set up
>  * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
>  * and the vm_pgoff will point to the first PFN mapped: thus every
>  * page that is a raw mapping will always honor the rule
>  *
>  *      pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
> 
> vm_normal_page() then does this:
> 
>         if (vma->vm_flags & VM_PFNMAP) {
>                 unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
>                 if (pfn == vma->vm_pgoff + off)
>                         return NULL;
>                 if (!is_cow_mapping(vma->vm_flags))
>                         return NULL;
>         }
> 
> Everywhere else it is stated that the VM_PFNMAP flag is only set for
> pages without a struct page backing it. In other words, are there any
> cases where the above requirement is really needed? Wouldn't it be
> sufficient to simply return NULL in vm_normal_page() if VM_PFNMAP is
> set?
> 
> The problem I have is that it the uncached pages in the mspec driver
> aren't physically contiguous and the above rule doesn't match for
> us. Right now we are safe since the mspec driver doesn't allow cow
> mappings, but I fear that something could change in vm_normal_page()
> that would make the behavior change underneath us. Alternatively one
> could add yet another flag for this, but it seems somewhat overkill
> for something which is so similar in behavior?
> 
> Any suggestions? (or rather, what obvious thing did I miss? ;-)

I believe you'll be safe for as long as your driver prohibits COW
mappings.  You're not the only one to have VM_PFNMAP areas which
don't follow Linus' vm_pgoff rule: which is why he added the
!is_cow_mapping letout late in 2.6.15-rc.  We cannot change that
lightly.

I think you're worrying too much, unless you anticipate wanting to
extend to COW mappings later.  That would indeed need vm_normal_page
to be changed (and I know what change to make, but Linus hated it!).

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VM_PFNMAP and do_no_pfn handler
  2006-02-20 15:39 ` Hugh Dickins
@ 2006-02-20 15:55   ` Jes Sorensen
  2006-02-20 16:30     ` Hugh Dickins
  0 siblings, 1 reply; 4+ messages in thread
From: Jes Sorensen @ 2006-02-20 15:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Linus Torvalds, Carsten Otte, roe, Robin Holt, Jack Steiner

>>>>> "Hugh" == Hugh Dickins <hugh@veritas.com> writes:

Hugh> On Mon, 20 Feb 2006, Jes Sorensen wrote:
>> Any suggestions? (or rather, what obvious thing did I miss? ;-)

Hugh> I believe you'll be safe for as long as your driver prohibits
Hugh> COW mappings.  You're not the only one to have VM_PFNMAP areas
Hugh> which don't follow Linus' vm_pgoff rule: which is why he added
Hugh> the !is_cow_mapping letout late in 2.6.15-rc.  We cannot change
Hugh> that lightly.

Hugh> I think you're worrying too much, unless you anticipate wanting
Hugh> to extend to COW mappings later.  That would indeed need
Hugh> vm_normal_page to be changed (and I know what change to make,
Hugh> but Linus hated it!).

Hi Hugh,

Thanks for the explanation. It just seemed to me that is_cow_mapping()
seemed a bit of a strange name for a
'this_mapping_really_has_no_struct_page_behind_it_honest()' function.
Is there some reason why we try to look up the struct page for
anything mapped VM_PFNMAP?

I can live with the current situation, but maybe it would be worth
adding some extra explanation to vm_normal_page() then?

I hope to post the changes I have in mind for do_no_pfn() and the
driver within a couple of days for those who are interested.

Cheers,
Jes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VM_PFNMAP and do_no_pfn handler
  2006-02-20 15:55   ` Jes Sorensen
@ 2006-02-20 16:30     ` Hugh Dickins
  0 siblings, 0 replies; 4+ messages in thread
From: Hugh Dickins @ 2006-02-20 16:30 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: linux-mm, Linus Torvalds, Carsten Otte, roe, Robin Holt, Jack Steiner

On Mon, 20 Feb 2006, Jes Sorensen wrote:
> 
> Thanks for the explanation. It just seemed to me that is_cow_mapping()
> seemed a bit of a strange name for a
> 'this_mapping_really_has_no_struct_page_behind_it_honest()' function.
> Is there some reason why we try to look up the struct page for
> anything mapped VM_PFNMAP?

If it's a Copy-On-Write mapping, then a write fault on a page (or page
frame!) in that mapping will copy the original to an ordinary anonymous
page, which will then be substituted into the mapping in that position.

So although the vma is marked VM_PFNMAP, if it is_cow_mapping, then it
might contain ordinary struct-page-type pages, which have to be dealt
with in the normal way (otherwise they'll get leaked).

(At first we thought this was not a realistic situation; then we found
some apps did it, and we thought they were just being silly; then we
found that some really relied on COW-ing in a PFNMAP area.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-02-20 16:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-20 14:20 VM_PFNMAP and do_no_pfn handler Jes Sorensen
2006-02-20 15:39 ` Hugh Dickins
2006-02-20 15:55   ` Jes Sorensen
2006-02-20 16:30     ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox