[RFC] Changing VM_PFNMAP assumptions and rules

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Changing VM_PFNMAP assumptions and rules
@ 2007-11-09 19:15 Jared Hulbert
  2007-11-11  0:09 ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Jared Hulbert @ 2007-11-09 19:15 UTC (permalink / raw)
  To: Linux Memory Management List

Per conversations regarding XIP from the vm/fs mini-summit a couple
months back I've got a patch to air out.

The basic problem is that the assumptions about PFN mappings stemming
from the rules of remap_pfn_range() aren't always valid.  For example:
what stops one from using vm_insert_pfn() to map PFN's into a vma in
an arbitrary order?  Nothing.  Yet those PFN's cause problems in two
ways.

First, vm_normal_page() won't return NULL.  My answer to this is to
simply check if pfn_valid()  if it isn't then we've got a proper PFN
that can only be a PFN.  If you do have a valid PFN then you are (A) a
'cow'ed' PFN that is now a real page or (B) you are a real page
pretending to be a PFN only.  The thing that makes me nervous is that
my hack doesn't let that page pretend to be a PFN.  I can't figure out
why a page would need/want to pretend to be a PFN so I don't see
anything wrong with this, but maybe somebody does.

Second, there are a few random BUG_ON() that don't seem to serve any
purpose other than to punish the PFN's that don't abide by
remap_pfn_range() rules.  I just get rid of them.  The problem is I
don't really understand why they are there in the first place so for
all I know I'm horribly breaking spufs or something.

Okay so I haven't tried this out on 2.6.24-rc1 yet, but the same basic
idea worked on 2.6.23 and older.  I just wanted to get feedback on
this approach.  I don't know the vm all that well so I want to make
sure I'm not doing something really stupid that breaks a bunch of code
paths I don't use.

diff --git a/mm/memory.c b/mm/memory.c
index 9791e47..fb962d0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -366,29 +366,19 @@ static inline int is_cow_mapping(unsigned int flags)
  * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
  * will have each page table entry just pointing to a raw page frame
  * number, and as far as the VM layer is concerned, those do not have
- * pages associated with them - even if the PFN might point to memory
- * that otherwise is perfectly fine and has a "struct page".
+ * pages associated with them.
  *
- * The way we recognize those mappings is through the rules set up
- * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
- * and the vm_pgoff will point to the first PFN mapped: thus every
- * page that is a raw mapping will always honor the rule
- *
- *	pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
- *
- * and if that isn't true, the page has been COW'ed (in which case it
- * _does_ have a "struct page" associated with it even if it is in a
- * VM_PFNMAP range).
+ * The old "remap_pfn_range()" rules don't work for all applications.
+ * Each "page" in a PFN mapping either has a page struct backing it
+ * or it doesn't.  If it does then treat it like the page it is, if
+ * if it doesn't then it is not a normal page so just return NULL.
  */
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long
addr, pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);

 	if (unlikely(vma->vm_flags & VM_PFNMAP)) {
-		unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
-		if (pfn == vma->vm_pgoff + off)
-			return NULL;
-		if (!is_cow_mapping(vma->vm_flags))
+		if (!pfn_valid(pfn))
 			return NULL;
 	}

@@ -1212,7 +1202,6 @@ int vm_insert_pfn(struct vm_area_struct *vma,
unsigned long addr,
 	spinlock_t *ptl;

 	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
-	BUG_ON(is_cow_mapping(vma->vm_flags));

 	retval = -ENOMEM;
 	pte = get_locked_pte(mm, addr, &ptl);
@@ -2216,8 +2205,6 @@ static int __do_fault(struct mm_struct *mm,
struct vm_area_struct *vma,
 	vmf.flags = flags;
 	vmf.page = NULL;

-	BUG_ON(vma->vm_flags & VM_PFNMAP);
-
 	if (likely(vma->vm_ops->fault)) {
 		ret = vma->vm_ops->fault(vma, &vmf);
 		if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-09 19:15 [RFC] Changing VM_PFNMAP assumptions and rules Jared Hulbert
@ 2007-11-11  0:09 ` Nick Piggin
  2007-11-12 22:03   ` Jared Hulbert
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2007-11-11  0:09 UTC (permalink / raw)
  To: Jared Hulbert, benh; +Cc: Linux Memory Management List

On Saturday 10 November 2007 06:15, Jared Hulbert wrote:
> Per conversations regarding XIP from the vm/fs mini-summit a couple
> months back I've got a patch to air out.
>
> The basic problem is that the assumptions about PFN mappings stemming
> from the rules of remap_pfn_range() aren't always valid.  For example:
> what stops one from using vm_insert_pfn() to map PFN's into a vma in
> an arbitrary order?  Nothing.  Yet those PFN's cause problems in two
> ways.
>
> First, vm_normal_page() won't return NULL. 

They will, because it isn't allowed to be a COW mapping, and hence it
fails the vm_normal_page() test.

> My answer to this is to 
> simply check if pfn_valid()  if it isn't then we've got a proper PFN
> that can only be a PFN.  If you do have a valid PFN then you are (A) a
> 'cow'ed' PFN that is now a real page or (B) you are a real page
> pretending to be a PFN only.  The thing that makes me nervous is that
> my hack doesn't let that page pretend to be a PFN.  I can't figure out
> why a page would need/want to pretend to be a PFN so I don't see
> anything wrong with this, but maybe somebody does.
>
> Second, there are a few random BUG_ON() that don't seem to serve any
> purpose other than to punish the PFN's that don't abide by
> remap_pfn_range() rules.  I just get rid of them.  The problem is I
> don't really understand why they are there in the first place so for
> all I know I'm horribly breaking spufs or something.

They are perhaps slightly undercommented, but they are definitely
required. And it is to ensure that everything works correctly.


> Okay so I haven't tried this out on 2.6.24-rc1 yet, but the same basic
> idea worked on 2.6.23 and older.  I just wanted to get feedback on
> this approach.  I don't know the vm all that well so I want to make
> sure I'm not doing something really stupid that breaks a bunch of code
> paths I don't use.

You actually can't just use pfn_valid, because there are cases where
you actually *cannot* touch the underlying struct page's mapcount,
flags, etc. I think the only real user is /dev/mem.

So my suggestion to you, if you want to support COW pfnmaps, is to
create a new VM_FLAG type (VM_INVALIDPFNMAP? ;)), which has the
pfn_valid() == COW semantics that you want.

Keep the streamlined fastpath in vm_normal_page()...

  if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_JAREDMAP))) {
    if (vma->vm_flags & VM_JAREDMAP) {
      if (!pfn_valid(pfn))
        return NULL;
    } else {
      unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
      if (pfn == vma->vm_pgoff + off)
        return NULL;
      if (!is_cow_mapping(vma->vm_flags))
        return NULL;
    }
}

The tests in vm_insert_pfn would just be complementary to your new
scheme..
BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_JAREDMAP)));
BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
BUG_ON((vma->vm_flags & VM_JAREDMAP) && pfn_valid(pfn));

May not work out so easy, but AFAIKS it will work. See how much milage
that gets you.

The other thing you might like is to allow pfn_valid(pfn) pfns to go
into these mappings, and you know it is fine to twiddle with the
struct page (eg. if you want to switch between different pfns, which
I know the spufs guys want to). That's not too hard: just take out some
of the assertions. You might have to do a little bit of setup work too,
like increment the page count and mapcount etc. but just so long as you
put that in a mm/memory.c helper rather than your own code, it should
be clean enough.

>
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 9791e47..fb962d0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -366,29 +366,19 @@ static inline int is_cow_mapping(unsigned int flags)
>   * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
>   * will have each page table entry just pointing to a raw page frame
>   * number, and as far as the VM layer is concerned, those do not have
> - * pages associated with them - even if the PFN might point to memory
> - * that otherwise is perfectly fine and has a "struct page".
> + * pages associated with them.
>   *
> - * The way we recognize those mappings is through the rules set up
> - * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
> - * and the vm_pgoff will point to the first PFN mapped: thus every
> - * page that is a raw mapping will always honor the rule
> - *
> - *	pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
> - *
> - * and if that isn't true, the page has been COW'ed (in which case it
> - * _does_ have a "struct page" associated with it even if it is in a
> - * VM_PFNMAP range).
> + * The old "remap_pfn_range()" rules don't work for all applications.
> + * Each "page" in a PFN mapping either has a page struct backing it
> + * or it doesn't.  If it does then treat it like the page it is, if
> + * if it doesn't then it is not a normal page so just return NULL.
>   */
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long
> addr, pte_t pte)
>  {
>  	unsigned long pfn = pte_pfn(pte);
>
>  	if (unlikely(vma->vm_flags & VM_PFNMAP)) {
> -		unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
> -		if (pfn == vma->vm_pgoff + off)
> -			return NULL;
> -		if (!is_cow_mapping(vma->vm_flags))
> +		if (!pfn_valid(pfn))
>  			return NULL;
>  	}
>
> @@ -1212,7 +1202,6 @@ int vm_insert_pfn(struct vm_area_struct *vma,
> unsigned long addr,
>  	spinlock_t *ptl;
>
>  	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
> -	BUG_ON(is_cow_mapping(vma->vm_flags));
>
>  	retval = -ENOMEM;
>  	pte = get_locked_pte(mm, addr, &ptl);
> @@ -2216,8 +2205,6 @@ static int __do_fault(struct mm_struct *mm,
> struct vm_area_struct *vma,
>  	vmf.flags = flags;
>  	vmf.page = NULL;
>
> -	BUG_ON(vma->vm_flags & VM_PFNMAP);
> -
>  	if (likely(vma->vm_ops->fault)) {
>  		ret = vma->vm_ops->fault(vma, &vmf);
>  		if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))
>
> --

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-11  0:09 ` Nick Piggin
@ 2007-11-12 22:03   ` Jared Hulbert
  2007-11-12 22:29     ` Benjamin Herrenschmidt
  2007-11-13 12:08     ` Nick Piggin
  0 siblings, 2 replies; 12+ messages in thread
From: Jared Hulbert @ 2007-11-12 22:03 UTC (permalink / raw)
  To: Nick Piggin; +Cc: benh, Linux Memory Management List

On 11/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> On Saturday 10 November 2007 06:15, Jared Hulbert wrote:
> > Per conversations regarding XIP from the vm/fs mini-summit a couple
> > months back I've got a patch to air out.
> >
> > The basic problem is that the assumptions about PFN mappings stemming
> > from the rules of remap_pfn_range() aren't always valid.  For example:
> > what stops one from using vm_insert_pfn() to map PFN's into a vma in
> > an arbitrary order?  Nothing.  Yet those PFN's cause problems in two
> > ways.
> >
> > First, vm_normal_page() won't return NULL.
>
> They will, because it isn't allowed to be a COW mapping, and hence it
> fails the vm_normal_page() test.

No.  It doesn't work.  If I have a mapping that doesn't abide by the
pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
rule, which is easy to do with vm_insert_pfn(), it won't get to NULL at
if (pfn == vma->vm_pgoff + off)
and because we don't expect to get this far with a PFN sometimes I do
have a is_cow_mapping() that is just a PFN.  Which means I fail the
second test and get to the print_bad_pte().  At least that's what I
have captured in the past.  Is that bad?

I'm still not sure you quite grasp what I am doing.  You assume the
map only contains PFN and COW'ed in core pages.  I'm mixing it all up.
 A given page in a file for AXFS is either backed by a uncompressed
XIP'able page or a compressed page that needs to be uncompressed to
RAM to be used (think SquashFS, CramFS, etc.)  So I would have raw
PFN's that are !pfn_valid() by nature, COW'ed pages that are from the
raw PFN, and in RAM pages that are backed by a compressed chunk on
Flash.  What more the raw PFN's are definately not in
remap_pfn_range() order.

> > My answer to this is to
> > simply check if pfn_valid()  if it isn't then we've got a proper PFN
> > that can only be a PFN.  If you do have a valid PFN then you are (A) a
> > 'cow'ed' PFN that is now a real page or (B) you are a real page
> > pretending to be a PFN only.  The thing that makes me nervous is that
> > my hack doesn't let that page pretend to be a PFN.  I can't figure out
> > why a page would need/want to pretend to be a PFN so I don't see
> > anything wrong with this, but maybe somebody does.
> >
> > Second, there are a few random BUG_ON() that don't seem to serve any
> > purpose other than to punish the PFN's that don't abide by
> > remap_pfn_range() rules.  I just get rid of them.  The problem is I
> > don't really understand why they are there in the first place so for
> > all I know I'm horribly breaking spufs or something.
>
> They are perhaps slightly undercommented, but they are definitely
> required. And it is to ensure that everything works correctly.

Help me understand this.  It seems to work fine if we remove these.

> > Okay so I haven't tried this out on 2.6.24-rc1 yet, but the same basic
> > idea worked on 2.6.23 and older.  I just wanted to get feedback on
> > this approach.  I don't know the vm all that well so I want to make
> > sure I'm not doing something really stupid that breaks a bunch of code
> > paths I don't use.
>
> You actually can't just use pfn_valid, because there are cases where
> you actually *cannot* touch the underlying struct page's mapcount,
> flags, etc. I think the only real user is /dev/mem.

Okay, I don't get why, but that's okay.

> So my suggestion to you, if you want to support COW pfnmaps, is to
> create a new VM_FLAG type (VM_INVALIDPFNMAP? ;)), which has the
> pfn_valid() == COW semantics that you want.

I __don't__ want pfn_valid() == COW.  I want pfn_valid() ==
is_real_RAM_page().  That real RAM page is not necessarily COW'ed yet.
  Remember I want a mapping that contains some Flash back pages and
some RAM backed pages.

> Keep the streamlined fastpath in vm_normal_page()...
>
>   if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_JAREDMAP))) {
>     if (vma->vm_flags & VM_JAREDMAP) {
>       if (!pfn_valid(pfn))
>         return NULL;
>     } else {
>       unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
>       if (pfn == vma->vm_pgoff + off)
>         return NULL;
>       if (!is_cow_mapping(vma->vm_flags))
>         return NULL;
>     }
> }

Got it.

> The tests in vm_insert_pfn would just be complementary to your new
> scheme..
> BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_JAREDMAP)));
> BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));

Right.

> BUG_ON((vma->vm_flags & VM_JAREDMAP) && pfn_valid(pfn));

Okay, maybe.  I got to look at this more carefully.

> May not work out so easy, but AFAIKS it will work. See how much milage
> that gets you.
>
> The other thing you might like is to allow pfn_valid(pfn) pfns to go
> into these mappings, and you know it is fine to twiddle with the
> struct page (eg. if you want to switch between different pfns, which
> I know the spufs guys want to). That's not too hard: just take out some
> of the assertions. You might have to do a little bit of setup work too,
> like increment the page count and mapcount etc. but just so long as you
> put that in a mm/memory.c helper rather than your own code, it should
> be clean enough.

Okay.... I don't understand how to do that.  These PFN's are from an
MTD partition.  They don't have a page structs.  So I don't mind
having real page structs backing the Flash pages being used here.  It
would make it unnecessary to tweak the filemap_xip.c stuff, eventually
it will be useful for doing read/write XIP stuff.  However, I just
really don't get how to even start that.

I have a page that is at a hardware level read-only.  What kind of
rules can that page live under?  More importantly these PFN's get
mapped in with a call to ioremap() in the mtd drivers.  So once I
figure out how to SPARSE_MEM, hotplug these pages in I've got to hack
the MTD to work with real pages.  Or something like that.  I'm not
ready to take that on yet, I just don't understand it all enough yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-12 22:03   ` Jared Hulbert
@ 2007-11-12 22:29     ` Benjamin Herrenschmidt
  2007-11-12 23:53       ` Jared Hulbert
  2007-11-13 12:08     ` Nick Piggin
  1 sibling, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-12 22:29 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: Nick Piggin, Linux Memory Management List

On Mon, 2007-11-12 at 14:03 -0800, Jared Hulbert wrote:
> They will, because it isn't allowed to be a COW mapping, and hence it
> > fails the vm_normal_page() test.
> 
> No.  It doesn't work.  If I have a mapping that doesn't abide by the
> pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
> rule, which is easy to do with vm_insert_pfn(), it won't get to NULL
> at
> if (pfn == vma->vm_pgoff + off)
> and because we don't expect to get this far with a PFN sometimes I do
> have a is_cow_mapping() that is just a PFN.  Which means I fail the
> second test and get to the print_bad_pte().  At least that's what I
> have captured in the past.  Is that bad?

.../...

I've hit that sort of thing in the past. I find vm_normal_page() very
fragile in addition to hard to understand.

Why can't we just have a VM flags that say "no normal pages here, move
along, nothing to see" ? :-)

That reminds me of a related problem: Such VMAs can't have
access_process_vm() neither, which means you can't access them with gdb.
That means for example that on Cell, an SPU local store cannot be
inspected with gdb. I suspect the DRM with the new TTM has the same
issue.

I was thinking about adding an access() hook to the VM ops for such
special VMAs to be able to provide ptrace with appropriate locking.

> I'm still not sure you quite grasp what I am doing.  You assume the
> map only contains PFN and COW'ed in core pages.  I'm mixing it all up.
>  A given page in a file for AXFS is either backed by a uncompressed
> XIP'able page or a compressed page that needs to be uncompressed to
> RAM to be used (think SquashFS, CramFS, etc.)  So I would have raw
> PFN's that are !pfn_valid() by nature, COW'ed pages that are from the
> raw PFN, and in RAM pages that are backed by a compressed chunk on
> Flash.  What more the raw PFN's are definately not in
> remap_pfn_range() order.

Your problem is harder than mine as it seems to me that a given VMA can
have both normal and non-normal pages... I'm afraid there is no other
way to deal with that than introducing a PTE flag for those, which means
whacking something in all archs... unless you do provide something in
the like of pfn_normal() to use here.

> Okay.... I don't understand how to do that.  These PFN's are from an
> MTD partition.  They don't have a page structs.  So I don't mind
> having real page structs backing the Flash pages being used here.  It
> would make it unnecessary to tweak the filemap_xip.c stuff, eventually
> it will be useful for doing read/write XIP stuff.  However, I just
> really don't get how to even start that.

Having page structs introduces different kind of problems, I would
recommend not going there unless you really can't do otherwise. It's
been a terrible pain in the neck on Cell with SPEs until I introduced
vm_insert_pfn() to get rid of them.

> I have a page that is at a hardware level read-only.  What kind of
> rules can that page live under?  More importantly these PFN's get
> mapped in with a call to ioremap() in the mtd drivers.  So once I
> figure out how to SPARSE_MEM, hotplug these pages in I've got to hack
> the MTD to work with real pages.  Or something like that.  I'm not
> ready to take that on yet, I just don't understand it all enough yet.

I think vm_normal_page() could use something like pfn_normal() which
isn't quite the same as pfn_valid()... or just use pfn_valid() but in
that case, that would mean removing a bunch of the BUG_ON's indeed.

Ben.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-12 22:29     ` Benjamin Herrenschmidt
@ 2007-11-12 23:53       ` Jared Hulbert
  2007-11-13  0:24         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Jared Hulbert @ 2007-11-12 23:53 UTC (permalink / raw)
  To: benh; +Cc: Nick Piggin, Linux Memory Management List

On 11/12/07, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> On Mon, 2007-11-12 at 14:03 -0800, Jared Hulbert wrote:
> > They will, because it isn't allowed to be a COW mapping, and hence it
> > > fails the vm_normal_page() test.
> >
> > No.  It doesn't work.  If I have a mapping that doesn't abide by the
> > pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
> > rule, which is easy to do with vm_insert_pfn(), it won't get to NULL
> > at
> > if (pfn == vma->vm_pgoff + off)
> > and because we don't expect to get this far with a PFN sometimes I do
> > have a is_cow_mapping() that is just a PFN.  Which means I fail the
> > second test and get to the print_bad_pte().  At least that's what I
> > have captured in the past.  Is that bad?
>
> .../...
>
> I've hit that sort of thing in the past. I find vm_normal_page() very
> fragile in addition to hard to understand.

Despite the fact that is only ~50 line and 2/3 comments, I agree.

> Why can't we just have a VM flags that say "no normal pages here, move
> along, nothing to see" ? :-)

That won't work for me.

> That reminds me of a related problem: Such VMAs can't have
> access_process_vm() neither, which means you can't access them with gdb.
> That means for example that on Cell, an SPU local store cannot be
> inspected with gdb. I suspect the DRM with the new TTM has the same
> issue.
>
> I was thinking about adding an access() hook to the VM ops for such
> special VMAs to be able to provide ptrace with appropriate locking.

I don't know if I've tried that....

> > I'm still not sure you quite grasp what I am doing.  You assume the
> > map only contains PFN and COW'ed in core pages.  I'm mixing it all up.
> >  A given page in a file for AXFS is either backed by a uncompressed
> > XIP'able page or a compressed page that needs to be uncompressed to
> > RAM to be used (think SquashFS, CramFS, etc.)  So I would have raw
> > PFN's that are !pfn_valid() by nature, COW'ed pages that are from the
> > raw PFN, and in RAM pages that are backed by a compressed chunk on
> > Flash.  What more the raw PFN's are definately not in
> > remap_pfn_range() order.
>
> Your problem is harder than mine as it seems to me that a given VMA can
> have both normal and non-normal pages... I'm afraid there is no other
> way to deal with that than introducing a PTE flag for those, which means
> whacking something in all archs... unless you do provide something in
> the like of pfn_normal() to use here.

I still don't see why pfn_valid() won't work for me here.  These PFN's
from the MTD drivers are by defination !pfn_valid().  The PTE flag
route sounds nasty.

> > Okay.... I don't understand how to do that.  These PFN's are from an
> > MTD partition.  They don't have a page structs.  So I don't mind
> > having real page structs backing the Flash pages being used here.  It
> > would make it unnecessary to tweak the filemap_xip.c stuff, eventually
> > it will be useful for doing read/write XIP stuff.  However, I just
> > really don't get how to even start that.
>
> Having page structs introduces different kind of problems, I would
> recommend not going there unless you really can't do otherwise. It's
> been a terrible pain in the neck on Cell with SPEs until I introduced
> vm_insert_pfn() to get rid of them.

That was you?  Thanks.  It makes AXFS possible too.  Long term I may
want to do this anyway.

> > I have a page that is at a hardware level read-only.  What kind of
> > rules can that page live under?  More importantly these PFN's get
> > mapped in with a call to ioremap() in the mtd drivers.  So once I
> > figure out how to SPARSE_MEM, hotplug these pages in I've got to hack
> > the MTD to work with real pages.  Or something like that.  I'm not
> > ready to take that on yet, I just don't understand it all enough yet.
>
> I think vm_normal_page() could use something like pfn_normal() which
> isn't quite the same as pfn_valid()... or just use pfn_valid() but in
> that case, that would mean removing a bunch of the BUG_ON's indeed.

That's exactly what my original patch does.  Would my patch break
spufs?  Nick said my patch would break /dev/mem I think.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-12 23:53       ` Jared Hulbert
@ 2007-11-13  0:24         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-13  0:24 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: Nick Piggin, Linux Memory Management List

On Mon, 2007-11-12 at 15:53 -0800, Jared Hulbert wrote:

> > > I have a page that is at a hardware level read-only.  What kind of
> > > rules can that page live under?  More importantly these PFN's get
> > > mapped in with a call to ioremap() in the mtd drivers.  So once I
> > > figure out how to SPARSE_MEM, hotplug these pages in I've got to
> hack
> > > the MTD to work with real pages.  Or something like that.  I'm not
> > > ready to take that on yet, I just don't understand it all enough
> yet.
> >
> > I think vm_normal_page() could use something like pfn_normal() which
> > isn't quite the same as pfn_valid()... or just use pfn_valid() but
> in
> > that case, that would mean removing a bunch of the BUG_ON's indeed.
> 
> That's exactly what my original patch does.  Would my patch break
> spufs?  Nick said my patch would break /dev/mem I think.

I missed your original patch. Can you resend it to me ? Nick, how would
it break /dev/mem ?

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-12 22:03   ` Jared Hulbert
  2007-11-12 22:29     ` Benjamin Herrenschmidt
@ 2007-11-13 12:08     ` Nick Piggin
  2007-11-14  1:29       ` Jared Hulbert
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2007-11-13 12:08 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: benh, Linux Memory Management List

On Tuesday 13 November 2007 09:03, Jared Hulbert wrote:
> On 11/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > On Saturday 10 November 2007 06:15, Jared Hulbert wrote:
> > > Per conversations regarding XIP from the vm/fs mini-summit a couple
> > > months back I've got a patch to air out.
> > >
> > > The basic problem is that the assumptions about PFN mappings stemming
> > > from the rules of remap_pfn_range() aren't always valid.  For example:
> > > what stops one from using vm_insert_pfn() to map PFN's into a vma in
> > > an arbitrary order?  Nothing.  Yet those PFN's cause problems in two
> > > ways.
> > >
> > > First, vm_normal_page() won't return NULL.
> >
> > They will, because it isn't allowed to be a COW mapping, and hence it
> > fails the vm_normal_page() test.
>
> No.  It doesn't work.  If I have a mapping that doesn't abide by the
> pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
> rule, which is easy to do with vm_insert_pfn(), it won't get to NULL at
> if (pfn == vma->vm_pgoff + off)
> and because we don't expect to get this far with a PFN sometimes I do
> have a is_cow_mapping() that is just a PFN.  Which means I fail the
> second test and get to the print_bad_pte().  At least that's what I
> have captured in the past.  Is that bad?

Well you aren't allowed to put a pfn into an is_cow_mapping() with
vm_insert_pfn(). That's my whole point. Everything should work and
be consistent today. (The real issue is that "work" doesn't work for
you).


> I'm still not sure you quite grasp what I am doing.  You assume the
> map only contains PFN and COW'ed in core pages.  I'm mixing it all up.
>  A given page in a file for AXFS is either backed by a uncompressed
> XIP'able page or a compressed page that needs to be uncompressed to
> RAM to be used (think SquashFS, CramFS, etc.)  So I would have raw
> PFN's that are !pfn_valid() by nature, COW'ed pages that are from the
> raw PFN, and in RAM pages that are backed by a compressed chunk on
> Flash.  What more the raw PFN's are definately not in
> remap_pfn_range() order.

OK, that's fine. So long as your underlying pages (that are pfn_valid())
are able to cope with the refcounting, nothing changes.


> > > My answer to this is to
> > > simply check if pfn_valid()  if it isn't then we've got a proper PFN
> > > that can only be a PFN.  If you do have a valid PFN then you are (A) a
> > > 'cow'ed' PFN that is now a real page or (B) you are a real page
> > > pretending to be a PFN only.  The thing that makes me nervous is that
> > > my hack doesn't let that page pretend to be a PFN.  I can't figure out
> > > why a page would need/want to pretend to be a PFN so I don't see
> > > anything wrong with this, but maybe somebody does.
> > >
> > > Second, there are a few random BUG_ON() that don't seem to serve any
> > > purpose other than to punish the PFN's that don't abide by
> > > remap_pfn_range() rules.  I just get rid of them.  The problem is I
> > > don't really understand why they are there in the first place so for
> > > all I know I'm horribly breaking spufs or something.
> >
> > They are perhaps slightly undercommented, but they are definitely
> > required. And it is to ensure that everything works correctly.
>
> Help me understand this.  It seems to work fine if we remove these.

Oh sure, which is why I say you could do exactly that, but with
*another* VM_flag. Because you'll break subtle things if you change
VM_PFNMAP.


> > > Okay so I haven't tried this out on 2.6.24-rc1 yet, but the same basic
> > > idea worked on 2.6.23 and older.  I just wanted to get feedback on
> > > this approach.  I don't know the vm all that well so I want to make
> > > sure I'm not doing something really stupid that breaks a bunch of code
> > > paths I don't use.
> >
> > You actually can't just use pfn_valid, because there are cases where
> > you actually *cannot* touch the underlying struct page's mapcount,
> > flags, etc. I think the only real user is /dev/mem.
>
> Okay, I don't get why, but that's okay.

/dev/mem gives a window into all memory, and you don't actually want
to take a reference or elevate the mapcount on the actual underlying
pages.

There are also other cases that we may want to use VM_PFNMAP for,
which aren't technically going to break if you refcount them, but it
is suboptimal. Eg. vdso pages -- it might be useful to avoid the
cacheline bouncing of refcounting these.


> > So my suggestion to you, if you want to support COW pfnmaps, is to
> > create a new VM_FLAG type (VM_INVALIDPFNMAP? ;)), which has the
> > pfn_valid() == COW semantics that you want.
>
> I __don't__ want pfn_valid() == COW.  I want pfn_valid() ==
> is_real_RAM_page().  That real RAM page is not necessarily COW'ed yet.
>   Remember I want a mapping that contains some Flash back pages and
> some RAM backed pages.

Yeah sure OK. The only thing that really matters is pfn_valid() ==
page with a valid struct page, which should be refcounted.


> > BUG_ON((vma->vm_flags & VM_JAREDMAP) && pfn_valid(pfn));
>
> Okay, maybe.  I got to look at this more carefully.

OK, well this would prevent you putting improperly refcounted
pfn_valid() pages into the pagetables with vm_insert_pfn().

Insert the pfn_valid() pages with vm_insert_page(), which I think
should take care of all those issues for you.


> > May not work out so easy, but AFAIKS it will work. See how much milage
> > that gets you.
> >
> > The other thing you might like is to allow pfn_valid(pfn) pfns to go
> > into these mappings, and you know it is fine to twiddle with the
> > struct page (eg. if you want to switch between different pfns, which
> > I know the spufs guys want to). That's not too hard: just take out some
> > of the assertions. You might have to do a little bit of setup work too,
> > like increment the page count and mapcount etc. but just so long as you
> > put that in a mm/memory.c helper rather than your own code, it should
> > be clean enough.
>
> Okay.... I don't understand how to do that.  These PFN's are from an
> MTD partition.  They don't have a page structs.  So I don't mind
> having real page structs backing the Flash pages being used here.  It
> would make it unnecessary to tweak the filemap_xip.c stuff, eventually
> it will be useful for doing read/write XIP stuff.  However, I just
> really don't get how to even start that.

No sorry, I didn't word that very well: so long as the pages you
have which _are_ pfn_valid() do have valid and properly refcounted
struct pages, then inserting them as normal pages into the VM should
be fine.

By properly refcounted, I mean that page->_count isn't 0, and that
you are prepared for the page to be freed when the user mappings go
away *if* you have dropped your own reference. Just common sense
stuff really.

When I waffled on about doing a bit of setup work, I'd forgotten
about vm_insert_page(), which should already do just about everything
you need.


> I have a page that is at a hardware level read-only.  What kind of
> rules can that page live under?  More importantly these PFN's get
> mapped in with a call to ioremap() in the mtd drivers.  So once I
> figure out how to SPARSE_MEM, hotplug these pages in I've got to hack
> the MTD to work with real pages.  Or something like that.  I'm not
> ready to take that on yet, I just don't understand it all enough yet.

These pages could live under the !pfn_valid() rules, which, in your
new VM_flag scheme, should not require underlying struct pages. So
hopefully don't need messing with sparsemem?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-14  1:29       ` Jared Hulbert
@ 2007-11-13 17:26         ` Nick Piggin
  2007-11-14 18:52           ` Jared Hulbert
  2007-11-16 23:42           ` Jared Hulbert
  0 siblings, 2 replies; 12+ messages in thread
From: Nick Piggin @ 2007-11-13 17:26 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: benh, Linux Memory Management List

On Wednesday 14 November 2007 12:29, Jared Hulbert wrote:
> > Well you aren't allowed to put a pfn into an is_cow_mapping() with
> > vm_insert_pfn().  That's my whole point.
>
> Why not?

Because it breaks VM_PFNMAP as you saw. *This* is why vm_normal_page()
does actually work correctly with vm_insert_pfn() and VM_PFNMAP today :)
Because they all work together to ensure that vm_insert_pfn's "breakage"
of the vm_pgoff you say isn't actually broken.

> Maybe I don't understand what this really is.  I want to be 
> able to COW from pfn only pages.  Wouldn't this restriction cramp my
> style?  Or is it that you can't tolerate pfn's in a VM_PFNMAP vma?

Yes, it's simply a question of implementation (and one which is required
for /dev/mem). So all we have to do really is to create a new type of
mapping for you.

And because /dev/mem is out of the picture, so is the requirement of
mapping pfn_valid() pages without refcounting them. The sketch I gave
in the first post *should* be on the right way

I can write the patch for you if you like, but if you'd like a shot at
it, that would be great!

> > Insert the pfn_valid() pages with vm_insert_page(), which I think
> > should take care of all those issues for you.
>
> Right.  So that's probably what I've been doing indirectly, with
> .nopage/.fault?

If it hasn't been going oops, yes it's probably what's happening.
And that would be a valid thing for you to do -- if you return the
page via fault(), it will get refcounted for you, no need for
vm_insert_page().

> > When I waffled on about doing a bit of setup work, I'd forgotten
> > about vm_insert_page(), which should already do just about everything
> > you need.
>
> So long as I just us vm_insert_page() and don't screw around with
> anything else, I'm good right?

Actually, I have a patch to unify ->fault and ->nopfn which might
make it quite neat for you. From your fault handler, you could
decide either to do the vm_insert_pfn(), or return the the struct
page to the generic code, and not worry about vm_insert_page at all.

> > These pages could live under the !pfn_valid() rules, which, in your
> > new VM_flag scheme, should not require underlying struct pages. So
> > hopefully don't need messing with sparsemem?
>
> But say I want to do more, like migrate them and such, won't I want to
> have some kind of page struct?

But most of the complexity of migrating pages goes away if you are
only dealing with pfns that you control, I suspect. Ie. you can
just unmap all pagetables mapping them, and prevent your fault handler
from giving out new references to the pfn until everything is switched
over (or, if that would be too slow, have the fault handler flip a
switch causing the migration to fail/retry).

For your struct page backed pages, if those guys ever are allowed onto
the LRU or into pagecache, or via get_user_pages(), then yes they should
go through the full migration path.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-13 12:08     ` Nick Piggin
@ 2007-11-14  1:29       ` Jared Hulbert
  2007-11-13 17:26         ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Jared Hulbert @ 2007-11-14  1:29 UTC (permalink / raw)
  To: Nick Piggin; +Cc: benh, Linux Memory Management List

> Well you aren't allowed to put a pfn into an is_cow_mapping() with
> vm_insert_pfn().  That's my whole point.

Why not?  Maybe I don't understand what this really is.  I want to be
able to COW from pfn only pages.  Wouldn't this restriction cramp my
style?  Or is it that you can't tolerate pfn's in a VM_PFNMAP vma?

> Oh sure, which is why I say you could do exactly that, but with
> *another* VM_flag. Because you'll break subtle things if you change
> VM_PFNMAP.

Okay so I'll code that up and see if I get what you are saying here.

> /dev/mem gives a window into all memory, and you don't actually want
> to take a reference or elevate the mapcount on the actual underlying
> pages.
>
> There are also other cases that we may want to use VM_PFNMAP for,
> which aren't technically going to break if you refcount them, but it
> is suboptimal. Eg. vdso pages -- it might be useful to avoid the
> cacheline bouncing of refcounting these.

Okay I see.

> Yeah sure OK. The only thing that really matters is pfn_valid() ==
> page with a valid struct page, which should be refcounted.

That seems clear to me.

> > > BUG_ON((vma->vm_flags & VM_JAREDMAP) && pfn_valid(pfn));
> >
> > Okay, maybe.  I got to look at this more carefully.
>
> OK, well this would prevent you putting improperly refcounted
> pfn_valid() pages into the pagetables with vm_insert_pfn().

Of course now I get it.

> Insert the pfn_valid() pages with vm_insert_page(), which I think
> should take care of all those issues for you.

Right.  So that's probably what I've been doing indirectly, with .nopage/.fault?

> No sorry, I didn't word that very well: so long as the pages you
> have which _are_ pfn_valid() do have valid and properly refcounted
> struct pages, then inserting them as normal pages into the VM should
> be fine.
>
> By properly refcounted, I mean that page->_count isn't 0, and that
> you are prepared for the page to be freed when the user mappings go
> away *if* you have dropped your own reference. Just common sense
> stuff really.
>
> When I waffled on about doing a bit of setup work, I'd forgotten
> about vm_insert_page(), which should already do just about everything
> you need.

So long as I just us vm_insert_page() and don't screw around with
anything else, I'm good right?

> These pages could live under the !pfn_valid() rules, which, in your
> new VM_flag scheme, should not require underlying struct pages. So
> hopefully don't need messing with sparsemem?

But say I want to do more, like migrate them and such, won't I want to
have some kind of page struct?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-13 17:26         ` Nick Piggin
@ 2007-11-14 18:52           ` Jared Hulbert
  2007-11-16 23:42           ` Jared Hulbert
  1 sibling, 0 replies; 12+ messages in thread
From: Jared Hulbert @ 2007-11-14 18:52 UTC (permalink / raw)
  To: Nick Piggin; +Cc: benh, Linux Memory Management List

> On Wednesday 14 November 2007 12:29, Jared Hulbert wrote:
> > > Well you aren't allowed to put a pfn into an is_cow_mapping() with
> > > vm_insert_pfn().  That's my whole point.
> >
> > Why not?
>
> Because it breaks VM_PFNMAP as you saw. *This* is why vm_normal_page()
> does actually work correctly with vm_insert_pfn() and VM_PFNMAP today :)
> Because they all work together to ensure that vm_insert_pfn's "breakage"
> of the vm_pgoff you say isn't actually broken.

oh okay I get it.

> Actually, I have a patch to unify ->fault and ->nopfn which might
> make it quite neat for you. From your fault handler, you could
> decide either to do the vm_insert_pfn(), or return the the struct
> page to the generic code, and not worry about vm_insert_page at all.

Where? mm tree?  I saw that in mm tree a while ago, of course I'm
pretty sure the pfn path was very broken.  Assuming it was fixed since
then should I go ahead and develop off that?

> But most of the complexity of migrating pages goes away if you are
> only dealing with pfns that you control, I suspect. Ie. you can
> just unmap all pagetables mapping them, and prevent your fault handler
> from giving out new references to the pfn until everything is switched
> over (or, if that would be too slow, have the fault handler flip a
> switch causing the migration to fail/retry).
>
> For your struct page backed pages, if those guys ever are allowed onto
> the LRU or into pagecache, or via get_user_pages(), then yes they should
> go through the full migration path.

Okay yeah, I suppose if I control the memory, there isn't too much to
be concerned about.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-13 17:26         ` Nick Piggin
  2007-11-14 18:52           ` Jared Hulbert
@ 2007-11-16 23:42           ` Jared Hulbert
  2007-11-19  0:17             ` Nick Piggin
  1 sibling, 1 reply; 12+ messages in thread
From: Jared Hulbert @ 2007-11-16 23:42 UTC (permalink / raw)
  To: Nick Piggin; +Cc: benh, Linux Memory Management List

> And because /dev/mem is out of the picture, so is the requirement of
> mapping pfn_valid() pages without refcounting them. The sketch I gave
> in the first post *should* be on the right way
>
> I can write the patch for you if you like, but if you'd like a shot at
> it, that would be great!


I haven't tested this yet and this mailer is broken, I'm just hoping
to get a little visual review.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 520238c..bc1e627 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -105,6 +105,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_ALWAYSDUMP	0x04000000	/* Always include in core dumps */

 #define VM_CAN_NONLINEAR 0x08000000	/* Has ->fault & does nonlinear pages */
+#define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure
PFN pages */

 #ifndef VM_STACK_DEFAULT_FLAGS		/* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
diff --git a/mm/memory.c b/mm/memory.c
index 4bf0b6d..9b3a8ee 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -361,30 +361,46 @@ static inline int is_cow_mapping(unsigned int flags)
 }

 /*
- * This function gets the "struct page" associated with a pte.
+ * This function gets the "struct page" associated with a pte or returns
+ * NULL if no "struct page" is associated with the pte.
  *
- * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
- * will have each page table entry just pointing to a raw page frame
- * number, and as far as the VM layer is concerned, those do not have
- * pages associated with them - even if the PFN might point to memory
- * that otherwise is perfectly fine and has a "struct page".
+ * VM_PFNMAP mappings do not have "struct pages" with exception of COW'ed
+ * pages. A raw PFN mapping will have each page table entry just pointing
+ * to a raw page frame number, and as far as the VM layer is concerned,
+ * those do not have pages associated with them - even if the PFN might
+ * point to memory that otherwise is perfectly fine and has a "struct page".
  *
- * The way we recognize those mappings is through the rules set up
+ * The way we recognize VM_PFNMAP mappings is through the rules set up
  * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
  * and the vm_pgoff will point to the first PFN mapped: thus every
  * page that is a raw mapping will always honor the rule
  *
  *	pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
  *
- * and if that isn't true, the page has been COW'ed (in which case it
- * _does_ have a "struct page" associated with it even if it is in a
- * VM_PFNMAP range).
+ * A call to vm_normal_page() will return NULL for such a page.
+ *
+ * If the page doesn't follow the "remap_pfn_range()" rule in a VM_PFNMAP
+ * then the page has been COW'ed.  A COW'ed page _does_ have a "struct page"
+ * associated with it even if it is in a VM_PFNMAP range.  Calling
+ * vm_normal_page() on such a page will therefore return the "struct page".
+ *
+ * VM_MIXEDMAP mappings can contain pages that either are raw PFN
+ * mappings or normal pages with associated "struct page".  Raw PFN mappings
+ * in a VM_MIXEDMAP do not need to follow the "remap_pfn_range()" rules.
+ * A call to vm_normal_page() with a VM_MIXEDMAP mapping will return the
+ * associated "struct page" or NULL for memory not backed by a "struct page".
  */
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long
addr, pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);

-	if (unlikely(vma->vm_flags & VM_PFNMAP)) {
+	if (unlikely(vma->vm_flags & VM_PFNMAP|VM_MIXEDMAP)) {
+		if (vma->vm_flags & VM_MIXEDMAP) {
+			if (!pfn_valid(pfn))
+				return NULL;
+			return pfn_to_page(pfn);
+		}
+
 		unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
 		if (pfn == vma->vm_pgoff + off)
 			return NULL;
@@ -1211,8 +1227,9 @@ int vm_insert_pfn(struct vm_area_struct *vma,
unsigned long addr,
 	pte_t *pte, entry;
 	spinlock_t *ptl;

-	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
-	BUG_ON(is_cow_mapping(vma->vm_flags));
+	BUG_ON(!(vma->vm_flags & VM_PFNMAP|VM_MIXEDMAP));
+	BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
+	BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_valid(pfn));

 	retval = -ENOMEM;
 	pte = get_locked_pte(mm, addr, &ptl);
@@ -2386,8 +2403,9 @@ static noinline int do_no_pfn(struct mm_struct
*mm, struct vm_area_struct *vma,
 	unsigned long pfn;

 	pte_unmap(page_table);
-	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
-	BUG_ON(is_cow_mapping(vma->vm_flags));
+	BUG_ON(!(vma->vm_flags & VM_PFNMAP|VM_MIXEDMAP));
+	BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
+	BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_valid(pfn));

 	pfn = vma->vm_ops->nopfn(vma, address & PAGE_MASK);
 	if (unlikely(pfn == NOPFN_OOM))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Changing VM_PFNMAP assumptions and rules
  2007-11-16 23:42           ` Jared Hulbert
@ 2007-11-19  0:17             ` Nick Piggin
  0 siblings, 0 replies; 12+ messages in thread
From: Nick Piggin @ 2007-11-19  0:17 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: benh, Linux Memory Management List

On Saturday 17 November 2007 10:42, Jared Hulbert wrote:
> > And because /dev/mem is out of the picture, so is the requirement of
> > mapping pfn_valid() pages without refcounting them. The sketch I gave
> > in the first post *should* be on the right way
> >
> > I can write the patch for you if you like, but if you'd like a shot at
> > it, that would be great!
>
> I haven't tested this yet and this mailer is broken, I'm just hoping
> to get a little visual review.

No comments, other than, it looks good to me and I wouldn't see any
problems in getting it merged if it is able to solve your problems.

VM_MIXEDMAP is not a bad name, either ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-11-19  0:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-09 19:15 [RFC] Changing VM_PFNMAP assumptions and rules Jared Hulbert
2007-11-11  0:09 ` Nick Piggin
2007-11-12 22:03   ` Jared Hulbert
2007-11-12 22:29     ` Benjamin Herrenschmidt
2007-11-12 23:53       ` Jared Hulbert
2007-11-13  0:24         ` Benjamin Herrenschmidt
2007-11-13 12:08     ` Nick Piggin
2007-11-14  1:29       ` Jared Hulbert
2007-11-13 17:26         ` Nick Piggin
2007-11-14 18:52           ` Jared Hulbert
2007-11-16 23:42           ` Jared Hulbert
2007-11-19  0:17             ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox