linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Carsten Otte <cotte@de.ibm.com>
From: Jared Hulbert <jaredeh@gmail.com>
From: Carsten Otte <cotte@de.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: carsteno@de.ibm.com, Jared Hulbert <jaredeh@gmail.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>
Subject: [rfc][patch 2/4] mm: introduce VM_MIXEDMAP
Date: Wed, 09 Jan 2008 16:14:10 +0100	[thread overview]
Message-ID: <1199891651.28689.23.camel@cotte.boeblingen.de.ibm.com> (raw)
In-Reply-To: <1199891032.28689.9.camel@cotte.boeblingen.de.ibm.com>

mm: introduce VM_MIXEDMAP

Introduce a new type of mapping, VM_MIXEDMAP. This is unlike VM_PFNMAP in
that it can support COW mappings of arbitrary ranges including ranges without
struct page (PFNMAP can only support COW in those cases where the un-COW-ed
translations are mapped linearly in the virtual address).

VM_MIXEDMAP achieves this by refcounting pages with mixedmap_refcount_pte(pte)
being non-zero, and not refcounting !mixedmap_refcount_pte(pte) pages
(which is not an option for VM_PFNMAP, because it needs to avoid refcounting
pfn_valid pages eg. for /dev/mem mappings).

The core VM calls pte_set_norefcount(__pte) for each PTE that is being created
by vm_insert_pfn and do_no_pfn for a vma that has VM_MIXEDMAP set.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
--- 
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -106,6 +106,7 @@ extern unsigned int kobjsize(const void 
 #define VM_ALWAYSDUMP	0x04000000	/* Always include in core dumps */
 
 #define VM_CAN_NONLINEAR 0x08000000	/* Has ->fault & does nonlinear pages */
+#define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure PFN pages */
 
 #ifndef VM_STACK_DEFAULT_FLAGS		/* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -361,35 +361,65 @@ static inline int is_cow_mapping(unsigne
 }
 
 /*
- * This function gets the "struct page" associated with a pte.
+ * This function gets the "struct page" associated with a pte or returns
+ * NULL if no "struct page" is associated with the pte.
  *
- * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
- * will have each page table entry just pointing to a raw page frame
- * number, and as far as the VM layer is concerned, those do not have
- * pages associated with them - even if the PFN might point to memory
+ * A raw VM_PFNMAP mapping (ie. one that is not COWed) may not have any "struct
+ * page" backing, and even if they do, they are not refcounted. COWed pages of
+ * a VM_PFNMAP do always have a struct page, and they are normally refcounted
+ * (they are _normal_ pages).
+ *
+ * So a raw PFNMAP mapping will have each page table entry just pointing
+ * to a page frame number, and as far as the VM layer is concerned, those do
+ * not have pages associated with them - even if the PFN might point to memory
  * that otherwise is perfectly fine and has a "struct page".
  *
- * The way we recognize those mappings is through the rules set up
- * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
- * and the vm_pgoff will point to the first PFN mapped: thus every
+ * The way we recognize COWed pages within VM_PFNMAP mappings is through the
+ * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit
+ * set, and the vm_pgoff will point to the first PFN mapped: thus every
  * page that is a raw mapping will always honor the rule
  *
  *	pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
  *
- * and if that isn't true, the page has been COW'ed (in which case it
- * _does_ have a "struct page" associated with it even if it is in a
- * VM_PFNMAP range).
+ * A call to vm_normal_page() will return NULL for such a page.
+ *
+ * If the page doesn't follow the "remap_pfn_range()" rule in a VM_PFNMAP
+ * then the page has been COW'ed.  A COW'ed page _does_ have a "struct page"
+ * associated with it even if it is in a VM_PFNMAP range.  Calling
+ * vm_normal_page() on such a page will therefore return the "struct page".
+ *
+ *
+ * VM_MIXEDMAP mappings can likewise contain memory with or without "struct
+ * page" backing, however the difference is that _all_ pages with a struct
+ * page (that is, those where mixedmap_refcount_pte is true) are refcounted
+ * and considered normal pages by the VM. The disadvantage is that pages are
+ * refcounted (which can be slower and simply not an option for some PFNMAP
+ * users). The advantage is that we don't have to follow the strict linearity
+ * rule of PFNMAP mappings in order to support COWable mappings.
+ *
+ * A call to vm_normal_page() with a VM_MIXEDMAP mapping will return the
+ * associated "struct page" or NULL for memory not backed by a "struct page".
+ *
+ *
+ * All other mappings should have a valid struct page, which will be
+ * returned by a call to vm_normal_page().
  */
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
 
-	if (unlikely(vma->vm_flags & VM_PFNMAP)) {
-		unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
-		if (pfn == vma->vm_pgoff + off)
-			return NULL;
-		if (!is_cow_mapping(vma->vm_flags))
-			return NULL;
+	if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
+		if (vma->vm_flags & VM_MIXEDMAP) {
+			if (!mixedmap_refcount_pte(pte))
+				return NULL;
+			goto out;
+		} else {
+			unsigned long off = (addr-vma->vm_start) >> PAGE_SHIFT;
+			if (pfn == vma->vm_pgoff + off)
+				return NULL;
+			if (!is_cow_mapping(vma->vm_flags))
+				return NULL;
+		}
 	}
 
 	/*
@@ -410,6 +440,7 @@ struct page *vm_normal_page(struct vm_ar
 	 * The PAGE_ZERO() pages and various VDSO mappings can
 	 * cause them to exist.
 	 */
+out:
 	return pfn_to_page(pfn);
 }
 
@@ -1211,8 +1242,10 @@ int vm_insert_pfn(struct vm_area_struct 
 	pte_t *pte, entry;
 	spinlock_t *ptl;
 
-	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
-	BUG_ON(is_cow_mapping(vma->vm_flags));
+	BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)));
+	BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) ==
+						(VM_PFNMAP|VM_MIXEDMAP));
+	BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
 
 	retval = -ENOMEM;
 	pte = get_locked_pte(mm, addr, &ptl);
@@ -1222,8 +1255,11 @@ int vm_insert_pfn(struct vm_area_struct 
 	if (!pte_none(*pte))
 		goto out_unlock;
 
+
 	/* Ok, finally just insert the thing.. */
 	entry = pfn_pte(pfn, vma->vm_page_prot);
+	if (vma->vm_flags & VM_MIXEDMAP)
+		entry = pte_set_norefcount(entry);
 	set_pte_at(mm, addr, pte, entry);
 	update_mmu_cache(vma, addr, entry);
 
@@ -2386,10 +2422,11 @@ static noinline int do_no_pfn(struct mm_
 	unsigned long pfn;
 
 	pte_unmap(page_table);
-	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
-	BUG_ON(is_cow_mapping(vma->vm_flags));
+	BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)));
+	BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
 
 	pfn = vma->vm_ops->nopfn(vma, address & PAGE_MASK);
+
 	if (unlikely(pfn == NOPFN_OOM))
 		return VM_FAULT_OOM;
 	else if (unlikely(pfn == NOPFN_SIGBUS))
@@ -2404,6 +2441,8 @@ static noinline int do_no_pfn(struct mm_
 		entry = pfn_pte(pfn, vma->vm_page_prot);
 		if (write_access)
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		if (vma->vm_flags & VM_MIXEDMAP)
+			entry = pte_set_norefcount(entry);
 		set_pte_at(mm, address, page_table, entry);
 	}
 	pte_unmap_unlock(page_table, ptl);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-01-09 15:14 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-14 13:38 [rfc][patch 1/2] mm: introduce VM_MIXEDMAP mappings Nick Piggin
2007-12-14 13:41 ` [rfc][patch 2/2] xip: support non-struct page memory Nick Piggin
2007-12-14 13:46   ` Carsten Otte
2007-12-15  1:07     ` Jared Hulbert
2007-12-15  1:17       ` Nick Piggin
2007-12-15  6:47         ` Jared Hulbert
2007-12-19 14:04   ` Carsten Otte
2007-12-20  9:23     ` Jared Hulbert
2007-12-21  0:40       ` Nick Piggin
2007-12-20 13:53   ` Carsten Otte
2007-12-20 14:33     ` Carsten Otte
2007-12-20 14:50       ` Carsten Otte
2007-12-20 17:24         ` Jared Hulbert
2007-12-21  0:12           ` Jared Hulbert
2007-12-21  0:56             ` Nick Piggin
2007-12-21  9:56             ` Carsten Otte
2007-12-21  9:49           ` Carsten Otte
2007-12-21  0:50         ` Nick Piggin
2007-12-21 10:02           ` Carsten Otte
2007-12-21 10:14             ` Nick Piggin
2007-12-21 10:17               ` Carsten Otte
2007-12-21 10:23                 ` Nick Piggin
2007-12-21 10:31                   ` Carsten Otte
2007-12-21  0:45       ` Nick Piggin
2007-12-21 10:05         ` Carsten Otte
2007-12-21 10:20           ` Nick Piggin
2007-12-21 10:35             ` Carsten Otte
2007-12-21 10:47               ` Nick Piggin
2007-12-21 19:29                 ` Martin Schwidefsky
2008-01-07  4:43                   ` [rfc][patch] mm: use a pte bit to flag normal pages Nick Piggin
2008-01-07 10:30                     ` Russell King
2008-01-07 11:14                       ` Nick Piggin
2008-01-07 18:49                       ` Jared Hulbert
2008-01-07 19:45                         ` Russell King
2008-01-07 22:52                           ` Jared Hulbert
2008-01-08  2:37                           ` Andi Kleen
2008-01-08  2:49                             ` Nick Piggin
2008-01-08  3:31                               ` Andi Kleen
2008-01-08  3:52                                 ` Nick Piggin
2008-01-08 10:11                           ` Catalin Marinas
2008-01-08 10:52                             ` Russell King
2008-01-08 13:54                               ` Catalin Marinas
2008-01-08 14:08                                 ` Russell King
2008-01-10 13:33                     ` Carsten Otte
2008-01-10 23:18                       ` Nick Piggin
2008-01-08  9:35                 ` [rfc][patch 0/4] VM_MIXEDMAP patchset with s390 backend Carsten Otte
2008-01-08 10:08                   ` Nick Piggin
2008-01-08 11:34                     ` Carsten Otte
2008-01-08 11:55                       ` Nick Piggin
2008-01-08 12:03                         ` Carsten Otte
2008-01-08 13:56                       ` Jörn Engel
2008-01-08 14:51                         ` Carsten Otte
2008-01-08 18:09                           ` Jared Hulbert
2008-01-08 22:12                             ` Nick Piggin
2008-01-09 15:14                   ` [rfc][patch 0/4] VM_MIXEDMAP patchset with s390 backend v2 Carsten Otte
     [not found]                   ` <1199891032.28689.9.camel@cotte.boeblingen.de.ibm.com>
2008-01-09 15:14                     ` [rfc][patch 1/4] include: add callbacks to toggle reference counting for VM_MIXEDMAP pages Carsten Otte, Carsten Otte
2008-01-09 17:31                       ` Martin Schwidefsky
2008-01-09 18:17                       ` Jared Hulbert
2008-01-10  7:59                         ` Carsten Otte
2008-01-10 20:01                           ` Jared Hulbert
2008-01-11  8:45                             ` Carsten Otte
2008-01-13  2:44                               ` Nick Piggin
2008-01-14 11:36                                 ` Carsten Otte
2008-01-16  4:04                                   ` Nick Piggin
2008-01-15 13:05                                 ` Carsten Otte
2008-01-16  4:22                                   ` Nick Piggin
2008-01-16 14:29                                     ` [rft] updated xip patch rollup Nick Piggin
2008-01-17 10:24                                       ` Carsten Otte
2008-01-10 20:23                           ` [rfc][patch 1/4] include: add callbacks to toggle reference counting for VM_MIXEDMAP pages Jared Hulbert
2008-01-11  8:32                             ` Carsten Otte
2008-01-10  0:20                       ` Nick Piggin
2008-01-10  8:06                         ` Carsten Otte
2008-01-09 15:14                     ` Carsten Otte, Jared Hulbert, Carsten Otte [this message]
2008-01-09 15:14                     ` [rfc][patch 3/4] Convert XIP to support non-struct page backed memory Carsten Otte, Nick Piggin
2008-01-09 15:14                     ` [rfc][patch 4/4] s390: remove struct page entries for DCSS memory segments Carsten Otte, Carsten Otte
     [not found]                 ` <1199784196.25114.11.camel@cotte.boeblingen.de.ibm.com>
2008-01-08  9:35                   ` [rfc][patch 1/4] mm: introduce VM_MIXEDMAP Carsten Otte, Jared Hulbert, Carsten Otte
2008-01-08  9:35                   ` [rfc][patch 2/4] xip: support non-struct page memory Carsten Otte, Nick Piggin, Carsten Otte
2008-01-08  9:36                   ` [rfc][patch 3/4] s390: remove sturct page entries for z/VM DCSS memory segments Carsten Otte
2008-01-08  9:36                   ` [rfc][patch 4/4] s390: mixedmap_refcount_pfn implementation using list walk Carsten Otte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1199891651.28689.23.camel@cotte.boeblingen.de.ibm.com \
    --to=cotte@de.ibm.com \
    --cc=carsteno@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jaredeh@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox