From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, nacc@us.ibm.com, ak@suse.de,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
clameter@sgi.com
Subject: [PATCH/RFC 6/11] Shared Policy: Factor alloc_page_pol routine
Date: Mon, 25 Jun 2007 15:53:05 -0400 [thread overview]
Message-ID: <20070625195305.21210.37682.sendpatchset@localhost> (raw)
In-Reply-To: <20070625195224.21210.89898.sendpatchset@localhost>
Shared Mapped File Policy 6/11 Factor alloc_page_pol routine
Against 2.6.22-rc4-mm2
Implement alloc_page_pol() to allocate a page given a policy and
an offset [for interleaving]. No vma nor addr needed. This
function will be used to allocate page_cache pages given the
policy at a given page offset in a subsequent patch.
Revise alloc_page_vma() to just call alloc_page_pol() after looking
up the vma policy, to eliminate duplicate code. This change rippled
into the interleaving functions. I was able to eliminate
interleave_nid() by computing the offset at the call sites and
calling [modified] offset_il_node() directly.
removed vma arg from offset_il_node(), as it wasn't
used and is not available when called from
alloc_page_pol().
Note: re: alloc_page_vma() -- can be called w/ vma == NULL via
read_swap_cache_async() from swapin_readahead(). Can't compute
a page offset in this case. This means that pages read by swap
readahead don't/can't follow vma policy. This is current
behavior.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/gfp.h | 3 +
include/linux/hugetlb.h | 9 ++++
include/linux/mempolicy.h | 2 +
include/linux/mm.h | 6 ++-
mm/mempolicy.c | 89 ++++++++++++++++++++++++++--------------------
5 files changed, 71 insertions(+), 38 deletions(-)
Index: Linux/include/linux/gfp.h
===================================================================
--- Linux.orig/include/linux/gfp.h 2007-06-25 14:58:25.000000000 -0400
+++ Linux/include/linux/gfp.h 2007-06-25 14:58:57.000000000 -0400
@@ -192,10 +192,13 @@ alloc_pages(gfp_t gfp_mask, unsigned int
}
extern struct page *alloc_page_vma(gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr);
+struct mempolicy;
+extern struct page *alloc_page_pol(gfp_t, struct mempolicy *, pgoff_t);
#else
#define alloc_pages(gfp_mask, order) \
alloc_pages_node(numa_node_id(), gfp_mask, order)
#define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
+#define alloc_page_pol(gfp_mask, pol, off) alloc_pages(gfp_mask, 0)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
Index: Linux/include/linux/hugetlb.h
===================================================================
--- Linux.orig/include/linux/hugetlb.h 2007-06-25 14:58:25.000000000 -0400
+++ Linux/include/linux/hugetlb.h 2007-06-25 14:58:57.000000000 -0400
@@ -14,6 +14,14 @@ static inline int is_vm_hugetlb_page(str
return vma->vm_flags & VM_HUGETLB;
}
+static inline int vma_page_shift(struct vm_area_struct *vma)
+{
+ if (unlikely(is_vm_hugetlb_page(vma)))
+ return HPAGE_SHIFT;
+ else
+ return PAGE_SHIFT;
+}
+
int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
@@ -127,6 +135,7 @@ static inline unsigned long hugetlb_tota
#define HPAGE_MASK PAGE_MASK /* Keep the compiler happy */
#define HPAGE_SIZE PAGE_SIZE
#endif
+#define vma_page_shift(VMA) PAGE_SHIFT
#endif /* !CONFIG_HUGETLB_PAGE */
Index: Linux/include/linux/mempolicy.h
===================================================================
--- Linux.orig/include/linux/mempolicy.h 2007-06-25 14:58:25.000000000 -0400
+++ Linux/include/linux/mempolicy.h 2007-06-25 14:58:57.000000000 -0400
@@ -124,6 +124,8 @@ extern int mpol_parse_options(char *valu
nodemask_t *policy_nodes);
extern struct mempolicy default_policy;
+extern struct mempolicy *get_file_policy(struct task_struct *,
+ struct address_space *, pgoff_t);
extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
unsigned long addr, gfp_t gfp_flags);
extern unsigned slab_node(struct mempolicy *policy);
Index: Linux/include/linux/mm.h
===================================================================
--- Linux.orig/include/linux/mm.h 2007-06-25 14:58:25.000000000 -0400
+++ Linux/include/linux/mm.h 2007-06-25 14:58:57.000000000 -0400
@@ -1058,11 +1058,15 @@ extern void setup_per_cpu_pageset(void);
/*
* Address to offset for shared mapping policy lookup.
+ * When used for interleaving hugepagefs pages [when shift
+ * == HPAGE_SHIFT], actually returns hugepage offset in
+ * mapping; NOT file page offset.
*/
static inline pgoff_t vma_addr_to_pgoff(struct vm_area_struct *vma,
unsigned long addr, int shift)
{
- return ((addr - vma->vm_start) >> shift) + vma->vm_pgoff;
+ return ((addr - vma->vm_start) >> shift) +
+ (vma->vm_pgoff >> (shift - PAGE_SHIFT));
}
static inline pgoff_t vma_pgoff_to_addr(struct vm_area_struct *vma,
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2007-06-25 14:58:25.000000000 -0400
+++ Linux/mm/mempolicy.c 2007-06-25 14:58:57.000000000 -0400
@@ -21,6 +21,7 @@
*
* bind Only allocate memory on a specific set of nodes,
* no fallback.
+//TODO: following still applicable?
* FIXME: memory is allocated starting with the first node
* to the last. It would be better if bind would truly restrict
* the allocation to memory nodes instead
@@ -35,6 +36,7 @@
* use the process policy. This is what Linux always did
* in a NUMA aware kernel and still does by, ahem, default.
*
+//TODO: following needs paragraph rewording. haven't figured out what to say.
* The process policy is applied for most non interrupt memory allocations
* in that process' context. Interrupts ignore the policies and always
* try to allocate on the local CPU. The VMA policy is only applied for memory
@@ -50,15 +52,18 @@
* Same with GFP_DMA allocations.
*
* For shmfs/tmpfs/hugetlbfs shared memory the policy is shared between
- * all users and remembered even when nobody has memory mapped.
+ * all users and remembered even when nobody has memory mapped. Shared
+ * policies handle sub-ranges of the object using a red/black tree.
+ *
+ * For mmap()ed files, the policy is shared between all 'SHARED mappers
+ * and is remembered as long as the inode exists. Private mappings
+ * still use vma policy for COWed pages, but use the shared policy
+ * [default, if none] for initial and read-only faults.
*/
/* Notebook:
- fix mmap readahead to honour policy and enable policy for any page cache
- object
statistics for bigpages
- global policy for page cache? currently it uses process policy. Requires
- first item above.
+ global policy for page cache?
handle mremap for shared memory (currently ignored for the policy)
grows down?
make bind policy root only? It can trigger oom much faster and the
@@ -1135,6 +1140,22 @@ static struct mempolicy * get_vma_policy
return pol;
}
+/*
+ * Return effective policy for file [address_space] at pgoff
+ */
+struct mempolicy *get_file_policy(struct task_struct *task,
+ struct address_space *x, pgoff_t pgoff)
+{
+ struct shared_policy *sp = x->spolicy;
+ struct mempolicy *pol = task->mempolicy;
+
+ if (sp)
+ pol = mpol_shared_policy_lookup(sp, pgoff);
+ if (!pol)
+ pol = &default_policy;
+ return pol;
+}
+
/* Return a zonelist representing a mempolicy */
static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy)
{
@@ -1207,9 +1228,8 @@ unsigned slab_node(struct mempolicy *pol
}
}
-/* Do static interleaving for a VMA with known offset. */
-static unsigned offset_il_node(struct mempolicy *pol,
- struct vm_area_struct *vma, unsigned long off)
+/* Do static interleaving for a policy with known offset. */
+static unsigned offset_il_node(struct mempolicy *pol, pgoff_t off)
{
unsigned nnodes = nodes_weight(pol->v.nodes);
unsigned target = (unsigned)off % nnodes;
@@ -1224,28 +1244,6 @@ static unsigned offset_il_node(struct me
return nid;
}
-/* Determine a node number for interleave */
-static inline unsigned interleave_nid(struct mempolicy *pol,
- struct vm_area_struct *vma, unsigned long addr, int shift)
-{
- if (vma) {
- unsigned long off;
-
- /*
- * for small pages, there is no difference between
- * shift and PAGE_SHIFT, so the bit-shift is safe.
- * for huge pages, since vm_pgoff is in units of small
- * pages, we need to shift off the always 0 bits to get
- * a useful offset.
- */
- BUG_ON(shift < PAGE_SHIFT);
- off = vma->vm_pgoff >> (shift - PAGE_SHIFT);
- off += (addr - vma->vm_start) >> shift;
- return offset_il_node(pol, vma, off);
- } else
- return interleave_nodes(pol);
-}
-
#ifdef CONFIG_HUGETLBFS
/* Return a zonelist suitable for a huge page allocation. */
struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
@@ -1256,7 +1254,8 @@ struct zonelist *huge_zonelist(struct vm
if (pol->policy == MPOL_INTERLEAVE) {
unsigned nid;
- nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
+ nid = offset_il_node(pol,
+ vma_addr_to_pgoff(vma, addr, HPAGE_SHIFT));
return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags);
}
return zonelist_policy(GFP_HIGHUSER, pol);
@@ -1278,6 +1277,23 @@ static struct page *alloc_page_interleav
return page;
}
+/*
+ * alloc_page_pol() -- allocate a page based on policy,offset.
+ * Used for mmap()ed file policy allocations where policy is based
+ * on file offset rather than a vma,addr pair
+ */
+struct page *alloc_page_pol(gfp_t gfp, struct mempolicy *pol, pgoff_t pgoff)
+{
+ if (unlikely(pol->policy == MPOL_INTERLEAVE)) {
+ unsigned nid;
+
+ nid = offset_il_node(pol, pgoff);
+ return alloc_page_interleave(gfp, 0, nid);
+ }
+ return __alloc_pages(gfp, 0, zonelist_policy(gfp, pol));
+}
+EXPORT_SYMBOL(alloc_page_pol);
+
/**
* alloc_page_vma - Allocate a page for a VMA.
*
@@ -1304,16 +1320,15 @@ struct page *
alloc_page_vma(gfp_t gfp, struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol = get_vma_policy(current, vma, addr);
+ pgoff_t pgoff = 0;
cpuset_update_task_memory_state();
- if (unlikely(pol->policy == MPOL_INTERLEAVE)) {
- unsigned nid;
-
- nid = interleave_nid(pol, vma, addr, PAGE_SHIFT);
- return alloc_page_interleave(gfp, 0, nid);
+ if (likely(vma)) {
+ int shift = vma_page_shift(vma);
+ pgoff = vma_addr_to_pgoff(vma, addr, shift);
}
- return __alloc_pages(gfp, 0, zonelist_policy(gfp, pol));
+ return alloc_page_pol(gfp, pol, pgoff);
}
/**
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-25 19:53 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-25 19:52 [PATCH/RFC 0/11] Shared Policy Overview Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 1/11] Shared Policy: move shared policy to inode/mapping Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 2/11] Shared Policy: allocate shared policies as needed Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 3/11] Shared Policy: let vma policy ops handle sub-vma policies Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 4/11] Shared Policy: fix show_numa_maps() Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 5/11] Shared Policy: Add hugepage shmem policy vm_ops Lee Schermerhorn
2007-06-25 19:53 ` Lee Schermerhorn [this message]
2007-06-25 19:53 ` [PATCH/RFC 7/11] Shared Policy: use shared policy for page cache allocations Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 8/11] Shared Policy: fix migration of private mappings Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 9/11] Shared Policy: mapped file policy persistence model Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 10/11] Shared Policy: per cpuset shared file policy control Lee Schermerhorn
2007-06-25 21:10 ` Paul Jackson
2007-06-27 17:33 ` Lee Schermerhorn
2007-06-27 19:52 ` Paul Jackson
2007-06-27 20:22 ` Lee Schermerhorn
2007-06-27 20:36 ` Paul Jackson
2007-06-25 19:53 ` [PATCH/RFC 11/11] Shared Policy: add generic file set/get policy vm ops Lee Schermerhorn
2007-06-26 22:17 ` [PATCH/RFC 0/11] Shared Policy Overview Christoph Lameter
2007-06-27 13:43 ` Lee Schermerhorn
2007-06-26 22:21 ` Christoph Lameter
2007-06-26 22:42 ` Andi Kleen
2007-06-27 3:25 ` Christoph Lameter
2007-06-27 20:14 ` Lee Schermerhorn
2007-06-27 18:14 ` Lee Schermerhorn
2007-06-27 21:37 ` Christoph Lameter
2007-06-27 22:01 ` Andi Kleen
2007-06-27 22:08 ` Christoph Lameter
2007-06-27 23:46 ` Paul E. McKenney
2007-06-28 0:14 ` Andi Kleen
2007-06-29 21:47 ` Lee Schermerhorn
2007-06-28 13:42 ` Lee Schermerhorn
2007-06-28 22:02 ` Andi Kleen
2007-06-29 17:14 ` Lee Schermerhorn
2007-06-29 17:42 ` Andi Kleen
2007-06-30 18:34 ` [PATCH/RFC] Fix Mempolicy Ref Counts - was " Lee Schermerhorn
2007-07-03 18:09 ` Christoph Lameter
2007-06-29 1:39 ` Christoph Lameter
2007-06-29 9:01 ` Andi Kleen
2007-06-29 14:05 ` Christoph Lameter
2007-06-29 17:41 ` Lee Schermerhorn
2007-06-29 20:15 ` Christoph Lameter
2007-06-29 13:22 ` Lee Schermerhorn
2007-06-29 14:18 ` Christoph Lameter
2007-06-27 23:36 ` Lee Schermerhorn
2007-06-29 1:41 ` Christoph Lameter
2007-06-29 13:30 ` Lee Schermerhorn
2007-06-29 14:20 ` Andi Kleen
2007-06-29 21:40 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070625195305.21210.37682.sendpatchset@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=nacc@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox