The NUMA memory allocation support attempts to allocate pages close to
the CPUs that it is currently running on.  We have a hard time
determining how effective these strategies have been, or how fragmented
the allocations might get if a process is bounced around between nodes.
 This patch adds a new /proc/<pid> entry: nodepages.

It walks the process's vm_area_structs for all vaddr ranges, then
examines the ptes to determine on which node each virtual address
physically resides.

I'm a little worried about just taking the pte from __follow_page() and
dumping it into pte_pfn().  Is there something I should be testing for,
before I feed it along?

I've tested it on both NUMA and non-NUMA systems (see the pfn_to_nid()
changes).  The below are from a 4-quad 16-proc NUMAQ.

This is a process that allocates, then faults in a 256MB chunk of
memory, bound to CPU 4 (node 1).
curly:~# cat /proc/378/nodepages
Node 0 pages: 369
Node 1 pages: 65571
Node 2 pages: 0
Node 3 pages: 0

Here is the same thing, bound to CPU12 (node 3), probably forked on node
1, before it was bound.
Node 0 pages: 369
Node 1 pages: 2
Node 2 pages: 0
Node 3 pages: 65569

I would imagine that the pages on node 0 are from libc, which was
originally mapped on node 0.  The other processes inherit this.
-- 
Dave Hansen
haveblue@us.ibm.com