linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: rientjes@google.com
Cc: drepper@gmail.com, anatol.pomozov@gmail.com, jkosina@suse.cz,
	akpm@linux-foundation.org, xemul@parallels.com,
	paul.gortmaker@windriver.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [PATCH] drivers/base/node.c: export physical address range of given node (Re: NUMA node information for pages)
Date: Thu, 10 Apr 2014 21:35:51 -0400	[thread overview]
Message-ID: <5347e2f3.c214e00a.02fa.ffff8c61SMTPIN_ADDED_BROKEN@mx.google.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1404101500280.11995@chino.kir.corp.google.com>

(CC:ed linux-mm)

On Thu, Apr 10, 2014 at 03:06:49PM -0700, David Rientjes wrote:
> On Wed, 9 Apr 2014, Naoya Horiguchi wrote:
> 
> > >  [ And that block_size_bytes file is absolutely horrid, why are we
> > >    exporting all this information in hex and not telling anybody? ]
> > 
> > Indeed, this kind of implicit hex numbers are commonly used in many place.
> > I guess that it's maybe for historical reasons.
> > 
> 
> I think it was meant to be simple to that you could easily add the length 
> to the start, but it should at least prefix this with `0x'.  That code has 
> been around for years, though, so we probably can't fix it now.
> 
> > > I'd much prefer a single change that works for everybody and userspace can 
> > > rely on exporting accurate information as long as sysfs is mounted, and 
> > > not even need to rely on getpagesize() to convert from pfn to physical 
> > > address: just simple {start,end}_phys_addr files added to 
> > > /sys/devices/system/node/nodeN/ for node N.  Online information can 
> > > already be parsed for these ranges from /sys/devices/system/node/online.
> > 
> > OK, so what if some node has multiple address ranges?  I don't think that
> > start(end)_phys_addr simply returns minimum (maximum) possible address is optimal,
> > because users can't know about void range between valid address ranges
> > (non-exist pfn should not belong to any node).
> > Are printing multilined (or comma-separated) ranges preferable for example
> > like below?
> > 
> >   $ cat /sys/devices/system/node/nodeN/phys_addr
> >   0x0-0x80000000
> >   0x100000000-0x180000000
> > 
> 
> What the...?  nodeN should represent the pgdat for that node and a pgdat 
> can only have a single range.

Yes, that's right, but it seems to me that just node_start_pfn and node_end_pfn
is not enough because there can be holes (without any page struct backed) inside
[node_start_pfn, node_end_pfn), and it's not aware of memory hotplug.

> I'm suggesting that 
> /sys/devices/system/node/nodeN/start_phys_addr returns 
> node_start_pfn(N) << PAGE_SHIFT and 
> /sys/devices/system/node/nodeN/end_phys_addr returns
> node_end_pfn(N) << PAGE_SHIFT and prefix them correctly this time with 
> `0x'.

Yes, we should have '0x' prefix for new interfaces.

I wrote a patch, so could you take a look?

I tested !CONFIG_MEMORY_HOTPLUG_SPARSE case too, and it works fine.

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Thu, 10 Apr 2014 20:59:59 -0400
Subject: [PATCH] drivers/base/node.c: export physical address range of given
 node

Userspace applications sometimes want to know which node a give page belongs to,
so this patch adds a new interface /sys/devices/system/node/nodeN/phys_addr
which gives the physical address ranges of node N. A reader gets multilined
physical address ranges backed by valid pfn (neither holes within the zone nor
offlined range are not included in the output.)

Here is the example.

  # cat /sys/devices/system/node/node0/phys_addr
  0x1000-0x40000000
  # cat /sys/devices/system/node/node3/phys_addr
  0xc0000000-0xe0000000
  0x100000000-0x120000000
  # echo offline > /sys/devices/system/memory/memory33/state
  # cat /sys/devices/system/node/node3/phys_addr
  0xc0000000-0xe0000000
  0x100000000-0x108000000
  0x110000000-0x120000000

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 drivers/base/node.c    | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/memory.h |  4 ++++
 include/linux/mmzone.h |  1 +
 3 files changed, 65 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index bc9f43bf7e29..21e87909ec4f 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -18,6 +18,7 @@
 #include <linux/device.h>
 #include <linux/swap.h>
 #include <linux/slab.h>
+#include <linux/memblock.h>
 
 static struct bus_type node_subsys = {
 	.name = "node",
@@ -185,6 +186,64 @@ static ssize_t node_read_vmstat(struct device *dev,
 }
 static DEVICE_ATTR(vmstat, S_IRUGO, node_read_vmstat, NULL);
 
+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+static bool valid_mem_section(unsigned long sec_nr)
+{
+	struct memory_block *memblk;
+	if (!present_section_nr(sec_nr))
+		return false;
+	memblk = find_memory_block(__nr_to_section(sec_nr));
+	if (!memblk || is_memblock_offlined(memblk))
+		return false;
+	return true;
+}
+#endif
+
+static ssize_t node_read_phys_addr(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct pglist_data *pgdat = NODE_DATA(dev->id);
+	unsigned long start, end;
+	int n = 0;
+	unsigned long pfn;
+	unsigned long rstart = 0, rend = 0; /* range's start and end */
+
+	start = pgdat->node_start_pfn;
+	end = pgdat_end_pfn(pgdat);
+	pfn = start;
+	while (pfn < end) {
+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+		if (unlikely(!valid_mem_section(pfn_to_section_nr(pfn)))) {
+			if (rstart) {
+				n += sprintf(buf + n, "0x%llx-0x%llx\n",
+					     PFN_PHYS(rstart), PFN_PHYS(rend));
+				rstart = 0;
+				rend = 0;
+			}
+			pfn = SECTION_NEXT_BOUNDARY(pfn);
+			continue;
+		}
+#endif
+		if (pfn_valid(pfn)) {
+			if (!rstart)
+				rstart = pfn;
+			rend = pfn + 1;
+		} else {
+			if (rstart)
+				n += sprintf(buf + n, "0x%llx-0x%llx\n",
+					     PFN_PHYS(rstart), PFN_PHYS(rend));
+			rstart = 0;
+			rend = 0;
+		}
+		pfn++;
+	}
+	if (rstart)
+		n += sprintf(buf + n, "0x%llx-0x%llx\n",
+			     PFN_PHYS(rstart), PFN_PHYS(rend));
+	return n;
+}
+static DEVICE_ATTR(phys_addr, S_IRUGO, node_read_phys_addr, NULL);
+
 static ssize_t node_read_distance(struct device *dev,
 			struct device_attribute *attr, char * buf)
 {
@@ -288,6 +347,7 @@ static int register_node(struct node *node, int num, struct node *parent)
 		device_create_file(&node->dev, &dev_attr_numastat);
 		device_create_file(&node->dev, &dev_attr_distance);
 		device_create_file(&node->dev, &dev_attr_vmstat);
+		device_create_file(&node->dev, &dev_attr_phys_addr);
 
 		scan_unevictable_register_node(node);
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index bb7384e3c3d8..b68e14f8818c 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -103,6 +103,10 @@ static inline int memory_isolate_notify(unsigned long val, void *v)
 {
 	return 0;
 }
+static inline struct memory_block *find_memory_block(struct mem_section *ms)
+{
+	return NULL;
+}
 #else
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9b61b9bf81ac..138a7a6684ab 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1084,6 +1084,7 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
 
 #define SECTION_ALIGN_UP(pfn)	(((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
 #define SECTION_ALIGN_DOWN(pfn)	((pfn) & PAGE_SECTION_MASK)
+#define SECTION_NEXT_BOUNDARY(pfn)	(SECTION_ALIGN_DOWN(pfn) + PAGES_PER_SECTION)
 
 struct page;
 struct page_cgroup;
-- 
1.9.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

       reply	other threads:[~2014-04-11 12:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87eh1ix7g0.fsf@x240.local.i-did-not-set--mail-host-address--so-tickle-me>
     [not found] ` <533a1563.ad318c0a.6a93.182bSMTPIN_ADDED_BROKEN@mx.google.com>
     [not found]   ` <CAOPLpQc8R2SfTB+=BsMa09tcQ-iBNJHg+tGnPK-9EDH1M47MJw@mail.gmail.com>
     [not found]     ` <5343806c.100cc30a.0461.ffffc401SMTPIN_ADDED_BROKEN@mx.google.com>
     [not found]       ` <alpine.DEB.2.02.1404091734060.1857@chino.kir.corp.google.com>
     [not found]         ` <5345fe27.82dab40a.0831.0af9SMTPIN_ADDED_BROKEN@mx.google.com>
     [not found]           ` <alpine.DEB.2.02.1404101500280.11995@chino.kir.corp.google.com>
2014-04-11  1:35             ` Naoya Horiguchi [this message]
2014-04-11  3:03             ` Naoya Horiguchi
     [not found]             ` <53474709.e59ec20a.3bd5.3b91SMTPIN_ADDED_BROKEN@mx.google.com>
2014-04-11 11:00               ` David Rientjes
2014-04-11 11:43                 ` Naoya Horiguchi
2014-04-11 16:24                 ` Dave Hansen
2014-04-11 22:13                   ` David Rientjes
2014-04-11 22:53                     ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5347e2f3.c214e00a.02fa.ffff8c61SMTPIN_ADDED_BROKEN@mx.google.com \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=anatol.pomozov@gmail.com \
    --cc=drepper@gmail.com \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=rientjes@google.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox