linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/3] add dev_to_node()
@ 2006-10-30 14:15 Christoph Hellwig
  2006-10-30 22:33 ` David Miller, Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2006-10-30 14:15 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-mm

Davem suggested to get the node-affinity information directly from
struct device instead of having the caller extreact it from the
pci_dev.  This patch adds dev_to_node() to the topology API for that.
The implementation is rather ugly as we need to compare the bus
operations which we can't do inline in a header without pulling all
kinds of mess in.

Thus provide an out of line dev_to_node for ppc and let everyone else
use the dummy variant in asm-generic.h for now.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/include/asm-generic/topology.h
===================================================================
--- linux-2.6.orig/include/asm-generic/topology.h	2006-10-10 14:53:52.000000000 +0200
+++ linux-2.6/include/asm-generic/topology.h	2006-10-30 13:42:22.000000000 +0100
@@ -45,11 +45,14 @@
 #define pcibus_to_node(node)	(-1)
 #endif
 
+#ifndef dev_to_node
+#define dev_to_node(dev)	(-1)
+#endif
+
 #ifndef pcibus_to_cpumask
 #define pcibus_to_cpumask(bus)	(pcibus_to_node(bus) == -1 ? \
 					CPU_MASK_ALL : \
 					node_to_cpumask(pcibus_to_node(bus)) \
 				)
 #endif
-
 #endif /* _ASM_GENERIC_TOPOLOGY_H */
Index: linux-2.6/include/asm-powerpc/topology.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/topology.h	2006-10-10 14:53:52.000000000 +0200
+++ linux-2.6/include/asm-powerpc/topology.h	2006-10-30 14:03:44.000000000 +0100
@@ -5,6 +5,7 @@
 
 struct sys_device;
 struct device_node;
+struct device;
 
 #ifdef CONFIG_NUMA
 
@@ -33,6 +34,7 @@
 
 struct pci_bus;
 extern int pcibus_to_node(struct pci_bus *bus);
+int dev_to_node(struct device *dev);
 
 #define pcibus_to_cpumask(bus)	(pcibus_to_node(bus) == -1 ? \
 					CPU_MASK_ALL : \
Index: linux-2.6/arch/powerpc/kernel/pci_64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/pci_64.c	2006-10-23 17:21:43.000000000 +0200
+++ linux-2.6/arch/powerpc/kernel/pci_64.c	2006-10-30 14:02:40.000000000 +0100
@@ -1424,4 +1424,12 @@
 	return phb->node;
 }
 EXPORT_SYMBOL(pcibus_to_node);
+
+int dev_to_node(struct device *dev)
+{
+	if (dev->bus == &pci_bus_type)
+		return pcibus_to_node(to_pci_dev(dev)->bus);
+	return -1;
+}
+EXPORT_SYMBOL(dev_to_node);
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-10-30 14:15 [PATCH 2/3] add dev_to_node() Christoph Hellwig
@ 2006-10-30 22:33 ` David Miller, Christoph Hellwig
  2006-11-01  0:10   ` Christoph Lameter
  2006-11-04 22:56   ` Christoph Hellwig
  0 siblings, 2 replies; 17+ messages in thread
From: David Miller, Christoph Hellwig @ 2006-10-30 22:33 UTC (permalink / raw)
  To: hch; +Cc: linux-kernel, netdev, linux-mm

> Davem suggested to get the node-affinity information directly from
> struct device instead of having the caller extreact it from the
> pci_dev.  This patch adds dev_to_node() to the topology API for that.
> The implementation is rather ugly as we need to compare the bus
> operations which we can't do inline in a header without pulling all
> kinds of mess in.
> 
> Thus provide an out of line dev_to_node for ppc and let everyone else
> use the dummy variant in asm-generic.h for now.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

It may be a bit much to be calling all the way through up to the PCI
layer just to pluck out a simple integer, don't you think?  The PCI
bus pointer comparison is just a symptom of how silly this is.

Especially since this will be used for every packet allocation a
device makes.

So, please add some sanity to this situation and just put the node
into the generic struct device. :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-10-30 22:33 ` David Miller, Christoph Hellwig
@ 2006-11-01  0:10   ` Christoph Lameter
  2006-11-01  0:53     ` David Miller, Christoph Lameter
  2006-11-04 22:56   ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Lameter @ 2006-11-01  0:10 UTC (permalink / raw)
  To: David Miller, Christoph Hellwig; +Cc: linux-kernel, netdev, linux-mm

On Mon, 30 Oct 2006, David Miller wrote:

> So, please add some sanity to this situation and just put the node
> into the generic struct device. :-)

Good. Then we can remove the node from the pci structure and get rid of 
pcibus_to_node?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-01  0:10   ` Christoph Lameter
@ 2006-11-01  0:53     ` David Miller, Christoph Lameter
  2006-11-01  1:58       ` Christoph Lameter
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller, Christoph Lameter @ 2006-11-01  0:53 UTC (permalink / raw)
  To: clameter; +Cc: hch, linux-kernel, netdev, linux-mm

> On Mon, 30 Oct 2006, David Miller wrote:
> 
> > So, please add some sanity to this situation and just put the node
> > into the generic struct device. :-)
> 
> Good. Then we can remove the node from the pci structure and get rid of 
> pcibus_to_node?

Yes, that's possible, because the idea is that the arch specific
bus layer code would initialize the node value.  Therefore, there
would be no need for things like pcibus_to_node() any longer.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-01  0:53     ` David Miller, Christoph Lameter
@ 2006-11-01  1:58       ` Christoph Lameter
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Lameter @ 2006-11-01  1:58 UTC (permalink / raw)
  To: David Miller; +Cc: hch, linux-kernel, netdev, linux-mm

On Tue, 31 Oct 2006, David Miller wrote:

> Yes, that's possible, because the idea is that the arch specific
> bus layer code would initialize the node value.  Therefore, there
> would be no need for things like pcibus_to_node() any longer.

Then lets rename pcibus_to_node to dev_to_node() throughout the kernel. 
Provide a -1 default. Then other device layers that are not based on pci 
will also be able to exploit NUMA locality.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-10-30 22:33 ` David Miller, Christoph Hellwig
  2006-11-01  0:10   ` Christoph Lameter
@ 2006-11-04 22:56   ` Christoph Hellwig
  2006-11-04 23:06     ` Dave Jones
  2006-11-08  2:40     ` KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2006-11-04 22:56 UTC (permalink / raw)
  To: David Miller; +Cc: hch, linux-kernel, netdev, linux-mm

On Mon, Oct 30, 2006 at 02:33:57PM -0800, David Miller wrote:
> It may be a bit much to be calling all the way through up to the PCI
> layer just to pluck out a simple integer, don't you think?  The PCI
> bus pointer comparison is just a symptom of how silly this is.
> 
> Especially since this will be used for every packet allocation a
> device makes.
> 
> So, please add some sanity to this situation and just put the node
> into the generic struct device. :-)

I was concerned about growing struct device, on smaller system it
already eats up a lot of memory.  But we can make the node member
conditional on CONFIG_NUMA, as I did in the patch below.

This directly replaces PATCH 2/2 (the one we're replying to), all
others remain unmodified.


Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h	2006-10-29 16:02:38.000000000 +0100
+++ linux-2.6/include/linux/device.h	2006-11-02 12:47:17.000000000 +0100
@@ -347,6 +347,9 @@
 					   BIOS data),reserved for device core*/
 	struct dev_pm_info	power;
 
+#ifdef CONFIG_NUMA
+	int		numa_node;	/* NUMA node this device is close to */
+#endif
 	u64		*dma_mask;	/* dma mask (if dma'able device) */
 	u64		coherent_dma_mask;/* Like dma_mask, but for
 					     alloc_coherent mappings as
@@ -368,6 +371,12 @@
 	void	(*release)(struct device * dev);
 };
 
+#ifdef CONFIG_NUMA
+#define dev_to_node(dev)	((dev)->numa_node)
+#else
+#define dev_to_node(dev)	(-1)
+#endif
+
 static inline void *
 dev_get_drvdata (struct device *dev)
 {
Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c	2006-10-23 17:21:44.000000000 +0200
+++ linux-2.6/drivers/base/core.c	2006-11-02 12:48:12.000000000 +0100
@@ -381,6 +381,7 @@
 	INIT_LIST_HEAD(&dev->node);
 	init_MUTEX(&dev->sem);
 	device_init_wakeup(dev, 0);
+	dev->numa_node = -1;
 }
 
 /**
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c	2006-10-23 17:21:46.000000000 +0200
+++ linux-2.6/drivers/pci/probe.c	2006-11-02 12:47:35.000000000 +0100
@@ -846,6 +846,7 @@
 	dev->dev.release = pci_release_dev;
 	pci_dev_get(dev);
 
+	dev->dev.numa_node = pcibus_to_node(bus);
 	dev->dev.dma_mask = &dev->dma_mask;
 	dev->dev.coherent_dma_mask = 0xffffffffull;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-04 22:56   ` Christoph Hellwig
@ 2006-11-04 23:06     ` Dave Jones
  2006-11-04 23:09       ` Christoph Hellwig
  2006-11-04 23:53       ` Christoph Hellwig
  2006-11-08  2:40     ` KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 17+ messages in thread
From: Dave Jones @ 2006-11-04 23:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David Miller, linux-kernel, netdev, linux-mm

On Sat, Nov 04, 2006 at 11:56:29PM +0100, Christoph Hellwig wrote:

This will break the compile for !NUMA if someone ends up doing a bisect
and lands here as a bisect point.

You introduce this nice wrapper..

 > +#ifdef CONFIG_NUMA
 > +#define dev_to_node(dev)	((dev)->numa_node)
 > +#else
 > +#define dev_to_node(dev)	(-1)
 > +#endif
 > +
 >  static inline void *
 >  dev_get_drvdata (struct device *dev)
 >  {


And then don't use it here..

 > Index: linux-2.6/drivers/base/core.c
 > ===================================================================
 > --- linux-2.6.orig/drivers/base/core.c	2006-10-23 17:21:44.000000000 +0200
 > +++ linux-2.6/drivers/base/core.c	2006-11-02 12:48:12.000000000 +0100
 > @@ -381,6 +381,7 @@
 >  	INIT_LIST_HEAD(&dev->node);
 >  	init_MUTEX(&dev->sem);
 >  	device_init_wakeup(dev, 0);
 > +	dev->numa_node = -1;
 >  }
 >  
 >  /**

and here.

 > Index: linux-2.6/drivers/pci/probe.c
 > ===================================================================
 > --- linux-2.6.orig/drivers/pci/probe.c	2006-10-23 17:21:46.000000000 +0200
 > +++ linux-2.6/drivers/pci/probe.c	2006-11-02 12:47:35.000000000 +0100
 > @@ -846,6 +846,7 @@
 >  	dev->dev.release = pci_release_dev;
 >  	pci_dev_get(dev);
 >  
 > +	dev->dev.numa_node = pcibus_to_node(bus);
 >  	dev->dev.dma_mask = &dev->dma_mask;
 >  	dev->dev.coherent_dma_mask = 0xffffffffull;


	Dave


-- 
http://www.codemonkey.org.uk

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-04 23:06     ` Dave Jones
@ 2006-11-04 23:09       ` Christoph Hellwig
  2006-11-04 23:53       ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2006-11-04 23:09 UTC (permalink / raw)
  To: Dave Jones, Christoph Hellwig, David Miller, linux-kernel,
	netdev, linux-mm

On Sat, Nov 04, 2006 at 06:06:48PM -0500, Dave Jones wrote:
> On Sat, Nov 04, 2006 at 11:56:29PM +0100, Christoph Hellwig wrote:
> 
> This will break the compile for !NUMA if someone ends up doing a bisect
> and lands here as a bisect point.
> 
> You introduce this nice wrapper..

Yes, I'm stupid :)  Updated version will follow ASAP.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-04 23:06     ` Dave Jones
  2006-11-04 23:09       ` Christoph Hellwig
@ 2006-11-04 23:53       ` Christoph Hellwig
  2006-11-05  8:22         ` David Miller, Christoph Hellwig
  2006-11-07  6:25         ` Ravikiran G Thirumalai
  1 sibling, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2006-11-04 23:53 UTC (permalink / raw)
  To: Dave Jones, Christoph Hellwig, David Miller, linux-kernel,
	netdev, linux-mm

On Sat, Nov 04, 2006 at 06:06:48PM -0500, Dave Jones wrote:
> On Sat, Nov 04, 2006 at 11:56:29PM +0100, Christoph Hellwig wrote:
> 
> This will break the compile for !NUMA if someone ends up doing a bisect
> and lands here as a bisect point.
> 
> You introduce this nice wrapper..

The dev_to_node wrapper is not enough as we can't assign to (-1) for
the non-NUMA case.  So I added a second macro, set_dev_node for that.

The patch below compiles and works on numa and non-NUMA platforms.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h	2006-11-05 00:16:09.000000000 +0100
+++ linux-2.6/include/linux/device.h	2006-11-05 00:39:22.000000000 +0100
@@ -347,6 +347,9 @@
 					   BIOS data),reserved for device core*/
 	struct dev_pm_info	power;
 
+#ifdef CONFIG_NUMA
+	int		numa_node;	/* NUMA node this device is close to */
+#endif
 	u64		*dma_mask;	/* dma mask (if dma'able device) */
 	u64		coherent_dma_mask;/* Like dma_mask, but for
 					     alloc_coherent mappings as
@@ -368,6 +371,14 @@
 	void	(*release)(struct device * dev);
 };
 
+#ifdef CONFIG_NUMA
+#define dev_to_node(dev)	((dev)->numa_node)
+#define set_dev_node(dev, node)	((dev)->numa_node = node)
+#else
+#define dev_to_node(dev)	(-1)
+#define set_dev_node(dev, node)	do { } while (0)
+#endif
+
 static inline void *
 dev_get_drvdata (struct device *dev)
 {
Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c	2006-11-05 00:16:09.000000000 +0100
+++ linux-2.6/drivers/base/core.c	2006-11-05 00:40:01.000000000 +0100
@@ -381,6 +381,7 @@
 	INIT_LIST_HEAD(&dev->node);
 	init_MUTEX(&dev->sem);
 	device_init_wakeup(dev, 0);
+	set_dev_node(dev, -1);
 }
 
 /**
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c	2006-11-05 00:16:09.000000000 +0100
+++ linux-2.6/drivers/pci/probe.c	2006-11-05 00:39:55.000000000 +0100
@@ -846,6 +846,7 @@
 	dev->dev.release = pci_release_dev;
 	pci_dev_get(dev);
 
+	set_dev_node(&dev->dev, pcibus_to_node(bus));
 	dev->dev.dma_mask = &dev->dma_mask;
 	dev->dev.coherent_dma_mask = 0xffffffffull;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-04 23:53       ` Christoph Hellwig
@ 2006-11-05  8:22         ` David Miller, Christoph Hellwig
  2006-11-06 23:39           ` Christoph Hellwig
  2006-11-07  6:25         ` Ravikiran G Thirumalai
  1 sibling, 1 reply; 17+ messages in thread
From: David Miller, Christoph Hellwig @ 2006-11-05  8:22 UTC (permalink / raw)
  To: hch; +Cc: davej, linux-kernel, netdev, linux-mm

> On Sat, Nov 04, 2006 at 06:06:48PM -0500, Dave Jones wrote:
> > On Sat, Nov 04, 2006 at 11:56:29PM +0100, Christoph Hellwig wrote:
> > 
> > This will break the compile for !NUMA if someone ends up doing a bisect
> > and lands here as a bisect point.
> > 
> > You introduce this nice wrapper..
> 
> The dev_to_node wrapper is not enough as we can't assign to (-1) for
> the non-NUMA case.  So I added a second macro, set_dev_node for that.
> 
> The patch below compiles and works on numa and non-NUMA platforms.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-05  8:22         ` David Miller, Christoph Hellwig
@ 2006-11-06 23:39           ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2006-11-06 23:39 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, netdev, linux-mm

On Sun, Nov 05, 2006 at 12:22:37AM -0800, David Miller wrote:
> Looks good to me.

So what's the right path to get this in?  There's one patch touching
MM code, one adding something to the driver core and then finally a
networking patch depending on the previous two.  Do you want to take
them all and send them in through the networking tree?  Or should
we put the burden on Andrew?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-04 23:53       ` Christoph Hellwig
  2006-11-05  8:22         ` David Miller, Christoph Hellwig
@ 2006-11-07  6:25         ` Ravikiran G Thirumalai
  2006-11-07 10:15           ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Ravikiran G Thirumalai @ 2006-11-07  6:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Jones, David Miller, linux-kernel, netdev, linux-mm,
	Benzi Galili (Benzi@ScaleMP.com),
	Shai Fultheim (Shai@scalex86.org)

On Sun, Nov 05, 2006 at 12:53:23AM +0100, Christoph Hellwig wrote:
> On Sat, Nov 04, 2006 at 06:06:48PM -0500, Dave Jones wrote:
> > On Sat, Nov 04, 2006 at 11:56:29PM +0100, Christoph Hellwig wrote:
> > 
> > This will break the compile for !NUMA if someone ends up doing a bisect
> > and lands here as a bisect point.
> > 
> > You introduce this nice wrapper..
> 
> The dev_to_node wrapper is not enough as we can't assign to (-1) for
> the non-NUMA case.  So I added a second macro, set_dev_node for that.
> 
> The patch below compiles and works on numa and non-NUMA platforms.
> 
> 

Hi Christoph,
dev_to_node does not work as expected on x86_64 (and i386).  This is because
node value returned by pcibus_to_node is initialized after a struct device
is created with current x86_64 code.

We need the node value initialized before the call to pci_scan_bus_parented,
as the generic devices are allocated and initialized
off pci_scan_child_bus, which gets called from pci_scan_bus_parented
The following patch does that using "pci_sysdata" introduced by the PCI
domain patches in -mm.

Signed-off-by: Alok N Kataria <alok.kataria@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.19-rc4mm2/arch/i386/pci/acpi.c
===================================================================
--- linux-2.6.19-rc4mm2.orig/arch/i386/pci/acpi.c	2006-11-06 11:03:50.000000000 -0800
+++ linux-2.6.19-rc4mm2/arch/i386/pci/acpi.c	2006-11-06 22:04:14.000000000 -0800
@@ -9,6 +9,7 @@ struct pci_bus * __devinit pci_acpi_scan
 {
 	struct pci_bus *bus;
 	struct pci_sysdata *sd;
+	int pxm;
 
 	/* Allocate per-root-bus (not per bus) arch-specific data.
 	 * TODO: leak; this memory is never freed.
@@ -30,15 +31,21 @@ struct pci_bus * __devinit pci_acpi_scan
 	}
 #endif /* CONFIG_PCI_DOMAINS */
 
+	sd->node = -1;
+
+	pxm = acpi_get_pxm(device->handle);
+#ifdef CONFIG_ACPI_NUMA
+	if (pxm >= 0)
+		sd->node = pxm_to_node(pxm);
+#endif
+
 	bus = pci_scan_bus_parented(NULL, busnum, &pci_root_ops, sd);
 	if (!bus)
 		kfree(sd);
 
 #ifdef CONFIG_ACPI_NUMA
 	if (bus != NULL) {
-		int pxm = acpi_get_pxm(device->handle);
 		if (pxm >= 0) {
-			sd->node = pxm_to_node(pxm);
 			printk("bus %d -> pxm %d -> node %d\n",
 				busnum, pxm, sd->node);
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-07  6:25         ` Ravikiran G Thirumalai
@ 2006-11-07 10:15           ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2006-11-07 10:15 UTC (permalink / raw)
  To: Ravikiran G Thirumalai
  Cc: Christoph Hellwig, Dave Jones, David Miller, linux-kernel,
	netdev, linux-mm, Benzi Galili (Benzi@ScaleMP.com),
	Shai Fultheim (Shai@scalex86.org)

On Mon, Nov 06, 2006 at 10:25:36PM -0800, Ravikiran G Thirumalai wrote:
> On Sun, Nov 05, 2006 at 12:53:23AM +0100, Christoph Hellwig wrote:
> > On Sat, Nov 04, 2006 at 06:06:48PM -0500, Dave Jones wrote:
> > > On Sat, Nov 04, 2006 at 11:56:29PM +0100, Christoph Hellwig wrote:
> > > 
> > > This will break the compile for !NUMA if someone ends up doing a bisect
> > > and lands here as a bisect point.
> > > 
> > > You introduce this nice wrapper..
> > 
> > The dev_to_node wrapper is not enough as we can't assign to (-1) for
> > the non-NUMA case.  So I added a second macro, set_dev_node for that.
> > 
> > The patch below compiles and works on numa and non-NUMA platforms.
> > 
> > 
> 
> Hi Christoph,
> dev_to_node does not work as expected on x86_64 (and i386).  This is because
> node value returned by pcibus_to_node is initialized after a struct device
> is created with current x86_64 code.
> 
> We need the node value initialized before the call to pci_scan_bus_parented,
> as the generic devices are allocated and initialized
> off pci_scan_child_bus, which gets called from pci_scan_bus_parented
> The following patch does that using "pci_sysdata" introduced by the PCI
> domain patches in -mm.

A nice, that some non-cell folks actually care for this patch.  As far
as my x86_64 pci code knowledge is concerned that patch look fine to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-04 22:56   ` Christoph Hellwig
  2006-11-04 23:06     ` Dave Jones
@ 2006-11-08  2:40     ` KAMEZAWA Hiroyuki
  2006-11-10 18:16       ` Christoph Lameter
  1 sibling, 1 reply; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-11-08  2:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: davem, linux-kernel, netdev, linux-mm

Hi, I have a question.

On Sat, 4 Nov 2006 23:56:29 +0100
Christoph Hellwig <hch@lst.de> wrote:
> Index: linux-2.6/include/linux/device.h
> ===================================================================
> --- linux-2.6.orig/include/linux/device.h	2006-10-29 16:02:38.000000000 +0100
> +++ linux-2.6/include/linux/device.h	2006-11-02 12:47:17.000000000 +0100
> @@ -347,6 +347,9 @@
>  					   BIOS data),reserved for device core*/
>  	struct dev_pm_info	power;
>  
> +#ifdef CONFIG_NUMA
> +	int		numa_node;	/* NUMA node this device is close to */
> +#endif

> +	dev->dev.numa_node = pcibus_to_node(bus);

Does this "node" is guaranteed to be online ?

if node is not online, NODE_DATA(node) is NULL or not initialized.
Then, alloc_pages_node() at el. will panic.

I wonder there are no code for creating NODE_DATA() for device-only-node.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-08  2:40     ` KAMEZAWA Hiroyuki
@ 2006-11-10 18:16       ` Christoph Lameter
  2006-11-10 18:28         ` Lee Schermerhorn
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Lameter @ 2006-11-10 18:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Hellwig, davem, linux-kernel, netdev, linux-mm

On Wed, 8 Nov 2006, KAMEZAWA Hiroyuki wrote:

> I wonder there are no code for creating NODE_DATA() for device-only-node.

On IA64 we remap nodes with no memory / cpus to the nearest node with 
memory. I think that is sufficient.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-10 18:16       ` Christoph Lameter
@ 2006-11-10 18:28         ` Lee Schermerhorn
  2006-11-11  0:08           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 17+ messages in thread
From: Lee Schermerhorn @ 2006-11-10 18:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KAMEZAWA Hiroyuki, Christoph Hellwig, davem, linux-kernel,
	netdev, linux-mm

On Fri, 2006-11-10 at 10:16 -0800, Christoph Lameter wrote:
> On Wed, 8 Nov 2006, KAMEZAWA Hiroyuki wrote:
> 
> > I wonder there are no code for creating NODE_DATA() for device-only-node.
> 
> On IA64 we remap nodes with no memory / cpus to the nearest node with 
> memory. I think that is sufficient.

I don't think this happens anymore.  Back in the ~2.6.5 days, when we
would configure our numa platforms with 100% of memory interleaved [in
hardware at  cache line granularity], the cpus would move to the
interleaved "pseudo-node" and the memoryless nodes would be removed.
numactl --hardware would show something like this:

# uname -r
2.6.5-7.244-default
# numactl --hardware
available: 1 nodes (0-0)
node 0 size: 65443 MB
node 0 free: 64506 MB

I started seeing different behavior about the time SPARSEMEM went in.
Now, with a 2.6.16 base kernel [same platform, hardware interleaved
memory], I see:

# uname -r# numactl --hardware
available: 5 nodes (0-4)
node 0 size: 0 MB
node 0 free: 0 MB
node 1 size: 0 MB
node 1 free: 0 MB
node 2 size: 0 MB
node 2 free: 0 MB
node 3 size: 0 MB
node 3 free: 0 MB
node 4 size: 65439 MB
node 4 free: 64492 MB
node distances:
node   0   1   2   3   4
  0:  10  17  17  17  14
  1:  17  10  17  17  14
  2:  17  17  10  17  14
  3:  17  17  17  10  14
  4:  14  14  14  14  10
2.6.16.21-0.8-default

[Aside:  The firmware/SLIT says that the interleaved memory is closer to
all nodes that other nodes' memory.  This has interesting implications
for the "overflow" zone lists...]

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] add dev_to_node()
  2006-11-10 18:28         ` Lee Schermerhorn
@ 2006-11-11  0:08           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-11-11  0:08 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: clameter, hch, davem, linux-kernel, netdev, linux-mm

On Fri, 10 Nov 2006 13:28:25 -0500
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> On Fri, 2006-11-10 at 10:16 -0800, Christoph Lameter wrote:
> > On Wed, 8 Nov 2006, KAMEZAWA Hiroyuki wrote:
> > 
> > > I wonder there are no code for creating NODE_DATA() for device-only-node.
> > 
> > On IA64 we remap nodes with no memory / cpus to the nearest node with 
> > memory. I think that is sufficient.
> 
> I don't think this happens anymore.  

In my understanding , from drivers/acpi/numa.c, 
a node is created by a pxm found in SRAT table at boot time.

the node-number for the pxm which was not found in SRAT at boot time is "-1".
please check how acpi_map_pxm_to_node() is used.

If pci's node-id is based on pxm, checking return vaule of pxm_to_node() 
will be good.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2006-11-11  0:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-10-30 14:15 [PATCH 2/3] add dev_to_node() Christoph Hellwig
2006-10-30 22:33 ` David Miller, Christoph Hellwig
2006-11-01  0:10   ` Christoph Lameter
2006-11-01  0:53     ` David Miller, Christoph Lameter
2006-11-01  1:58       ` Christoph Lameter
2006-11-04 22:56   ` Christoph Hellwig
2006-11-04 23:06     ` Dave Jones
2006-11-04 23:09       ` Christoph Hellwig
2006-11-04 23:53       ` Christoph Hellwig
2006-11-05  8:22         ` David Miller, Christoph Hellwig
2006-11-06 23:39           ` Christoph Hellwig
2006-11-07  6:25         ` Ravikiran G Thirumalai
2006-11-07 10:15           ` Christoph Hellwig
2006-11-08  2:40     ` KAMEZAWA Hiroyuki
2006-11-10 18:16       ` Christoph Lameter
2006-11-10 18:28         ` Lee Schermerhorn
2006-11-11  0:08           ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox