linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* kmap_kiobuf()
@ 2000-06-28 15:41 David Woodhouse
  2000-06-28 17:44 ` kmap_kiobuf() Stephen C. Tweedie
  2000-06-29 10:52 ` kmap_kiobuf() Stephen C. Tweedie
  0 siblings, 2 replies; 18+ messages in thread
From: David Woodhouse @ 2000-06-28 15:41 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: sct, riel

I think it would be useful to provide a function which can be used to 
obtain a virtually-contiguous VM mapping of the pages of an iobuf.

Currently, to access the pages of an iobuf, you have to kmap() each page
individually. For various purposes, it would be useful to be able to kmap the
whole iobuf contiguously, so that you can guarantee that:

	page_address(iobuf->maplist[n]) + PAGE_SIZE 
		== page_address(iobuf->maplist[n+1])

    (for n such that n < iobuf->nr_pages, obviously. Don't be so pedantic.)

Rather than taking a kiobuf as an argument, the new function might as well 
be more generic:

unsigned long kremap_pages(struct page **maplist, int nr_pages);
void kunmap_pages(struct page **maplist, int nr_pages);

I had a quick look at the code for kmap() and vmalloc() and decided that 
even if I attempted to do it myself, I'd probably bugger it up and a MM 
hacker would have to fix it anyway. So I'm not going to bother.

T'would be useful if someone else could find the time to do so, though.


--
dwmw2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 15:41 kmap_kiobuf() David Woodhouse
@ 2000-06-28 17:44 ` Stephen C. Tweedie
  2000-06-29 10:52 ` kmap_kiobuf() Stephen C. Tweedie
  1 sibling, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2000-06-28 17:44 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-kernel, linux-mm, sct, riel

Hi,

On Wed, Jun 28, 2000 at 04:41:55PM +0100, David Woodhouse wrote:

> I think it would be useful to provide a function which can be used to 
> obtain a virtually-contiguous VM mapping of the pages of an iobuf.

Why?  This is not as straightforward as it seems, so I'm curious as
to the intended use.

> Currently, to access the pages of an iobuf, you have to kmap() each page
> individually. For various purposes, it would be useful to be able to kmap the
> whole iobuf contiguously, so that you can guarantee that:
> 
> 	page_address(iobuf->maplist[n]) + PAGE_SIZE 
> 		== page_address(iobuf->maplist[n+1])

For any moderately large sized kiobuf, that just means that we risk
running out of kmaps.  You need to treat kmaps as a scarce resource;
on PAE36-configured machines we only have 512 of them right now.

For user-space access, the current kiobuf patches already have mmap()
support for kiobufs so that getting a user-contiguous mapping of
kiobufs is already done.  That doesn't have the kmap problem, though,
since we can map things into user page tables without pinning them in
kernel memory.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 15:41 kmap_kiobuf() David Woodhouse
  2000-06-28 17:44 ` kmap_kiobuf() Stephen C. Tweedie
@ 2000-06-29 10:52 ` Stephen C. Tweedie
  1 sibling, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2000-06-29 10:52 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-kernel, linux-mm, sct, riel

Hi,

On Wed, Jun 28, 2000 at 04:41:55PM +0100, David Woodhouse wrote:

> I think it would be useful to provide a function which can be used to 
> obtain a virtually-contiguous VM mapping of the pages of an iobuf.

Perhaps, but I really would rather resist this.  The whole point of
kiobufs is to let you deal cleanly with thinks which are unaligned by
doing page lookups.  They are specifically intended to _avoid_ nasty
VM tricks.  

Adding kiobuf support for things like memcpy_to/from_kiobuf is
something I will do, but kiobufs are there to help get people out of
the mindset that all buffers are virtually contiguous in the first
place!  

It would be fairly easy to add kiobuf support for this, but it would
make it harder to get the diffs past Linus (who really wants vmalloc
to go away as much as possible).

I'm open to arguments, but "I coded this way in the past and I'd
rather the kernel did ugly things to let me keep coding this way in
the future" isn't the sort of reasoning that helps me to get new
functionality past Linus's sanity filters.  :-)

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-29  9:34 ` kmap_kiobuf() Stephen C. Tweedie
@ 2000-06-29 13:45   ` Steve Lord
  0 siblings, 0 replies; 18+ messages in thread
From: Steve Lord @ 2000-06-29 13:45 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: lord, David Woodhouse, linux-mm, riel

> Hi,
> 
> On Wed, Jun 28, 2000 at 03:16:42PM -0500, lord@sgi.com wrote:
> 
> > So we are currently using memory managed as an address space to do the
> > caching of metadata. Everything is built up out of single pages, and when w
e
> > need something bigger we glue it together into a larger chunk of address
> > space.
> 
> What do you mean by "address space"?  If you mean kernel VA, then
> there's a clear risk of fragmenting the kernel's remappable area and
> ending up unable to find contiguous regions at all if you're not
> careful.

Sorry, I used the same term for two different things there, we cache meta
data in a 'struct address_space', the second one is the problem issue. XFS
is doing 'bad things' with address space remapping to take a handful of
previously existing pages and remapping them to appear as one chunk of
memory. We do not usually have that many in existence at once. I think it
has pretty much been established that this is not going to be acceptable
in the long term - I always knew that was likely.

Running with a bigger PAGE_CACHE_SIZE will help, some of the code in XFS
may be changable to work without treating the metadata object as a single
chunk of memory, and we may be able to come up with some other tricks too.

Steve



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 20:16 kmap_kiobuf() lord
  2000-06-28 21:22 ` kmap_kiobuf() Benjamin C.R. LaHaise
@ 2000-06-29  9:34 ` Stephen C. Tweedie
  2000-06-29 13:45   ` kmap_kiobuf() Steve Lord
  1 sibling, 1 reply; 18+ messages in thread
From: Stephen C. Tweedie @ 2000-06-29  9:34 UTC (permalink / raw)
  To: lord; +Cc: Stephen C. Tweedie, David Woodhouse, linux-mm, riel

Hi,

On Wed, Jun 28, 2000 at 03:16:42PM -0500, lord@sgi.com wrote:

> So we are currently using memory managed as an address space to do the
> caching of metadata. Everything is built up out of single pages, and when we
> need something bigger we glue it together into a larger chunk of address
> space.

What do you mean by "address space"?  If you mean kernel VA, then
there's a clear risk of fragmenting the kernel's remappable area and
ending up unable to find contiguous regions at all if you're not
careful.

> p.s. Woudn't the remapping of pages be a way to let modules etc get larger
> arrays of memory after boot time - doing it a few times is not going to
> kill the system.

That's what vmalloc does.  If you mean actually moving physical pages
to clear space, it's really not so simple --- what happens if you
encounter mlock()ed pages?  The kernel also mixes user pages with
pinned kernel data structures which simply cannot be moved, so it's
not straightforward to support that sort of thing.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 18:45     ` kmap_kiobuf() David Woodhouse
@ 2000-06-29  9:09       ` Stephen C. Tweedie
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2000-06-29  9:09 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Stephen C. Tweedie, lord, linux-kernel, linux-mm, riel

Hi,

On Wed, Jun 28, 2000 at 07:45:57PM +0100, David Woodhouse wrote:
> On Wed, 28 Jun 2000, Stephen C. Tweedie wrote:
> 
> > The pinning of user buffers is part of the reason we have kiobufs.
> > But why do you need to pass it to functions expecting kernel buffers?  
> 
> So far I've encountered two places where I've wanted to do this.
> 
> First, in copying a packet from userspace to a PCI card, where I have to
> have interrupts disabled locally (spin_lock_irq()).

How much data is involved?

If it's not too much, then your current scheme looks like the right
way to do this.  You really should try to keep things per-page and not
rely on the pages being contiguous, since using the kernel's vmalloc
area for contiguifying the pages will be enormously expensive on SMP.
I can certainly add kiobuf support routines for kmapping and
memcpy()ing kiobuf pages to the kiobuf core patches to clean the code
a bit.

> If it's really that difficult to map them contiguously into VM, I suppose
> it can stay the way it is - actually I can probably get away without the
> array of virtual addresses by discarding the return value of kmap() and
> using page_address() from within the spinlock, can't I?

Yes, that should work fine.

> Secondly, for the character device access to MTD devices. Almost all
> access to MTD devices uses kernel-space buffers. I don't really want to
> bloat _every_ MTD driver by making it conditionally user/kernel.
> 
> The only exception is the direct chardevice access, for which I'm
> currently using bounce buffers, but would like to just lock down the pages
> and pass a contiguously-mapped VM address instead.

Why does it need to be *contiguous*???  The right way to code this is
most definitely in terms of kiobufs.  That's basically the only way
we'll support user-space direct access.  If I can give you a
memcpy_to_kiovec() and memcpy_from_kiovec() patch, then that gives you
a canonical way of representing buffers from either user or kernel
space without any assumption at all that the pages are contiguous, and
you get direct IO for free.

The whole point of kiobufs is to abstract away the source of the
pages.  You don't have to know whether the pages were originally
kernel or user space.

> I noticed that kmap ptes seem to be allocated from array of static size,
> which is different to the method used for vmalloc(). Why is this?

vmalloc() and kmap() are meant for completely different purposes.
vmalloc() is designed for long-term persistent regions (such as
loadable moules).  However, it is slow.  kmap() is very fast, but is
designed for transient mappings of individual pages.  The fixed kmap
pte list is used as a ring buffer.  If we kmap a page twice without
wrapping, we can reuse the old virtual address of the page, so it's
pretty fast to repeatedly kmap and kunmap a single page (there's a
spinlock cost but not much more).  

The big advantage of kmap is that we only have to do an SMP TLB IPI
once every wrap of the kmap ring buffer.  vmalloc incurs that cost
every time.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 20:16 kmap_kiobuf() lord
@ 2000-06-28 21:22 ` Benjamin C.R. LaHaise
  2000-06-29  9:34 ` kmap_kiobuf() Stephen C. Tweedie
  1 sibling, 0 replies; 18+ messages in thread
From: Benjamin C.R. LaHaise @ 2000-06-28 21:22 UTC (permalink / raw)
  To: lord; +Cc: Stephen C. Tweedie, David Woodhouse, linux-mm, riel

On Wed, 28 Jun 2000 lord@sgi.com wrote:

...
> Ben mentioned large page support as another way to get around this
> problem. Where is that in the grand scheme of things?
...

For filesystems, I meant increasing PAGE_CACHE_SIZE.  I'm planning on
getting this working for 2.5.early.  Of course, this will put more
pressure on the memory allocator which means that it will have to go along
with zoning changes.

Large page support will be a somewhat different beast: using 4MB pages (on
x86) for mapping/io purposes.  The idea there is that the individual pages
would still be put into the page cache, but they would be marked with a
flag as part of a large page (should be fairly similar to how other unices
implement it).  It's really only relevant to the mm subsystem and the
tlb's on machines that support varying page sizes.

> p.s. Woudn't the remapping of pages be a way to let modules etc get larger
> arrays of memory after boot time - doing it a few times is not going to
> kill the system.

Hrm?  Allocating physically contiguous memory is a problem that requires
big changes to the allocator and the swapper.  Sure, once we get these
fixed, all sorts of things become possible.  But this doesn't help with
the fact that kernel mappings for objects larger than a single page just
aren't possible right now.  Hmm.  How bad would supporting a small number
of fixed higher-order kmaps be?  (and what's Linus' opinion on such a
change?)

		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 18:06 ` kmap_kiobuf() Stephen C. Tweedie
  2000-06-28 19:06   ` kmap_kiobuf() Manfred Spraul
@ 2000-06-28 21:05   ` Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2000-06-28 21:05 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: lord, Benjamin C.R. LaHaise, David Woodhouse, linux-kernel, linux-mm

On Wed, Jun 28, 2000 at 07:06:12PM +0100, Stephen C. Tweedie wrote:
> Hi,
> 
> On Wed, Jun 28, 2000 at 11:52:40AM -0500, lord@sgi.com wrote:
> > 
> > I am not a VM guy either, Ben, is the cost of the TLB flush mostly in
> > the synchronization between CPUs, or is it just expensive anyway you
> > look at it?
> 
> The TLB IPI is by far the biggest factor here.

In theory it would be possible to do it lazily associated with the object's
lock (so that the TLB is only transfered when some other CPU aquires the lock of the
object in question). It would be probably rather error prone though.


	


-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
@ 2000-06-28 20:16 lord
  2000-06-28 21:22 ` kmap_kiobuf() Benjamin C.R. LaHaise
  2000-06-29  9:34 ` kmap_kiobuf() Stephen C. Tweedie
  0 siblings, 2 replies; 18+ messages in thread
From: lord @ 2000-06-28 20:16 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: lord, David Woodhouse, linux-mm, riel

> Hi,
> 
> On Wed, Jun 28, 2000 at 10:54:40AM -0500, lord@sgi.com wrote:
> 
> > I always knew it would go down like a ton of bricks, because of the TLB
> > flushing costs. As soon as you have a multi-cpu box this operation gets
> > expensive, the code could be changed to do lazy tlb flushes on unmapping
> > the pages, but you still have the cost every time you set a mapping up.
> 
> That's exactly what kmap() is for --- it does all the lazy tlb
> flushing for you.  Of course, the kmap area can get fragmented so it's
> not a magic solution if you really need contiguous virtual mappings.
> 
> However, kmap caches the virtual mappings for you automatically, so it
> may well be fast enough for you that you can avoid the whole
> contiguous map thing and just kmap pages as you need them.  Is that
> impossible for your code?
> 
> Cheers,
>  Stephen

Hmm, not sure how much kmap helps - it appears to be for mapping a single
page from highmem. The issue with XFS is that we have variable sized
chunks of meta-data (could be upto 64 Kbytes depending on how the filesystem
was built). 

The code was originally written to treat this like a byte array. Some of the
structures are layed out so that we could rework the code to not treat it
as a byte array, since they are basically arrays of smaller records. Some are
run length encoded type structures (directory leaf blocks being one) where
reworking the code would be a pain to say the least.

So we are currently using memory managed as an address space to do the
caching of metadata. Everything is built up out of single pages, and when we
need something bigger we glue it together into a larger chunk of address
space. This has the nice property that for cached metadata which does
not have special properties at the moment, we can just leave the pages
in the address space. The rest of the vm system is then free to reuse
them out from under us when there is demand for more memory.

Clearly it also has the nasty property of wanting to mess with the address
space map on a regular basis. [ Note that the mapping together of
pages like this is only done when the caller requests it, we can
still use pagebufs without it. ]

So if we do not use pages then we could use other memory from the slab
allocator, and work really hard to ensure it always works. If we go this
route then we now have chunks memory which we need to manage as our own cache,
otherwise we end up continually re-reading from disk. We introduce another 
caching mechanism into the kernel - yet another beast to fight over memory.

If we do not allow the remapping of the pages then we get into rewriting
lots of XFS, and almost certainly breaking it in the process.

Ben mentioned large page support as another way to get around this
problem. Where is that in the grand scheme of things?

Steve

p.s. Woudn't the remapping of pages be a way to let modules etc get larger
arrays of memory after boot time - doing it a few times is not going to
kill the system.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 18:06 ` kmap_kiobuf() Stephen C. Tweedie
@ 2000-06-28 19:06   ` Manfred Spraul
  2000-06-28 21:05   ` kmap_kiobuf() Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2000-06-28 19:06 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: lord, Benjamin C.R. LaHaise, David Woodhouse, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Wed, Jun 28, 2000 at 11:52:40AM -0500, lord@sgi.com wrote:
> >
> > I am not a VM guy either, Ben, is the cost of the TLB flush mostly in
> > the synchronization between CPUs, or is it just expensive anyway you
> > look at it?
> 
> The TLB IPI is by far the biggest factor here.
> 
I tried it on my Dual Pentium II/350, 100 MHz FSB:

* an empty IPI returns after ~ 1630 cpu ticks.
* a tlb flush IPI needs ~ 2130 cpu ticks.

The computer was idle, and obviously I only measure the cost as seen
from the primary cpu, I don't know how long the second cpu needs until
it returns from the interrupt.


--
	Manfred

[-- Attachment #2: patch-newperf --]
[-- Type: text/plain, Size: 3609 bytes --]

--- 2.4/drivers/net/dummy.c	Sat Jun 24 11:07:56 2000
+++ build-2.4/drivers/net/dummy.c	Wed Jun 28 20:55:44 2000
@@ -132,17 +132,171 @@
 	dummy_init(dev);
 	return 0;
 }
-
 static struct net_device dev_dummy = {
 		"",
 		0, 0, 0, 0,
 	 	0x0, 0,
 	 	0, 0, 0, NULL, dummy_probe };
 
+/* kernel benchmark hook (C) Manfred Spraul manfreds@colorfullife.com */
+
+int p_shift = -1;
+MODULE_PARM     (p_shift, "1i");
+MODULE_PARM_DESC(p_shift, "Shift for the profile buffer");
+
+#define STAT_TABLELEN		16384
+static unsigned long totals[STAT_TABLELEN];
+static unsigned int overflows;
+
+static unsigned long long stime;
+static void start_measure(void)
+{
+	 __asm__ __volatile__ (
+		".align 64\n\t"
+	 	"pushal\n\t"
+		"cpuid\n\t"
+		"popal\n\t"
+		"rdtsc\n\t"
+		"movl %%eax,(%0)\n\t"
+		"movl %%edx,4(%0)\n\t"
+		: /* no output */
+		: "c"(&stime)
+		: "eax", "edx", "memory" );
+}
+
+static void end_measure(void)
+{
+static unsigned long long etime;
+	__asm__ __volatile__ (
+		"pushal\n\t"
+		"cpuid\n\t"
+		"popal\n\t"
+		"rdtsc\n\t"
+		"movl %%eax,(%0)\n\t"
+		"movl %%edx,4(%0)\n\t"
+		: /* no output */
+		: "c"(&etime)
+		: "eax", "edx", "memory" );
+	{
+		unsigned long time = (unsigned long)(etime-stime);
+		time >>= p_shift;
+		if(time < STAT_TABLELEN) {
+			totals[time]++;
+		} else {
+			overflows++;
+		}
+	}
+}
+
+static void clean_buf(void)
+{
+	memset(totals,0,sizeof(totals));
+	overflows = 0;
+}
+
+static void print_line(unsigned long* array)
+{
+	int i;
+	for(i=0;i<32;i++) {
+		if((i%32)==16)
+			printk(":");
+		printk("%lx ",array[i]); 
+	}
+}
+
+static void print_buf(char* caption)
+{
+	int i, other = 0;
+	printk("Results - %s - shift %d",
+		caption, p_shift);
+
+	for(i=0;i<STAT_TABLELEN;i+=32) {
+		int j;
+		int local = 0;
+		for(j=0;j<32;j++)
+			local += totals[i+j];
+
+		if(local) {
+			printk("\n%3x: ",i);
+			print_line(&totals[i]);
+			other += local;
+		}
+	}
+	printk("\nOverflows: %d.\n",
+		overflows);
+	printk("Sum: %ld\n",other+overflows);
+}
+
+static void return_immediately(void* dummy)
+{
+	return;
+}
+
+/* gross hack */
+static unsigned long mmu_cr4_features;
+
+static void do_flush_tlb(void* dummy)
+{
+	__flush_tlb_all();
+}
+
 int init_module(void)
 {
 	/* Find a name for this unit */
-	int err=dev_alloc_name(&dev_dummy,"dummy%d");
+	int err;
+
+	if(p_shift != -1) {
+		int i;
+		/* empty test measurement: */
+		printk("******** kernel cpu benchmark activated **********\n");
+		clean_buf();
+		schedule_timeout(100);
+		for(i=0;i<100;i++) {
+			start_measure();
+			return_immediately(NULL);
+			return_immediately(NULL);
+			return_immediately(NULL);
+			return_immediately(NULL);
+			end_measure();
+		}
+		print_buf("zero");
+		clean_buf();
+		schedule_timeout(100);
+		for(i=0;i<100;i++) {
+			start_measure();
+			return_immediately(NULL);
+			return_immediately(NULL);
+			smp_call_function(return_immediately,NULL,1,1);
+			return_immediately(NULL);
+			return_immediately(NULL);
+			end_measure();
+		}
+		print_buf("empty smp call");
+		clean_buf();
+		{
+			int tmp;
+			__asm__ __volatile__(
+					"movl %%cr4,%0\n\t"
+					"movl %0,%1\n\t"
+					: "=&r"(tmp)
+					: "m" (mmu_cr4_features));
+		}
+		schedule_timeout(100);
+		for(i=0;i<100;i++) {
+			start_measure();
+			return_immediately(NULL);
+			return_immediately(NULL);
+			smp_call_function(do_flush_tlb,NULL,1,1);
+			return_immediately(NULL);
+			return_immediately(NULL);
+			end_measure();
+		}
+		print_buf("tlb flush");
+		clean_buf();
+	return -EINVAL;
+	}
+
+	err=dev_alloc_name(&dev_dummy,"dummy%d");
 	if(err<0)
 		return err;
 	if (register_netdev(&dev_dummy) != 0)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 18:07   ` kmap_kiobuf() Stephen C. Tweedie
@ 2000-06-28 18:45     ` David Woodhouse
  2000-06-29  9:09       ` kmap_kiobuf() Stephen C. Tweedie
  0 siblings, 1 reply; 18+ messages in thread
From: David Woodhouse @ 2000-06-28 18:45 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: lord, linux-kernel, linux-mm, riel

On Wed, 28 Jun 2000, Stephen C. Tweedie wrote:

> The pinning of user buffers is part of the reason we have kiobufs.
> But why do you need to pass it to functions expecting kernel buffers?  

So far I've encountered two places where I've wanted to do this.

First, in copying a packet from userspace to a PCI card, where I have to
have interrupts disabled locally (spin_lock_irq()).

Currently, it does:
	lock_iobuf()
	foreach(page)
		kmap(page) (and store the address in an array)
	spin_lock_irq()
	foreach(page or part thereof)
		memcpy_toio() (using the virtadr returned by kmap)
	spin_unlock_irq()
	foreach(page)
		kunmap(page)
	unlock_iobuf()


The memcpy_toio() has to be split into page-sized chunks, and because we
have to do the kmap from outside the spinlock, I have to keep an array of
virtual addresses.

If it's really that difficult to map them contiguously into VM, I suppose
it can stay the way it is - actually I can probably get away without the
array of virtual addresses by discarding the return value of kmap() and
using page_address() from within the spinlock, can't I?


Secondly, for the character device access to MTD devices. Almost all
access to MTD devices uses kernel-space buffers. I don't really want to
bloat _every_ MTD driver by making it conditionally user/kernel.

The only exception is the direct chardevice access, for which I'm
currently using bounce buffers, but would like to just lock down the pages
and pass a contiguously-mapped VM address instead.

Again, if it's really that much of a problem, I can work round it. It just
seemed like the ideal solution, that's all.

> For any moderately large sized kiobuf, that just means that we risk
> running out of kmaps.  You need to treat kmaps as a scarce resource;
> on PAE36-configured machines we only have 512 of them right now.

I noticed that kmap ptes seem to be allocated from array of static size,
which is different to the method used for vmalloc(). Why is this?

-- 
dwmw2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 16:06 ` kmap_kiobuf() David Woodhouse
  2000-06-28 16:24   ` kmap_kiobuf() Benjamin C.R. LaHaise
@ 2000-06-28 18:07   ` Stephen C. Tweedie
  2000-06-28 18:45     ` kmap_kiobuf() David Woodhouse
  1 sibling, 1 reply; 18+ messages in thread
From: Stephen C. Tweedie @ 2000-06-28 18:07 UTC (permalink / raw)
  To: David Woodhouse; +Cc: lord, linux-kernel, linux-mm, sct, riel

Hi,

On Wed, Jun 28, 2000 at 05:06:30PM +0100, David Woodhouse wrote:
> 
> MM is not exactly my field - I just know I want to be able to lock down a 
> user's buffer and treat it as if it were in kernel-space, passing its 
> address to functions which expect kernel buffers.

The pinning of user buffers is part of the reason we have kiobufs.
But why do you need to pass it to functions expecting kernel buffers?  

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 16:52 kmap_kiobuf() lord
@ 2000-06-28 18:06 ` Stephen C. Tweedie
  2000-06-28 19:06   ` kmap_kiobuf() Manfred Spraul
  2000-06-28 21:05   ` kmap_kiobuf() Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2000-06-28 18:06 UTC (permalink / raw)
  To: lord; +Cc: Benjamin C.R. LaHaise, David Woodhouse, linux-kernel, linux-mm

Hi,

On Wed, Jun 28, 2000 at 11:52:40AM -0500, lord@sgi.com wrote:
> 
> I am not a VM guy either, Ben, is the cost of the TLB flush mostly in
> the synchronization between CPUs, or is it just expensive anyway you
> look at it?

The TLB IPI is by far the biggest factor here.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 15:54 kmap_kiobuf() lord
  2000-06-28 16:06 ` kmap_kiobuf() David Woodhouse
@ 2000-06-28 17:46 ` Stephen C. Tweedie
  1 sibling, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2000-06-28 17:46 UTC (permalink / raw)
  To: lord; +Cc: David Woodhouse, linux-kernel, linux-mm, sct, riel

Hi,

On Wed, Jun 28, 2000 at 10:54:40AM -0500, lord@sgi.com wrote:

> I always knew it would go down like a ton of bricks, because of the TLB
> flushing costs. As soon as you have a multi-cpu box this operation gets
> expensive, the code could be changed to do lazy tlb flushes on unmapping
> the pages, but you still have the cost every time you set a mapping up.

That's exactly what kmap() is for --- it does all the lazy tlb
flushing for you.  Of course, the kmap area can get fragmented so it's
not a magic solution if you really need contiguous virtual mappings.

However, kmap caches the virtual mappings for you automatically, so it
may well be fast enough for you that you can avoid the whole
contiguous map thing and just kmap pages as you need them.  Is that
impossible for your code?

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
@ 2000-06-28 16:52 lord
  2000-06-28 18:06 ` kmap_kiobuf() Stephen C. Tweedie
  0 siblings, 1 reply; 18+ messages in thread
From: lord @ 2000-06-28 16:52 UTC (permalink / raw)
  To: Benjamin C.R. LaHaise; +Cc: David Woodhouse, lord, linux-kernel, linux-mm

> On Wed, 28 Jun 2000, David Woodhouse wrote:
> 
> > MM is not exactly my field - I just know I want to be able to lock down a 
> > user's buffer and treat it as if it were in kernel-space, passing its 
> > address to functions which expect kernel buffers.
> 
> Then pass in a kiovec (we're planning on adding a rw_kiovec file op!) and
> use kmap/kmap_atomic on individual pages as required.  As to providing
> larger kmaps, I have yet to be convinced that providing primatives for
> dealing with objects larger than PAGE_SIZE is a Good Idea. 
> 
> 		-ben

I agree with trying to minimize things which require TLB flushes, we just
have 112 thousand lines of existing code (OK, lots of comments in that)
which wants to use things bigger than a page, and use them in ways which
are sometimes not going to be amenable to rewriting to use an array of pages,
not to mention rewriting would destabilize the code base.

I am not a VM guy either, Ben, is the cost of the TLB flush mostly in
the synchronization between CPUs, or is it just expensive anyway you
look at it?


Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 16:06 ` kmap_kiobuf() David Woodhouse
@ 2000-06-28 16:24   ` Benjamin C.R. LaHaise
  2000-06-28 18:07   ` kmap_kiobuf() Stephen C. Tweedie
  1 sibling, 0 replies; 18+ messages in thread
From: Benjamin C.R. LaHaise @ 2000-06-28 16:24 UTC (permalink / raw)
  To: David Woodhouse; +Cc: lord, linux-kernel, linux-mm

On Wed, 28 Jun 2000, David Woodhouse wrote:

> MM is not exactly my field - I just know I want to be able to lock down a 
> user's buffer and treat it as if it were in kernel-space, passing its 
> address to functions which expect kernel buffers.

Then pass in a kiovec (we're planning on adding a rw_kiovec file op!) and
use kmap/kmap_atomic on individual pages as required.  As to providing
larger kmaps, I have yet to be convinced that providing primatives for
dealing with objects larger than PAGE_SIZE is a Good Idea. 

		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
  2000-06-28 15:54 kmap_kiobuf() lord
@ 2000-06-28 16:06 ` David Woodhouse
  2000-06-28 16:24   ` kmap_kiobuf() Benjamin C.R. LaHaise
  2000-06-28 18:07   ` kmap_kiobuf() Stephen C. Tweedie
  2000-06-28 17:46 ` kmap_kiobuf() Stephen C. Tweedie
  1 sibling, 2 replies; 18+ messages in thread
From: David Woodhouse @ 2000-06-28 16:06 UTC (permalink / raw)
  To: lord; +Cc: linux-kernel, linux-mm, sct, riel

lord@sgi.com said:
>  I always knew it would go down like a ton of bricks, because of the
> TLB flushing costs. As soon as you have a multi-cpu box this operation
> gets expensive, the code could be changed to do lazy tlb flushes on
> unmapping the pages, but you still have the cost every time you set a
> mapping up. 

Aha - is this why kmap uses a pre-allocated set of PTEs? I got about that 
far before deciding I had no clue what was going on and giving up.

MM is not exactly my field - I just know I want to be able to lock down a 
user's buffer and treat it as if it were in kernel-space, passing its 
address to functions which expect kernel buffers.

--
dwmw2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kmap_kiobuf()
@ 2000-06-28 15:54 lord
  2000-06-28 16:06 ` kmap_kiobuf() David Woodhouse
  2000-06-28 17:46 ` kmap_kiobuf() Stephen C. Tweedie
  0 siblings, 2 replies; 18+ messages in thread
From: lord @ 2000-06-28 15:54 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-kernel, linux-mm, sct, riel

> I think it would be useful to provide a function which can be used to 
> obtain a virtually-contiguous VM mapping of the pages of an iobuf.
> 
> Currently, to access the pages of an iobuf, you have to kmap() each page
> individually. For various purposes, it would be useful to be able to kmap the
> whole iobuf contiguously, so that you can guarantee that:
> 
> 	page_address(iobuf->maplist[n]) + PAGE_SIZE 
> 		== page_address(iobuf->maplist[n+1])
> 
>     (for n such that n < iobuf->nr_pages, obviously. Don't be so pedantic.)
> 
> Rather than taking a kiobuf as an argument, the new function might as well 
> be more generic:
> 
> unsigned long kremap_pages(struct page **maplist, int nr_pages);
> void kunmap_pages(struct page **maplist, int nr_pages);
> 
> I had a quick look at the code for kmap() and vmalloc() and decided that 
> even if I attempted to do it myself, I'd probably bugger it up and a MM 
> hacker would have to fix it anyway. So I'm not going to bother.
> 
> T'would be useful if someone else could find the time to do so, though.
> 
> 
> --
> dwmw2
> 
> 


The XFS port currently has exactly this beast, there is an extension
to let us pass an existing set of pages into the vmalloc_area_pages
function. It uses the existing pages instead of allocating new ones.
We needed something to let us map groups of pages into a single byte array.


I always knew it would go down like a ton of bricks, because of the TLB
flushing costs. As soon as you have a multi-cpu box this operation gets
expensive, the code could be changed to do lazy tlb flushes on unmapping
the pages, but you still have the cost every time you set a mapping up.

Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2000-06-29 13:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-28 15:41 kmap_kiobuf() David Woodhouse
2000-06-28 17:44 ` kmap_kiobuf() Stephen C. Tweedie
2000-06-29 10:52 ` kmap_kiobuf() Stephen C. Tweedie
2000-06-28 15:54 kmap_kiobuf() lord
2000-06-28 16:06 ` kmap_kiobuf() David Woodhouse
2000-06-28 16:24   ` kmap_kiobuf() Benjamin C.R. LaHaise
2000-06-28 18:07   ` kmap_kiobuf() Stephen C. Tweedie
2000-06-28 18:45     ` kmap_kiobuf() David Woodhouse
2000-06-29  9:09       ` kmap_kiobuf() Stephen C. Tweedie
2000-06-28 17:46 ` kmap_kiobuf() Stephen C. Tweedie
2000-06-28 16:52 kmap_kiobuf() lord
2000-06-28 18:06 ` kmap_kiobuf() Stephen C. Tweedie
2000-06-28 19:06   ` kmap_kiobuf() Manfred Spraul
2000-06-28 21:05   ` kmap_kiobuf() Andi Kleen
2000-06-28 20:16 kmap_kiobuf() lord
2000-06-28 21:22 ` kmap_kiobuf() Benjamin C.R. LaHaise
2000-06-29  9:34 ` kmap_kiobuf() Stephen C. Tweedie
2000-06-29 13:45   ` kmap_kiobuf() Steve Lord

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox