Stack & policy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Stack & policy
@ 2000-04-12 14:45 Stelios Xanthakis
  2000-04-12 15:05 ` James Antill
  0 siblings, 1 reply; 4+ messages in thread
From: Stelios Xanthakis @ 2000-04-12 14:45 UTC (permalink / raw)
  To: linux-mm

Hi 

Some time ago I posted a message about a kernel feature where the
application can request the vma->vm_start of its stack virtual memory area
in order to unmap part of the unused stack (esp - vma->vm_start).

Such a feature is very useful for an alternative programming technique.

gcc and libc provide a function called alloca(x). This function allocates x
bytes in the stack frame of the caller, and therefore this space is
automatically available to the program as soon as the function which invoked
alloca returns.

alloca has a few major advantages:
 - very elegant code since we do not worry for freeing the allocated space.
 - very fast allocations because of no fragmentation in the stack space

Here is an example of a nice alloca use. Suppose the function getline()
which gets a line of any size from a client.

char *getline (FILE *f)
{
#define CHUNK_SIZE 200
#define MAX_LINE 1024*1024
	struct chunk {
		char txt [CHUNK_SIZE];
		struct chunk *next;
	} first_chunk, *cur = &first_chunk;
	int i, j, k;
	char *c;

	for (j = 0;; j++) {
		if (j >= MAX_LINE / CHUNK_SIZE) return NULL;
		i = fread (cur->txt, 1, CHUNK_SIZE, f);
		if (i < CHUNK_SIZE) break;
		cur->next = (struct chunk*) alloca (sizeof (struct chunk));
		cur = cur->next;
	}

	c = (char*) malloc (j * CHUNK_SIZE + i);
	for (cur = &first_chunk, k = 0; k < j; k++, cur = cur->next)
		memcpy (c + k * CHUNK_SIZE, cur->txt, CHUNK_SIZE);
	memcpy (c + j * CHUNK_SIZE, cur->txt, i);
	c [j * CHUNK_SIZE + i + 1] = 0;

	return c;
}

Its a beauty!
No need to free the chunk list and we can even return NULL at any time.
On the other hand a version using malloc() would end up being 
"Doug Lea's Nightmare of malloc fragmentation" after some time.

There are many similar pieces of code in which stack allocations prove
efficient and result in great code.

However the above example suffers from one weakness. The kernel has a
stack-only-expands policy (and rightly so); therefore if our function gets a
very big line and expands the stack to 1MB, this huge stack segment will
remain until the termination of the program even if the rest calls to it get
the stack to 300 bytes.

This is the same reason why the code:

void init ()
{
	int tmp [10000];
	...
}

is a shooting offence.

The C Programming language, implies that automatic variables (and alloca()s
in our case) are only used until their scope ends. In the words of OS that
means that space for automatic variables is not returned to the OS after the
function which declared them returns, but they are reserved by the program
for future stack requirements.

I propose a way where an application will be able to release part of its
unused stack if it wants.
We can use the already existing prctl() system call with a new option
PR_GET_STKBOTTOM in order to get the vma->vm_start of the stack area.
Then we can unmap part of the unused stack.

Application will be able to define a directive:
--------------------------------
#include <linux/prctl.h>
#include <asm/page.h>

#define MIN_UNUSED_STACK 2*PAGE_SIZE

#ifdef PR_GET_STKBOTTOM

#define PAGE_DALIGN(x) ((x) & PAGE_MASK))  /* downwards alignment for esp */

#define STACKFIX {\
	unsigned long sb, esp, len;\
	prctl (PR_GET_STKBOTTOM, (unsinged long*)&sb, 0, 0, 0);\
	__asm__ ("mov %%esp,%0"::"m"(esp));\
	len = (sb < PAGE_DALIGN(esp)) ? PAGE_DALIGN(esp)-sb : 0;\
	if (len >= MIN_UNUSED_STACK) munmap ((void*)sb, len);\
	}

#else

#define STACKFIX ;

#endif
-----------------------------------

Calling STACKFIX will return pages from the unused stack to the operating
system. This is a good thing to do occasionaly and on strategic locations in
our application.

In the previous message, Kanoj pointed that:
 1. Only if the app touches the stack pages will they be allocated.

 - indeed but the declaration of an automatic variable implies its usage.
   If there are automatic variables that may not be used then the function
   should be broken in to more functions so we use 100% what we declare.

 2. Programs might have multiple stack segments. Pthreads?

 - the kernel does not have to do anything dangerous. Just provide the
   vma->vm_start to us. 99% of the programs may use this info to release
   stack. If a program has multiple stack segments the authors will avoid
   using this feature. I think pthreads should be Ok BTW.

 3. We can get the same info from /proc/pid/maps.

 - that is very slow to be actually usable in our loops.

I have a patch for the new version of prctl() with the PR_GET_STKBOTTOM
option. I'm not very happy with the fact that in order to get to the stack
vmarea we have to walk through the entire mm->mmap list since the stack vma
is always the last?
Is there a faster way to get to the vm_start of the last mmap'd area?

Your comments?

Cheers

Stelios
<axanth@tee.gr>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Stack & policy
  2000-04-12 14:45 Stack & policy Stelios Xanthakis
@ 2000-04-12 15:05 ` James Antill
  2000-04-13  0:35   ` Jamie Lokier
  0 siblings, 1 reply; 4+ messages in thread
From: James Antill @ 2000-04-12 15:05 UTC (permalink / raw)
  To: axanth; +Cc: linux-mm

> Hi 
> 
> Some time ago I posted a message about a kernel feature where the
> application can request the vma->vm_start of its stack virtual memory area
> in order to unmap part of the unused stack (esp - vma->vm_start).
> 
> Such a feature is very useful for an alternative programming technique.

 Have you seen jamie and chuck talking about madvise() flags ?
 Just doing madvise(cur_stack, MADV_DONTNEED, cur_stack - end_stack)[1]
after a function that uses alloca() or has a large auto should be
a pretty simple addition to gcc (although you might not want to put it
there).

 Those seem like a much better idea to me, as they can also be used in
pthreads (much as I hate pthreads) and other bits of memory that has
similar usage patterns.
 This would also be much more likely to work on other OSes.

[1] I think I have the API correct but I don't have access to it atm.

-- 
James Antill -- james@and.org
"If we can't keep this sort of thing out of the kernel, we might as well
pack it up and go run Solaris." -- Larry McVoy.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Stack & policy
  2000-04-12 15:05 ` James Antill
@ 2000-04-13  0:35   ` Jamie Lokier
  2000-04-14 13:10     ` Stelios Xanthakis
  0 siblings, 1 reply; 4+ messages in thread
From: Jamie Lokier @ 2000-04-13  0:35 UTC (permalink / raw)
  To: James Antill; +Cc: axanth, linux-mm

James Antill wrote:
> > Some time ago I posted a message about a kernel feature where the
> > application can request the vma->vm_start of its stack virtual memory area
> > in order to unmap part of the unused stack (esp - vma->vm_start).
> > 
> > Such a feature is very useful for an alternative programming technique.
> 
>  Have you seen jamie and chuck talking about madvise() flags ?
>  Just doing madvise(cur_stack, MADV_DONTNEED, cur_stack - end_stack)[1]
> after a function that uses alloca() or has a large auto should be
> a pretty simple addition to gcc (although you might not want to put it
> there).
> 
>  Those seem like a much better idea to me, as they can also be used in
> pthreads (much as I hate pthreads) and other bits of memory that has
> similar usage patterns.
>  This would also be much more likely to work on other OSes.

You'd use MADV_FREE, as it allows the app to reuse stack pages
immediately without the overhead of them being unmapped, remapped and
rezeroed -- if it reuses them before the kernel finds another use for
them.  The most efficiently place to put this call is probably in a
timer signal handler.

You still need to get the base of the mapped region though.  You can
parse /proc/self/maps for this :-)

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Stack & policy
  2000-04-13  0:35   ` Jamie Lokier
@ 2000-04-14 13:10     ` Stelios Xanthakis
  0 siblings, 0 replies; 4+ messages in thread
From: Stelios Xanthakis @ 2000-04-14 13:10 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: James Antill, axanth, linux-mm

On Thu, 13 Apr 2000, Jamie Lokier wrote:

> You'd use MADV_FREE, as it allows the app to reuse stack pages
> immediately without the overhead of them being unmapped, remapped and
> rezeroed -- if it reuses them before the kernel finds another use for
> them.  The most efficiently place to put this call is probably in a
> timer signal handler.
> 
> You still need to get the base of the mapped region though.  You can
> parse /proc/self/maps for this :-)

/proc/self/maps might not be the best solution because:
 - too slow. Need to fopen the file, read all the lines up to the last,
   parse and strtoul.
 - most important, the format of proc info tends to change:) I think
   /proc/net/dev is an example..

On the other hand the whole `unmap something maitnained by the kernel' is
very hackerish anyway.

It would be possible to have a specific system call, say prune_stack(),
which will be taking as argument a pointer that represents a stack pointer;
when called, prune_stack would walk through the memory mapped areas for the
one which (VM_GROWSDOWN && vm_start <= sp <= vm_end).
If such an area is found and the base address of this virtual memory area
is `too far' from what the caller passed as stack pointer madvise() is called
with MADV_FREE to release what what is supposed to be unused stack.

That would also work in the case of multiple stack segments.

Passing the desired minimum unused stack is also a good hint to the
procedure.

/* Sample Prototype */
prune_stack (void *stack_pointer, unsigned int min_unused)

When apps should use prune_stack()

An optimum location for prune_stack would be on the main loop of an
application and provided two conditions are met.
 1. Right after prune_stack a function that may block is called.
 2. The functions called in the main loop have unpredictable stack
requirements (a Rayleigh distribution comes in mind:)

For example:

	while (1) {
		fgets (/*command from client*/);
		process_command ();    /* No blocking up here */
		__asm__("mov %%esp,%0"::"m"(sp));
		prune_stack (sp, 8*PAGE_SIZE);
	}

The min_unused is yet important because the processing functions may have
standard stack requirements plus the upredictable ones.

Normally, prune_stack() in the wrong location and/or with wrong min_unused
might introduce a slowdown.
madvise protects us against this; it would be better to release bottom pages
first though? Should this be passed as desired MADV_policy or is it the
default behaviour?

There seem to be 3 alternatives:
 1. Write a prune_stack() generic stack segment pruning system call.
 2. Provide the base address of the std stack segment through prctl() and
   call madvise.
 3. Parse /proc/self/maps and call madvise (no kernel changes).

Stelios
<sxanth@ceid.upatras.gr>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2000-04-14 13:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-04-12 14:45 Stack & policy Stelios Xanthakis
2000-04-12 15:05 ` James Antill
2000-04-13  0:35   ` Jamie Lokier
2000-04-14 13:10     ` Stelios Xanthakis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox