Hangs in 2.5.41-mm1

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Hangs in 2.5.41-mm1
@ 2002-10-09 18:36 Paul Larson
  2002-10-09 20:17 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Larson @ 2002-10-09 18:36 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

I'm able to generate a lot of hangs with 2.5.41-mm1.
This is on a 8-way PIII-700, 16 GB ram (PAE enabled)

The first one, I got by running ltp for a while, then the attached test
for a bit, then, at the suggestion of Bill Irwin to increase the amount
of ram I could be using for huge pages:
echo 768 > /proc/sys/vm/nr_hugepages

Doing that (and the corresponding echo 1610612736 >
/proc/sys/kernel/shmmax) after a cold boot gave me no problems though.

I also got it to hang after runnging the attached test with -s
1610612736 and then running another one with no options.

There was no output on the serial console when it hung, and it was
unresponsive to ping, vc switch, and sysrq.

The attached test is an ltp shmem test modified by Bill Irwin to support
the shm huge pages in 2.5.41-mm1.  Compile it with --static.

Thanks,
Paul Larson



[-- Attachment #2: shmt01.c --]
[-- Type: text/x-c, Size: 9751 bytes --]

/*
 *
 *   Copyright (c) International Business Machines  Corp., 2001
 *
 *   This program is free software;  you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 *   This program is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
 *   the GNU General Public License for more details.
 *
 *   You should have received a copy of the GNU General Public License
 *   along with this program;  if not, write to the Free Software
 *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 */

/*
 * Copyright (C) Bull S.A. 1996
 * Level 1,5 Years Bull Confidential and Proprietary Information
 */

/*---------------------------------------------------------------------+
|                           shmem_test_01                              |
| ==================================================================== |
|                                                                      |
| Description:  Simplistic test to verify the shmem system function    |
|               calls.                                                 |
|                                                                      |
|                                                                      |
| Algorithm:    o  Obtain a unique shared memory identifier with       |
|                  shmget ()                                           |
|               o  Map the shared memory segment to the current        |
|                  process with shmat ()                               |
|               o  Index through the shared memory segment             |
|               o  Release the shared memory segment with shmctl ()    |
|                                                                      |
| System calls: The following system calls are tested:                 |
|                                                                      |
|               shmget () - Gets shared memory segments                |
|               shmat () - Controls shared memory operations           |
|               shmctl () - Attaches a shared memory segment or mapped |
|                           file to the current process                |
|                                                                      |
| Usage:        shmem_test_01                                          |
|                                                                      |
| To compile:   cc -o shmem_test_01 shmem_test_01.c                    |
|                                                                      |
| Last update:   Ver. 1.2, 2/8/94 00:08:30                           |
|                                                                      |
| Change Activity                                                      |
|                                                                      |
|   Version  Date    Name  Reason                                      |
|    0.1     111593  DJK   Initial version for AIX 4.1                 |
|    1.2     020794  DJK   Moved to "prod" directory                   |
|                                                                      |
+---------------------------------------------------------------------*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/shm.h>

/* Defines
 *
 * MAX_SHMEM_SIZE: maximum shared memory segment size of 256MB 
 * (reference 3.2.5 man pages)
 *
 * DEFAULT_SHMEM_SIZE: default shared memory size, unless specified with
 * -s command line option
 * 
 * SHMEM_MODE: shared memory access permissions (permit process to read
 * and write access)
 * 
 * USAGE: usage statement
 */
#define SHM_HUGETLB		04000
#define SHMADDR			((const void *)0x4000000)
#define MAX_SHMEM_SIZE		(3UL*1024UL*1024UL*1024UL)
#define DEFAULT_SHMEM_SIZE	(64*1024*1024)
#define	SHMEM_MODE		(SHM_R | SHM_W | SHM_HUGETLB | IPC_CREAT)
#define USAGE	"\nUsage: %s [-s shmem_size]\n\n" \
		"\t-s shmem_size  size of shared memory segment (bytes)\n" \
		"\t               (must be less than 256MB!)\n\n"

/*
 * Function prototypes
 *
 * parse_args (): Parse command line arguments
 * sys_error (): System error message function
 * error (): Error message function
 */
void parse_args (int, char **);
void sys_error (const char *, int);
void error (const char *, int);

/*
 * Global variables
 * 
 * shmem_size: shared memory segment size (in bytes)
 */
unsigned long shmem_size = DEFAULT_SHMEM_SIZE;
const key_t key = 1;

/*---------------------------------------------------------------------+
|                               main                                   |
| ==================================================================== |
|                                                                      |
|                                                                      |
| Function:  Main program  (see prolog for more details)               |
|                                                                      |
| Returns:   (0)  Successful completion                                |
|            (-1) Error occurred                                       |
|                                                                      |
+---------------------------------------------------------------------*/
int main (int argc, char **argv)
{
	int	shmid;		/* (Unique) Shared memory identifier */
	char	*shmptr,	/* Shared memory segment address */
		*ptr,		/* Index into shared memory segment */
		value = 0;	/* Value written into shared memory segment */

	/*
	 * Parse command line arguments and print out program header
	 */
	parse_args (argc, argv);
	printf ("%s: IPC Shared Memory TestSuite program\n", *argv);
    
	/*
	 * Obtain a unique shared memory identifier with shmget ().
	 * Attach the shared memory segment to the process with shmat (), 
	 * index through the shared memory segment, and then release the
	 * shared memory segment with shmctl ().
	 */
	printf ("\n\tGet shared memory segment (%lu bytes)\n", shmem_size);
	if ((shmid = shmget (key, shmem_size, SHMEM_MODE)) < 0)
		sys_error ("shmget failed", __LINE__);

	printf ("\n\tAttach shared memory segment to process\n");
	if ((shmptr = shmat (shmid, SHMADDR, SHM_HUGETLB)) < 0)
		sys_error ("shmat failed", __LINE__);

	printf ("\n\tIndex through shared memory segment ...\n");
	for (ptr=shmptr; ptr < (shmptr + shmem_size); ptr++)
		*ptr = value++;
	sleep(10);

	printf ("\n\tRelease shared memory\n");
	if (shmctl (shmid, IPC_RMID, 0) < 0)
		sys_error ("shmctl failed", __LINE__);

	/* 
	 * Program completed successfully -- exit
	 */
	printf ("\nsuccessful!\n");

	return (0);
}


/*---------------------------------------------------------------------+
|                             parse_args ()                            |
| ==================================================================== |
|                                                                      |
| Function:  Parse the command line arguments & initialize global      |
|            variables.                                                |
|                                                                      |
| Updates:   (command line options)                                    |
|                                                                      |
|            [-s] size: shared memory segment size                     |
|                                                                      |
+---------------------------------------------------------------------*/
void parse_args (int argc, char **argv)
{
	int	i;
	int	errflag = 0;
	char	*program_name = *argv;
	extern char 	*optarg;	/* Command line option */

	while ((i = getopt(argc, argv, "s:?")) != EOF) {
		switch (i) {
			case 's':
				shmem_size = atoi (optarg);
				break;
			case '?':
				errflag++;
				break;
		}
	}

	if (shmem_size < 1 || shmem_size > MAX_SHMEM_SIZE)
		errflag++;

	if (errflag) {
		fprintf (stderr, USAGE, program_name);
		exit (2);
	}
}


/*---------------------------------------------------------------------+
|                             sys_error ()                             |
| ==================================================================== |
|                                                                      |
| Function:  Creates system error message and calls error ()           |
|                                                                      |
+---------------------------------------------------------------------*/
void sys_error (const char *msg, int line)
{
	char syserr_msg [256];

	sprintf (syserr_msg, "%s: %s\n", msg, strerror (errno));
	error (syserr_msg, line);
}


/*---------------------------------------------------------------------+
|                               error ()                               |
| ==================================================================== |
|                                                                      |
| Function:  Prints out message and exits...                           |
|                                                                      |
+---------------------------------------------------------------------*/
void error (const char *msg, int line)
{
	fprintf (stderr, "ERROR [line: %d] %s\n", line, msg);
	exit (-1);
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-09 18:36 Hangs in 2.5.41-mm1 Paul Larson
@ 2002-10-09 20:17 ` Andrew Morton
  2002-10-09 20:29   ` Paul Larson
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-10-09 20:17 UTC (permalink / raw)
  To: Paul Larson; +Cc: linux-mm

Paul Larson wrote:
> 
> I'm able to generate a lot of hangs with 2.5.41-mm1.
> This is on a 8-way PIII-700, 16 GB ram (PAE enabled)
> 
> The first one, I got by running ltp for a while, then the attached test
> for a bit, then, at the suggestion of Bill Irwin to increase the amount
> of ram I could be using for huge pages:
> echo 768 > /proc/sys/vm/nr_hugepages

Paul, this is not very clear to me, sorry.

You don't state at which point it hung.  Could you please
carefully spell out the precise sequence of steps which led to
the hang?

> Doing that (and the corresponding echo 1610612736 >
> /proc/sys/kernel/shmmax) after a cold boot gave me no problems though.
> 
> I also got it to hang after runnging the attached test with -s
> 1610612736 and then running another one with no options.

With what settings in /proc, etc?

 
> There was no output on the serial console when it hung, and it was
> unresponsive to ping, vc switch, and sysrq.
> 
> The attached test is an ltp shmem test modified by Bill Irwin to support
> the shm huge pages in 2.5.41-mm1.  Compile it with --static.

OK, great.  I'll try to reproduce this but I would appreciate
some help in understanding what I need to do.  Usually it just
ends up with "it works for me" :(

There is a locks-up-for-ages bug in refill_inactive_zone() - could
be that.  Dunno.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-09 20:17 ` Andrew Morton
@ 2002-10-09 20:29   ` Paul Larson
  2002-10-09 21:00     ` William Lee Irwin III
  2002-10-09 21:32     ` Andrew Morton
  0 siblings, 2 replies; 12+ messages in thread
From: Paul Larson @ 2002-10-09 20:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

On Wed, 2002-10-09 at 15:17, Andrew Morton wrote:
> Paul Larson wrote:
> > echo 768 > /proc/sys/vm/nr_hugepages
> 
> Paul, this is not very clear to me, sorry.
Sorry about that, let me try to restate it better.  First let me add
though, these have been somewhat random and hard to reproduce the same
way every time, but if I run this test enough though, I eventually get
it to lock up cold.

Here are the situations where I saw it happen so far under 2.5.41-mm1:

Case 1:
from ltp, 'runalltests.sh -l /tmp/mm1.log |tee /tmp/mm1.out
shmt01 (attached test from before)
shmt01& (repeated 10 times)
echo 768 > /proc/sys/vm/nr_hugepages
*hang*

Case 2:
cold boot
echo 768 > /proc/sys/vm/nr_hugepages
echo 1610612736 > /proc/sys/kernel/shmmax
shmt01 -s 1610612736&
shmt01 (immediately after starting the previous command)
*hang*

> There is a locks-up-for-ages bug in refill_inactive_zone() - could
> be that.  Dunno.
I'm not aware of that one, do you know of a reliable way to reproduce that?

Thanks,
Paul Larson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-09 20:29   ` Paul Larson
@ 2002-10-09 21:00     ` William Lee Irwin III
  2002-10-09 21:17       ` Paul Larson
  2002-10-09 21:32     ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: William Lee Irwin III @ 2002-10-09 21:00 UTC (permalink / raw)
  To: Paul Larson; +Cc: Andrew Morton, linux-mm

On Wed, Oct 09, 2002 at 03:29:28PM -0500, Paul Larson wrote:
> Case 1:
> from ltp, 'runalltests.sh -l /tmp/mm1.log |tee /tmp/mm1.out
> shmt01 (attached test from before)
> shmt01& (repeated 10 times)
> echo 768 > /proc/sys/vm/nr_hugepages
> *hang*
> Case 2:
> cold boot
> echo 768 > /proc/sys/vm/nr_hugepages
> echo 1610612736 > /proc/sys/kernel/shmmax
> shmt01 -s 1610612736&
> shmt01 (immediately after starting the previous command)
> *hang*


You want to check that you still have free hugepages available. It's
passing IPC_CREAT to shmget() so it's trying to create at least double
the number of hugepages you have configured, or 10 times it in case 1.

Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-09 21:00     ` William Lee Irwin III
@ 2002-10-09 21:17       ` Paul Larson
  2002-10-09 21:29         ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Larson @ 2002-10-09 21:17 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, linux-mm

I got an oops out of it this time, after running it that test several
times, I retried case 2 and got this:

Unable to handle kernel paging request at virtual address 20b17050
 printing eip:
c0133a5b
*pde = 00000000
Oops: 0000

CPU:    3
EIP:    0060:[<c0133a5b>]    Not tainted
EFLAGS: 00010017
EIP is at cache_alloc_refill+0xbb/0x170
eax: 0000000c   ebx: f7ffba88   ecx: 20b17040   edx: 00000000
esi: 00000010   edi: cc16a800   ebp: f7ffba00   esp: f63a1ee4
ds: 0068   es: 0068   ss: 0068
Process crond (pid: 1239, threadinfo=f63a0000 task=f64d6100)
Stack: f7ffba90 00000282 f6b63720 0804f797 cc1dae00 c0133dab f7ffba00
000001d0
       00000001 00000000 00000001 c04a368c f7ffba00 c0158721 f7ffba00
000001d0
       cc1dae00 f6b63720 0804f797 f6baa2c0 c0158e6a cc1dae00 c014dfcf
cc1dae00
Call Trace:
 [<c0133dab>] kmem_cache_alloc+0x3b/0x50
 [<c0158721>] alloc_inode+0x31/0x170
 [<c0158e6a>] new_inode+0xa/0x60
 [<c014dfcf>] get_pipe_inode+0xf/0x90
 [<c014e082>] do_pipe+0x32/0x1e0
 [<c01240d9>] sys_rt_sigaction+0x69/0x90
 [<c010c9dd>] sys_pipe+0xd/0x40
 [<c0112d20>] do_page_fault+0x0/0x4a5
 [<c01071d3>] syscall_call+0x7/0xb

Code: 39 41 10 73 06 4e 83 fe ff 75 ba 8b 51 04 8b 01 89 50 04 89

Hopefully that will help a little.
Thanks,
Paul Larson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-09 21:17       ` Paul Larson
@ 2002-10-09 21:29         ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2002-10-09 21:29 UTC (permalink / raw)
  To: Paul Larson; +Cc: William Lee Irwin III, linux-mm

Paul Larson wrote:
> 
> I got an oops out of it this time, after running it that test several
> times, I retried case 2 and got this:
> 
> ...
> EIP is at cache_alloc_refill+0xbb/0x170

I seem to be giving this patch to everyone lately.  Hopefully
it will fix that.



--- 2.5.41/mm/slab.c~slab-split-10-list_for_each_fix	Tue Oct  8 15:40:52 2002
+++ 2.5.41-akpm/mm/slab.c	Tue Oct  8 15:40:52 2002
@@ -461,7 +461,7 @@ static kmem_cache_t cache_cache = {
 static struct semaphore	cache_chain_sem;
 static rwlock_t cache_chain_lock = RW_LOCK_UNLOCKED;
 
-#define cache_chain (cache_cache.next)
+struct list_head cache_chain;
 
 /*
  * chicken and egg problem: delay the per-cpu array allocation
@@ -617,6 +617,7 @@ void __init kmem_cache_init(void)
 
 	init_MUTEX(&cache_chain_sem);
 	INIT_LIST_HEAD(&cache_chain);
+	list_add(&cache_cache.next, &cache_chain);
 
 	cache_estimate(0, cache_cache.objsize, 0,
 			&left_over, &cache_cache.num);
@@ -2093,10 +2094,10 @@ static void *s_start(struct seq_file *m,
 	down(&cache_chain_sem);
 	if (!n)
 		return (void *)1;
-	p = &cache_cache.next;
+	p = cache_chain.next;
 	while (--n) {
 		p = p->next;
-		if (p == &cache_cache.next)
+		if (p == &cache_chain)
 			return NULL;
 	}
 	return list_entry(p, kmem_cache_t, next);
@@ -2107,9 +2108,9 @@ static void *s_next(struct seq_file *m, 
 	kmem_cache_t *cachep = p;
 	++*pos;
 	if (p == (void *)1)
-		return &cache_cache;
-	cachep = list_entry(cachep->next.next, kmem_cache_t, next);
-	return cachep == &cache_cache ? NULL : cachep;
+		return list_entry(cache_chain.next, kmem_cache_t, next);
+	return cachep->next.next == &cache_chain ? NULL
+		: list_entry(cachep->next.next, kmem_cache_t, next);
 }
 
 static void s_stop(struct seq_file *m, void *p)

.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-09 20:29   ` Paul Larson
  2002-10-09 21:00     ` William Lee Irwin III
@ 2002-10-09 21:32     ` Andrew Morton
  2002-10-10 15:45       ` Paul Larson
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-10-09 21:32 UTC (permalink / raw)
  To: Paul Larson; +Cc: linux-mm

Paul Larson wrote:
> 
> On Wed, 2002-10-09 at 15:17, Andrew Morton wrote:
> > Paul Larson wrote:
> > > echo 768 > /proc/sys/vm/nr_hugepages
> >
> > Paul, this is not very clear to me, sorry.
> Sorry about that, let me try to restate it better.  First let me add
> though, these have been somewhat random and hard to reproduce the same
> way every time, but if I run this test enough though, I eventually get
> it to lock up cold.
> 
> Here are the situations where I saw it happen so far under 2.5.41-mm1:
> 
> Case 1:
> from ltp, 'runalltests.sh -l /tmp/mm1.log |tee /tmp/mm1.out
> shmt01 (attached test from before)
> shmt01& (repeated 10 times)
> echo 768 > /proc/sys/vm/nr_hugepages
> *hang*
> 
> Case 2:
> cold boot
> echo 768 > /proc/sys/vm/nr_hugepages
> echo 1610612736 > /proc/sys/kernel/shmmax
> shmt01 -s 1610612736&
> shmt01 (immediately after starting the previous command)
> *hang*

OK, thanks.

> > There is a locks-up-for-ages bug in refill_inactive_zone() - could
> > be that.  Dunno.
> I'm not aware of that one, do you know of a reliable way to reproduce that?

You need to torture it.  It happens when there's a huge amount
of mapped memory in a zone and the `swappiness' knob is set low.
We end up doing a ton of scanning of the active list, but not
actually doing anything.  Fix is to only scan a little bit, then
fall back and scan the inactive list a bit, let the scanning
priority increase until it's high enough to trigger reclaim of
mapped memory.

-mm2 will cure all ills ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-09 21:32     ` Andrew Morton
@ 2002-10-10 15:45       ` Paul Larson
  2002-10-10 16:53         ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Larson @ 2002-10-10 15:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

On Wed, 2002-10-09 at 16:32, Andrew Morton wrote:
> -mm2 will cure all ills ;)

If only we could be so lucky! :)

Linux-2.5.41-mm2
# echo 768 > /proc/sys/vm/nr_hugepages
# echo 1610612736 > /proc/sys/kernel/shmmax
# ./shmt01
./shmt01: IPC Shared Memory TestSuite program

        Get shared memory segment (67108864 bytes)

        Attach shared memory segment to process

        Index through shared memory segment ...

        Release shared memory

successful!
# ./shmt01 -s 1610612736./shmt01: IPC Shared Memory TestSuite program

        Get shared memory segment (1610612736 bytes)

        Attach shared memory segment to process

        Index through shared memory segment ...

        Release shared memory

successful!
#
*HANG*

I went back and tried to reproduce it.  I got through the first run of
shmt01, then got half the command typed of the second run through it and
it hang.  So if anything, it would appear that mm2 is easier to hang
than mm1.

-Paul Larson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-10 15:45       ` Paul Larson
@ 2002-10-10 16:53         ` Andrew Morton
  2002-10-10 17:01           ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-10-10 16:53 UTC (permalink / raw)
  To: Paul Larson, Manfred Spraul; +Cc: linux-mm

Paul Larson wrote:
> 
> On Wed, 2002-10-09 at 16:32, Andrew Morton wrote:
> > -mm2 will cure all ills ;)
> 
> If only we could be so lucky! :)
> 
> Linux-2.5.41-mm2
> # echo 768 > /proc/sys/vm/nr_hugepages
> # echo 1610612736 > /proc/sys/kernel/shmmax
> # ./shmt01
> ./shmt01: IPC Shared Memory TestSuite program
> 
>         Get shared memory segment (67108864 bytes)
> 
>         Attach shared memory segment to process
> 
>         Index through shared memory segment ...
> 
>         Release shared memory
> 
> successful!
> # ./shmt01 -s 1610612736./shmt01: IPC Shared Memory TestSuite program
> 
>         Get shared memory segment (1610612736 bytes)
> 
>         Attach shared memory segment to process
> 
>         Index through shared memory segment ...
> 
>         Release shared memory
> 
> successful!
> #
> *HANG*
> 

This is easy to reproduce; thanks for that.

I took an NMI watchdog hit in the slab code.  It would appear
that the loop in cache_alloc_refill() has gone infinite.

I assume slabp->inuse is >= cachep->num, so we're never
decrementing batchcount and the loop does not terminate.



Program received signal SIGEMT, Emulation trap.
0xc01357c7 in cache_alloc_refill (cachep=0xf7ffc740, flags=464) at mm/slab.c:1580
1580                    if (entry == &l3->slabs_partial) {
(gdb) bt
#0  0xc01357c7 in cache_alloc_refill (cachep=0xf7ffc740, flags=464) at mm/slab.c:1580
#1  0xc0135b1a in kmem_cache_alloc (cachep=0xf7ffc740, flags=464) at mm/slab.c:1670
#2  0xc0159c72 in alloc_inode (sb=0xf7f8a400) at fs/inode.c:99
#3  0xc015a3c5 in new_inode (sb=0xf7f8a400) at fs/inode.c:505
#4  0xc014f7ae in get_pipe_inode () at fs/pipe.c:510
#5  0xc014f867 in do_pipe (fd=0xf6693fb4) at fs/pipe.c:559
#6  0xc010ce01 in sys_pipe (fildes=0xbffff83c) at arch/i386/kernel/sys_i386.c:35
#7  0xc01070f3 in syscall_call () at net/sunrpc/stats.c:204
#8  0x0805c426 in ?? () at net/sunrpc/stats.c:204
#9  0x400177c0 in ?? () at net/sunrpc/stats.c:204
#10 0x0000001c in af_unix_exit () at arch/i386/kernel/cpuid.c:168
Cannot access memory at address 0x1
(gdb) p batchcount
$1 = 6
(gdb) p slabp->inuse
No symbol "slabp" in current context.
(gdb) p cachep->num
$2 = 12
(gdb) p/x *slabp
No symbol "slabp" in current context.
(gdb) p/x *cachep
$3 = {cpudata = {0xc3fe8000, 0xc3fe8200, 0xc3fe8400, 0xc3fe8600}, batchcount = 0x3c, limit = 0x78, lists = {slabs_partial = {
      next = 0xf6ace060, prev = 0xf7ed5000}, slabs_full = {next = 0xc3e67080, prev = 0xf7ff5000}, slabs_free = {
      next = 0xf7ffc768, prev = 0xf7ffc768}, free_objects = 0x20, free_touched = 0x0, next_reap = 0x1f9e4}, objsize = 0x140, 
  flags = 0x2000, num = 0xc, free_limit = 0x138, spinlock = {lock = 0xfe}, gfporder = 0x0, gfpflags = 0x0, colour = 0x5, 
  colour_off = 0x20, colour_next = 0x1, slabp_cache = 0x0, dflags = 0x0, ctor = 0xc0159f3c, dtor = 0x0, name = 0xc0271c05, 
  next = {next = 0xf7ffc738, prev = 0xf7ffc838}}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-10 16:53         ` Andrew Morton
@ 2002-10-10 17:01           ` Andrew Morton
  2002-10-10 18:32             ` William Lee Irwin III
  2002-10-10 18:39             ` Manfred Spraul
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2002-10-10 17:01 UTC (permalink / raw)
  To: Paul Larson, Manfred Spraul, linux-mm

Andrew Morton wrote:
> 
> ...
> #0  0xc01357c7 in cache_alloc_refill (cachep=0xf7ffc740, flags=464) at mm/slab.c:1580
> #1  0xc0135b1a in kmem_cache_alloc (cachep=0xf7ffc740, flags=464) at mm/slab.c:1670
> #2  0xc0159c72 in alloc_inode (sb=0xf7f8a400) at fs/inode.c:99
> #3  0xc015a3c5 in new_inode (sb=0xf7f8a400) at fs/inode.c:505
> #4  0xc014f7ae in get_pipe_inode () at fs/pipe.c:510
> #5  0xc014f867 in do_pipe (fd=0xf6693fb4) at fs/pipe.c:559
> #6  0xc010ce01 in sys_pipe (fildes=0xbffff83c) at arch/i386/kernel/sys_i386.c:35
> #7  0xc01070f3 in syscall_call () at net/sunrpc/stats.c:204

Or it could be that the inode cache has been corrupted.
Bill, can you review the handling in there?  It'd be a
bit sad if one of the hugetlb privately-kmalloced inodes
were put back onto the inode_cachep slab somehow.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-10 17:01           ` Andrew Morton
@ 2002-10-10 18:32             ` William Lee Irwin III
  2002-10-10 18:39             ` Manfred Spraul
  1 sibling, 0 replies; 12+ messages in thread
From: William Lee Irwin III @ 2002-10-10 18:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paul Larson, Manfred Spraul, linux-mm

On Thu, Oct 10, 2002 at 10:01:43AM -0700, Andrew Morton wrote:
> Or it could be that the inode cache has been corrupted.
> Bill, can you review the handling in there?  It'd be a
> bit sad if one of the hugetlb privately-kmalloced inodes
> were put back onto the inode_cachep slab somehow.

ergh, the refcounting down there looks dangerous to say the least.

Fix ETA 2-4 hours depending on what else I need to do.


Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hangs in 2.5.41-mm1
  2002-10-10 17:01           ` Andrew Morton
  2002-10-10 18:32             ` William Lee Irwin III
@ 2002-10-10 18:39             ` Manfred Spraul
  1 sibling, 0 replies; 12+ messages in thread
From: Manfred Spraul @ 2002-10-10 18:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paul Larson, linux-mm

Andrew Morton wrote:
> Andrew Morton wrote:
> 
>>...
>>#0  0xc01357c7 in cache_alloc_refill (cachep=0xf7ffc740, flags=464) at mm/slab.c:1580
>>#1  0xc0135b1a in kmem_cache_alloc (cachep=0xf7ffc740, flags=464) at mm/slab.c:1670
>>#2  0xc0159c72 in alloc_inode (sb=0xf7f8a400) at fs/inode.c:99
>>#3  0xc015a3c5 in new_inode (sb=0xf7f8a400) at fs/inode.c:505
>>#4  0xc014f7ae in get_pipe_inode () at fs/pipe.c:510
>>#5  0xc014f867 in do_pipe (fd=0xf6693fb4) at fs/pipe.c:559
>>#6  0xc010ce01 in sys_pipe (fildes=0xbffff83c) at arch/i386/kernel/sys_i386.c:35
>>#7  0xc01070f3 in syscall_call () at net/sunrpc/stats.c:204
> 
> 
> Or it could be that the inode cache has been corrupted.
> Bill, can you review the handling in there?  It'd be a
> bit sad if one of the hugetlb privately-kmalloced inodes
> were put back onto the inode_cachep slab somehow.

Could you try to reproduce with slab debugging enabled? slab checks for 
foreign objects and BUG's.

--
	Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-10-10 18:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-09 18:36 Hangs in 2.5.41-mm1 Paul Larson
2002-10-09 20:17 ` Andrew Morton
2002-10-09 20:29   ` Paul Larson
2002-10-09 21:00     ` William Lee Irwin III
2002-10-09 21:17       ` Paul Larson
2002-10-09 21:29         ` Andrew Morton
2002-10-09 21:32     ` Andrew Morton
2002-10-10 15:45       ` Paul Larson
2002-10-10 16:53         ` Andrew Morton
2002-10-10 17:01           ` Andrew Morton
2002-10-10 18:32             ` William Lee Irwin III
2002-10-10 18:39             ` Manfred Spraul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox