linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* filemap.c SMP bug in 2.4.0-test*
@ 2000-08-15 22:10 Rik van Riel
  2000-08-16  2:43 ` Linus Torvalds
  2000-08-16  3:26 ` Linus Torvalds
  0 siblings, 2 replies; 6+ messages in thread
From: Rik van Riel @ 2000-08-15 22:10 UTC (permalink / raw)
  To: linux-mm
  Cc: Stephen C. Tweedie, Linus Torvalds, Andrea Arcangeli, Marcelo Tosatti

Hi,

it appears that a debugging check in my VM patch has uncovered
a bug in filemap.c (which could explain the "innd failing"
thread on linux-kernel).

The debugging check (in mm/swap.c::lru_cache_add(), line 232)
checks if the page which is to be added to the page lists is
already on one of the lists. In case it is, a nice backtrace
follows...

from mm/swap.c:
    227 void lru_cache_add(struct page * page)
    228 {
    229         spin_lock(&pagemap_lru_lock);
    230         if (!PageLocked(page))
    231                 BUG();
    232         DEBUG_ADD_PAGE
    233         add_page_to_active_list(page);

from include/mm/swap.h:
    199 #define DEBUG_ADD_PAGE \
    200         if (PageActive(page) || PageInactiveDirty(page) || \
    201                              PageInactiveClean(page)) BUG();

The backtrace I got points to some place deep inside
mm/filemap.c, in code I really didn't touch and I
wouldn't want to touch if this bug wasn't here ;)

>>EIP; c012e370 <lru_cache_add+5c/d4>   <=====
Trace; c021b33e <tvecs+1dde/19f60>
Trace; c021b579 <tvecs+2019/19f60>
Trace; c0127823 <add_to_page_cache_locked+cb/dc>
Trace; c0130a3c <add_to_swap_cache+84/8c>
Trace; c0130d00 <read_swap_cache_async+68/98>
Trace; c0125c8b <handle_mm_fault+143/1c0>
Trace; c0113d33 <do_page_fault+143/3f0>

I've had a few variants of this, but always the
add_to_page_cache* functions were involved...

BTW, in the normal source tree, this situation
could lead to corruption of the lru list. Maybe
this explains the innd problems, maybe not, but
I think we should at least add the debugging
code to vanilla 2.4 as well in order to catch
this bug.

On a related note, the new VM patch seems to be
well-behaved, solid and nicely performant now. ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filemap.c SMP bug in 2.4.0-test*
  2000-08-15 22:10 filemap.c SMP bug in 2.4.0-test* Rik van Riel
@ 2000-08-16  2:43 ` Linus Torvalds
  2000-08-16  2:46   ` Rik van Riel
  2000-08-16  3:26 ` Linus Torvalds
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2000-08-16  2:43 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, Stephen C. Tweedie, Andrea Arcangeli, Marcelo Tosatti


On Tue, 15 Aug 2000, Rik van Riel wrote:
> 
> The debugging check (in mm/swap.c::lru_cache_add(), line 232)
> checks if the page which is to be added to the page lists is
> already on one of the lists. In case it is, a nice backtrace
> follows...

Why do you think your "PageActive()"/"PageInactiveDirty()"/
"PageInactiveClean()" tests are right?

I don't see any reason to assume that you just don't clear the flags
correctly.

In fact, if this bug really existed in the standard kernel, you'd see
machines locking up left and right. Adding a page to a the LRU list  when
it already is on the LRU list would cause immediate and severe list
corruption. It wouldn't just go silently in the night, it would _scream_. 

I would suggest that you add something like DEBUG_ADD_PAGE to
__free_pages_ok(), and see if somebody frees the page without clearing the
flags. Sounds like a bug in your code.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filemap.c SMP bug in 2.4.0-test*
  2000-08-16  2:43 ` Linus Torvalds
@ 2000-08-16  2:46   ` Rik van Riel
  0 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2000-08-16  2:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, Stephen C. Tweedie, Andrea Arcangeli, Marcelo Tosatti

On Tue, 15 Aug 2000, Linus Torvalds wrote:
> On Tue, 15 Aug 2000, Rik van Riel wrote:
> > 
> > The debugging check (in mm/swap.c::lru_cache_add(), line 232)
> > checks if the page which is to be added to the page lists is
> > already on one of the lists. In case it is, a nice backtrace
> > follows...
> 
> Why do you think your "PageActive()"/"PageInactiveDirty()"/
> "PageInactiveClean()" tests are right?
> 
> I don't see any reason to assume that you just don't clear the flags
> correctly.
> 
> In fact, if this bug really existed in the standard kernel, you'd see
> machines locking up left and right. Adding a page to a the LRU list  when
> it already is on the LRU list would cause immediate and severe list
> corruption. It wouldn't just go silently in the night, it would _scream_. 
> 
> I would suggest that you add something like DEBUG_ADD_PAGE to
> __free_pages_ok(), and see if somebody frees the page without
> clearing the flags. Sounds like a bug in your code.

This test is in _all_ places where pages are added or
removed from the list and in __free_pages_ok().

The only place where I'm hitting the bug is in
lru_cache_add.

OTOH, I wouldn't mind it if you were to take a look
at my code and potted the bug ;))

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filemap.c SMP bug in 2.4.0-test*
  2000-08-15 22:10 filemap.c SMP bug in 2.4.0-test* Rik van Riel
  2000-08-16  2:43 ` Linus Torvalds
@ 2000-08-16  3:26 ` Linus Torvalds
  2000-08-16  3:40   ` Rik van Riel
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2000-08-16  3:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, Stephen C. Tweedie, Andrea Arcangeli, Marcelo Tosatti


On Tue, 15 Aug 2000, Rik van Riel wrote:
> 
> The backtrace I got points to some place deep inside
> mm/filemap.c, in code I really didn't touch and I
> wouldn't want to touch if this bug wasn't here ;)
> 
> >>EIP; c012e370 <lru_cache_add+5c/d4>   <=====
> Trace; c021b33e <tvecs+1dde/19f60>
> Trace; c021b579 <tvecs+2019/19f60>
> Trace; c0127823 <add_to_page_cache_locked+cb/dc>
> Trace; c0130a3c <add_to_swap_cache+84/8c>
> Trace; c0130d00 <read_swap_cache_async+68/98>
> Trace; c0125c8b <handle_mm_fault+143/1c0>
> Trace; c0113d33 <do_page_fault+143/3f0>

Look at this back-trace again.

In particular, look at which page read_swap_cache_async() adds to the swap
cache.

The code is:

        /*
         * Look for the page in the swap cache.
         */
        found_page = lookup_swap_cache(entry);
        if (found_page)
                goto out_free_swap;

*****   new_page_addr = __get_free_page(GFP_USER);		*******
        if (!new_page_addr)
                goto out_free_swap;     /* Out of memory */
        new_page = virt_to_page(new_page_addr);

        /*
         * Check the swap cache again, in case we stalled above.
         */
        found_page = lookup_swap_cache(entry);
        if (found_page)
                goto out_free_page;
        /*
         * Add it to the swap cache and read its contents.
         */
        lock_page(new_page);
        add_to_swap_cache(new_page, entry);
        rw_swap_page(READ, new_page, wait);
        return new_page;

In short, read_swap_cache_async() allocates a new page that nobody else
has access to. There's no way in hell that page is going to be on any LRU
lists. 

(The page allocation and type switching is silly - it should really do
"new_page = page_cache_alloc()" and not have that "new_page_addr" thing
at all, but that's a silly inefficiency, not a bug).

The bug pretty much has to be in the new page flag handling. No, I don't
see anything wrong in your patch, but we're talking about a code-path that
has it's own very private page that cannot be shared unless there are some
pretty major bugs (if __get_free_page() returns a page that is still in
use somewhere, we're _soo_ screwed).

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filemap.c SMP bug in 2.4.0-test*
  2000-08-16  3:26 ` Linus Torvalds
@ 2000-08-16  3:40   ` Rik van Riel
  2000-08-16  4:09     ` Rik van Riel
  0 siblings, 1 reply; 6+ messages in thread
From: Rik van Riel @ 2000-08-16  3:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, Stephen C. Tweedie, Andrea Arcangeli, Marcelo Tosatti

On Tue, 15 Aug 2000, Linus Torvalds wrote:
> On Tue, 15 Aug 2000, Rik van Riel wrote:
> > 
> > The backtrace I got points to some place deep inside
> > mm/filemap.c, in code I really didn't touch and I
> > wouldn't want to touch if this bug wasn't here ;)
> > 
> > >>EIP; c012e370 <lru_cache_add+5c/d4>   <=====
> > Trace; c021b33e <tvecs+1dde/19f60>
> > Trace; c021b579 <tvecs+2019/19f60>
> > Trace; c0127823 <add_to_page_cache_locked+cb/dc>
> > Trace; c0130a3c <add_to_swap_cache+84/8c>
> > Trace; c0130d00 <read_swap_cache_async+68/98>
> > Trace; c0125c8b <handle_mm_fault+143/1c0>
> > Trace; c0113d33 <do_page_fault+143/3f0>
> 
> Look at this back-trace again.
> 
> In particular, look at which page read_swap_cache_async() adds
> to the swap cache.

> *****   new_page_addr = __get_free_page(GFP_USER);		*******

> In short, read_swap_cache_async() allocates a new page that
> nobody else has access to. There's no way in hell that page is
> going to be on any LRU lists.

*nod*

> The bug pretty much has to be in the new page flag handling. No,
> I don't see anything wrong in your patch, but we're talking
> about a code-path that has it's own very private page that
> cannot be shared unless there are some pretty major bugs (if
> __get_free_page() returns a page that is still in use somewhere,
> we're _soo_ screwed).

I've seen the call trace with sys_write and sys_read
too, so I assume that it's indeed __alloc_pages() which
hands over a page with one of the flags still set.

Question is, how did that thing get on the free list
in the first place?  __free_pages_ok() checks for the
flags and reclaim_page() also checks for all of the
flags (inside the del_page_from_inactive_clean_list
macro)...

I've been looking at this code very closely over the
last week and fail to see any possibility for this
to happen, but where there's a bug there's a way ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filemap.c SMP bug in 2.4.0-test*
  2000-08-16  3:40   ` Rik van Riel
@ 2000-08-16  4:09     ` Rik van Riel
  0 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2000-08-16  4:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, Stephen C. Tweedie, Andrea Arcangeli, Marcelo Tosatti

On Wed, 16 Aug 2000, Rik van Riel wrote:
> On Tue, 15 Aug 2000, Linus Torvalds wrote:

> > In particular, look at which page read_swap_cache_async() adds
> > to the swap cache.
> 
> > *****   new_page_addr = __get_free_page(GFP_USER);		*******
> 
> > In short, read_swap_cache_async() allocates a new page that
> > nobody else has access to. There's no way in hell that page is
> > going to be on any LRU lists.

> Question is, how did that thing get on the free list
> in the first place?  __free_pages_ok() checks for the
> flags and reclaim_page() also checks for all of the
> flags

OK, I have a vague and highly improbable idea about
this (but no clue about some of the subsystems I'm
going to assume things about).

What if the page we barf on was part of a multi-page
contiguous allocation?

Suppose some subsystem (like nfs) allocates an 8kB
contiguous area, which gets filled with data and mmap()ed
by a user process.

At that moment, _both_ pages are put into the lru list and
flagged as such. Now if the "lower" of the two pages gets
released and the upper is still in the list, a hypothetical
buggy driver (maybe even nfs) would do a __free_pages_ok()
on the DOUBLE page, even though the "higher" page is still
in use (and has the bit set).

That way a page with one of the page list flags set could
slip by the check in __free_pages_ok. I know this is an
improbable theory, but it's the only way I can see which
would bypass the checks in __free_pages_ok (and the one
in reclaim_page)...

[yes, I know I must get some sleep and look at this
stuff when I'm awake ;)]

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2000-08-16  4:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-08-15 22:10 filemap.c SMP bug in 2.4.0-test* Rik van Riel
2000-08-16  2:43 ` Linus Torvalds
2000-08-16  2:46   ` Rik van Riel
2000-08-16  3:26 ` Linus Torvalds
2000-08-16  3:40   ` Rik van Riel
2000-08-16  4:09     ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox