[highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
@ 2000-10-02 19:35 Rik van Riel
  2000-10-02 19:56 ` Andrea Arcangeli
  0 siblings, 1 reply; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 19:35 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, linux-mm, Linus Torvalds, Stephen C. Tweedie

Hi,

as you can see below, the highmem bug was already there before
the new VM. However, it may be easier to trigger in the new VM
because we keep the buffer heads on active pages in memory...

(then again, we can't clear the buffer heads on dirty pages
anyway, so maybe the difference in how easy it is to trigger is
very small or nonexistant)

One possible explanation for the problem may be that we use
GFP_ATOMIC (and PF_MEMALLOC is set) in prepare_highmem_swapout().

That means we /could/ eat up the last free pages for creating
bounce buffers in low memory, after which we end up with a bunch
of unflushable, unfreeable pages in low memory (because we can't
allocate bufferheads or read indirect blocks from the swapfile).

Maybe we want to use GFP_SOFT (fail if we have less than pages_min
free pages in the low memory zone) for prepare_highmem_swapout(),
it appears that try_to_swap_out() and shm_swap_core() are already
quite capable of dealing with bounce buffer create failures.

I'd really like to see this bug properly fixed in 2.4...

regards,

Rik
---------- Forwarded message ----------
Date: Fri, 1 Sep 2000 09:27:58 -0700
From: Ying Chen/Almaden/IBM <ying@almaden.ibm.com>
To: Rik van Riel <riel@conectiva.com.br>
Subject: Re: [PATCH] Re: simple FS application that hangs 2.4-test5,
     mem mgmt problem or FS buffer cache mgmt problem?

Hi, Rik,

I while back I reported some problems with buffer cache and probably memory
mgmt subsystem when I ran high IOPS with SPEC SFS.
I haven't got a chance to go back to the problem and dig out where the
problem is yet.
I recently tried the same thing, i.e., running large IOPS SPEC SFS, against
the test6 up kernel. I had no problem if I don't turn HIGHMEM
support on in the kernel. As soon as I turned HIGHMEM support on (I have
2GB memory in my system), I ran into the same problem, i.e., I'd get "Out
of memory" sort of thing from various subsystems, like SCSI or IP, and
eventually my kernel hangs. I don't know if this rings some bell to you or
not. I'll try to locate the problem more accurately in the next few days.
If you get have any suggestions on how I might pursu this, let me know.
Thanks a lot!

Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 19:35 [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd) Rik van Riel
@ 2000-10-02 19:56 ` Andrea Arcangeli
  2000-10-02 19:59   ` Rik van Riel
  2000-10-02 20:06   ` Linus Torvalds
  0 siblings, 2 replies; 36+ messages in thread
From: Andrea Arcangeli @ 2000-10-02 19:56 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, linux-mm, Linus Torvalds, Stephen C. Tweedie

On Mon, Oct 02, 2000 at 04:35:43PM -0300, Rik van Riel wrote:
> because we keep the buffer heads on active pages in memory...

A page can be the most active and the VM and never need bh on it after the
first pagein. Keeping the bh on it means wasting tons of memory for no good
reason.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 19:56 ` Andrea Arcangeli
@ 2000-10-02 19:59   ` Rik van Riel
  2000-10-02 20:17     ` Andrea Arcangeli
  2000-10-02 21:16     ` Linus Torvalds
  2000-10-02 20:06   ` Linus Torvalds
  1 sibling, 2 replies; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 19:59 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, linux-mm, Linus Torvalds, Stephen C. Tweedie

On Mon, 2 Oct 2000, Andrea Arcangeli wrote:
> On Mon, Oct 02, 2000 at 04:35:43PM -0300, Rik van Riel wrote:
> > because we keep the buffer heads on active pages in memory...
> 
> A page can be the most active and the VM and never need bh on it
> after the first pagein. Keeping the bh on it means wasting tons
> of memory for no good reason.

Indeed. On the other hand, maybe we /will/ need the buffer
head again soon?

Linus, I remember you saying some time ago that you would
like to keep the buffer heads on a page around so we'd
have them at the point where we need to swap out again.

Is this still your position or should I make some code to
strip the buffer heads of clean, active pages ?

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 19:56 ` Andrea Arcangeli
  2000-10-02 19:59   ` Rik van Riel
@ 2000-10-02 20:06   ` Linus Torvalds
  2000-10-02 20:16     ` Rik van Riel
  2000-10-02 20:25     ` Ingo Molnar
  1 sibling, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 20:06 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Ingo Molnar, linux-mm, Stephen C. Tweedie


On Mon, 2 Oct 2000, Andrea Arcangeli wrote:
> On Mon, Oct 02, 2000 at 04:35:43PM -0300, Rik van Riel wrote:
> > because we keep the buffer heads on active pages in memory...
> 
> A page can be the most active and the VM and never need bh on it after the
> first pagein. Keeping the bh on it means wasting tons of memory for no good
> reason.

I agree. Most of the time, there's absolutely no point in keeping the
buffer heads around. Most pages (and _especially_ the actively mapped
ones) do not need the buffer heads at all after creation - once they are
uptodate they stay uptodate and we're only interested in the page, not the
buffers used to create it.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 20:06   ` Linus Torvalds
@ 2000-10-02 20:16     ` Rik van Riel
  2000-10-02 20:25     ` Ingo Molnar
  1 sibling, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 20:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Ingo Molnar, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:
> On Mon, 2 Oct 2000, Andrea Arcangeli wrote:
> > On Mon, Oct 02, 2000 at 04:35:43PM -0300, Rik van Riel wrote:
> > > because we keep the buffer heads on active pages in memory...
> > 
> > A page can be the most active and the VM and never need bh on it after the
> > first pagein. Keeping the bh on it means wasting tons of memory for no good
> > reason.
> 
> I agree. Most of the time, there's absolutely no point in
> keeping the buffer heads around. Most pages (and _especially_
> the actively mapped ones) do not need the buffer heads at all
> after creation - once they are uptodate they stay uptodate and
> we're only interested in the page, not the buffers used to
> create it.

I'll create a patch to do strip off the buffer heads from
clean active pages.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 19:59   ` Rik van Riel
@ 2000-10-02 20:17     ` Andrea Arcangeli
  2000-10-02 20:24       ` Rik van Riel
  2000-10-02 21:16     ` Linus Torvalds
  1 sibling, 1 reply; 36+ messages in thread
From: Andrea Arcangeli @ 2000-10-02 20:17 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, linux-mm, Linus Torvalds, Stephen C. Tweedie

On Mon, Oct 02, 2000 at 04:59:57PM -0300, Rik van Riel wrote:
> Linus, I remember you saying some time ago that you would
> like to keep the buffer heads on a page around so we'd
> have them at the point where we need to swap out again.

That's one of the basic differences between the 2.2.x and 2.4.x
page cache design. We don't reclaim the buffers at I/O completion
time anymore in 2.4.x but we reclaim them only later when we run
low on memory.

Forbidding the bh to be reclaimed when we run low on memory is a bug
and I don't think Linus ever suggested that.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 20:17     ` Andrea Arcangeli
@ 2000-10-02 20:24       ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 20:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, linux-mm, Linus Torvalds, Stephen C. Tweedie

On Mon, 2 Oct 2000, Andrea Arcangeli wrote:
> On Mon, Oct 02, 2000 at 04:59:57PM -0300, Rik van Riel wrote:
> > Linus, I remember you saying some time ago that you would
> > like to keep the buffer heads on a page around so we'd
> > have them at the point where we need to swap out again.
> 
> That's one of the basic differences between the 2.2.x and 2.4.x
> page cache design. We don't reclaim the buffers at I/O completion
> time anymore in 2.4.x but we reclaim them only later when we run
> low on memory.
> 
> Forbidding the bh to be reclaimed when we run low on memory is a
> bug and I don't think Linus ever suggested that.

*nod*

How about having the following code in refill_inactive_scan() ?

	if (page->buffers && page->mapping)
		try_to_free_buffers(page, 0);

(this will strip the buffer heads of any clean page cache
page ... we don't want to strip buffer head pages because
that would mean throwing away the data from that page)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 20:06   ` Linus Torvalds
  2000-10-02 20:16     ` Rik van Riel
@ 2000-10-02 20:25     ` Ingo Molnar
  2000-10-02 20:45       ` Rik van Riel
  2000-10-02 21:19       ` Linus Torvalds
  1 sibling, 2 replies; 36+ messages in thread
From: Ingo Molnar @ 2000-10-02 20:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Rik van Riel, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:

> I agree. Most of the time, there's absolutely no point in keeping the
> buffer heads around. Most pages (and _especially_ the actively mapped
> ones) do not need the buffer heads at all after creation - once they
> are uptodate they stay uptodate and we're only interested in the page,
> not the buffers used to create it.

except for writes, there we cache the block # in the bh and do not have to
call the lowlevel FS repeatedly to calculate the FS position of the page.
This also makes it possible to flush metadata blocks from RAM - otherwise
those metadata blocks would be accessed frequently. Especially in the case
of smaller files (smaller than 100k) there could be much more RAM
allocated to metadata than to the bhs. The write-mark-dirty shortcut also
makes a measurable difference in dbench-type write-intensive workloads. In
pure read-only workloads the bh overhead is definitely there. Maybe we
should separate bhs into 'physical block mapping' and 'IO context' parts?

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 20:25     ` Ingo Molnar
@ 2000-10-02 20:45       ` Rik van Riel
  2000-10-02 21:21         ` Linus Torvalds
  2000-10-02 21:19       ` Linus Torvalds
  1 sibling, 1 reply; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 20:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Ingo Molnar wrote:
> On Mon, 2 Oct 2000, Linus Torvalds wrote:
> 
> > I agree. Most of the time, there's absolutely no point in keeping the
> > buffer heads around. Most pages (and _especially_ the actively mapped
> > ones) do not need the buffer heads at all after creation - once they
> > are uptodate they stay uptodate and we're only interested in the page,
> > not the buffers used to create it.
> 
> except for writes, there we cache the block # in the bh and do
> not have to call the lowlevel FS repeatedly to calculate the FS
> position of the page.

Would it be "close enough" to simply clear the buffer heads of
clean pages which make it to the front of the active list ?

Or is there another optimisation we could do to make the
approximation even better ?

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 19:59   ` Rik van Riel
  2000-10-02 20:17     ` Andrea Arcangeli
@ 2000-10-02 21:16     ` Linus Torvalds
  1 sibling, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 21:16 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrea Arcangeli, Ingo Molnar, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:
>
> Linus, I remember you saying some time ago that you would
> like to keep the buffer heads on a page around so we'd
> have them at the point where we need to swap out again.

Only if it actually simplifies the VM and FS code noticeably.

Right now the VFS code already has all the complexity to handle
re-creating the buffer heads, so there's nothing to be gained from wasting
memory on them.

But we could make it an implementation decision to _always_ have the
buffer heads hanging around, and simplify (and possibly speed up) the code
by having that rule. It's not the case now, though.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 20:25     ` Ingo Molnar
  2000-10-02 20:45       ` Rik van Riel
@ 2000-10-02 21:19       ` Linus Torvalds
  2000-10-02 21:23         ` Rik van Riel
  2000-10-02 21:57         ` Ingo Molnar
  1 sibling, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 21:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrea Arcangeli, Rik van Riel, linux-mm, Stephen C. Tweedie


On Mon, 2 Oct 2000, Ingo Molnar wrote:
> 
> On Mon, 2 Oct 2000, Linus Torvalds wrote:
> 
> > I agree. Most of the time, there's absolutely no point in keeping the
> > buffer heads around. Most pages (and _especially_ the actively mapped
> > ones) do not need the buffer heads at all after creation - once they
> > are uptodate they stay uptodate and we're only interested in the page,
> > not the buffers used to create it.
> 
> except for writes, there we cache the block # in the bh and do not have to
> call the lowlevel FS repeatedly to calculate the FS position of the page.

Oh, I agree 100%.

Note that this is why I think we should just do it the way we used to
handle it: we keep the buffer heads around "indefinitely" (because we
_may_ need them - we don't know a priori one way or the other), but
because they _do_ potentially use up a lot of memory we do free them in
the normal aging process when we're low on memory.

So if we have "lots" of memory, we basically optimize for speed (leave the
cached mapping around), while if we get low on memory we automatically
optimize for space (get rid of bh's when we don't know that we'll need
them).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 20:45       ` Rik van Riel
@ 2000-10-02 21:21         ` Linus Torvalds
  2000-10-02 21:27           ` Rik van Riel
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 21:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:
> 
> Would it be "close enough" to simply clear the buffer heads of
> clean pages which make it to the front of the active list ?
> 
> Or is there another optimisation we could do to make the
> approximation even better ?

I'd prefer it to be done as part of the LRU aging - we do watn to age all
pages, and as part of the aging process we migth as well remove buffers
that are lying around.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:19       ` Linus Torvalds
@ 2000-10-02 21:23         ` Rik van Riel
  2000-10-02 21:31           ` Linus Torvalds
  2000-10-02 21:57         ` Ingo Molnar
  1 sibling, 1 reply; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 21:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:

> So if we have "lots" of memory, we basically optimize for speed
> (leave the cached mapping around), while if we get low on memory
> we automatically optimize for space (get rid of bh's when we
> don't know that we'll need them).

OK, so we want something like the following in
refill_inactive_scan() ?

if (free_shortage() && inactive_shortage() && page->mapping &&
			page->buffers)
	try_to_free_buffers(page, 0);

This would keep the buffer heads around in the background page
scans too and only free them when we really need to.

(but still, I'm not sure if this is agressive enough or not
quite agressive enough)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:21         ` Linus Torvalds
@ 2000-10-02 21:27           ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 21:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:
> On Mon, 2 Oct 2000, Rik van Riel wrote:
> > 
> > Would it be "close enough" to simply clear the buffer heads of
> > clean pages which make it to the front of the active list ?
> > 
> > Or is there another optimisation we could do to make the
> > approximation even better ?
> 
> I'd prefer it to be done as part of the LRU aging - we do watn
> to age all pages, and as part of the aging process we migth as
> well remove buffers that are lying around.

This was what I was proposing ;)

With, maybe, the optimisation that we don't want to do this
if we're simply doing background scanning and we don't have a
free memory shortage yet.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:23         ` Rik van Riel
@ 2000-10-02 21:31           ` Linus Torvalds
  2000-10-02 21:42             ` Rik van Riel
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 21:31 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:
> 
> OK, so we want something like the following in
> refill_inactive_scan() ?
> 
> if (free_shortage() && inactive_shortage() && page->mapping &&
> 			page->buffers)
> 	try_to_free_buffers(page, 0);

That's just nasty.

Why not just do it unconditionally whenever we do the
age_page_down_ageonly(page) too? Simply something like

	if (page->buffers)
		try_to_free_buffers(page, 1);

(and yes, I think it should also start background writing - we probably
need the gfp_mask to know whether we can do that).

I hate code that tries to be clever. 

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:31           ` Linus Torvalds
@ 2000-10-02 21:42             ` Rik van Riel
  2000-10-02 21:58               ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 21:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:
> On Mon, 2 Oct 2000, Rik van Riel wrote:
> > 
> > OK, so we want something like the following in
> > refill_inactive_scan() ?
> > 
> > if (free_shortage() && inactive_shortage() && page->mapping &&
> > 			page->buffers)
> > 	try_to_free_buffers(page, 0);
> 
> That's just nasty.
> 
> Why not just do it unconditionally whenever we do the
> age_page_down_ageonly(page) too? Simply something like
> 
> 	if (page->buffers)
> 		try_to_free_buffers(page, 1);

You will want to add page->mapping too, so we won't be kicking
buffermem data out of memory when we don't need to.

Also, you really want to free the bufferheads on the pages that
are in heavy use (say glibc shared ages) too...

> (and yes, I think it should also start background writing - we
> probably need the gfp_mask to know whether we can do that).

Background writing is done by kupdate / kflushd.

> I hate code that tries to be clever. 

*nod*

You're right that my last idea was too complicated ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:57         ` Ingo Molnar
@ 2000-10-02 21:52           ` Rik van Riel
  2000-10-02 22:53             ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 21:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrea Arcangeli, MM mailing list, Stephen C. Tweedie

On Mon, 2 Oct 2000, Ingo Molnar wrote:

> yep, this would be nice, but i think it will be quite tough to
> balance this properly. There are two kinds of bhs in this aging
> scheme: 'normal' bhs (metadata), and 'virtual' bhs (aliased to a
> page). Freeing a 'normal' bh will get rid of the bh, and will
> (statistically) free the data buffer behind. A 'virtual' bh on
> the other hand has only sizeof(*bh) bytes worth of RAM
> footprint.

This is easy. Normal page aging will take care of the buffermem
pages. Freeing the buffer heads on pagecache pages is the only
thing we need to do in refill_inactive_scan.

> another thing is the complexity of marking a page dirty - right
> now we can assume that page->buffers holds all the blocks. With
> aging we must check wether a bh is there or not,

The code must already be able to handle this. This is nothing new.

> Plus some sort of locking has to be added as well - right now we
> dont have to care about anyone else accessing page->buffers if
> the PG_lock held - with an aging mechanizm this could get
> tougher.

OK, so we'll have:

	if (page->buffers && page->mapping && !TryLockPage(page)) {
		try_to_free_buffers(page);
		UnlockPage(page);
	}

> > So if we have "lots" of memory, we basically optimize for speed (leave
> > the cached mapping around), while if we get low on memory we
> > automatically optimize for space (get rid of bh's when we don't know
> > that we'll need them).
> 
> i'd love to have all the cached objects within the system on a
> global, size-neutral LRU list. (or at least attach a
> last-accessed timestamp to them.) This way we could synchronize
> the pagecache, inode/dentry and buffer-cache LRU lists.

s/LRU/page aging/   ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:19       ` Linus Torvalds
  2000-10-02 21:23         ` Rik van Riel
@ 2000-10-02 21:57         ` Ingo Molnar
  2000-10-02 21:52           ` Rik van Riel
  1 sibling, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2000-10-02 21:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Rik van Riel, MM mailing list, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:

> > except for writes, there we cache the block # in the bh and do not have to
> > call the lowlevel FS repeatedly to calculate the FS position of the page.
> 
> Oh, I agree 100%.
> 
> Note that this is why I think we should just do it the way we used to
> handle it: we keep the buffer heads around "indefinitely" (because we
> _may_ need them - we don't know a priori one way or the other), but
> because they _do_ potentially use up a lot of memory we do free them in
> the normal aging process when we're low on memory.

yep, this would be nice, but i think it will be quite tough to balance
this properly. There are two kinds of bhs in this aging scheme: 'normal'
bhs (metadata), and 'virtual' bhs (aliased to a page). Freeing a 'normal'
bh will get rid of the bh, and will (statistically) free the data buffer
behind. A 'virtual' bh on the other hand has only sizeof(*bh) bytes worth
of RAM footprint.

another thing is the complexity of marking a page dirty - right now we can
assume that page->buffers holds all the blocks. With aging we must check
wether a bh is there or not, which further complicates the block_*()
functions in buffer.c. Plus some sort of locking has to be added as well -
right now we dont have to care about anyone else accessing page->buffers
if the PG_lock held - with an aging mechanizm this could get tougher.
(unless the buffer-cache aging mechanizm 'knows' about pages and locks
them - this is what my former hash-all-buffers scheme did :-)

but i agree, currently even in the 4k filesystem case the per-page bh
causes +2.0% data-cache RAM footprint. (struct page accounts for ~1.7%)

> So if we have "lots" of memory, we basically optimize for speed (leave
> the cached mapping around), while if we get low on memory we
> automatically optimize for space (get rid of bh's when we don't know
> that we'll need them).

i'd love to have all the cached objects within the system on a global,
size-neutral LRU list. (or at least attach a last-accessed timestamp to
them.) This way we could synchronize the pagecache, inode/dentry and
buffer-cache LRU lists.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:42             ` Rik van Riel
@ 2000-10-02 21:58               ` Linus Torvalds
  2000-10-02 22:08                 ` Rik van Riel
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 21:58 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:
> 
> You will want to add page->mapping too, so we won't be kicking
> buffermem data out of memory when we don't need to.

Fair enough.

> Also, you really want to free the bufferheads on the pages that
> are in heavy use (say glibc shared ages) too...

I don't think they matter that much, but yes, we could just do this all
outside the whole test.

> > (and yes, I think it should also start background writing - we
> > probably need the gfp_mask to know whether we can do that).
> 
> Background writing is done by kupdate / kflushd.

.. but there's no reason why we should not do it here.

I'd MUCH rather work towards a setup where kupdate/kflushd goes away
completely, and the work is done as a natural end result of just aging the
pages.

If you look at what kflushd does, you'll notice that it already has a lot
of incestuous relationships with the VM layer. And the VM layer has a lot
of the same with bdflush. That is what I call UGLY and bad design.

Now, look at what bdflush actually _does_. Think about it.

Yeah, it's really aging the pages and writing them out in the background.

In short, it's something that kswapd might as well do.

In fact, if you look at how the VM layer tries to wake up bdflush, you'll
notice that the VM layer really wants to say "please flush more pages
because I'm low on memory". Which is really another way of saying that
kswapd should run more. 

It all ties together, and we should make that explicit, instead of having
the current incestuous relationships and saying "oh, the VM layer
shouldn't write out dirty pages, because that's the job of kflushd (but
the VM layer can wake it up because it knows that kflushd needs to be
run)".

Note that the current "flush_dirty_buffers()" should just go away. It has
no advantages compared to having "try_to_free_buffers(x,1)" on the
properly aged LRU queue..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:58               ` Linus Torvalds
@ 2000-10-02 22:08                 ` Rik van Riel
  2000-10-02 22:18                   ` Andrea Arcangeli
  2000-10-02 22:53                   ` Linus Torvalds
  0 siblings, 2 replies; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 22:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:
> On Mon, 2 Oct 2000, Rik van Riel wrote:
> > 
> > You will want to add page->mapping too, so we won't be kicking
> > buffermem data out of memory when we don't need to.
> 
> Fair enough.
> 
> > Also, you really want to free the bufferheads on the pages that
> > are in heavy use (say glibc shared ages) too...
> 
> I don't think they matter that much, but yes, we could just do this all
> outside the whole test.
> 
> > > (and yes, I think it should also start background writing - we
> > > probably need the gfp_mask to know whether we can do that).
> > 
> > Background writing is done by kupdate / kflushd.
> 
> .. but there's no reason why we should not do it here.
> 
> I'd MUCH rather work towards a setup where kupdate/kflushd goes
> away completely, and the work is done as a natural end result of
> just aging the pages.

I agree. But I don't know how much of this can be a 2.4 thing,
considering the __GFP_IO related locking issues ;(

> In fact, if you look at how the VM layer tries to wake up
> bdflush, you'll notice that the VM layer really wants to say
> "please flush more pages because I'm low on memory". Which is
> really another way of saying that kswapd should run more.

*nod*

> Note that the current "flush_dirty_buffers()" should just go
> away. It has no advantages compared to having
> "try_to_free_buffers(x,1)" on the properly aged LRU queue..

Yes it has. The write order in flush_dirty_buffers() is the order
in which the pages were written. This may be different from the
LRU order and could give us slightly better IO performance.

OTOH, having proper write clustering code to do everything from
the LRU queue will be much much better, but that's probably a
2.5 issue ...

Furthermore, we'll need to preserve the data writeback list,
since you really want to write back old data to disk some
time. However, we will want to get rid of flush_dirty_buffers()
for this purpose since it is mostly unsuitable for filesystems
that don't use buffer heads (yet), like XFS with del-alloc,
filesystems with write ordering constraints or network FSes..

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 22:08                 ` Rik van Riel
@ 2000-10-02 22:18                   ` Andrea Arcangeli
  2000-10-02 22:23                     ` Rik van Riel
  2000-10-02 23:06                     ` Linus Torvalds
  2000-10-02 22:53                   ` Linus Torvalds
  1 sibling, 2 replies; 36+ messages in thread
From: Andrea Arcangeli @ 2000-10-02 22:18 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds, Ingo Molnar, linux-mm, Stephen C. Tweedie

On Mon, Oct 02, 2000 at 07:08:20PM -0300, Rik van Riel wrote:
> Yes it has. The write order in flush_dirty_buffers() is the order
> in which the pages were written. This may be different from the
> LRU order and could give us slightly better IO performance.

And it will forbid us to use barriers in software elevator and in SCSI hardware
to avoid having to wait I/O completation every time a journaling fs needs to do
ordered writes. The write ordering must remain irrelevant to the page-LRU
order.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 22:18                   ` Andrea Arcangeli
@ 2000-10-02 22:23                     ` Rik van Riel
  2000-10-02 23:06                     ` Linus Torvalds
  1 sibling, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 22:23 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Ingo Molnar, linux-mm, Stephen C. Tweedie

On Tue, 3 Oct 2000, Andrea Arcangeli wrote:
> On Mon, Oct 02, 2000 at 07:08:20PM -0300, Rik van Riel wrote:
> > Yes it has. The write order in flush_dirty_buffers() is the order
> > in which the pages were written. This may be different from the
> > LRU order and could give us slightly better IO performance.
> 
> And it will forbid us to use barriers in software elevator and
> in SCSI hardware to avoid having to wait I/O completation every
> time a journaling fs needs to do ordered writes. The write
> ordering must remain irrelevant to the page-LRU order.

The solution to that is the page->mapping->flush() callback.

The VM doesn't write out any page themselves without going
through that (filesystem specific) function, where the
filesystem can do the following things:

1) do IO optimisations (IO clustering, delayed allocation)
2) check write ordering constraints
3) write out something else instead if write ordering means
   we can't flush this page yet

Note that the VM doesn't /really/ care if the page selected
doesn't become freeable immediately. There are always more
inactive pages around...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 21:52           ` Rik van Riel
@ 2000-10-02 22:53             ` Ingo Molnar
  2000-10-02 23:01               ` Rik van Riel
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2000-10-02 22:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Andrea Arcangeli, MM mailing list, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:

> > yep, this would be nice, but i think it will be quite tough to
> > balance this properly. There are two kinds of bhs in this aging
> > scheme: 'normal' bhs (metadata), and 'virtual' bhs (aliased to a
> > page). Freeing a 'normal' bh will get rid of the bh, and will

> This is easy. Normal page aging will take care of the buffermem pages.
> Freeing the buffer heads on pagecache pages is the only thing we need
> to do in refill_inactive_scan.

to do some sort of aging is of course easy. But to treat a 4kbyte
'metadata bh' the same way as a 80 bytes worth 'cached mapping bh' is IMO
a stretch. This is what i ment by 'tough to balance properly'.

> > another thing is the complexity of marking a page dirty - right
> > now we can assume that page->buffers holds all the blocks. With
> > aging we must check wether a bh is there or not,
> 
> The code must already be able to handle this. This is nothing new.

sure this is new. The page->buffers list right now is assumed to stay
constant after being created.

> > i'd love to have all the cached objects within the system on a
> > global, size-neutral LRU list. (or at least attach a
> > last-accessed timestamp to them.) This way we could synchronize
> > the pagecache, inode/dentry and buffer-cache LRU lists.
> 
> s/LRU/page aging/   ;)

no - how does this handle the inode/dentry cache? Making everything a page
is a mistake.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 22:08                 ` Rik van Riel
  2000-10-02 22:18                   ` Andrea Arcangeli
@ 2000-10-02 22:53                   ` Linus Torvalds
  2000-10-02 23:06                     ` Rik van Riel
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 22:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:
> 
> Yes it has. The write order in flush_dirty_buffers() is the order
> in which the pages were written. This may be different from the
> LRU order and could give us slightly better IO performance.

.. or it might not.

Basically, the LRU order will be the same, EXCEPT if you have people
re-writing.

And if you have re-writing going on, you can't really say which order is
better.

I agree that flush_dirty_buffers() is _different_ from using the LRU pages
and try_to_free_buffer(). I don't think either one is obviously "better" -
I suspect you can find cases both ways.

What I do know is that we do need the try_to_free_buffer() approach anyway
from a VM standpoint, so I know that in that sense try_to_free_buffer() is
much superior in that it can do everything we want, and
flush_dirty_buffers() really doesn't cut it in that way.

Note that from a VM standpoint, there are real disadvantages from using
the flush_dirty_buffers() stuff - we may end up doing IO that we should
never have done at all, because flush_dirty_buffers() can write out stuff
that isn't needed from a VM standpoint.

> Furthermore, we'll need to preserve the data writeback list,
> since you really want to write back old data to disk some
> time.

Aging will certainly take care of that. As long as you do the writeback
_before_ you age it.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 22:53             ` Ingo Molnar
@ 2000-10-02 23:01               ` Rik van Riel
  2000-10-02 23:10                 ` Andrea Arcangeli
  2000-10-02 23:29                 ` Ingo Molnar
  0 siblings, 2 replies; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 23:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrea Arcangeli, MM mailing list, Stephen C. Tweedie

On Tue, 3 Oct 2000, Ingo Molnar wrote:
> On Mon, 2 Oct 2000, Rik van Riel wrote:
> 
> > > yep, this would be nice, but i think it will be quite tough to
> > > balance this properly. There are two kinds of bhs in this aging
> > > scheme: 'normal' bhs (metadata), and 'virtual' bhs (aliased to a
> > > page). Freeing a 'normal' bh will get rid of the bh, and will
> 
> > This is easy. Normal page aging will take care of the buffermem pages.
> > Freeing the buffer heads on pagecache pages is the only thing we need
> > to do in refill_inactive_scan.
> 
> to do some sort of aging is of course easy. But to treat a
> 4kbyte 'metadata bh' the same way as a 80 bytes worth 'cached
> mapping bh' is IMO a stretch. This is what i ment by 'tough to
> balance properly'.

To do that, you'd need to keep track of whether the buffer
heads (and icache/dcache entries, etc) have been accessed.
In essence an emulated accessed bit on these structures.

(and suddenly balancing is easy ... but is it worth the cost
for these small objects?)

> > > another thing is the complexity of marking a page dirty - right
> > > now we can assume that page->buffers holds all the blocks. With
> > > aging we must check wether a bh is there or not,
> > 
> > The code must already be able to handle this. This is nothing new.
> 
> sure this is new. The page->buffers list right now is assumed to
> stay constant after being created.

Eeeeeek. So pages /cannot/ lose their buffer heads ???

(I guess that explains why my buffer-head stealing code
is making trouble for the system ;))

> > > i'd love to have all the cached objects within the system on a
> > > global, size-neutral LRU list. (or at least attach a
> > > last-accessed timestamp to them.) This way we could synchronize
> > > the pagecache, inode/dentry and buffer-cache LRU lists.
> > 
> > s/LRU/page aging/   ;)
> 
> no - how does this handle the inode/dentry cache? Making
> everything a page is a mistake.

Indeed, you're right here... I'll have to think about this a
bit more (but we have the time for 2.5).

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 22:18                   ` Andrea Arcangeli
  2000-10-02 22:23                     ` Rik van Riel
@ 2000-10-02 23:06                     ` Linus Torvalds
  2000-10-02 23:12                       ` Rik van Riel
  2000-10-02 23:20                       ` Andrea Arcangeli
  1 sibling, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 23:06 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Ingo Molnar, linux-mm, Stephen C. Tweedie

On Tue, 3 Oct 2000, Andrea Arcangeli wrote:

> On Mon, Oct 02, 2000 at 07:08:20PM -0300, Rik van Riel wrote:
> > Yes it has. The write order in flush_dirty_buffers() is the order
> > in which the pages were written. This may be different from the
> > LRU order and could give us slightly better IO performance.
> 
> And it will forbid us to use barriers in software elevator and in SCSI hardware
> to avoid having to wait I/O completation every time a journaling fs needs to do
> ordered writes. The write ordering must remain irrelevant to the page-LRU
> order.

Note that ordered writes are going to change how we do things _anyway_,
regardless of whether we have flush_dirty_buffers() or use the LRU list.

So that's a non-argument: neither of the two routines can handle ordered
writes at this point.

You could argue that the simple single ordered queue that is currently in
use by flush_dirty_buffers() might be easier to adopt to ordering. 

I can tell you already that you'd be wrong to argue that. Exactly because
of the fact that we _need_ the page-oriented flushing regardless of what
we do. So we need to solve the page case anyway. Which means that it will
obviously be easiest to solve just _one_ problem (the page case) than to
solve two problems (the page case _and_ the flush_dirty_buffers() case).

Basically the ordered write case will need extra logic, and we might as
well put the effort in just one place anyway. Note that the page case
isn't necessarily any harder in the end - the simple solution might be
something like just adding a generation count to the buffer head, and
having try_to_free_buffers() just refuse to write stuff out before that
generation has come to pass.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 22:53                   ` Linus Torvalds
@ 2000-10-02 23:06                     ` Rik van Riel
  2000-10-02 23:14                       ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 23:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:
> On Mon, 2 Oct 2000, Rik van Riel wrote:
> > 
> > Yes it has. The write order in flush_dirty_buffers() is the order
> > in which the pages were written. This may be different from the
> > LRU order and could give us slightly better IO performance.
> 
> .. or it might not.
> 
> Basically, the LRU order will be the same, EXCEPT if you have
> people re-writing.
> 
> And if you have re-writing going on, you can't really say which
> order is better.

Agreed.

> > Furthermore, we'll need to preserve the data writeback list,
> > since you really want to write back old data to disk some
> > time.
> 
> Aging will certainly take care of that. As long as you do the
> writeback _before_ you age it.

Ummm. Even if you don't have any memory pressure, you'll
still want old data to be written to disk. Currently all
data which is written is committed to disk after 5 seconds
by default.

I wouldn't want to lose this piece of functionality ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:01               ` Rik van Riel
@ 2000-10-02 23:10                 ` Andrea Arcangeli
  2000-10-02 23:29                 ` Ingo Molnar
  1 sibling, 0 replies; 36+ messages in thread
From: Andrea Arcangeli @ 2000-10-02 23:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ingo Molnar, Linus Torvalds, MM mailing list, Stephen C. Tweedie

On Mon, Oct 02, 2000 at 08:01:42PM -0300, Rik van Riel wrote:
> Eeeeeek. So pages /cannot/ lose their buffer heads ???

Page cache can definitely lose its page->buffers. page->buffers is protected by
the per-page lock. The test8 locking is completly correct.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:06                     ` Linus Torvalds
@ 2000-10-02 23:12                       ` Rik van Riel
  2000-10-02 23:16                         ` Linus Torvalds
  2000-10-02 23:20                       ` Andrea Arcangeli
  1 sibling, 1 reply; 36+ messages in thread
From: Rik van Riel @ 2000-10-02 23:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Ingo Molnar, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Linus Torvalds wrote:

> Basically the ordered write case will need extra logic, and we
> might as well put the effort in just one place anyway. Note that
> the page case isn't necessarily any harder in the end - the
> simple solution might be something like just adding a generation
> count to the buffer head, and having try_to_free_buffers() just
> refuse to write stuff out before that generation has come to
> pass.

That is another one of the very wrong (im)possibilities ;)

The VM is doing page aging and should, for page replacement
efficiency, only write out OLD pages. This can conflict with
the write ordering constraints in such a way that the system
will never get around to flushing out the only writable page
we have at that moment -> livelock.

Also, you cannot do try_to_free_buffers() on delayed allocation
pages, simply because these pages haven't been allocated yet
and just don't have any buffer heads attached ...

The idea Stephen and I have to solve this problem is to have
a callback into the filesystem [page->mapping->flush(page)],
so the filesystem can take care of filesystem-specific issues
and the VM subsystem takes care of VM-specific issues.

Without the need for any of the two to know much about each other.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:06                     ` Rik van Riel
@ 2000-10-02 23:14                       ` Linus Torvalds
  0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 23:14 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, Andrea Arcangeli, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:
> > 
> > Aging will certainly take care of that. As long as you do the
> > writeback _before_ you age it.
> 
> Ummm. Even if you don't have any memory pressure, you'll
> still want old data to be written to disk. Currently all
> data which is written is committed to disk after 5 seconds
> by default.

Oh, no arguments there. But the point is that we have to do that currently
_anyway_ in the current code - we just move _that_ logic to the page aging
code instead.

I'm not really suggesting getting rid of kflushd. I'm more suggesting
thinking of it as a VM process rather than a fs/buffer.c process.

Right now kflushd is pretty tied to the notion of buffers, and doesn't
know what to do with pending NFS writebacks, for example. So NFS has to
have its own timeouts etc.

If you think of it as a VM issue, kflushd quite naturally does the
page->ops->flush() thing instead, and is more than it is today.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:12                       ` Rik van Riel
@ 2000-10-02 23:16                         ` Linus Torvalds
  0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 23:16 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrea Arcangeli, Ingo Molnar, linux-mm, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:
> 
> The VM is doing page aging and should, for page replacement
> efficiency, only write out OLD pages. This can conflict with
> the write ordering constraints in such a way that the system
> will never get around to flushing out the only writable page
> we have at that moment -> livelock.

Yeah. In which case the VM layer is _buggy_.

Think about it.

The easy solution is to say that if we tried to write out a page where the
buffers were of a generation that is in the future, we should just move
that page to the head of the LRU queue, and go on with the next one. It
is, after all, "busy".

So you end up getting to the pages that _can_ be written out eventually.
End of story.

If you think that LRU and write ordering constraints cannot live together,
then you're being inflexible.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:06                     ` Linus Torvalds
  2000-10-02 23:12                       ` Rik van Riel
@ 2000-10-02 23:20                       ` Andrea Arcangeli
  1 sibling, 0 replies; 36+ messages in thread
From: Andrea Arcangeli @ 2000-10-02 23:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, Ingo Molnar, linux-mm, Stephen C. Tweedie

On Mon, Oct 02, 2000 at 04:06:25PM -0700, Linus Torvalds wrote:
> So that's a non-argument: neither of the two routines can handle ordered
> writes at this point.

Correct.

> You could argue that the simple single ordered queue that is currently in
> use by flush_dirty_buffers() might be easier to adopt to ordering. 

Right.

> I can tell you already that you'd be wrong to argue that. Exactly because
> of the fact that we _need_ the page-oriented flushing regardless of what
> we do. So we need to solve the page case anyway. Which means that it will

page oriented flushing isn't my point (that happens when we start to have
pressure, I wasn't talking about low on memory scenario). My point is that the
fs can do:

	write to the log and mark it dirty and queue it into the FIFO lru
	queue the barrier into the LRU
	write to the page and mark it dirty and queue it into the same FIFO lru

Now the fs can forget about that and after 30 second kupdate will do both I/O
in one single scsi command doing the I/O in order by respecting the software
and hardware I/O barrier. That would speed up things.

> isn't necessarily any harder in the end - the simple solution might be
> something like just adding a generation count to the buffer head, and
> having try_to_free_buffers() just refuse to write stuff out before that
> generation has come to pass.

This looks worthwhile idea to be able to do the sync_page_buffers thing even
while handling ordered writes.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:29                 ` Ingo Molnar
@ 2000-10-02 23:25                   ` Andrea Arcangeli
  2000-10-02 23:32                     ` Linus Torvalds
  2000-10-03 12:05                     ` Ingo Molnar
  0 siblings, 2 replies; 36+ messages in thread
From: Andrea Arcangeli @ 2000-10-02 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rik van Riel, Linus Torvalds, MM mailing list, Stephen C. Tweedie

On Tue, Oct 03, 2000 at 01:29:27AM +0200, Ingo Molnar wrote:
> it can and does lose them - but only all of them. Aging OTOH is a per-bh
> thing, this kind of granularity is simply not present in the current
> page->buffers handling. This is all i wanted to mention. Not unsolvable,

I'm pretty sure it doesn't worth the per-bh thing. And even if it would make
any difference with a 1k fs for good performance 4k blksize is necessary anyway
for other reasons.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:01               ` Rik van Riel
  2000-10-02 23:10                 ` Andrea Arcangeli
@ 2000-10-02 23:29                 ` Ingo Molnar
  2000-10-02 23:25                   ` Andrea Arcangeli
  1 sibling, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2000-10-02 23:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Andrea Arcangeli, MM mailing list, Stephen C. Tweedie

On Mon, 2 Oct 2000, Rik van Riel wrote:

> > > > another thing is the complexity of marking a page dirty - right
> > > > now we can assume that page->buffers holds all the blocks. With
> > > > aging we must check wether a bh is there or not,
> > > 
> > > The code must already be able to handle this. This is nothing new.
> > 
> > sure this is new. The page->buffers list right now is assumed to
> > stay constant after being created.
> 
> Eeeeeek. So pages /cannot/ lose their buffer heads ???

it can and does lose them - but only all of them. Aging OTOH is a per-bh
thing, this kind of granularity is simply not present in the current
page->buffers handling. This is all i wanted to mention. Not unsolvable,
but needs extra logic.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:25                   ` Andrea Arcangeli
@ 2000-10-02 23:32                     ` Linus Torvalds
  2000-10-03 12:05                     ` Ingo Molnar
  1 sibling, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2000-10-02 23:32 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Rik van Riel, MM mailing list, Stephen C. Tweedie

On Tue, 3 Oct 2000, Andrea Arcangeli wrote:

> On Tue, Oct 03, 2000 at 01:29:27AM +0200, Ingo Molnar wrote:
> > it can and does lose them - but only all of them. Aging OTOH is a per-bh
> > thing, this kind of granularity is simply not present in the current
> > page->buffers handling. This is all i wanted to mention. Not unsolvable,
> 
> I'm pretty sure it doesn't worth the per-bh thing. And even if it would make
> any difference with a 1k fs for good performance 4k blksize is necessary anyway
> for other reasons.

Well, remember that some page sizes are large. A page size is not
necessarily 4k. It could be 64k.

Now, you're probably right that if you want to perform well, a 64k block
is not that large, and most things that do ordered writes might not be too
badly off with even that kind of big ordering granularity. But let's not
take it for granted.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd)
  2000-10-02 23:25                   ` Andrea Arcangeli
  2000-10-02 23:32                     ` Linus Torvalds
@ 2000-10-03 12:05                     ` Ingo Molnar
  1 sibling, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2000-10-03 12:05 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Linus Torvalds, MM mailing list, Stephen C. Tweedie

On Tue, 3 Oct 2000, Andrea Arcangeli wrote:

> > it can and does lose them - but only all of them. Aging OTOH is a per-bh
> > thing, this kind of granularity is simply not present in the current
> > page->buffers handling. This is all i wanted to mention. Not unsolvable,
> 

> I'm pretty sure it doesn't worth the per-bh thing. And even if it
> would make any difference with a 1k fs for good performance 4k blksize
> is necessary anyway for other reasons.

well if those bhs are aged by the normal buffer-cache aging mechanizm,
then there is no choice but to age them at bh granularity, not page
granularity. (this is only interesting in the case of 1k filesystems.)
Aging page->buffers at bh granularity creates interesting situations.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2000-10-03 12:05 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-10-02 19:35 [highmem bug report against -test5 and -test6] Re: [PATCH] Re: simple FS application that hangs 2.4-test5, mem mgmt problem or FS buffer cache mgmt problem? (fwd) Rik van Riel
2000-10-02 19:56 ` Andrea Arcangeli
2000-10-02 19:59   ` Rik van Riel
2000-10-02 20:17     ` Andrea Arcangeli
2000-10-02 20:24       ` Rik van Riel
2000-10-02 21:16     ` Linus Torvalds
2000-10-02 20:06   ` Linus Torvalds
2000-10-02 20:16     ` Rik van Riel
2000-10-02 20:25     ` Ingo Molnar
2000-10-02 20:45       ` Rik van Riel
2000-10-02 21:21         ` Linus Torvalds
2000-10-02 21:27           ` Rik van Riel
2000-10-02 21:19       ` Linus Torvalds
2000-10-02 21:23         ` Rik van Riel
2000-10-02 21:31           ` Linus Torvalds
2000-10-02 21:42             ` Rik van Riel
2000-10-02 21:58               ` Linus Torvalds
2000-10-02 22:08                 ` Rik van Riel
2000-10-02 22:18                   ` Andrea Arcangeli
2000-10-02 22:23                     ` Rik van Riel
2000-10-02 23:06                     ` Linus Torvalds
2000-10-02 23:12                       ` Rik van Riel
2000-10-02 23:16                         ` Linus Torvalds
2000-10-02 23:20                       ` Andrea Arcangeli
2000-10-02 22:53                   ` Linus Torvalds
2000-10-02 23:06                     ` Rik van Riel
2000-10-02 23:14                       ` Linus Torvalds
2000-10-02 21:57         ` Ingo Molnar
2000-10-02 21:52           ` Rik van Riel
2000-10-02 22:53             ` Ingo Molnar
2000-10-02 23:01               ` Rik van Riel
2000-10-02 23:10                 ` Andrea Arcangeli
2000-10-02 23:29                 ` Ingo Molnar
2000-10-02 23:25                   ` Andrea Arcangeli
2000-10-02 23:32                     ` Linus Torvalds
2000-10-03 12:05                     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox