linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] 498+ days uptime
       [not found] <199808262153.OAA13651@cesium.transmeta.com>
@ 1998-08-26 22:49 ` Zlatko Calusic
  1998-08-27 12:07   ` Bernhard Heidegger
  1998-08-28  9:35   ` Stephen C. Tweedie
  0 siblings, 2 replies; 16+ messages in thread
From: Zlatko Calusic @ 1998-08-26 22:49 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel List, Linux-MM List

"H. Peter Anvin" <hpa@transmeta.com> writes:

> > 
> > bdflush yes, but update is not obsolete.
> > 
> > It is still needed if you want to make sure data (and metadata)
> > eventually gets written to disk.
> > 
> > Of course, you can run without update, but then don't bother if you
> > lose file in system crash, even if you edited it and saved it few
> > hours ago. :)
> > 
> > Update is very important if you have lots of RAM in your computer.
> > 
> 
> Oh.  I guess my next question then is "why", as why can't this be done
> by kflushd as well?
> 

To tell you the truth, I'm not sure why, these days.

I thought it was done this way (update running in userspace) so to
have control how often buffers get flushed. But, I believe bdflush
program had this functionality, and it is long gone (as you correctly
noticed).

These days, however, we have sysctl thing that is usable for about
anything, and especially for things like this.

Peeking at /proc/sys/vm/bdflush, I can see all needed variables are
already there, so nothing stops kernel to (ab)use them.

{atlas} [/proc/sys/vm]# cat bdflush
40      500     64      256     15      3000    500     1884    2

I'm crossposting this mail to linux-mm where some clever MM people can
be found. Hopefully we can get an explanation why do we still need
update.

Regards,
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
       Linux, WinNT and MS-DOS. The Good, The Bad and The Ugly.
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-26 22:49 ` [PATCH] 498+ days uptime Zlatko Calusic
@ 1998-08-27 12:07   ` Bernhard Heidegger
  1998-08-27 12:21     ` Zlatko Calusic
  1998-08-28  9:35   ` Stephen C. Tweedie
  1 sibling, 1 reply; 16+ messages in thread
From: Bernhard Heidegger @ 1998-08-27 12:07 UTC (permalink / raw)
  To: Zlatko.Calusic; +Cc: H. Peter Anvin, Linux Kernel List, Linux-MM List

>>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:

>> "H. Peter Anvin" <hpa@transmeta.com> writes:
>> > 
>> > bdflush yes, but update is not obsolete.
>> > 
>> > It is still needed if you want to make sure data (and metadata)
>> > eventually gets written to disk.
>> > 
>> > Of course, you can run without update, but then don't bother if you
>> > lose file in system crash, even if you edited it and saved it few
>> > hours ago. :)
>> > 
>> > Update is very important if you have lots of RAM in your computer.
>> > 
>> 
>> Oh.  I guess my next question then is "why", as why can't this be done
>> by kflushd as well?
>> 

>> To tell you the truth, I'm not sure why, these days.

>> I thought it was done this way (update running in userspace) so to
>> have control how often buffers get flushed. But, I believe bdflush
>> program had this functionality, and it is long gone (as you correctly
>> noticed).

IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This
function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers
and sync_inodes before it goes through the dirty buffer lust (to write
some dirty buffers); the kflushd only writes some dirty buffers dependent
on the sysctl parameters.
If I'm wrong, please feel free to correct me!

Regards
Bernhard

get my pgp key from a public keyserver (keyID=0x62446355)
-----------------------------------------------------------------------------
Bernhard Heidegger                                       bheide@hyperwave.com
                  Hyperwave Software Research & Development
                       Schloegelgasse 9/1, A-8010 Graz
Voice: ++43/316/820918-25                             Fax: ++43/316/820918-99
-----------------------------------------------------------------------------
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-27 12:07   ` Bernhard Heidegger
@ 1998-08-27 12:21     ` Zlatko Calusic
  1998-08-27 12:43       ` Bernhard Heidegger
  0 siblings, 1 reply; 16+ messages in thread
From: Zlatko Calusic @ 1998-08-27 12:21 UTC (permalink / raw)
  To: Bernhard Heidegger; +Cc: H. Peter Anvin, Linux Kernel List, Linux-MM List

Bernhard Heidegger <bheide@hyperwave.com> writes:

> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
> 
> >> "H. Peter Anvin" <hpa@transmeta.com> writes:
> >> > 
> >> > bdflush yes, but update is not obsolete.
> >> > 
> >> > It is still needed if you want to make sure data (and metadata)
> >> > eventually gets written to disk.
> >> > 
> >> > Of course, you can run without update, but then don't bother if you
> >> > lose file in system crash, even if you edited it and saved it few
> >> > hours ago. :)
> >> > 
> >> > Update is very important if you have lots of RAM in your computer.
> >> > 
> >> 
> >> Oh.  I guess my next question then is "why", as why can't this be done
> >> by kflushd as well?
> >> 
> 
> >> To tell you the truth, I'm not sure why, these days.
> 
> >> I thought it was done this way (update running in userspace) so to
> >> have control how often buffers get flushed. But, I believe bdflush
> >> program had this functionality, and it is long gone (as you correctly
> >> noticed).
> 
> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This
> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers
> and sync_inodes before it goes through the dirty buffer lust (to write
> some dirty buffers); the kflushd only writes some dirty buffers dependent
> on the sysctl parameters.
> If I'm wrong, please feel free to correct me!
> 

You are not wrong.

Update flushes metadata blocks every 5 seconds, and data block every
30 seconds.

Questions is why can't this functionality be integrated in the kernel, 
so we don't have to run yet another daemon?

As parameters are easy controllable with sysctl interface, I don't see
a reason why is update still needed. Or is it not?
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
	There is an exception to every rule, except this one.
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-27 12:21     ` Zlatko Calusic
@ 1998-08-27 12:43       ` Bernhard Heidegger
  1998-08-28  1:03         ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Bernhard Heidegger @ 1998-08-27 12:43 UTC (permalink / raw)
  To: Zlatko.Calusic
  Cc: Bernhard Heidegger, H. Peter Anvin, Linux Kernel List, Linux-MM List

>>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:

>> Bernhard Heidegger <bheide@hyperwave.com> writes:
>> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
>> 
>> >> "H. Peter Anvin" <hpa@transmeta.com> writes:
>> >> > 
>> >> > bdflush yes, but update is not obsolete.
>> >> > 
>> >> > It is still needed if you want to make sure data (and metadata)
>> >> > eventually gets written to disk.
>> >> > 
>> >> > Of course, you can run without update, but then don't bother if you
>> >> > lose file in system crash, even if you edited it and saved it few
>> >> > hours ago. :)
>> >> > 
>> >> > Update is very important if you have lots of RAM in your computer.
>> >> > 
>> >> 
>> >> Oh.  I guess my next question then is "why", as why can't this be done
>> >> by kflushd as well?
>> >> 
>> 
>> >> To tell you the truth, I'm not sure why, these days.
>> 
>> >> I thought it was done this way (update running in userspace) so to
>> >> have control how often buffers get flushed. But, I believe bdflush
>> >> program had this functionality, and it is long gone (as you correctly
>> >> noticed).
>> 
>> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This
>> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers
>> and sync_inodes before it goes through the dirty buffer lust (to write
>> some dirty buffers); the kflushd only writes some dirty buffers dependent
>> on the sysctl parameters.
>> If I'm wrong, please feel free to correct me!
>> 

>> You are not wrong.

>> Update flushes metadata blocks every 5 seconds, and data block every
>> 30 seconds.

My version of update (something around Slakware 3.4) does the following:
1.) calls bdflush(1,0) (fs/buffer.c:sys_bdflush) which will call
    sync_old_buffers() and return
2.) only if the bdflush(1,0) fails (it returns < 0) it returns to the
    old behavior of sync()ing every 30 seconds

But case 2) should only happen on really old kernels; on newer kernels
(I'm using 2.0.34) the bdflush() should never fail.

But as I told, sync_old_buffers() do:
1.) sync_supers(0)
2.) sync_inodes(0)
3.) go through dirty buffer list and may flush some buffers

Conclusion: the meta data get synced every 5 seconds and some buffers may
be flushed.

>> Questions is why can't this functionality be integrated in the kernel, 
>> so we don't have to run yet another daemon?

Good question, but I've another one: IMHO sync_old_buffers (especially
the for loop) do similar things as the kflushd. Why??
Is it possible to reduce the sync_old_buffers() routine to soemthing like:
{
  	sync_supers(0);
	sync_inodes(0);
}
??

Bernhard

get my pgp key from a public keyserver (keyID=0x62446355)
-----------------------------------------------------------------------------
Bernhard Heidegger                                       bheide@hyperwave.com
                  Hyperwave Software Research & Development
                       Schloegelgasse 9/1, A-8010 Graz
Voice: ++43/316/820918-25                             Fax: ++43/316/820918-99
-----------------------------------------------------------------------------
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-27 12:43       ` Bernhard Heidegger
@ 1998-08-28  1:03         ` Eric W. Biederman
  1998-08-28  9:09           ` Bernhard Heidegger
  1998-08-28 21:32           ` Zlatko Calusic
  0 siblings, 2 replies; 16+ messages in thread
From: Eric W. Biederman @ 1998-08-28  1:03 UTC (permalink / raw)
  To: Bernhard Heidegger
  Cc: Zlatko.Calusic, H. Peter Anvin, Linux Kernel List, Linux-MM List

>>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes:

>>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
>>> Bernhard Heidegger <bheide@hyperwave.com> writes:
>>> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
>>> 
>>> >> "H. Peter Anvin" <hpa@transmeta.com> writes:
>>> >> > 
>>> >> > bdflush yes, but update is not obsolete.
>>> >> > 
>>> >> > It is still needed if you want to make sure data (and metadata)
>>> >> > eventually gets written to disk.
>>> >> > 
>>> >> > Of course, you can run without update, but then don't bother if you
>>> >> > lose file in system crash, even if you edited it and saved it few
>>> >> > hours ago. :)
>>> >> > 
>>> >> > Update is very important if you have lots of RAM in your computer.
>>> >> > 
>>> >> 
>>> >> Oh.  I guess my next question then is "why", as why can't this be done
>>> >> by kflushd as well?
>>> >> 
>>> 
>>> >> To tell you the truth, I'm not sure why, these days.
>>> 
>>> >> I thought it was done this way (update running in userspace) so to
>>> >> have control how often buffers get flushed. But, I believe bdflush
>>> >> program had this functionality, and it is long gone (as you correctly
>>> >> noticed).
>>> 
>>> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This
>>> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers
>>> and sync_inodes before it goes through the dirty buffer lust (to write
>>> some dirty buffers); the kflushd only writes some dirty buffers dependent
>>> on the sysctl parameters.
>>> If I'm wrong, please feel free to correct me!
>>> 

>>> You are not wrong.

>>> Update flushes metadata blocks every 5 seconds, and data block every
>>> 30 seconds.

BH> My version of update (something around Slakware 3.4) does the following:
BH> 1.) calls bdflush(1,0) (fs/buffer.c:sys_bdflush) which will call
BH>     sync_old_buffers() and return
BH> 2.) only if the bdflush(1,0) fails (it returns < 0) it returns to the
BH>     old behavior of sync()ing every 30 seconds

BH> But case 2) should only happen on really old kernels; on newer kernels
BH> (I'm using 2.0.34) the bdflush() should never fail.

BH> But as I told, sync_old_buffers() do:
BH> 1.) sync_supers(0)
BH> 2.) sync_inodes(0)
BH> 3.) go through dirty buffer list and may flush some buffers

BH> Conclusion: the meta data get synced every 5 seconds and some buffers may
BH> be flushed.

>>> Questions is why can't this functionality be integrated in the kernel, 
>>> so we don't have to run yet another daemon?

We can do this in kernel thread but I don't see the win.

BH> Good question, but I've another one: IMHO sync_old_buffers (especially
BH> the for loop) do similar things as the kflushd. Why??

kflushd removes buffers only when we are low on memory, and unconditionally.

bdflush lets buffers sit for 30 seconds and every 5 seconds it checks
for buffers that are at least 30 seconds old and flushes them.

bdflush does most of the work.

BH> Is it possible to reduce the sync_old_buffers() routine to soemthing like:

No.  Major performance problem.

Eric
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28  1:03         ` Eric W. Biederman
@ 1998-08-28  9:09           ` Bernhard Heidegger
  1998-08-28 13:14             ` Eric W. Biederman
  1998-08-28 21:36             ` Zlatko Calusic
  1998-08-28 21:32           ` Zlatko Calusic
  1 sibling, 2 replies; 16+ messages in thread
From: Bernhard Heidegger @ 1998-08-28  9:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bernhard Heidegger, Zlatko.Calusic, H. Peter Anvin,
	Linux Kernel List, Linux-MM List

>>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes:

>>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes:
>>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
>>>> Bernhard Heidegger <bheide@hyperwave.com> writes:
>>>> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
>>>> 
>>>> >> "H. Peter Anvin" <hpa@transmeta.com> writes:
>>>> >> > 
>>>> >> > bdflush yes, but update is not obsolete.
>>>> >> > 
>>>> >> > It is still needed if you want to make sure data (and metadata)
>>>> >> > eventually gets written to disk.
>>>> >> > 
>>>> >> > Of course, you can run without update, but then don't bother if you
>>>> >> > lose file in system crash, even if you edited it and saved it few
>>>> >> > hours ago. :)
>>>> >> > 
>>>> >> > Update is very important if you have lots of RAM in your computer.
>>>> >> > 
>>>> >> 
>>>> >> Oh.  I guess my next question then is "why", as why can't this be done
>>>> >> by kflushd as well?
>>>> >> 
>>>> 
>>>> >> To tell you the truth, I'm not sure why, these days.
>>>> 
>>>> >> I thought it was done this way (update running in userspace) so to
>>>> >> have control how often buffers get flushed. But, I believe bdflush
>>>> >> program had this functionality, and it is long gone (as you correctly
>>>> >> noticed).
>>>> 
>>>> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This
>>>> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers
>>>> and sync_inodes before it goes through the dirty buffer lust (to write
>>>> some dirty buffers); the kflushd only writes some dirty buffers dependent
>>>> on the sysctl parameters.
>>>> If I'm wrong, please feel free to correct me!
>>>> 

>>>> You are not wrong.

>>>> Update flushes metadata blocks every 5 seconds, and data block every
>>>> 30 seconds.

BH> My version of update (something around Slakware 3.4) does the following:
BH> 1.) calls bdflush(1,0) (fs/buffer.c:sys_bdflush) which will call
BH> sync_old_buffers() and return
BH> 2.) only if the bdflush(1,0) fails (it returns < 0) it returns to the
BH> old behavior of sync()ing every 30 seconds

BH> But case 2) should only happen on really old kernels; on newer kernels
BH> (I'm using 2.0.34) the bdflush() should never fail.

BH> But as I told, sync_old_buffers() do:
BH> 1.) sync_supers(0)
BH> 2.) sync_inodes(0)
BH> 3.) go through dirty buffer list and may flush some buffers

BH> Conclusion: the meta data get synced every 5 seconds and some buffers may
BH> be flushed.

>>>> Questions is why can't this functionality be integrated in the kernel, 
>>>> so we don't have to run yet another daemon?

>> We can do this in kernel thread but I don't see the win.

I don't have a problem with the user level thing (so I can decide to not
start it ;-)

BH> Good question, but I've another one: IMHO sync_old_buffers (especially
BH> the for loop) do similar things as the kflushd. Why??

>> kflushd removes buffers only when we are low on memory, and unconditionally.

>> bdflush lets buffers sit for 30 seconds and every 5 seconds it checks
>> for buffers that are at least 30 seconds old and flushes them.

Ahh, is this bh->b_flushtime?

>> bdflush does most of the work.

Yes, I know :-(

BH> Is it possible to reduce the sync_old_buffers() routine to soemthing like:

>> No.  Major performance problem.

Why?

Imagine an application which has most of the (index) file pages in memory
and many of the pages are dirty. bdflush will flush the pages regularly,
but the pages will get dirty immediately again.
If you can be sure, that the power cannot fail the performance should be
much better without bdflush, because kflushd has to write pages only if
the system is running low on memory...

Bernhard

get my pgp key from a public keyserver (keyID=0x62446355)
-----------------------------------------------------------------------------
Bernhard Heidegger                                       bheide@hyperwave.com
                  Hyperwave Software Research & Development
                       Schloegelgasse 9/1, A-8010 Graz
Voice: ++43/316/820918-25                             Fax: ++43/316/820918-99
-----------------------------------------------------------------------------
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-26 22:49 ` [PATCH] 498+ days uptime Zlatko Calusic
  1998-08-27 12:07   ` Bernhard Heidegger
@ 1998-08-28  9:35   ` Stephen C. Tweedie
  1998-08-28 22:16     ` Zlatko Calusic
  1 sibling, 1 reply; 16+ messages in thread
From: Stephen C. Tweedie @ 1998-08-28  9:35 UTC (permalink / raw)
  To: Zlatko.Calusic; +Cc: H. Peter Anvin, Linux Kernel List, Linux-MM List

Hi,

On 27 Aug 1998 00:49:55 +0200, Zlatko Calusic <Zlatko.Calusic@CARNet.hr>
said:

> I thought it was done this way (update running in userspace) so to
> have control how often buffers get flushed. But, I believe bdflush
> program had this functionality, and it is long gone (as you correctly
> noticed).

update(8) _is_ the old bdflush program. :)

There are two entirely separate jobs being done.  One is to flush all
buffers which are beyond their dirty timelimit: that job is done by the
bdflush syscall called by update/bdflush every 5 seconds.  The second
job is to trickle back some dirty buffers to disk if we are getting
short of clean buffer space in memory. 

These are completely different jobs.  They select which buffers and how
many buffers to write based on different criteria, and they are woken up
by different events.  That's why we have two daemons.  The fact that one
spends its wait time in user mode and one spends its time in kernel mode
is irrelevant; even if they were both kernel threads we'd still have two
separate jobs needing done.

> I'm crossposting this mail to linux-mm where some clever MM people can
> be found. Hopefully we can get an explanation why do we still need
> update.

Because kflushd does not do the job which update needs to do.  It does a
different job.

--Stephen 
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28  9:09           ` Bernhard Heidegger
@ 1998-08-28 13:14             ` Eric W. Biederman
  1998-08-28 16:03               ` Bernhard Heidegger
  1998-08-28 21:47               ` Zlatko Calusic
  1998-08-28 21:36             ` Zlatko Calusic
  1 sibling, 2 replies; 16+ messages in thread
From: Eric W. Biederman @ 1998-08-28 13:14 UTC (permalink / raw)
  To: Bernhard Heidegger
  Cc: Eric W. Biederman, Zlatko.Calusic, H. Peter Anvin,
	Linux Kernel List, Linux-MM List

>>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes:

>>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes:

>>> bdflush lets buffers sit for 30 seconds and every 5 seconds it checks
>>> for buffers that are at least 30 seconds old and flushes them.

BH> Ahh, is this bh->b_flushtime?
yes.

>>> bdflush does most of the work.

BH> Yes, I know :-(

BH> Is it possible to reduce the sync_old_buffers() routine to soemthing like:

>>> No.  Major performance problem.

BH> Why?

BH> Imagine an application which has most of the (index) file pages in memory
BH> and many of the pages are dirty. bdflush will flush the pages regularly,
BH> but the pages will get dirty immediately again.
BH> If you can be sure, that the power cannot fail the performance should be
BH> much better without bdflush, because kflushd has to write pages only if
BH> the system is running low on memory...

The performance improvement comes when looking for free memory.  In
most cases bdflush's slow but steady writing of pages keeps buffers
clean.  When the application wants more memory with bdflush in the
background unsually the pages it needs will be clean (because the I/O
started before the application needed it), so they can just be dropped
out of memory.  Relying on kflushd means nothing is written until an
application needs the memory and then it must wait until something is
written to disk, which is much slower.

Further 
a) garanteeing no power failure is hard.
b) generally there is so much data on the disk you must write it
   sometime, because you can't hold it all in memory.
c) I have trouble imagining a case where a small file would be rewritten
   continually.

Eric 
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28 13:14             ` Eric W. Biederman
@ 1998-08-28 16:03               ` Bernhard Heidegger
  1998-08-28 22:03                 ` Zlatko Calusic
  1998-08-28 21:47               ` Zlatko Calusic
  1 sibling, 1 reply; 16+ messages in thread
From: Bernhard Heidegger @ 1998-08-28 16:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bernhard Heidegger, Zlatko.Calusic, H. Peter Anvin,
	Linux Kernel List, Linux-MM List

>>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes:

>>>> No.  Major performance problem.

BH> Why?

BH> Imagine an application which has most of the (index) file pages in memory
BH> and many of the pages are dirty. bdflush will flush the pages regularly,
BH> but the pages will get dirty immediately again.
BH> If you can be sure, that the power cannot fail the performance should be
BH> much better without bdflush, because kflushd has to write pages only if
BH> the system is running low on memory...

>> The performance improvement comes when looking for free memory.  In
>> most cases bdflush's slow but steady writing of pages keeps buffers
>> clean.  When the application wants more memory with bdflush in the
>> background unsually the pages it needs will be clean (because the I/O
>> started before the application needed it), so they can just be dropped
>> out of memory.  Relying on kflushd means nothing is written until an
>> application needs the memory and then it must wait until something is
>> written to disk, which is much slower.

>> Further 
>> a) garanteeing no power failure is hard.

Use and UPS and regularly flush/sync the primary data to disk from
the application

>> b) generally there is so much data on the disk you must write it
>>    sometime, because you can't hold it all in memory.

only a question of how much RAM you can put in your PC

>> c) I have trouble imagining a case where a small file would be rewritten
>>    continually.

Not really small, but a database application may use btree based indexes,
where many blocks will get dirty when inserting/deleting data. If you flush
the dirty buffers and the next insertion dirty the same buffer(s) you have
lost performance (Note: the btree based indexes are secondary data; you
can rebuild it from scratch if the system fails)

Bernhard

get my pgp key from a public keyserver (keyID=0x62446355)
-----------------------------------------------------------------------------
Bernhard Heidegger                                       bheide@hyperwave.com
                  Hyperwave Software Research & Development
                       Schloegelgasse 9/1, A-8010 Graz
Voice: ++43/316/820918-25                             Fax: ++43/316/820918-99
-----------------------------------------------------------------------------
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28  1:03         ` Eric W. Biederman
  1998-08-28  9:09           ` Bernhard Heidegger
@ 1998-08-28 21:32           ` Zlatko Calusic
  1 sibling, 0 replies; 16+ messages in thread
From: Zlatko Calusic @ 1998-08-28 21:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bernhard Heidegger, H. Peter Anvin, Linux Kernel List, Linux-MM List

ebiederm@inetnebr.com (Eric W. Biederman) writes:

> >>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes:
> 
> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
> >>> Bernhard Heidegger <bheide@hyperwave.com> writes:
> >>> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
> >>> 
> >>> >> "H. Peter Anvin" <hpa@transmeta.com> writes:
> >>> >> > 
> >>> >> > bdflush yes, but update is not obsolete.
> >>> >> > 
> >>> >> > It is still needed if you want to make sure data (and metadata)
> >>> >> > eventually gets written to disk.
> >>> >> > 
> >>> >> > Of course, you can run without update, but then don't bother if you
> >>> >> > lose file in system crash, even if you edited it and saved it few
> >>> >> > hours ago. :)
> >>> >> > 
> >>> >> > Update is very important if you have lots of RAM in your computer.
> >>> >> > 
> >>> >> 
> >>> >> Oh.  I guess my next question then is "why", as why can't this be done
> >>> >> by kflushd as well?
> >>> >> 
> >>> 
> >>> >> To tell you the truth, I'm not sure why, these days.
> >>> 
> >>> >> I thought it was done this way (update running in userspace) so to
> >>> >> have control how often buffers get flushed. But, I believe bdflush
> >>> >> program had this functionality, and it is long gone (as you correctly
> >>> >> noticed).
> >>> 
> >>> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This
> >>> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers
> >>> and sync_inodes before it goes through the dirty buffer lust (to write
> >>> some dirty buffers); the kflushd only writes some dirty buffers dependent
> >>> on the sysctl parameters.
> >>> If I'm wrong, please feel free to correct me!
> >>> 
> 
> >>> You are not wrong.
> 
> >>> Update flushes metadata blocks every 5 seconds, and data block every
> >>> 30 seconds.
> 
> BH> My version of update (something around Slakware 3.4) does the following:
> BH> 1.) calls bdflush(1,0) (fs/buffer.c:sys_bdflush) which will call
> BH>     sync_old_buffers() and return
> BH> 2.) only if the bdflush(1,0) fails (it returns < 0) it returns to the
> BH>     old behavior of sync()ing every 30 seconds
> 
> BH> But case 2) should only happen on really old kernels; on newer kernels
> BH> (I'm using 2.0.34) the bdflush() should never fail.
> 
> BH> But as I told, sync_old_buffers() do:
> BH> 1.) sync_supers(0)
> BH> 2.) sync_inodes(0)
> BH> 3.) go through dirty buffer list and may flush some buffers
> 
> BH> Conclusion: the meta data get synced every 5 seconds and some buffers may
> BH> be flushed.
> 
> >>> Questions is why can't this functionality be integrated in the kernel, 
> >>> so we don't have to run yet another daemon?
> 
> We can do this in kernel thread but I don't see the win.
> 

One daemon less to run.

This should be enough.

You have one less process running, you free some memory, and make
things slightly cleaner.

Not a big win, but small things make people happy. :)
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
       Linux, WinNT and MS-DOS. The Good, The Bad and The Ugly.
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28  9:09           ` Bernhard Heidegger
  1998-08-28 13:14             ` Eric W. Biederman
@ 1998-08-28 21:36             ` Zlatko Calusic
  1 sibling, 0 replies; 16+ messages in thread
From: Zlatko Calusic @ 1998-08-28 21:36 UTC (permalink / raw)
  To: Bernhard Heidegger
  Cc: Eric W. Biederman, H. Peter Anvin, Linux Kernel List, Linux-MM List

Bernhard Heidegger <bheide@hyperwave.com> writes:

> >>>> Questions is why can't this functionality be integrated in the kernel, 
> >>>> so we don't have to run yet another daemon?
> 
> >> We can do this in kernel thread but I don't see the win.
> 
> I don't have a problem with the user level thing (so I can decide to not
> start it ;-)
> 

You can always tune things up to you preference, even with update
functionality in the kernel. If you set flushing period to say 12
hours, it's effectively like you killed update. :)
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
			Do vampires get AIDS?
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28 13:14             ` Eric W. Biederman
  1998-08-28 16:03               ` Bernhard Heidegger
@ 1998-08-28 21:47               ` Zlatko Calusic
  1 sibling, 0 replies; 16+ messages in thread
From: Zlatko Calusic @ 1998-08-28 21:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bernhard Heidegger, H. Peter Anvin, Linux Kernel List, Linux-MM List

ebiederm@inetnebr.com (Eric W. Biederman) writes:

> >>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes:
> BH> Imagine an application which has most of the (index) file pages in memory
> BH> and many of the pages are dirty. bdflush will flush the pages regularly,
> BH> but the pages will get dirty immediately again.
> BH> If you can be sure, that the power cannot fail the performance should be
> BH> much better without bdflush, because kflushd has to write pages only if
> BH> the system is running low on memory...
> 
> The performance improvement comes when looking for free memory.  In
> most cases bdflush's slow but steady writing of pages keeps buffers
> clean.  When the application wants more memory with bdflush in the
> background unsually the pages it needs will be clean (because the I/O
> started before the application needed it), so they can just be dropped
> out of memory.  Relying on kflushd means nothing is written until an
> application needs the memory and then it must wait until something is
> written to disk, which is much slower.

Not absolutely true. kflushd flushes dirty buffers not when they're
all dirty, but when percentage of dirty buffers goes above the
threshold. And that threshold is tunable, default value as of recent
kernels is 40%.

So even if kflushd didn't run in time, that only means you have *up*
to 40% of dirty buffers. Other 60% or more are clean.

We're here speaking of first parameter in /proc/sys/vm/bdflush. It was
60 initially, but lowered recently (few months ago, half a year?) due
to problems with buffers at that time.

> 
> Further 
> a) garanteeing no power failure is hard.

Here I entirely agree. UPS' cost much more than update/bdflush. :)

> b) generally there is so much data on the disk you must write it
>    sometime, because you can't hold it all in memory.

Right.

> c) I have trouble imagining a case where a small file would be rewritten
>    continually.
> 

It happens. Otherwise we wouldn't need buffers at all. :)
Maybe only to achieve asynchrony.

Think of metadata, and operations of creating/deleting lots of files
in the directory, and similar. Imagine a busy news/proxy server.
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
		   Recursive, adj.; see Recursive.
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28 16:03               ` Bernhard Heidegger
@ 1998-08-28 22:03                 ` Zlatko Calusic
  1998-08-31  8:32                   ` Bernhard Heidegger
  0 siblings, 1 reply; 16+ messages in thread
From: Zlatko Calusic @ 1998-08-28 22:03 UTC (permalink / raw)
  To: Bernhard Heidegger
  Cc: Eric W. Biederman, H. Peter Anvin, Linux Kernel List, Linux-MM List

Bernhard Heidegger <bheide@hyperwave.com> writes:

> >>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes:
> 
> >>>> No.  Major performance problem.
> 
> BH> Why?
> 
> BH> Imagine an application which has most of the (index) file pages in memory
> BH> and many of the pages are dirty. bdflush will flush the pages regularly,
> BH> but the pages will get dirty immediately again.
> BH> If you can be sure, that the power cannot fail the performance should be
> BH> much better without bdflush, because kflushd has to write pages only if
> BH> the system is running low on memory...
> 
> >> The performance improvement comes when looking for free memory.  In
> >> most cases bdflush's slow but steady writing of pages keeps buffers
> >> clean.  When the application wants more memory with bdflush in the
> >> background unsually the pages it needs will be clean (because the I/O
> >> started before the application needed it), so they can just be dropped
> >> out of memory.  Relying on kflushd means nothing is written until an
> >> application needs the memory and then it must wait until something is
> >> written to disk, which is much slower.
> 
> >> Further 
> >> a) garanteeing no power failure is hard.
> 
> Use and UPS and regularly flush/sync the primary data to disk from
> the application

Update/bdflush costs you nothing. UPS costs you lots of money. Big
difference.

Also, flushing/syncing data to disk doesn't always mean data really
got to media. Check your favorite sync(2) manpage. :)

Using completely synchronous API in applications would consideraly cut
performances down. Why would your application wait for disk to commit
buffers, when your CPU can do other useful things in the meantime.
Also, don't forget that disk latency times are measured in
milliseconds, where modern CPU's run in units of (almost) nanoseconds.

> 
> >> b) generally there is so much data on the disk you must write it
> >>    sometime, because you can't hold it all in memory.
> 
> only a question of how much RAM you can put in your PC

Still requires money. :)

> 
> >> c) I have trouble imagining a case where a small file would be rewritten
> >>    continually.
> 
> Not really small, but a database application may use btree based indexes,
> where many blocks will get dirty when inserting/deleting data. If you flush
> the dirty buffers and the next insertion dirty the same buffer(s) you have
> lost performance (Note: the btree based indexes are secondary data; you
> can rebuild it from scratch if the system fails)
> 

Right, we agree. But performance doesn't go down if you write buffers
every few tens of seconds. That is a LOT of time, if you ask your
application. Some of them never get so old. :)

And (big) databases mostly like to have their own memory management,
because "they know better".
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
	Vi is the God of editors. Emacs is the editor of Gods.
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28  9:35   ` Stephen C. Tweedie
@ 1998-08-28 22:16     ` Zlatko Calusic
  1998-08-30 15:10       ` Stephen C. Tweedie
  0 siblings, 1 reply; 16+ messages in thread
From: Zlatko Calusic @ 1998-08-28 22:16 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: H. Peter Anvin, Linux Kernel List, Linux-MM List

"Stephen C. Tweedie" <sct@redhat.com> writes:

> Hi,
> 
> On 27 Aug 1998 00:49:55 +0200, Zlatko Calusic <Zlatko.Calusic@CARNet.hr>
> said:
> 
> > I thought it was done this way (update running in userspace) so to
> > have control how often buffers get flushed. But, I believe bdflush
> > program had this functionality, and it is long gone (as you correctly
> > noticed).
> 
> update(8) _is_ the old bdflush program. :)

I know. But in that old days, I believe, we had two daemons, update
AND bdflush. They were started from the same binary, but their
functionality was different.

Too bad 1.2.13 can't be compiled in todays setups. :)

> 
> There are two entirely separate jobs being done.  One is to flush all
> buffers which are beyond their dirty timelimit: that job is done by the
> bdflush syscall called by update/bdflush every 5 seconds.  The second
> job is to trickle back some dirty buffers to disk if we are getting
> short of clean buffer space in memory. 
> 
> These are completely different jobs.  They select which buffers and how
> many buffers to write based on different criteria, and they are woken up
> by different events.  That's why we have two daemons.  The fact that one
> spends its wait time in user mode and one spends its time in kernel mode
> is irrelevant; even if they were both kernel threads we'd still have two
> separate jobs needing done.

Right, I agree entirely.

Maybe I should reformulate my question. :)

Why is the former in the userspace?

I believe it is not that hard to code bdflush in the kernel, where we
lose nothing, but save few pages of memory. One less process to run,
as I already pointed out.

You probably did have an opportunity to visit Paul Gortmaker's page,
helpful for those with low memory machines. There you can find "few
lines of assembly" program that replaces update. I ran that program
for few years to save few kilobytes of memory on my old 386 / 5MB RAM.

> 
> > I'm crossposting this mail to linux-mm where some clever MM people can
> > be found. Hopefully we can get an explanation why do we still need
> > update.
> 
> Because kflushd does not do the job which update needs to do.  It does a
> different job.
> 

Yep, but allow me one more question, please.

If I happen to get some free time (very unlikely) to code bdflush
completely in the kernel, so we can get rid of update, now running as
daemon, would you consider it for inclusion in the official kernel
(sending patches to Linus, etc..)? 
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
		  It's bad luck to be superstitious.
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28 22:16     ` Zlatko Calusic
@ 1998-08-30 15:10       ` Stephen C. Tweedie
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen C. Tweedie @ 1998-08-30 15:10 UTC (permalink / raw)
  To: Zlatko.Calusic
  Cc: Stephen C. Tweedie, H. Peter Anvin, Linux Kernel List, Linux-MM List

Hi,

On 29 Aug 1998 00:16:34 +0200, Zlatko Calusic
<Zlatko.Calusic@CARNet.hr> said:

[re update/bdflush:]

> Why is the former in the userspace?

Simply because the latter is the only one to have been moved to the
kernel.  That happened because the trigger for bdflush is an internal
kernel wait queue, whereas the trigger for update is a timer.  Timers
can be easily done in user space.

> I believe it is not that hard to code bdflush in the kernel, where we
> lose nothing, but save few pages of memory. One less process to run,
> as I already pointed out.

Dead easy.  It will save memory; it will also, more importantly, save
non-pageable memory (although the kernel thread will still need its
own kernel stack, it will not need the extra page tables which
accompany a user-space process).

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 498+ days uptime
  1998-08-28 22:03                 ` Zlatko Calusic
@ 1998-08-31  8:32                   ` Bernhard Heidegger
  0 siblings, 0 replies; 16+ messages in thread
From: Bernhard Heidegger @ 1998-08-31  8:32 UTC (permalink / raw)
  To: Zlatko.Calusic; +Cc: Bernhard Heidegger, Linux Kernel List, Linux-MM List

>>>>> "Z>" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:

Z>> Bernhard Heidegger <bheide@hyperwave.com> writes:
>> >>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes:
>> 
>> >>>> No.  Major performance problem.
>> 
BH> Why?
>> 
BH> Imagine an application which has most of the (index) file pages in memory
BH> and many of the pages are dirty. bdflush will flush the pages regularly,
BH> but the pages will get dirty immediately again.
BH> If you can be sure, that the power cannot fail the performance should be
BH> much better without bdflush, because kflushd has to write pages only if
BH> the system is running low on memory...
>> 
>> >> The performance improvement comes when looking for free memory.  In
>> >> most cases bdflush's slow but steady writing of pages keeps buffers
>> >> clean.  When the application wants more memory with bdflush in the
>> >> background unsually the pages it needs will be clean (because the I/O
>> >> started before the application needed it), so they can just be dropped
>> >> out of memory.  Relying on kflushd means nothing is written until an
>> >> application needs the memory and then it must wait until something is
>> >> written to disk, which is much slower.
>> 
>> >> Further 
>> >> a) garanteeing no power failure is hard.
>> 
>> Use and UPS and regularly flush/sync the primary data to disk from
>> the application

Z>> Update/bdflush costs you nothing. UPS costs you lots of money. Big
Z>> difference.

Yes, but on a real big server where performance does matter (greetings
from Godzilla ;-) you will have an UPS anyway...

Z>> Also, flushing/syncing data to disk doesn't always mean data really
Z>> got to media. Check your favorite sync(2) manpage. :)

Correct, but you doesn't win anything with update/bdflush in this case.

Z>> Using completely synchronous API in applications would consideraly cut
Z>> performances down. Why would your application wait for disk to commit
Z>> buffers, when your CPU can do other useful things in the meantime.
Z>> Also, don't forget that disk latency times are measured in
Z>> milliseconds, where modern CPU's run in units of (almost) nanoseconds.

>> 
>> >> b) generally there is so much data on the disk you must write it
>> >>    sometime, because you can't hold it all in memory.
>> 
>> only a question of how much RAM you can put in your PC

Z>> Still requires money. :)

RAM isn't that expensive anymore

>> 
>> >> c) I have trouble imagining a case where a small file would be rewritten
>> >>    continually.
>> 
>> Not really small, but a database application may use btree based indexes,
>> where many blocks will get dirty when inserting/deleting data. If you flush
>> the dirty buffers and the next insertion dirty the same buffer(s) you have
>> lost performance (Note: the btree based indexes are secondary data; you
>> can rebuild it from scratch if the system fails)
>> 

Z>> Right, we agree. But performance doesn't go down if you write buffers
Z>> every few tens of seconds. That is a LOT of time, if you ask your
Z>> application. Some of them never get so old. :)

Hey, I speak of (database) applications which (should ;-) run until the
earth go down :-)
I did some measurements with our application and there were peaks which
were 10 times higher than the average time. I will make some further
tests in order to see if the overall performance drops. Anyway, this
application is also used interactively and if you try to get some data
during such a peak you'll have to wait ;-)

Z>> And (big) databases mostly like to have their own memory management,
Z>> because "they know better".

That's another point I agree with you, but this is a really big task...

Bernhard

get my pgp key from a public keyserver (keyID=0x62446355)
-----------------------------------------------------------------------------
Bernhard Heidegger                                       bheide@hyperwave.com
                  Hyperwave Software Research & Development
                       Schloegelgasse 9/1, A-8010 Graz
Voice: ++43/316/820918-25                             Fax: ++43/316/820918-99
-----------------------------------------------------------------------------
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~1998-08-31  8:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <199808262153.OAA13651@cesium.transmeta.com>
1998-08-26 22:49 ` [PATCH] 498+ days uptime Zlatko Calusic
1998-08-27 12:07   ` Bernhard Heidegger
1998-08-27 12:21     ` Zlatko Calusic
1998-08-27 12:43       ` Bernhard Heidegger
1998-08-28  1:03         ` Eric W. Biederman
1998-08-28  9:09           ` Bernhard Heidegger
1998-08-28 13:14             ` Eric W. Biederman
1998-08-28 16:03               ` Bernhard Heidegger
1998-08-28 22:03                 ` Zlatko Calusic
1998-08-31  8:32                   ` Bernhard Heidegger
1998-08-28 21:47               ` Zlatko Calusic
1998-08-28 21:36             ` Zlatko Calusic
1998-08-28 21:32           ` Zlatko Calusic
1998-08-28  9:35   ` Stephen C. Tweedie
1998-08-28 22:16     ` Zlatko Calusic
1998-08-30 15:10       ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox