* Re: [PATCH] 498+ days uptime [not found] <199808262153.OAA13651@cesium.transmeta.com> @ 1998-08-26 22:49 ` Zlatko Calusic 1998-08-27 12:07 ` Bernhard Heidegger 1998-08-28 9:35 ` Stephen C. Tweedie 0 siblings, 2 replies; 16+ messages in thread From: Zlatko Calusic @ 1998-08-26 22:49 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Linux Kernel List, Linux-MM List "H. Peter Anvin" <hpa@transmeta.com> writes: > > > > bdflush yes, but update is not obsolete. > > > > It is still needed if you want to make sure data (and metadata) > > eventually gets written to disk. > > > > Of course, you can run without update, but then don't bother if you > > lose file in system crash, even if you edited it and saved it few > > hours ago. :) > > > > Update is very important if you have lots of RAM in your computer. > > > > Oh. I guess my next question then is "why", as why can't this be done > by kflushd as well? > To tell you the truth, I'm not sure why, these days. I thought it was done this way (update running in userspace) so to have control how often buffers get flushed. But, I believe bdflush program had this functionality, and it is long gone (as you correctly noticed). These days, however, we have sysctl thing that is usable for about anything, and especially for things like this. Peeking at /proc/sys/vm/bdflush, I can see all needed variables are already there, so nothing stops kernel to (ab)use them. {atlas} [/proc/sys/vm]# cat bdflush 40 500 64 256 15 3000 500 1884 2 I'm crossposting this mail to linux-mm where some clever MM people can be found. Hopefully we can get an explanation why do we still need update. Regards, -- Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr> --------------------------------------------------------------------- Linux, WinNT and MS-DOS. The Good, The Bad and The Ugly. -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-26 22:49 ` [PATCH] 498+ days uptime Zlatko Calusic @ 1998-08-27 12:07 ` Bernhard Heidegger 1998-08-27 12:21 ` Zlatko Calusic 1998-08-28 9:35 ` Stephen C. Tweedie 1 sibling, 1 reply; 16+ messages in thread From: Bernhard Heidegger @ 1998-08-27 12:07 UTC (permalink / raw) To: Zlatko.Calusic; +Cc: H. Peter Anvin, Linux Kernel List, Linux-MM List >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: >> "H. Peter Anvin" <hpa@transmeta.com> writes: >> > >> > bdflush yes, but update is not obsolete. >> > >> > It is still needed if you want to make sure data (and metadata) >> > eventually gets written to disk. >> > >> > Of course, you can run without update, but then don't bother if you >> > lose file in system crash, even if you edited it and saved it few >> > hours ago. :) >> > >> > Update is very important if you have lots of RAM in your computer. >> > >> >> Oh. I guess my next question then is "why", as why can't this be done >> by kflushd as well? >> >> To tell you the truth, I'm not sure why, these days. >> I thought it was done this way (update running in userspace) so to >> have control how often buffers get flushed. But, I believe bdflush >> program had this functionality, and it is long gone (as you correctly >> noticed). IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers and sync_inodes before it goes through the dirty buffer lust (to write some dirty buffers); the kflushd only writes some dirty buffers dependent on the sysctl parameters. If I'm wrong, please feel free to correct me! Regards Bernhard get my pgp key from a public keyserver (keyID=0x62446355) ----------------------------------------------------------------------------- Bernhard Heidegger bheide@hyperwave.com Hyperwave Software Research & Development Schloegelgasse 9/1, A-8010 Graz Voice: ++43/316/820918-25 Fax: ++43/316/820918-99 ----------------------------------------------------------------------------- -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-27 12:07 ` Bernhard Heidegger @ 1998-08-27 12:21 ` Zlatko Calusic 1998-08-27 12:43 ` Bernhard Heidegger 0 siblings, 1 reply; 16+ messages in thread From: Zlatko Calusic @ 1998-08-27 12:21 UTC (permalink / raw) To: Bernhard Heidegger; +Cc: H. Peter Anvin, Linux Kernel List, Linux-MM List Bernhard Heidegger <bheide@hyperwave.com> writes: > >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: > > >> "H. Peter Anvin" <hpa@transmeta.com> writes: > >> > > >> > bdflush yes, but update is not obsolete. > >> > > >> > It is still needed if you want to make sure data (and metadata) > >> > eventually gets written to disk. > >> > > >> > Of course, you can run without update, but then don't bother if you > >> > lose file in system crash, even if you edited it and saved it few > >> > hours ago. :) > >> > > >> > Update is very important if you have lots of RAM in your computer. > >> > > >> > >> Oh. I guess my next question then is "why", as why can't this be done > >> by kflushd as well? > >> > > >> To tell you the truth, I'm not sure why, these days. > > >> I thought it was done this way (update running in userspace) so to > >> have control how often buffers get flushed. But, I believe bdflush > >> program had this functionality, and it is long gone (as you correctly > >> noticed). > > IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This > function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers > and sync_inodes before it goes through the dirty buffer lust (to write > some dirty buffers); the kflushd only writes some dirty buffers dependent > on the sysctl parameters. > If I'm wrong, please feel free to correct me! > You are not wrong. Update flushes metadata blocks every 5 seconds, and data block every 30 seconds. Questions is why can't this functionality be integrated in the kernel, so we don't have to run yet another daemon? As parameters are easy controllable with sysctl interface, I don't see a reason why is update still needed. Or is it not? -- Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr> --------------------------------------------------------------------- There is an exception to every rule, except this one. -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-27 12:21 ` Zlatko Calusic @ 1998-08-27 12:43 ` Bernhard Heidegger 1998-08-28 1:03 ` Eric W. Biederman 0 siblings, 1 reply; 16+ messages in thread From: Bernhard Heidegger @ 1998-08-27 12:43 UTC (permalink / raw) To: Zlatko.Calusic Cc: Bernhard Heidegger, H. Peter Anvin, Linux Kernel List, Linux-MM List >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: >> Bernhard Heidegger <bheide@hyperwave.com> writes: >> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: >> >> >> "H. Peter Anvin" <hpa@transmeta.com> writes: >> >> > >> >> > bdflush yes, but update is not obsolete. >> >> > >> >> > It is still needed if you want to make sure data (and metadata) >> >> > eventually gets written to disk. >> >> > >> >> > Of course, you can run without update, but then don't bother if you >> >> > lose file in system crash, even if you edited it and saved it few >> >> > hours ago. :) >> >> > >> >> > Update is very important if you have lots of RAM in your computer. >> >> > >> >> >> >> Oh. I guess my next question then is "why", as why can't this be done >> >> by kflushd as well? >> >> >> >> >> To tell you the truth, I'm not sure why, these days. >> >> >> I thought it was done this way (update running in userspace) so to >> >> have control how often buffers get flushed. But, I believe bdflush >> >> program had this functionality, and it is long gone (as you correctly >> >> noticed). >> >> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This >> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers >> and sync_inodes before it goes through the dirty buffer lust (to write >> some dirty buffers); the kflushd only writes some dirty buffers dependent >> on the sysctl parameters. >> If I'm wrong, please feel free to correct me! >> >> You are not wrong. >> Update flushes metadata blocks every 5 seconds, and data block every >> 30 seconds. My version of update (something around Slakware 3.4) does the following: 1.) calls bdflush(1,0) (fs/buffer.c:sys_bdflush) which will call sync_old_buffers() and return 2.) only if the bdflush(1,0) fails (it returns < 0) it returns to the old behavior of sync()ing every 30 seconds But case 2) should only happen on really old kernels; on newer kernels (I'm using 2.0.34) the bdflush() should never fail. But as I told, sync_old_buffers() do: 1.) sync_supers(0) 2.) sync_inodes(0) 3.) go through dirty buffer list and may flush some buffers Conclusion: the meta data get synced every 5 seconds and some buffers may be flushed. >> Questions is why can't this functionality be integrated in the kernel, >> so we don't have to run yet another daemon? Good question, but I've another one: IMHO sync_old_buffers (especially the for loop) do similar things as the kflushd. Why?? Is it possible to reduce the sync_old_buffers() routine to soemthing like: { sync_supers(0); sync_inodes(0); } ?? Bernhard get my pgp key from a public keyserver (keyID=0x62446355) ----------------------------------------------------------------------------- Bernhard Heidegger bheide@hyperwave.com Hyperwave Software Research & Development Schloegelgasse 9/1, A-8010 Graz Voice: ++43/316/820918-25 Fax: ++43/316/820918-99 ----------------------------------------------------------------------------- -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-27 12:43 ` Bernhard Heidegger @ 1998-08-28 1:03 ` Eric W. Biederman 1998-08-28 9:09 ` Bernhard Heidegger 1998-08-28 21:32 ` Zlatko Calusic 0 siblings, 2 replies; 16+ messages in thread From: Eric W. Biederman @ 1998-08-28 1:03 UTC (permalink / raw) To: Bernhard Heidegger Cc: Zlatko.Calusic, H. Peter Anvin, Linux Kernel List, Linux-MM List >>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes: >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: >>> Bernhard Heidegger <bheide@hyperwave.com> writes: >>> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: >>> >>> >> "H. Peter Anvin" <hpa@transmeta.com> writes: >>> >> > >>> >> > bdflush yes, but update is not obsolete. >>> >> > >>> >> > It is still needed if you want to make sure data (and metadata) >>> >> > eventually gets written to disk. >>> >> > >>> >> > Of course, you can run without update, but then don't bother if you >>> >> > lose file in system crash, even if you edited it and saved it few >>> >> > hours ago. :) >>> >> > >>> >> > Update is very important if you have lots of RAM in your computer. >>> >> > >>> >> >>> >> Oh. I guess my next question then is "why", as why can't this be done >>> >> by kflushd as well? >>> >> >>> >>> >> To tell you the truth, I'm not sure why, these days. >>> >>> >> I thought it was done this way (update running in userspace) so to >>> >> have control how often buffers get flushed. But, I believe bdflush >>> >> program had this functionality, and it is long gone (as you correctly >>> >> noticed). >>> >>> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This >>> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers >>> and sync_inodes before it goes through the dirty buffer lust (to write >>> some dirty buffers); the kflushd only writes some dirty buffers dependent >>> on the sysctl parameters. >>> If I'm wrong, please feel free to correct me! >>> >>> You are not wrong. >>> Update flushes metadata blocks every 5 seconds, and data block every >>> 30 seconds. BH> My version of update (something around Slakware 3.4) does the following: BH> 1.) calls bdflush(1,0) (fs/buffer.c:sys_bdflush) which will call BH> sync_old_buffers() and return BH> 2.) only if the bdflush(1,0) fails (it returns < 0) it returns to the BH> old behavior of sync()ing every 30 seconds BH> But case 2) should only happen on really old kernels; on newer kernels BH> (I'm using 2.0.34) the bdflush() should never fail. BH> But as I told, sync_old_buffers() do: BH> 1.) sync_supers(0) BH> 2.) sync_inodes(0) BH> 3.) go through dirty buffer list and may flush some buffers BH> Conclusion: the meta data get synced every 5 seconds and some buffers may BH> be flushed. >>> Questions is why can't this functionality be integrated in the kernel, >>> so we don't have to run yet another daemon? We can do this in kernel thread but I don't see the win. BH> Good question, but I've another one: IMHO sync_old_buffers (especially BH> the for loop) do similar things as the kflushd. Why?? kflushd removes buffers only when we are low on memory, and unconditionally. bdflush lets buffers sit for 30 seconds and every 5 seconds it checks for buffers that are at least 30 seconds old and flushes them. bdflush does most of the work. BH> Is it possible to reduce the sync_old_buffers() routine to soemthing like: No. Major performance problem. Eric -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 1:03 ` Eric W. Biederman @ 1998-08-28 9:09 ` Bernhard Heidegger 1998-08-28 13:14 ` Eric W. Biederman 1998-08-28 21:36 ` Zlatko Calusic 1998-08-28 21:32 ` Zlatko Calusic 1 sibling, 2 replies; 16+ messages in thread From: Bernhard Heidegger @ 1998-08-28 9:09 UTC (permalink / raw) To: Eric W. Biederman Cc: Bernhard Heidegger, Zlatko.Calusic, H. Peter Anvin, Linux Kernel List, Linux-MM List >>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes: >>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes: >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: >>>> Bernhard Heidegger <bheide@hyperwave.com> writes: >>>> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: >>>> >>>> >> "H. Peter Anvin" <hpa@transmeta.com> writes: >>>> >> > >>>> >> > bdflush yes, but update is not obsolete. >>>> >> > >>>> >> > It is still needed if you want to make sure data (and metadata) >>>> >> > eventually gets written to disk. >>>> >> > >>>> >> > Of course, you can run without update, but then don't bother if you >>>> >> > lose file in system crash, even if you edited it and saved it few >>>> >> > hours ago. :) >>>> >> > >>>> >> > Update is very important if you have lots of RAM in your computer. >>>> >> > >>>> >> >>>> >> Oh. I guess my next question then is "why", as why can't this be done >>>> >> by kflushd as well? >>>> >> >>>> >>>> >> To tell you the truth, I'm not sure why, these days. >>>> >>>> >> I thought it was done this way (update running in userspace) so to >>>> >> have control how often buffers get flushed. But, I believe bdflush >>>> >> program had this functionality, and it is long gone (as you correctly >>>> >> noticed). >>>> >>>> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This >>>> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers >>>> and sync_inodes before it goes through the dirty buffer lust (to write >>>> some dirty buffers); the kflushd only writes some dirty buffers dependent >>>> on the sysctl parameters. >>>> If I'm wrong, please feel free to correct me! >>>> >>>> You are not wrong. >>>> Update flushes metadata blocks every 5 seconds, and data block every >>>> 30 seconds. BH> My version of update (something around Slakware 3.4) does the following: BH> 1.) calls bdflush(1,0) (fs/buffer.c:sys_bdflush) which will call BH> sync_old_buffers() and return BH> 2.) only if the bdflush(1,0) fails (it returns < 0) it returns to the BH> old behavior of sync()ing every 30 seconds BH> But case 2) should only happen on really old kernels; on newer kernels BH> (I'm using 2.0.34) the bdflush() should never fail. BH> But as I told, sync_old_buffers() do: BH> 1.) sync_supers(0) BH> 2.) sync_inodes(0) BH> 3.) go through dirty buffer list and may flush some buffers BH> Conclusion: the meta data get synced every 5 seconds and some buffers may BH> be flushed. >>>> Questions is why can't this functionality be integrated in the kernel, >>>> so we don't have to run yet another daemon? >> We can do this in kernel thread but I don't see the win. I don't have a problem with the user level thing (so I can decide to not start it ;-) BH> Good question, but I've another one: IMHO sync_old_buffers (especially BH> the for loop) do similar things as the kflushd. Why?? >> kflushd removes buffers only when we are low on memory, and unconditionally. >> bdflush lets buffers sit for 30 seconds and every 5 seconds it checks >> for buffers that are at least 30 seconds old and flushes them. Ahh, is this bh->b_flushtime? >> bdflush does most of the work. Yes, I know :-( BH> Is it possible to reduce the sync_old_buffers() routine to soemthing like: >> No. Major performance problem. Why? Imagine an application which has most of the (index) file pages in memory and many of the pages are dirty. bdflush will flush the pages regularly, but the pages will get dirty immediately again. If you can be sure, that the power cannot fail the performance should be much better without bdflush, because kflushd has to write pages only if the system is running low on memory... Bernhard get my pgp key from a public keyserver (keyID=0x62446355) ----------------------------------------------------------------------------- Bernhard Heidegger bheide@hyperwave.com Hyperwave Software Research & Development Schloegelgasse 9/1, A-8010 Graz Voice: ++43/316/820918-25 Fax: ++43/316/820918-99 ----------------------------------------------------------------------------- -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 9:09 ` Bernhard Heidegger @ 1998-08-28 13:14 ` Eric W. Biederman 1998-08-28 16:03 ` Bernhard Heidegger 1998-08-28 21:47 ` Zlatko Calusic 1998-08-28 21:36 ` Zlatko Calusic 1 sibling, 2 replies; 16+ messages in thread From: Eric W. Biederman @ 1998-08-28 13:14 UTC (permalink / raw) To: Bernhard Heidegger Cc: Eric W. Biederman, Zlatko.Calusic, H. Peter Anvin, Linux Kernel List, Linux-MM List >>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes: >>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes: >>> bdflush lets buffers sit for 30 seconds and every 5 seconds it checks >>> for buffers that are at least 30 seconds old and flushes them. BH> Ahh, is this bh->b_flushtime? yes. >>> bdflush does most of the work. BH> Yes, I know :-( BH> Is it possible to reduce the sync_old_buffers() routine to soemthing like: >>> No. Major performance problem. BH> Why? BH> Imagine an application which has most of the (index) file pages in memory BH> and many of the pages are dirty. bdflush will flush the pages regularly, BH> but the pages will get dirty immediately again. BH> If you can be sure, that the power cannot fail the performance should be BH> much better without bdflush, because kflushd has to write pages only if BH> the system is running low on memory... The performance improvement comes when looking for free memory. In most cases bdflush's slow but steady writing of pages keeps buffers clean. When the application wants more memory with bdflush in the background unsually the pages it needs will be clean (because the I/O started before the application needed it), so they can just be dropped out of memory. Relying on kflushd means nothing is written until an application needs the memory and then it must wait until something is written to disk, which is much slower. Further a) garanteeing no power failure is hard. b) generally there is so much data on the disk you must write it sometime, because you can't hold it all in memory. c) I have trouble imagining a case where a small file would be rewritten continually. Eric -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 13:14 ` Eric W. Biederman @ 1998-08-28 16:03 ` Bernhard Heidegger 1998-08-28 22:03 ` Zlatko Calusic 1998-08-28 21:47 ` Zlatko Calusic 1 sibling, 1 reply; 16+ messages in thread From: Bernhard Heidegger @ 1998-08-28 16:03 UTC (permalink / raw) To: Eric W. Biederman Cc: Bernhard Heidegger, Zlatko.Calusic, H. Peter Anvin, Linux Kernel List, Linux-MM List >>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes: >>>> No. Major performance problem. BH> Why? BH> Imagine an application which has most of the (index) file pages in memory BH> and many of the pages are dirty. bdflush will flush the pages regularly, BH> but the pages will get dirty immediately again. BH> If you can be sure, that the power cannot fail the performance should be BH> much better without bdflush, because kflushd has to write pages only if BH> the system is running low on memory... >> The performance improvement comes when looking for free memory. In >> most cases bdflush's slow but steady writing of pages keeps buffers >> clean. When the application wants more memory with bdflush in the >> background unsually the pages it needs will be clean (because the I/O >> started before the application needed it), so they can just be dropped >> out of memory. Relying on kflushd means nothing is written until an >> application needs the memory and then it must wait until something is >> written to disk, which is much slower. >> Further >> a) garanteeing no power failure is hard. Use and UPS and regularly flush/sync the primary data to disk from the application >> b) generally there is so much data on the disk you must write it >> sometime, because you can't hold it all in memory. only a question of how much RAM you can put in your PC >> c) I have trouble imagining a case where a small file would be rewritten >> continually. Not really small, but a database application may use btree based indexes, where many blocks will get dirty when inserting/deleting data. If you flush the dirty buffers and the next insertion dirty the same buffer(s) you have lost performance (Note: the btree based indexes are secondary data; you can rebuild it from scratch if the system fails) Bernhard get my pgp key from a public keyserver (keyID=0x62446355) ----------------------------------------------------------------------------- Bernhard Heidegger bheide@hyperwave.com Hyperwave Software Research & Development Schloegelgasse 9/1, A-8010 Graz Voice: ++43/316/820918-25 Fax: ++43/316/820918-99 ----------------------------------------------------------------------------- -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 16:03 ` Bernhard Heidegger @ 1998-08-28 22:03 ` Zlatko Calusic 1998-08-31 8:32 ` Bernhard Heidegger 0 siblings, 1 reply; 16+ messages in thread From: Zlatko Calusic @ 1998-08-28 22:03 UTC (permalink / raw) To: Bernhard Heidegger Cc: Eric W. Biederman, H. Peter Anvin, Linux Kernel List, Linux-MM List Bernhard Heidegger <bheide@hyperwave.com> writes: > >>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes: > > >>>> No. Major performance problem. > > BH> Why? > > BH> Imagine an application which has most of the (index) file pages in memory > BH> and many of the pages are dirty. bdflush will flush the pages regularly, > BH> but the pages will get dirty immediately again. > BH> If you can be sure, that the power cannot fail the performance should be > BH> much better without bdflush, because kflushd has to write pages only if > BH> the system is running low on memory... > > >> The performance improvement comes when looking for free memory. In > >> most cases bdflush's slow but steady writing of pages keeps buffers > >> clean. When the application wants more memory with bdflush in the > >> background unsually the pages it needs will be clean (because the I/O > >> started before the application needed it), so they can just be dropped > >> out of memory. Relying on kflushd means nothing is written until an > >> application needs the memory and then it must wait until something is > >> written to disk, which is much slower. > > >> Further > >> a) garanteeing no power failure is hard. > > Use and UPS and regularly flush/sync the primary data to disk from > the application Update/bdflush costs you nothing. UPS costs you lots of money. Big difference. Also, flushing/syncing data to disk doesn't always mean data really got to media. Check your favorite sync(2) manpage. :) Using completely synchronous API in applications would consideraly cut performances down. Why would your application wait for disk to commit buffers, when your CPU can do other useful things in the meantime. Also, don't forget that disk latency times are measured in milliseconds, where modern CPU's run in units of (almost) nanoseconds. > > >> b) generally there is so much data on the disk you must write it > >> sometime, because you can't hold it all in memory. > > only a question of how much RAM you can put in your PC Still requires money. :) > > >> c) I have trouble imagining a case where a small file would be rewritten > >> continually. > > Not really small, but a database application may use btree based indexes, > where many blocks will get dirty when inserting/deleting data. If you flush > the dirty buffers and the next insertion dirty the same buffer(s) you have > lost performance (Note: the btree based indexes are secondary data; you > can rebuild it from scratch if the system fails) > Right, we agree. But performance doesn't go down if you write buffers every few tens of seconds. That is a LOT of time, if you ask your application. Some of them never get so old. :) And (big) databases mostly like to have their own memory management, because "they know better". -- Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr> --------------------------------------------------------------------- Vi is the God of editors. Emacs is the editor of Gods. -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 22:03 ` Zlatko Calusic @ 1998-08-31 8:32 ` Bernhard Heidegger 0 siblings, 0 replies; 16+ messages in thread From: Bernhard Heidegger @ 1998-08-31 8:32 UTC (permalink / raw) To: Zlatko.Calusic; +Cc: Bernhard Heidegger, Linux Kernel List, Linux-MM List >>>>> "Z>" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: Z>> Bernhard Heidegger <bheide@hyperwave.com> writes: >> >>>>> ">" == Eric W Biederman <ebiederm@inetnebr.com> writes: >> >> >>>> No. Major performance problem. >> BH> Why? >> BH> Imagine an application which has most of the (index) file pages in memory BH> and many of the pages are dirty. bdflush will flush the pages regularly, BH> but the pages will get dirty immediately again. BH> If you can be sure, that the power cannot fail the performance should be BH> much better without bdflush, because kflushd has to write pages only if BH> the system is running low on memory... >> >> >> The performance improvement comes when looking for free memory. In >> >> most cases bdflush's slow but steady writing of pages keeps buffers >> >> clean. When the application wants more memory with bdflush in the >> >> background unsually the pages it needs will be clean (because the I/O >> >> started before the application needed it), so they can just be dropped >> >> out of memory. Relying on kflushd means nothing is written until an >> >> application needs the memory and then it must wait until something is >> >> written to disk, which is much slower. >> >> >> Further >> >> a) garanteeing no power failure is hard. >> >> Use and UPS and regularly flush/sync the primary data to disk from >> the application Z>> Update/bdflush costs you nothing. UPS costs you lots of money. Big Z>> difference. Yes, but on a real big server where performance does matter (greetings from Godzilla ;-) you will have an UPS anyway... Z>> Also, flushing/syncing data to disk doesn't always mean data really Z>> got to media. Check your favorite sync(2) manpage. :) Correct, but you doesn't win anything with update/bdflush in this case. Z>> Using completely synchronous API in applications would consideraly cut Z>> performances down. Why would your application wait for disk to commit Z>> buffers, when your CPU can do other useful things in the meantime. Z>> Also, don't forget that disk latency times are measured in Z>> milliseconds, where modern CPU's run in units of (almost) nanoseconds. >> >> >> b) generally there is so much data on the disk you must write it >> >> sometime, because you can't hold it all in memory. >> >> only a question of how much RAM you can put in your PC Z>> Still requires money. :) RAM isn't that expensive anymore >> >> >> c) I have trouble imagining a case where a small file would be rewritten >> >> continually. >> >> Not really small, but a database application may use btree based indexes, >> where many blocks will get dirty when inserting/deleting data. If you flush >> the dirty buffers and the next insertion dirty the same buffer(s) you have >> lost performance (Note: the btree based indexes are secondary data; you >> can rebuild it from scratch if the system fails) >> Z>> Right, we agree. But performance doesn't go down if you write buffers Z>> every few tens of seconds. That is a LOT of time, if you ask your Z>> application. Some of them never get so old. :) Hey, I speak of (database) applications which (should ;-) run until the earth go down :-) I did some measurements with our application and there were peaks which were 10 times higher than the average time. I will make some further tests in order to see if the overall performance drops. Anyway, this application is also used interactively and if you try to get some data during such a peak you'll have to wait ;-) Z>> And (big) databases mostly like to have their own memory management, Z>> because "they know better". That's another point I agree with you, but this is a really big task... Bernhard get my pgp key from a public keyserver (keyID=0x62446355) ----------------------------------------------------------------------------- Bernhard Heidegger bheide@hyperwave.com Hyperwave Software Research & Development Schloegelgasse 9/1, A-8010 Graz Voice: ++43/316/820918-25 Fax: ++43/316/820918-99 ----------------------------------------------------------------------------- -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 13:14 ` Eric W. Biederman 1998-08-28 16:03 ` Bernhard Heidegger @ 1998-08-28 21:47 ` Zlatko Calusic 1 sibling, 0 replies; 16+ messages in thread From: Zlatko Calusic @ 1998-08-28 21:47 UTC (permalink / raw) To: Eric W. Biederman Cc: Bernhard Heidegger, H. Peter Anvin, Linux Kernel List, Linux-MM List ebiederm@inetnebr.com (Eric W. Biederman) writes: > >>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes: > BH> Imagine an application which has most of the (index) file pages in memory > BH> and many of the pages are dirty. bdflush will flush the pages regularly, > BH> but the pages will get dirty immediately again. > BH> If you can be sure, that the power cannot fail the performance should be > BH> much better without bdflush, because kflushd has to write pages only if > BH> the system is running low on memory... > > The performance improvement comes when looking for free memory. In > most cases bdflush's slow but steady writing of pages keeps buffers > clean. When the application wants more memory with bdflush in the > background unsually the pages it needs will be clean (because the I/O > started before the application needed it), so they can just be dropped > out of memory. Relying on kflushd means nothing is written until an > application needs the memory and then it must wait until something is > written to disk, which is much slower. Not absolutely true. kflushd flushes dirty buffers not when they're all dirty, but when percentage of dirty buffers goes above the threshold. And that threshold is tunable, default value as of recent kernels is 40%. So even if kflushd didn't run in time, that only means you have *up* to 40% of dirty buffers. Other 60% or more are clean. We're here speaking of first parameter in /proc/sys/vm/bdflush. It was 60 initially, but lowered recently (few months ago, half a year?) due to problems with buffers at that time. > > Further > a) garanteeing no power failure is hard. Here I entirely agree. UPS' cost much more than update/bdflush. :) > b) generally there is so much data on the disk you must write it > sometime, because you can't hold it all in memory. Right. > c) I have trouble imagining a case where a small file would be rewritten > continually. > It happens. Otherwise we wouldn't need buffers at all. :) Maybe only to achieve asynchrony. Think of metadata, and operations of creating/deleting lots of files in the directory, and similar. Imagine a busy news/proxy server. -- Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr> --------------------------------------------------------------------- Recursive, adj.; see Recursive. -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 9:09 ` Bernhard Heidegger 1998-08-28 13:14 ` Eric W. Biederman @ 1998-08-28 21:36 ` Zlatko Calusic 1 sibling, 0 replies; 16+ messages in thread From: Zlatko Calusic @ 1998-08-28 21:36 UTC (permalink / raw) To: Bernhard Heidegger Cc: Eric W. Biederman, H. Peter Anvin, Linux Kernel List, Linux-MM List Bernhard Heidegger <bheide@hyperwave.com> writes: > >>>> Questions is why can't this functionality be integrated in the kernel, > >>>> so we don't have to run yet another daemon? > > >> We can do this in kernel thread but I don't see the win. > > I don't have a problem with the user level thing (so I can decide to not > start it ;-) > You can always tune things up to you preference, even with update functionality in the kernel. If you set flushing period to say 12 hours, it's effectively like you killed update. :) -- Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr> --------------------------------------------------------------------- Do vampires get AIDS? -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 1:03 ` Eric W. Biederman 1998-08-28 9:09 ` Bernhard Heidegger @ 1998-08-28 21:32 ` Zlatko Calusic 1 sibling, 0 replies; 16+ messages in thread From: Zlatko Calusic @ 1998-08-28 21:32 UTC (permalink / raw) To: Eric W. Biederman Cc: Bernhard Heidegger, H. Peter Anvin, Linux Kernel List, Linux-MM List ebiederm@inetnebr.com (Eric W. Biederman) writes: > >>>>> "BH" == Bernhard Heidegger <bheide@hyperwave.com> writes: > > >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: > >>> Bernhard Heidegger <bheide@hyperwave.com> writes: > >>> >>>>> ">" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes: > >>> > >>> >> "H. Peter Anvin" <hpa@transmeta.com> writes: > >>> >> > > >>> >> > bdflush yes, but update is not obsolete. > >>> >> > > >>> >> > It is still needed if you want to make sure data (and metadata) > >>> >> > eventually gets written to disk. > >>> >> > > >>> >> > Of course, you can run without update, but then don't bother if you > >>> >> > lose file in system crash, even if you edited it and saved it few > >>> >> > hours ago. :) > >>> >> > > >>> >> > Update is very important if you have lots of RAM in your computer. > >>> >> > > >>> >> > >>> >> Oh. I guess my next question then is "why", as why can't this be done > >>> >> by kflushd as well? > >>> >> > >>> > >>> >> To tell you the truth, I'm not sure why, these days. > >>> > >>> >> I thought it was done this way (update running in userspace) so to > >>> >> have control how often buffers get flushed. But, I believe bdflush > >>> >> program had this functionality, and it is long gone (as you correctly > >>> >> noticed). > >>> > >>> IMHO, update/bdflush (in user space) calls sys_bdflush regularly. This > >>> function (fs/buffer.c) calls sync_old_buffers() which itself sync_supers > >>> and sync_inodes before it goes through the dirty buffer lust (to write > >>> some dirty buffers); the kflushd only writes some dirty buffers dependent > >>> on the sysctl parameters. > >>> If I'm wrong, please feel free to correct me! > >>> > > >>> You are not wrong. > > >>> Update flushes metadata blocks every 5 seconds, and data block every > >>> 30 seconds. > > BH> My version of update (something around Slakware 3.4) does the following: > BH> 1.) calls bdflush(1,0) (fs/buffer.c:sys_bdflush) which will call > BH> sync_old_buffers() and return > BH> 2.) only if the bdflush(1,0) fails (it returns < 0) it returns to the > BH> old behavior of sync()ing every 30 seconds > > BH> But case 2) should only happen on really old kernels; on newer kernels > BH> (I'm using 2.0.34) the bdflush() should never fail. > > BH> But as I told, sync_old_buffers() do: > BH> 1.) sync_supers(0) > BH> 2.) sync_inodes(0) > BH> 3.) go through dirty buffer list and may flush some buffers > > BH> Conclusion: the meta data get synced every 5 seconds and some buffers may > BH> be flushed. > > >>> Questions is why can't this functionality be integrated in the kernel, > >>> so we don't have to run yet another daemon? > > We can do this in kernel thread but I don't see the win. > One daemon less to run. This should be enough. You have one less process running, you free some memory, and make things slightly cleaner. Not a big win, but small things make people happy. :) -- Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr> --------------------------------------------------------------------- Linux, WinNT and MS-DOS. The Good, The Bad and The Ugly. -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-26 22:49 ` [PATCH] 498+ days uptime Zlatko Calusic 1998-08-27 12:07 ` Bernhard Heidegger @ 1998-08-28 9:35 ` Stephen C. Tweedie 1998-08-28 22:16 ` Zlatko Calusic 1 sibling, 1 reply; 16+ messages in thread From: Stephen C. Tweedie @ 1998-08-28 9:35 UTC (permalink / raw) To: Zlatko.Calusic; +Cc: H. Peter Anvin, Linux Kernel List, Linux-MM List Hi, On 27 Aug 1998 00:49:55 +0200, Zlatko Calusic <Zlatko.Calusic@CARNet.hr> said: > I thought it was done this way (update running in userspace) so to > have control how often buffers get flushed. But, I believe bdflush > program had this functionality, and it is long gone (as you correctly > noticed). update(8) _is_ the old bdflush program. :) There are two entirely separate jobs being done. One is to flush all buffers which are beyond their dirty timelimit: that job is done by the bdflush syscall called by update/bdflush every 5 seconds. The second job is to trickle back some dirty buffers to disk if we are getting short of clean buffer space in memory. These are completely different jobs. They select which buffers and how many buffers to write based on different criteria, and they are woken up by different events. That's why we have two daemons. The fact that one spends its wait time in user mode and one spends its time in kernel mode is irrelevant; even if they were both kernel threads we'd still have two separate jobs needing done. > I'm crossposting this mail to linux-mm where some clever MM people can > be found. Hopefully we can get an explanation why do we still need > update. Because kflushd does not do the job which update needs to do. It does a different job. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 9:35 ` Stephen C. Tweedie @ 1998-08-28 22:16 ` Zlatko Calusic 1998-08-30 15:10 ` Stephen C. Tweedie 0 siblings, 1 reply; 16+ messages in thread From: Zlatko Calusic @ 1998-08-28 22:16 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: H. Peter Anvin, Linux Kernel List, Linux-MM List "Stephen C. Tweedie" <sct@redhat.com> writes: > Hi, > > On 27 Aug 1998 00:49:55 +0200, Zlatko Calusic <Zlatko.Calusic@CARNet.hr> > said: > > > I thought it was done this way (update running in userspace) so to > > have control how often buffers get flushed. But, I believe bdflush > > program had this functionality, and it is long gone (as you correctly > > noticed). > > update(8) _is_ the old bdflush program. :) I know. But in that old days, I believe, we had two daemons, update AND bdflush. They were started from the same binary, but their functionality was different. Too bad 1.2.13 can't be compiled in todays setups. :) > > There are two entirely separate jobs being done. One is to flush all > buffers which are beyond their dirty timelimit: that job is done by the > bdflush syscall called by update/bdflush every 5 seconds. The second > job is to trickle back some dirty buffers to disk if we are getting > short of clean buffer space in memory. > > These are completely different jobs. They select which buffers and how > many buffers to write based on different criteria, and they are woken up > by different events. That's why we have two daemons. The fact that one > spends its wait time in user mode and one spends its time in kernel mode > is irrelevant; even if they were both kernel threads we'd still have two > separate jobs needing done. Right, I agree entirely. Maybe I should reformulate my question. :) Why is the former in the userspace? I believe it is not that hard to code bdflush in the kernel, where we lose nothing, but save few pages of memory. One less process to run, as I already pointed out. You probably did have an opportunity to visit Paul Gortmaker's page, helpful for those with low memory machines. There you can find "few lines of assembly" program that replaces update. I ran that program for few years to save few kilobytes of memory on my old 386 / 5MB RAM. > > > I'm crossposting this mail to linux-mm where some clever MM people can > > be found. Hopefully we can get an explanation why do we still need > > update. > > Because kflushd does not do the job which update needs to do. It does a > different job. > Yep, but allow me one more question, please. If I happen to get some free time (very unlikely) to code bdflush completely in the kernel, so we can get rid of update, now running as daemon, would you consider it for inclusion in the official kernel (sending patches to Linus, etc..)? -- Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr> --------------------------------------------------------------------- It's bad luck to be superstitious. -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] 498+ days uptime 1998-08-28 22:16 ` Zlatko Calusic @ 1998-08-30 15:10 ` Stephen C. Tweedie 0 siblings, 0 replies; 16+ messages in thread From: Stephen C. Tweedie @ 1998-08-30 15:10 UTC (permalink / raw) To: Zlatko.Calusic Cc: Stephen C. Tweedie, H. Peter Anvin, Linux Kernel List, Linux-MM List Hi, On 29 Aug 1998 00:16:34 +0200, Zlatko Calusic <Zlatko.Calusic@CARNet.hr> said: [re update/bdflush:] > Why is the former in the userspace? Simply because the latter is the only one to have been moved to the kernel. That happened because the trigger for bdflush is an internal kernel wait queue, whereas the trigger for update is a timer. Timers can be easily done in user space. > I believe it is not that hard to code bdflush in the kernel, where we > lose nothing, but save few pages of memory. One less process to run, > as I already pointed out. Dead easy. It will save memory; it will also, more importantly, save non-pageable memory (although the kernel thread will still need its own kernel stack, it will not need the extra page tables which accompany a user-space process). --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~1998-08-31 8:32 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <199808262153.OAA13651@cesium.transmeta.com>
1998-08-26 22:49 ` [PATCH] 498+ days uptime Zlatko Calusic
1998-08-27 12:07 ` Bernhard Heidegger
1998-08-27 12:21 ` Zlatko Calusic
1998-08-27 12:43 ` Bernhard Heidegger
1998-08-28 1:03 ` Eric W. Biederman
1998-08-28 9:09 ` Bernhard Heidegger
1998-08-28 13:14 ` Eric W. Biederman
1998-08-28 16:03 ` Bernhard Heidegger
1998-08-28 22:03 ` Zlatko Calusic
1998-08-31 8:32 ` Bernhard Heidegger
1998-08-28 21:47 ` Zlatko Calusic
1998-08-28 21:36 ` Zlatko Calusic
1998-08-28 21:32 ` Zlatko Calusic
1998-08-28 9:35 ` Stephen C. Tweedie
1998-08-28 22:16 ` Zlatko Calusic
1998-08-30 15:10 ` Stephen C. Tweedie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox