From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 3 Nov 2005 20:46:06 +0000 (GMT) From: Mel Gorman Subject: Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19 In-Reply-To: Message-ID: References: <4366C559.5090504@yahoo.com.au> <4366D469.2010202@yahoo.com.au> <20051101135651.GA8502@elte.hu> <1130854224.14475.60.camel@localhost><20051101142959.GA9272@elte.hu> <1130856555.14475.77.camel@localhost><20051101150142.GA10636@elte.hu> <1130858580.14475.98.camel@localhost><20051102084946.GA3930@elte.hu> <436880B8.1050207@yahoo.com.au><1130923969.15627.11.camel@localhost> <43688B74.20002@yahoo.com.au><255360000.1130943722@[10.10.2.4]> <4369824E.2020407@yahoo.com.au> <306020000.1131032193@[10.10.2.4]><1131032422.2839.8.camel@laptopd505.fenrus.org> <309420000.1131036740@[10.10.2.4]> <311050000.1131040276@[10.10.2.4]> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Linus Torvalds Cc: "Martin J. Bligh" , Arjan van de Ven , Nick Piggin , Dave Hansen , Ingo Molnar , Andrew Morton , kravetz@us.ibm.com, linux-mm , Linux Kernel Mailing List , lhms , Arjan van de Ven List-ID: On Thu, 3 Nov 2005, Linus Torvalds wrote: > > > On Thu, 3 Nov 2005, Martin J. Bligh wrote: > > > And I suspect that by default, there should be zero of them. Ie you'd have > > > to set them up the same way you now set up a hugetlb area. > > > > So ... if there are 0 by default, and I run for a while and dirty up > > memory, how do I free any pages up to put into them? Not sure how that > > works. > > You don't. > > Just face it - people who want memory hotplug had better know that > beforehand (and let's be honest - in practice it's only going to work in > virtualized environments or in environments where you can insert the new > bank of memory and copy it over and remove the old one with hw support). > > Same as hugetlb. > For HugeTLB, there are cases were the sysadmin won't configure the server because it's a tunable that can badly affect the machine if they get it wrong. In those cases, the users just get small pages, the performance penalty and are told to like it. > Nobody sane _cares_. Nobody sane is asking for these things. Only people > with special needs are asking for it, and they know their needs. > > You have to realize that the first rule of engineering is to work out the > balances. The undeniable fact is, that 99.99% of all users will never care > one whit, and memory management is complex and fragile. End result: the > 0.01% of users will have to do some manual configuration to keep things > simpler for the cases that really matter. > Ok, so lets consider the 99.99% of users then. One two machines, aim9 benchmarks posted during this thread show some improvements on page_test, fork_test and brk_test, the paths you would expect to be hit by these patches. They are very minor improvements but 99.99% of users benefit from this. Aim9 might be considered artifical so somewhere in that 99.99% of users are kernel developers who care about kbuild so here are the timings of "kernel untar ; make defconfig ; make" 2.6.14-rc5-mm1: 1093 seconds 2.6.14-rc5-mm1-mbuddy-v19-withoutdefrag 1089 seconds 2.6.14-rc5-mm1-mbuddy-v19-withdefrag:: 1086 seconds The withoutdefrag mark is with the core of anti-defrag disabled via a configure option. The option to disable was a separate patch produced during this thread. To be really honest, I don't think a configurable page allocator is a great idea. Building kernels is faster with this set of patches which a few people on this list care about. aim9 shows very minor improvements which benefit a very large number of people and 0.01% of people who care about fragmentation get lower fragmentation. Of course, maybe there is something magic with my test machines (or maybe I am willing it faster) so figures from other people wouldn't hurt whether they show gains or regressions. On my machine at least, 99.99% of people are still benefitting. I am going to wait to see if people post figures that show regressions before asking "are you still saying no?" to this set of patches > Because the case that really matters is the sane case. The one where we > - don't change memory (normal) > - only add memory (easy) > - only switch out memory with hardware support (ie the _hardware_ > supports parallel memory, and you can switch out a DIMM without > software ever really even noticing) > - have system maintainers that do strange things, but _know_ that. > > We simply DO NOT CARE about some theoretical "general case", because the > general case is (a) insane and (b) impossible to cater to without > excessive complexity. > > Guys, a kernel developer needs to know when to say NO. > > And we say NO, HELL NO!! to generic software-only memory hotplug. > > If you are running a DB that needs to benchmark well, you damn well KNOW > IT IN ADVANCE, AND YOU TUNE FOR IT. > > Nobody takes a random machine and says "ok, we'll now put our most > performance-critical database on this machine, and oh, btw, you can't > reboot it and tune for it beforehand". And if you have such a person, you > need to learn to IGNORE THE CRAZY PEOPLE. > > When you hear voices in your head that tell you to shoot the pope, do you > do what they say? Same thing goes for customers and managers. They are the > crazy voices in your head, and you need to set them right, not just > blindly do what they ask for. > > Linus > -- Mel Gorman Part-time Phd Student Java Applications Developer University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org