* zone movable patches comments @ 2007-07-09 7:50 Nick Piggin 2007-07-09 10:30 ` KAMEZAWA Hiroyuki 2007-07-09 11:04 ` Mel Gorman 0 siblings, 2 replies; 19+ messages in thread From: Nick Piggin @ 2007-07-09 7:50 UTC (permalink / raw) To: Mel Gorman, Linux Memory Management, Andrew Morton Hi Mel, Just had a bit of a look at the zone movable stuff in -mm... Firstly, would it be possible to list all the dependant patches in that set, or is it just those few that are contiguous in Andrew's series file? A few comments -- can it be made configurable? I guess there is not much overhead if the zone is not populated, but there has been a fair bit of work towards taking out unneeded zones. Also, I don't really like the name kernelcore= to specify mem-sizeof movable zone. Could it be renamed and stated in the positive, like movable_mem= or reserve_movable_mem=? And can that option be written up in Documentation? What is the status of these patches? Are they working and pretty well ready to be merged for 2.6.23? Thanks, Nick -- SUSE Labs, Novell Inc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-09 7:50 zone movable patches comments Nick Piggin @ 2007-07-09 10:30 ` KAMEZAWA Hiroyuki 2007-07-09 11:04 ` Mel Gorman 1 sibling, 0 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2007-07-09 10:30 UTC (permalink / raw) To: Nick Piggin; +Cc: Mel Gorman, Linux Memory Management, Andrew Morton On Mon, 09 Jul 2007 17:50:41 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > A few comments -- can it be made configurable? I guess there is not > much overhead if the zone is not populated, but there has been a fair > bit of work towards taking out unneeded zones. > Hi, following is a patch for configurable zone, which I used in old days. - http://marc.info/?l=linux-mm&m=117315623423467&w=2 Will this kind of patch be help ? > Also, I don't really like the name kernelcore= to specify mem-sizeof > movable zone. Could it be renamed and stated in the positive, like > movable_mem= or reserve_movable_mem=? And can that option be written > up in Documentation? > As far as I remember, before Mel's work, I named "kernelcore=" ops because "max_dma=", "mem=", ....options are used for specifing the amount of memory from lower address...... But I have no strong opinion. > What is the status of these patches? Are they working and pretty well > ready to be merged for 2.6.23? > At least, works well in our (ia64/NUMA) environment. Memo: My thinking after OLS ZONE_MOVABLE is necessary for making guarantee to allocate only movable memory from some range of physical memory. It is useful but I know people doesn't like it. As an another option, I'm now consdering to specify memory range as "for hotremove" by page-type not by zone. This may enable us to avoid adding new zone. But I have no concrete idea now and will take some amount of time. For NUMA node-hotplug, I think that I have to add another boot ops. (For example, boot option for hot-add *removable nodes* after boot.) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-09 7:50 zone movable patches comments Nick Piggin 2007-07-09 10:30 ` KAMEZAWA Hiroyuki @ 2007-07-09 11:04 ` Mel Gorman 2007-07-09 11:44 ` KAMEZAWA Hiroyuki ` (2 more replies) 1 sibling, 3 replies; 19+ messages in thread From: Mel Gorman @ 2007-07-09 11:04 UTC (permalink / raw) To: Nick Piggin; +Cc: Linux Memory Management, Andrew Morton On (09/07/07 17:50), Nick Piggin didst pronounce: > Hi Mel, > > Just had a bit of a look at the zone movable stuff in -mm... Great. > Firstly, > would it be possible to list all the dependant patches in that set, or > is it just those few that are contiguous in Andrew's series file? > add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch and the few that are contiguous. I'm beginning to test with the following series file add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch create-the-zone_movable-zone.patch create-the-zone_movable-zone-fix.patch create-the-zone_movable-zone-fix-2.patch allow-huge-page-allocations-to-use-gfp_high_movable.patch allow-huge-page-allocations-to-use-gfp_high_movable-fix.patch allow-huge-page-allocations-to-use-gfp_high_movable-fix-2.patch allow-huge-page-allocations-to-use-gfp_high_movable-fix-3.patch handle-kernelcore=-generic.patch handle-kernelcore=-generic-fix.patch There was a minor reject in add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch but otherwise applied smoothly. > A few comments -- can it be made configurable? I guess there is not > much overhead if the zone is not populated, but there has been a fair > bit of work towards taking out unneeded zones. > It could be made configurable as zone_type already has configurable zones. However, as it is that would always be set on distro kernels for CONFIG_HUGETLB_PAGE, is there any point? It might make sense for embedded systems but I've received pushback from Andrew before for trying to introduce config options that affect the allocator before. > Also, I don't really like the name kernelcore= to specify mem-sizeof > movable zone. Could it be renamed and stated in the positive, like > movable_mem= or reserve_movable_mem=? It could but it was named this way for a reason. It was more important that the administrator get the amount of memory for non-movable allocations correct than movable allocations. If the size of ZONE_MOVABLE is wrong, the hugepage pool may not be able to grow as large as desired. If the size of memory usable of non-movable allocations is wrong, it's worse. > And can that option be written > up in Documentation? > Documentation/kernel-parameters.txt > What is the status of these patches? Are they working and pretty well > ready to be merged for 2.6.23? > I have not encountered problems with them in a long time. I'm re-testing now using 2.6.22 as a baseline but I believe they are ready for merging to 2.6.23. Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-09 11:04 ` Mel Gorman @ 2007-07-09 11:44 ` KAMEZAWA Hiroyuki 2007-07-09 12:15 ` Nick Piggin 2007-07-09 17:39 ` Christoph Lameter 2 siblings, 0 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2007-07-09 11:44 UTC (permalink / raw) To: Mel Gorman; +Cc: Nick Piggin, Linux Memory Management, Andrew Morton On Mon, 9 Jul 2007 12:04:57 +0100 mel@skynet.ie (Mel Gorman) wrote: > It could but it was named this way for a reason. It was more important that > the administrator get the amount of memory for non-movable allocations > correct than movable allocations. If the size of ZONE_MOVABLE is wrong, > the hugepage pool may not be able to grow as large as desired. If the size > of memory usable of non-movable allocations is wrong, it's worse. > I'd like to vote for kernelcore= rather than movable_mem= :) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-09 11:04 ` Mel Gorman 2007-07-09 11:44 ` KAMEZAWA Hiroyuki @ 2007-07-09 12:15 ` Nick Piggin 2007-07-09 13:21 ` Mel Gorman 2007-07-09 17:39 ` Christoph Lameter 2 siblings, 1 reply; 19+ messages in thread From: Nick Piggin @ 2007-07-09 12:15 UTC (permalink / raw) To: Mel Gorman; +Cc: Linux Memory Management, Andrew Morton, kamezawa.hiroyu Mel Gorman wrote: > On (09/07/07 17:50), Nick Piggin didst pronounce: > >>Hi Mel, >> >>Just had a bit of a look at the zone movable stuff in -mm... > > > Great. > > >>Firstly, >>would it be possible to list all the dependant patches in that set, or >>is it just those few that are contiguous in Andrew's series file? >> > > > add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch > and the few that are contiguous. I'm beginning to test with the > following series file > > add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch > create-the-zone_movable-zone.patch > create-the-zone_movable-zone-fix.patch > create-the-zone_movable-zone-fix-2.patch > allow-huge-page-allocations-to-use-gfp_high_movable.patch > allow-huge-page-allocations-to-use-gfp_high_movable-fix.patch > allow-huge-page-allocations-to-use-gfp_high_movable-fix-2.patch > allow-huge-page-allocations-to-use-gfp_high_movable-fix-3.patch > handle-kernelcore=-generic.patch > handle-kernelcore=-generic-fix.patch > > There was a minor reject in > add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch > but otherwise applied smoothly. Thanks. >>A few comments -- can it be made configurable? I guess there is not >>much overhead if the zone is not populated, but there has been a fair >>bit of work towards taking out unneeded zones. >> > > > It could be made configurable as zone_type already has configurable > zones. However, as it is that would always be set on distro kernels for > CONFIG_HUGETLB_PAGE, is there any point? It might make sense for embedded > systems but I've received pushback from Andrew before for trying to introduce > config options that affect the allocator before. I think yes it would be a good idea. If it is done for things like ZONE_DMA which is a fairly core bit of kernel, I don't see why it shouldn't be done for this. I'm sure it can be made to look niceish ;) (I haven't looked at Kame's patch yet, though). >>Also, I don't really like the name kernelcore= to specify mem-sizeof >>movable zone. Could it be renamed and stated in the positive, like >>movable_mem= or reserve_movable_mem=? > > > It could but it was named this way for a reason. It was more important that > the administrator get the amount of memory for non-movable allocations > correct than movable allocations. If the size of ZONE_MOVABLE is wrong, > the hugepage pool may not be able to grow as large as desired. If the size > of memory usable of non-movable allocations is wrong, it's worse. kernelcore= has some fairly strong connotations outside the movable zone functionality, however. If you have a 16GB highmem machine, and you want 8GB of movable zone, do you say kernelcore=8GB? Does that give you the other 8GB in kernel addressable memory? :) What if some other functionality is introduced that also wants to reserve a chunk of memory? How do you distinguish between them? Why not just specify in the help text that the admin should boot the kernel without that parameter first to check how much memory they have before using it... If they wanted to break the kernel by doing something silly, then I don't see how kernelcore is really better than reclaimable_mem... >>And can that option be written >>up in Documentation? >> > > > Documentation/kernel-parameters.txt Thanks, I didn't see the kernelcore patches. >>What is the status of these patches? Are they working and pretty well >>ready to be merged for 2.6.23? >> > > > I have not encountered problems with them in a long time. I'm re-testing now > using 2.6.22 as a baseline but I believe they are ready for merging to 2.6.23. Cool. Would be nice to see them go upstream! -- SUSE Labs, Novell Inc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-09 12:15 ` Nick Piggin @ 2007-07-09 13:21 ` Mel Gorman 2007-07-10 7:57 ` Nick Piggin 2007-07-10 9:08 ` KAMEZAWA Hiroyuki 0 siblings, 2 replies; 19+ messages in thread From: Mel Gorman @ 2007-07-09 13:21 UTC (permalink / raw) To: Nick Piggin; +Cc: Linux Memory Management, Andrew Morton, kamezawa.hiroyu On (09/07/07 22:15), Nick Piggin didst pronounce: > Mel Gorman wrote: > >On (09/07/07 17:50), Nick Piggin didst pronounce: > > > >>Hi Mel, > >> > >>Just had a bit of a look at the zone movable stuff in -mm... > > > > > >Great. > > > > > >>Firstly, > >>would it be possible to list all the dependant patches in that set, or > >>is it just those few that are contiguous in Andrew's series file? > >> > > > > > >add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch > >and the few that are contiguous. I'm beginning to test with the > >following series file > > > >add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch > >create-the-zone_movable-zone.patch > >create-the-zone_movable-zone-fix.patch > >create-the-zone_movable-zone-fix-2.patch > >allow-huge-page-allocations-to-use-gfp_high_movable.patch > >allow-huge-page-allocations-to-use-gfp_high_movable-fix.patch > >allow-huge-page-allocations-to-use-gfp_high_movable-fix-2.patch > >allow-huge-page-allocations-to-use-gfp_high_movable-fix-3.patch > >handle-kernelcore=-generic.patch > >handle-kernelcore=-generic-fix.patch > > > >There was a minor reject in > >add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch > >but otherwise applied smoothly. > > Thanks. > > > >>A few comments -- can it be made configurable? I guess there is not > >>much overhead if the zone is not populated, but there has been a fair > >>bit of work towards taking out unneeded zones. > >> > > > > > >It could be made configurable as zone_type already has configurable > >zones. However, as it is that would always be set on distro kernels for > >CONFIG_HUGETLB_PAGE, is there any point? It might make sense for embedded > >systems but I've received pushback from Andrew before for trying to > >introduce > >config options that affect the allocator before. > > I think yes it would be a good idea. If it is done for things like ZONE_DMA > which is a fairly core bit of kernel, I don't see why it shouldn't be done > for this. I'm sure it can be made to look niceish ;) (I haven't looked at > Kame's patch yet, though). > I'm pretty sure it can be made look nice by changing enum zone_type to conditionally define ZONE_MOVABLE and define __GFP_MOVABLE to be 0 when it doesn't exist. I'll look at Kame's patch before starting in case it's nicer. > > >>Also, I don't really like the name kernelcore= to specify mem-sizeof > >>movable zone. Could it be renamed and stated in the positive, like > >>movable_mem= or reserve_movable_mem=? > > > > > >It could but it was named this way for a reason. It was more important that > >the administrator get the amount of memory for non-movable allocations > >correct than movable allocations. If the size of ZONE_MOVABLE is wrong, > >the hugepage pool may not be able to grow as large as desired. If the size > >of memory usable of non-movable allocations is wrong, it's worse. > > kernelcore= has some fairly strong connotations outside the movable > zone functionality, however. > > If you have a 16GB highmem machine, and you want 8GB of movable zone, > do you say kernelcore=8GB? Yes but depending the topology of memory, the kernelcore portion may not be sized exactly as you request. For example, if you have many nodes of different sizes, kernelcore may not spread evently. Secondly, the movable zone can only use pages from the highest active zone. To illustrate the "highest" zone problem - lets say I have a 2GB 32 bit x86 machine and I specify kernelcore=512MB, I'll really get a kernelcore of 896MB because ZONE_MOVABLE can only use HIGHMEM pages in this case. > Does that give you the other 8GB in kernel > addressable memory? :) What if some other functionality is introduced > that also wants to reserve a chunk of memory? How do you distinguish > between them? > Right now I wouldn't distinguish between them. So if another user reserved a portion of memory, it may be in kernelcore only, movable only or some combination thereof. > Why not just specify in the help text that the admin should boot the > kernel without that parameter first to check how much memory they > have before using it... If they wanted to break the kernel by doing > something silly, then I don't see how kernelcore is really better > than reclaimable_mem... > It's simply harder to break a machine by getting kernelcore wrong than it is to get reclaimable_mem wrong. If the available memory to the machine is changed, it will not have unexpected results on the next boot with kernelcore and if you have a cluster with differing amounts of memory in each machine, it'll be easier to have one kernelcore value for all of them than unique reclaimable_mem ones. > >>And can that option be written > >>up in Documentation? > >> > > > > > >Documentation/kernel-parameters.txt > > Thanks, I didn't see the kernelcore patches. > > > >>What is the status of these patches? Are they working and pretty well > >>ready to be merged for 2.6.23? > >> > > > > > >I have not encountered problems with them in a long time. I'm re-testing > >now > >using 2.6.22 as a baseline but I believe they are ready for merging to > >2.6.23. > > Cool. Would be nice to see them go upstream! > I agree. The zone at least relaxes some restrictions on sizing the hugepage pool at runtime and it's predictable. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-09 13:21 ` Mel Gorman @ 2007-07-10 7:57 ` Nick Piggin 2007-07-10 9:21 ` Andy Whitcroft 2007-07-10 9:51 ` Mel Gorman 2007-07-10 9:08 ` KAMEZAWA Hiroyuki 1 sibling, 2 replies; 19+ messages in thread From: Nick Piggin @ 2007-07-10 7:57 UTC (permalink / raw) To: Mel Gorman; +Cc: Linux Memory Management, Andrew Morton, kamezawa.hiroyu Mel Gorman wrote: > On (09/07/07 22:15), Nick Piggin didst pronounce: > >>Mel Gorman wrote: >>kernelcore= has some fairly strong connotations outside the movable >>zone functionality, however. >> >>If you have a 16GB highmem machine, and you want 8GB of movable zone, >>do you say kernelcore=8GB? > > > Yes but depending the topology of memory, the kernelcore portion may not > be sized exactly as you request. For example, if you have many nodes of > different sizes, kernelcore may not spread evently. Secondly, the movable > zone can only use pages from the highest active zone. To illustrate the > "highest" zone problem - lets say I have a 2GB 32 bit x86 machine and I > specify kernelcore=512MB, I'll really get a kernelcore of 896MB because > ZONE_MOVABLE can only use HIGHMEM pages in this case. kernelcore suggests some fundamental VM tunable, rather than just a random shot in the dark that roughly relates to the amount of memory you want to reserve for your movable zone. >>Does that give you the other 8GB in kernel >>addressable memory? :) What if some other functionality is introduced >>that also wants to reserve a chunk of memory? How do you distinguish >>between them? >> > > > Right now I wouldn't distinguish between them. So if another user > reserved a portion of memory, it may be in kernelcore only, movable only > or some combination thereof. Does not seem very future proof. >>Why not just specify in the help text that the admin should boot the >>kernel without that parameter first to check how much memory they >>have before using it... If they wanted to break the kernel by doing >>something silly, then I don't see how kernelcore is really better >>than reclaimable_mem... >> > > > It's simply harder to break a machine by getting kernelcore wrong than > it is to get reclaimable_mem wrong. If the available memory to the > machine is changed, it will not have unexpected results on the next boot > with kernelcore and if you have a cluster with differing amounts of > memory in each machine, it'll be easier to have one kernelcore value for > all of them than unique reclaimable_mem ones. No I really don't see why kernelcore=toosmall is any better than movable_mem=toobig. And why do you think the admin knows how much memory is enough to run the kernel, or why should that be the same between different sized machines? If you have a huge machine, you need much more addressable kernel memory for the mem_map array before you even think about anything else. Actually, it is more likely that the admin knows exactly how much memory they need to reserve (eg. for their database's shared memory segment or to hot unplug or whatever), and in that case it is much better to be able to specify movable_mem= and just be given exactly what you asked for and the kernel can be given the rest. If somebody is playing with this parameter, they definitely know what they are doing and they are not just blindly throwing it out over their cluster because it might be a good idea. -- SUSE Labs, Novell Inc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 7:57 ` Nick Piggin @ 2007-07-10 9:21 ` Andy Whitcroft 2007-07-10 9:54 ` Yasunori Goto 2007-07-10 9:51 ` Mel Gorman 1 sibling, 1 reply; 19+ messages in thread From: Andy Whitcroft @ 2007-07-10 9:21 UTC (permalink / raw) To: Nick Piggin Cc: Mel Gorman, Linux Memory Management, Andrew Morton, kamezawa.hiroyu Nick Piggin wrote: > Mel Gorman wrote: >> On (09/07/07 22:15), Nick Piggin didst pronounce: >> >>> Mel Gorman wrote: > >>> kernelcore= has some fairly strong connotations outside the movable >>> zone functionality, however. >>> >>> If you have a 16GB highmem machine, and you want 8GB of movable zone, >>> do you say kernelcore=8GB? >> >> >> Yes but depending the topology of memory, the kernelcore portion may not >> be sized exactly as you request. For example, if you have many nodes of >> different sizes, kernelcore may not spread evently. Secondly, the movable >> zone can only use pages from the highest active zone. To illustrate the >> "highest" zone problem - lets say I have a 2GB 32 bit x86 machine and I >> specify kernelcore=512MB, I'll really get a kernelcore of 896MB because >> ZONE_MOVABLE can only use HIGHMEM pages in this case. > > kernelcore suggests some fundamental VM tunable, rather than just > a random shot in the dark that roughly relates to the amount of > memory you want to reserve for your movable zone. > > >>> Does that give you the other 8GB in kernel >>> addressable memory? :) What if some other functionality is introduced >>> that also wants to reserve a chunk of memory? How do you distinguish >>> between them? >>> >> >> >> Right now I wouldn't distinguish between them. So if another user >> reserved a portion of memory, it may be in kernelcore only, movable only >> or some combination thereof. > > Does not seem very future proof. > > >>> Why not just specify in the help text that the admin should boot the >>> kernel without that parameter first to check how much memory they >>> have before using it... If they wanted to break the kernel by doing >>> something silly, then I don't see how kernelcore is really better >>> than reclaimable_mem... >>> >> >> >> It's simply harder to break a machine by getting kernelcore wrong than >> it is to get reclaimable_mem wrong. If the available memory to the >> machine is changed, it will not have unexpected results on the next boot >> with kernelcore and if you have a cluster with differing amounts of >> memory in each machine, it'll be easier to have one kernelcore value for >> all of them than unique reclaimable_mem ones. > > No I really don't see why kernelcore=toosmall is any better than > movable_mem=toobig. And why do you think the admin knows how much > memory is enough to run the kernel, or why should that be the same > between different sized machines? If you have a huge machine, you > need much more addressable kernel memory for the mem_map array > before you even think about anything else. > > Actually, it is more likely that the admin knows exactly how much > memory they need to reserve (eg. for their database's shared > memory segment or to hot unplug or whatever), and in that case > it is much better to be able to specify movable_mem= and just be > given exactly what you asked for and the kernel can be given the > rest. > > If somebody is playing with this parameter, they definitely know > what they are doing and they are not just blindly throwing it out > over their cluster because it might be a good idea. It feels very much that there are two usage models. Those who know how much "kernel" memory works for them and want whatever is left usable for their small/huge page workloads, and those who know how much they need for their DB and are happy for the system to have the rest. Both seem like valid use cases, both would have the same underlying implementation a sized ZONE_MOVABLE. How about we have two kernel options "kernelcore=" and "movable=" which would both size ZONE_MOVABLE. Both would be the minimum sizes, so the effective differences would be the rounding to whole pageblocks. -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 9:21 ` Andy Whitcroft @ 2007-07-10 9:54 ` Yasunori Goto 2007-07-10 10:12 ` Andy Whitcroft 0 siblings, 1 reply; 19+ messages in thread From: Yasunori Goto @ 2007-07-10 9:54 UTC (permalink / raw) To: Andy Whitcroft Cc: Nick Piggin, Mel Gorman, Linux Memory Management, Andrew Morton, kamezawa.hiroyu > > No I really don't see why kernelcore=toosmall is any better than > > movable_mem=toobig. And why do you think the admin knows how much > > memory is enough to run the kernel, or why should that be the same > > between different sized machines? If you have a huge machine, you > > need much more addressable kernel memory for the mem_map array > > before you even think about anything else. > > > > Actually, it is more likely that the admin knows exactly how much > > memory they need to reserve (eg. for their database's shared > > memory segment or to hot unplug or whatever), and in that case > > it is much better to be able to specify movable_mem= and just be > > given exactly what you asked for and the kernel can be given the > > rest. If hot-unplug is invoked after bootup, then movable_mem will be useful to specify removable memory size. It is true. However, if hot-add is invoked at first after bootup, movable_mem is not so useful. I think admin expects hot-add memory will be removable zone in many case, because he wish the memory for his application rather than for kernel. But, movable mem can't specify size of hot-add memory in the future. I suppose "kernelcore" is desirable for its case. > > If somebody is playing with this parameter, they definitely know > > what they are doing and they are not just blindly throwing it out > > over their cluster because it might be a good idea. > > It feels very much that there are two usage models. Those who know how > much "kernel" memory works for them and want whatever is left usable for > their small/huge page workloads, and those who know how much they need > for their DB and are happy for the system to have the rest. Both seem > like valid use cases, both would have the same underlying implementation > a sized ZONE_MOVABLE. > > How about we have two kernel options "kernelcore=" and "movable=" which > would both size ZONE_MOVABLE. Both would be the minimum sizes, so the > effective differences would be the rounding to whole pageblocks. I would like to vote it due to above mentioned. :-) Bye. -- Yasunori Goto -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 9:54 ` Yasunori Goto @ 2007-07-10 10:12 ` Andy Whitcroft 0 siblings, 0 replies; 19+ messages in thread From: Andy Whitcroft @ 2007-07-10 10:12 UTC (permalink / raw) To: Yasunori Goto Cc: Nick Piggin, Mel Gorman, Linux Memory Management, Andrew Morton, kamezawa.hiroyu Yasunori Goto wrote: >>> No I really don't see why kernelcore=toosmall is any better than >>> movable_mem=toobig. And why do you think the admin knows how much >>> memory is enough to run the kernel, or why should that be the same >>> between different sized machines? If you have a huge machine, you >>> need much more addressable kernel memory for the mem_map array >>> before you even think about anything else. >>> >>> Actually, it is more likely that the admin knows exactly how much >>> memory they need to reserve (eg. for their database's shared >>> memory segment or to hot unplug or whatever), and in that case >>> it is much better to be able to specify movable_mem= and just be >>> given exactly what you asked for and the kernel can be given the >>> rest. > > If hot-unplug is invoked after bootup, then movable_mem will be > useful to specify removable memory size. It is true. > > However, if hot-add is invoked at first after bootup, > movable_mem is not so useful. > I think admin expects hot-add memory will be removable zone in many > case, because he wish the memory for his application rather than > for kernel. > But, movable mem can't specify size of hot-add memory in the future. > I suppose "kernelcore" is desirable for its case. I would have expected either would interact successfully with hot-remove/hot-add. It makes sense to the administrator to say "I will be removing this much memory" movable_mem=N. For the hot-add case I would have expected a zero sized movable_mem would suffice, the new memory being added to and expanding the zone as it goes. I envisioned "kernelcore" and "movable_mem" (that name is nasty btw can anyone think of a better one) being minimum's. So the expansion of ZONE_MOVABLE on hot-plug of memory fits that semantically. I think what I am saying is you really want movable_mem=, another sane use-case. >>> If somebody is playing with this parameter, they definitely know >>> what they are doing and they are not just blindly throwing it out >>> over their cluster because it might be a good idea. >> It feels very much that there are two usage models. Those who know how >> much "kernel" memory works for them and want whatever is left usable for >> their small/huge page workloads, and those who know how much they need >> for their DB and are happy for the system to have the rest. Both seem >> like valid use cases, both would have the same underlying implementation >> a sized ZONE_MOVABLE. >> >> How about we have two kernel options "kernelcore=" and "movable=" which >> would both size ZONE_MOVABLE. Both would be the minimum sizes, so the >> effective differences would be the rounding to whole pageblocks. > > I would like to vote it due to above mentioned. :-) -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 7:57 ` Nick Piggin 2007-07-10 9:21 ` Andy Whitcroft @ 2007-07-10 9:51 ` Mel Gorman 2007-07-10 10:16 ` Nick Piggin 1 sibling, 1 reply; 19+ messages in thread From: Mel Gorman @ 2007-07-10 9:51 UTC (permalink / raw) To: Nick Piggin; +Cc: Linux Memory Management, Andrew Morton, kamezawa.hiroyu On (10/07/07 17:57), Nick Piggin didst pronounce: > Mel Gorman wrote: > >On (09/07/07 22:15), Nick Piggin didst pronounce: > > > >>Mel Gorman wrote: > > >>kernelcore= has some fairly strong connotations outside the movable > >>zone functionality, however. > >> > >>If you have a 16GB highmem machine, and you want 8GB of movable zone, > >>do you say kernelcore=8GB? > > > > > >Yes but depending the topology of memory, the kernelcore portion may not > >be sized exactly as you request. For example, if you have many nodes of > >different sizes, kernelcore may not spread evently. Secondly, the movable > >zone can only use pages from the highest active zone. To illustrate the > >"highest" zone problem - lets say I have a 2GB 32 bit x86 machine and I > >specify kernelcore=512MB, I'll really get a kernelcore of 896MB because > >ZONE_MOVABLE can only use HIGHMEM pages in this case. > > kernelcore suggests some fundamental VM tunable, rather than just > a random shot in the dark that roughly relates to the amount of > memory you want to reserve for your movable zone. > It's not a random shot in the dark. If the topology is flat, nodes are all sufficiently large or kernelcore is larger than the "lower" zones, the actual value of kernelcore will be very close to the requested value. > >>Does that give you the other 8GB in kernel > >>addressable memory? :) What if some other functionality is introduced > >>that also wants to reserve a chunk of memory? How do you distinguish > >>between them? > >> > > > > > >Right now I wouldn't distinguish between them. So if another user > >reserved a portion of memory, it may be in kernelcore only, movable only > >or some combination thereof. > > Does not seem very future proof. > I don't know what these future people are doing. What zone it will exist in heavily depends on when they reserve their memory. If they are reserving with the bootmem allocator, they are doing it without the awareness of where the zone boundaries and when the zones are being initialised, there is no knowledge of what pages will be free in the future so it cannot be taken into account. If they reserve the memory after the buddy allocator is initialised on the other hand, the zones will already be laid out and they can choose whether to reserve in ZONE_MOVABLE or not. It's as future proof as it can be. As I guess there would be people who did not like how zone movable was laid out in the future, all the decisions on where to place the PFN is in one place find_zone_movable_pfns_for_nodes() so it can be changed in the future. > >>Why not just specify in the help text that the admin should boot the > >>kernel without that parameter first to check how much memory they > >>have before using it... If they wanted to break the kernel by doing > >>something silly, then I don't see how kernelcore is really better > >>than reclaimable_mem... > >> > > > > > >It's simply harder to break a machine by getting kernelcore wrong than > >it is to get reclaimable_mem wrong. If the available memory to the > >machine is changed, it will not have unexpected results on the next boot > >with kernelcore and if you have a cluster with differing amounts of > >memory in each machine, it'll be easier to have one kernelcore value for > >all of them than unique reclaimable_mem ones. > > No I really don't see why kernelcore=toosmall is any better than > movable_mem=toobig. And why do you think the admin knows how much > memory is enough to run the kernel, or why should that be the same > between different sized machines? If you have a huge machine, you > need much more addressable kernel memory for the mem_map array > before you even think about anything else. > That's a fair point. > Actually, it is more likely that the admin knows exactly how much > memory they need to reserve (eg. for their database's shared > memory segment or to hot unplug or whatever), and in that case > it is much better to be able to specify movable_mem= and just be > given exactly what you asked for and the kernel can be given the > rest. > Ok, as Andy Whitcroft points out in another mail - there may be two use cases. The case where they know the kernel should at least have this much memory available (use kernelcore) and those who really know their requirements for the database share memory segment or hot unplug (use movable= or something). > If somebody is playing with this parameter, they definitely know > what they are doing and they are not just blindly throwing it out > over their cluster because it might be a good idea. > Would you be happy if both options exist or do really feel the kernelcore= option is a bad plan? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 9:51 ` Mel Gorman @ 2007-07-10 10:16 ` Nick Piggin 2007-07-10 10:18 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Nick Piggin @ 2007-07-10 10:16 UTC (permalink / raw) To: Mel Gorman; +Cc: Linux Memory Management, Andrew Morton, kamezawa.hiroyu Mel Gorman wrote: > On (10/07/07 17:57), Nick Piggin didst pronounce: > >>Mel Gorman wrote: >> >>>On (09/07/07 22:15), Nick Piggin didst pronounce: >>> >>> >>>>Mel Gorman wrote: >> >>>>kernelcore= has some fairly strong connotations outside the movable >>>>zone functionality, however. >>>> >>>>If you have a 16GB highmem machine, and you want 8GB of movable zone, >>>>do you say kernelcore=8GB? >>> >>> >>>Yes but depending the topology of memory, the kernelcore portion may not >>>be sized exactly as you request. For example, if you have many nodes of >>>different sizes, kernelcore may not spread evently. Secondly, the movable >>>zone can only use pages from the highest active zone. To illustrate the >>>"highest" zone problem - lets say I have a 2GB 32 bit x86 machine and I >>>specify kernelcore=512MB, I'll really get a kernelcore of 896MB because >>>ZONE_MOVABLE can only use HIGHMEM pages in this case. >> >>kernelcore suggests some fundamental VM tunable, rather than just >>a random shot in the dark that roughly relates to the amount of >>memory you want to reserve for your movable zone. >> > > > It's not a random shot in the dark. If the topology is flat, nodes are all > sufficiently large or kernelcore is larger than the "lower" zones, the actual > value of kernelcore will be very close to the requested value. OK, so if the admin knows all that, then they know what the movable_mem= parameter will do as well. >>>>Does that give you the other 8GB in kernel >>>>addressable memory? :) What if some other functionality is introduced >>>>that also wants to reserve a chunk of memory? How do you distinguish >>>>between them? >>>> >>> >>> >>>Right now I wouldn't distinguish between them. So if another user >>>reserved a portion of memory, it may be in kernelcore only, movable only >>>or some combination thereof. >> >>Does not seem very future proof. >> > > > I don't know what these future people are doing. Exactly! So your parameter should take the form of exactly sizing your zone for the special feature provided by that patch, and not something that is in the form "everybody *else* except this feature should use X MB". > What zone it will exist in > heavily depends on when they reserve their memory. > > If they are reserving with the bootmem allocator, they are doing it > without the awareness of where the zone boundaries and when the zones > are being initialised, there is no knowledge of what pages will be free > in the future so it cannot be taken into account. > > If they reserve the memory after the buddy allocator is initialised on > the other hand, the zones will already be laid out and they can choose > whether to reserve in ZONE_MOVABLE or not. And all that messes with the idea that kernelcore= is supposed to specify the amount of memory available for general kernel allocations. >>Actually, it is more likely that the admin knows exactly how much >>memory they need to reserve (eg. for their database's shared >>memory segment or to hot unplug or whatever), and in that case >>it is much better to be able to specify movable_mem= and just be >>given exactly what you asked for and the kernel can be given the >>rest. >> > > > Ok, as Andy Whitcroft points out in another mail - there may be two use > cases. The case where they know the kernel should at least have this > much memory available (use kernelcore) and those who really know their > requirements for the database share memory segment or hot unplug (use movable= > or something). > > >>If somebody is playing with this parameter, they definitely know >>what they are doing and they are not just blindly throwing it out >>over their cluster because it might be a good idea. >> > > > Would you be happy if both options exist or do really feel the > kernelcore= option is a bad plan? I'm not completely against kernelcore=, no. However I do think that should be a general parameter that exists for the core kernel. I guess it would override any other reservations and things, and it would specify the absolute minimum kernelcore. Then if you add a movable_mem= (or something -- I don't know what the exact name should be), then that would also specify the minimum movable memory, although at a lower priority to kernelcore= (and you could have the appropriate warnings and such if they cannot be satisfied). -- SUSE Labs, Novell Inc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 10:16 ` Nick Piggin @ 2007-07-10 10:18 ` Nick Piggin 2007-07-10 13:21 ` Mel Gorman 0 siblings, 1 reply; 19+ messages in thread From: Nick Piggin @ 2007-07-10 10:18 UTC (permalink / raw) To: Nick Piggin Cc: Mel Gorman, Linux Memory Management, Andrew Morton, kamezawa.hiroyu, Andy Whitcroft Nick Piggin wrote: > I'm not completely against kernelcore=, no. However I do think that > should be a general parameter that exists for the core kernel. I guess it > would override any other reservations and things, and it would specify the > absolute minimum kernelcore. > > Then if you add a movable_mem= (or something -- I don't know what the > exact name should be), then that would also specify the minimum movable > memory, although at a lower priority to kernelcore= (and you could have > the appropriate warnings and such if they cannot be satisfied). Ah yes, I now read Andy's mail and this is what he is suggesting, so yes it seems like a good idea I think. -- SUSE Labs, Novell Inc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 10:18 ` Nick Piggin @ 2007-07-10 13:21 ` Mel Gorman 2007-07-12 12:11 ` Andy Whitcroft 0 siblings, 1 reply; 19+ messages in thread From: Mel Gorman @ 2007-07-10 13:21 UTC (permalink / raw) To: Nick Piggin Cc: Linux Memory Management, Andrew Morton, kamezawa.hiroyu, Andy Whitcroft On (10/07/07 20:18), Nick Piggin didst pronounce: > Nick Piggin wrote: > > >I'm not completely against kernelcore=, no. However I do think that > >should be a general parameter that exists for the core kernel. I guess it > >would override any other reservations and things, and it would specify the > >absolute minimum kernelcore. > > > >Then if you add a movable_mem= (or something -- I don't know what the > >exact name should be), then that would also specify the minimum movable > >memory, although at a lower priority to kernelcore= (and you could have > >the appropriate warnings and such if they cannot be satisfied). > > Ah yes, I now read Andy's mail and this is what he is suggesting, so > yes it seems like a good idea I think. > *beats keyboard with stick* Does something like the following cover it? Tested on a standalone x86 and it seemed to behave as expected. ===== This patch adds a new parameter for sizing ZONE_MOVABLE called movablecore=. kernelcore is used to specify the minimum amount of memory that must be available for all allocation types. movablecore= is used to specify the minimum amount of memory that is used for migratable allocations. The amount of memory used for migratable allocations determines how large the huge page pool could be dynamically resized to at runtime for example. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- Documentation/kernel-parameters.txt | 10 +++++ mm/page_alloc.c | 61 +++++++++++++++++++++++++++++++----- 2 files changed, 64 insertions(+), 7 deletions(-) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-zonemovable/Documentation/kernel-parameters.txt linux-2.6.22-movablecore/Documentation/kernel-parameters.txt --- linux-2.6.22-zonemovable/Documentation/kernel-parameters.txt 2007-07-09 11:50:18.000000000 +0100 +++ linux-2.6.22-movablecore/Documentation/kernel-parameters.txt 2007-07-10 11:38:04.000000000 +0100 @@ -850,6 +850,16 @@ and is between 256 and 4096 characters. use the HighMem zone if it exists, and the Normal zone if it does not. + movablecore=nn[KMG] [KNL,IA-32,IA-64,PPC,X86-64] This parameter + is similar to kernelcore except it specifies the + amount of memory used for migratable allocations. + If both kernelcore and movablecore is specified, + then kernelcore will be at *least* the specified + value but may be more. If movablecore on its own + is specified, the administrator must be careful + that the amount of memory usable for all allocations + is not too small. + keepinitrd [HW,ARM] kstack=N [IA-32,X86-64] Print N words from the kernel stack diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-zonemovable/mm/page_alloc.c linux-2.6.22-movablecore/mm/page_alloc.c --- linux-2.6.22-zonemovable/mm/page_alloc.c 2007-07-09 11:50:18.000000000 +0100 +++ linux-2.6.22-movablecore/mm/page_alloc.c 2007-07-10 12:31:39.000000000 +0100 @@ -137,6 +137,7 @@ static unsigned long __meminitdata dma_r unsigned long __initdata node_boundary_end_pfn[MAX_NUMNODES]; #endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */ unsigned long __initdata required_kernelcore; + unsigned long __initdata required_movablecore; unsigned long __initdata zone_movable_pfn[MAX_NUMNODES]; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ @@ -2980,6 +2981,18 @@ unsigned long __init find_max_pfn_with_a return max_pfn; } +unsigned long __init early_calculate_totalpages(void) +{ + int i; + unsigned long totalpages = 0; + + for (i = 0; i < nr_nodemap_entries; i++) + totalpages += early_node_map[i].end_pfn - + early_node_map[i].start_pfn; + + return totalpages; +} + /* * Find the PFN the Movable zone begins in each node. Kernel memory * is spread evenly between nodes as long as the nodes have enough @@ -2993,6 +3006,25 @@ void __init find_zone_movable_pfns_for_n unsigned long kernelcore_node, kernelcore_remaining; int usable_nodes = num_online_nodes(); + /* + * If movablecore was specified, calculate what size of + * kernelcore that corresponds so that memory usable for + * any allocation type is evenly spread. If both kernelcore + * and movablecore are specified, then the value of kernelcore + * will be used for required_kernelcore if it's greater than + * what movablecore would have allowed. + */ + if (required_movablecore) { + unsigned long totalpages = early_calculate_totalpages(); + unsigned long corepages; + + required_movablecore = + roundup(required_movablecore, MAX_ORDER_NR_PAGES); + corepages = totalpages - required_movablecore; + + required_kernelcore = max(required_kernelcore, corepages); + } + /* If kernelcore was not specified, there is no ZONE_MOVABLE */ if (!required_kernelcore) return; @@ -3173,26 +3205,41 @@ void __init free_area_init_nodes(unsigne } } -/* - * kernelcore=size sets the amount of memory for use for allocations that - * cannot be reclaimed or migrated. - */ -static int __init cmdline_parse_kernelcore(char *p) +static int __init cmdline_parse_core(char *p, unsigned long *core) { unsigned long long coremem; if (!p) return -EINVAL; coremem = memparse(p, &p); - required_kernelcore = coremem >> PAGE_SHIFT; + *core = coremem >> PAGE_SHIFT; - /* Paranoid check that UL is enough for required_kernelcore */ + /* Paranoid check that UL is enough for the coremem value */ WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX); return 0; } +/* + * kernelcore=size sets the amount of memory for use for allocations that + * cannot be reclaimed or migrated. + */ +static int __init cmdline_parse_kernelcore(char *p) +{ + return cmdline_parse_core(p, &required_kernelcore); +} + +/* + * movablecore=size sets the amount of memory for use for allocations that + * can be reclaimed or migrated. + */ +static int __init cmdline_parse_movablecore(char *p) +{ + return cmdline_parse_core(p, &required_movablecore); +} + early_param("kernelcore", cmdline_parse_kernelcore); +early_param("movablecore", cmdline_parse_movablecore); #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 13:21 ` Mel Gorman @ 2007-07-12 12:11 ` Andy Whitcroft 0 siblings, 0 replies; 19+ messages in thread From: Andy Whitcroft @ 2007-07-12 12:11 UTC (permalink / raw) To: Mel Gorman Cc: Nick Piggin, Linux Memory Management, Andrew Morton, kamezawa.hiroyu Mel Gorman wrote: > On (10/07/07 20:18), Nick Piggin didst pronounce: >> Nick Piggin wrote: >> >>> I'm not completely against kernelcore=, no. However I do think that >>> should be a general parameter that exists for the core kernel. I guess it >>> would override any other reservations and things, and it would specify the >>> absolute minimum kernelcore. >>> >>> Then if you add a movable_mem= (or something -- I don't know what the >>> exact name should be), then that would also specify the minimum movable >>> memory, although at a lower priority to kernelcore= (and you could have >>> the appropriate warnings and such if they cannot be satisfied). >> Ah yes, I now read Andy's mail and this is what he is suggesting, so >> yes it seems like a good idea I think. >> > > *beats keyboard with stick* > > Does something like the following cover it? Tested on a standalone x86 > and it seemed to behave as expected. > > ===== > > This patch adds a new parameter for sizing ZONE_MOVABLE called > movablecore=. kernelcore is used to specify the minimum amount of memory that > must be available for all allocation types. movablecore= is used to specify > the minimum amount of memory that is used for migratable allocations. The > amount of memory used for migratable allocations determines how large the > huge page pool could be dynamically resized to at runtime for example. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > Documentation/kernel-parameters.txt | 10 +++++ > mm/page_alloc.c | 61 +++++++++++++++++++++++++++++++----- > 2 files changed, 64 insertions(+), 7 deletions(-) > > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-zonemovable/Documentation/kernel-parameters.txt linux-2.6.22-movablecore/Documentation/kernel-parameters.txt > --- linux-2.6.22-zonemovable/Documentation/kernel-parameters.txt 2007-07-09 11:50:18.000000000 +0100 > +++ linux-2.6.22-movablecore/Documentation/kernel-parameters.txt 2007-07-10 11:38:04.000000000 +0100 > @@ -850,6 +850,16 @@ and is between 256 and 4096 characters. > use the HighMem zone if it exists, and the Normal > zone if it does not. > > + movablecore=nn[KMG] [KNL,IA-32,IA-64,PPC,X86-64] This parameter > + is similar to kernelcore except it specifies the > + amount of memory used for migratable allocations. > + If both kernelcore and movablecore is specified, > + then kernelcore will be at *least* the specified > + value but may be more. If movablecore on its own > + is specified, the administrator must be careful > + that the amount of memory usable for all allocations > + is not too small. > + > keepinitrd [HW,ARM] > > kstack=N [IA-32,X86-64] Print N words from the kernel stack > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-zonemovable/mm/page_alloc.c linux-2.6.22-movablecore/mm/page_alloc.c > --- linux-2.6.22-zonemovable/mm/page_alloc.c 2007-07-09 11:50:18.000000000 +0100 > +++ linux-2.6.22-movablecore/mm/page_alloc.c 2007-07-10 12:31:39.000000000 +0100 > @@ -137,6 +137,7 @@ static unsigned long __meminitdata dma_r > unsigned long __initdata node_boundary_end_pfn[MAX_NUMNODES]; > #endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */ > unsigned long __initdata required_kernelcore; > + unsigned long __initdata required_movablecore; > unsigned long __initdata zone_movable_pfn[MAX_NUMNODES]; > > /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ > @@ -2980,6 +2981,18 @@ unsigned long __init find_max_pfn_with_a > return max_pfn; > } > > +unsigned long __init early_calculate_totalpages(void) > +{ > + int i; > + unsigned long totalpages = 0; > + > + for (i = 0; i < nr_nodemap_entries; i++) > + totalpages += early_node_map[i].end_pfn - > + early_node_map[i].start_pfn; > + > + return totalpages; > +} > + > /* > * Find the PFN the Movable zone begins in each node. Kernel memory > * is spread evenly between nodes as long as the nodes have enough > @@ -2993,6 +3006,25 @@ void __init find_zone_movable_pfns_for_n > unsigned long kernelcore_node, kernelcore_remaining; > int usable_nodes = num_online_nodes(); > > + /* > + * If movablecore was specified, calculate what size of > + * kernelcore that corresponds so that memory usable for > + * any allocation type is evenly spread. If both kernelcore > + * and movablecore are specified, then the value of kernelcore > + * will be used for required_kernelcore if it's greater than > + * what movablecore would have allowed. > + */ > + if (required_movablecore) { > + unsigned long totalpages = early_calculate_totalpages(); > + unsigned long corepages; > + > + required_movablecore = > + roundup(required_movablecore, MAX_ORDER_NR_PAGES); This roundup is subtle. This ensures that we get at least as much MOVABLE as we requested, which is correct, but perhaps it should be mentioned in the commentary. > + corepages = totalpages - required_movablecore; > + > + required_kernelcore = max(required_kernelcore, corepages); > + } > + > /* If kernelcore was not specified, there is no ZONE_MOVABLE */ > if (!required_kernelcore) > return; > @@ -3173,26 +3205,41 @@ void __init free_area_init_nodes(unsigne > } > } > > -/* > - * kernelcore=size sets the amount of memory for use for allocations that > - * cannot be reclaimed or migrated. > - */ > -static int __init cmdline_parse_kernelcore(char *p) > +static int __init cmdline_parse_core(char *p, unsigned long *core) > { > unsigned long long coremem; > if (!p) > return -EINVAL; > > coremem = memparse(p, &p); > - required_kernelcore = coremem >> PAGE_SHIFT; > + *core = coremem >> PAGE_SHIFT; > > - /* Paranoid check that UL is enough for required_kernelcore */ > + /* Paranoid check that UL is enough for the coremem value */ > WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX); > > return 0; > } > > +/* > + * kernelcore=size sets the amount of memory for use for allocations that > + * cannot be reclaimed or migrated. > + */ > +static int __init cmdline_parse_kernelcore(char *p) > +{ > + return cmdline_parse_core(p, &required_kernelcore); > +} > + > +/* > + * movablecore=size sets the amount of memory for use for allocations that > + * can be reclaimed or migrated. > + */ > +static int __init cmdline_parse_movablecore(char *p) > +{ > + return cmdline_parse_core(p, &required_movablecore); > +} > + > early_param("kernelcore", cmdline_parse_kernelcore); > +early_param("movablecore", cmdline_parse_movablecore); > > #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ > Looks like a sane extension to this configurable to me. Acked-by: Andy Whitcroft <apw@shadowen.org> -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-09 13:21 ` Mel Gorman 2007-07-10 7:57 ` Nick Piggin @ 2007-07-10 9:08 ` KAMEZAWA Hiroyuki 2007-07-10 9:48 ` Andy Whitcroft 1 sibling, 1 reply; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2007-07-10 9:08 UTC (permalink / raw) To: Mel Gorman; +Cc: Nick Piggin, Linux Memory Management, Andrew Morton On Mon, 9 Jul 2007 14:21:41 +0100 mel@skynet.ie (Mel Gorman) wrote: > I'm pretty sure it can be made look nice by changing enum zone_type to > conditionally define ZONE_MOVABLE and define __GFP_MOVABLE to be 0 when > it doesn't exist. I'll look at Kame's patch before starting in case it's > nicer. > This patch is just for sharing idea. I updated mine against 2.6.22-rc6-mm1. just confirmed my system can boot with this. Cheers, -Kame == Includes 2 feature. 1. By defining ZONE_xxx even if they are not configured, we can remove many ifdefs. Instead of #ifdef, is_configurated_zone() func is added. compiler will do enough work to inline it and remove unnecessary codes. 2. This patch makes ZONE_MOVABLE to be configurable. Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- include/linux/gfp.h | 21 ++++++------- include/linux/mmzone.h | 47 ++++++++++++++++-------------- mm/Kconfig | 10 ++++++ mm/page_alloc.c | 75 +++++++++++++++++++++++++------------------------ 4 files changed, 84 insertions(+), 69 deletions(-) Index: linux-2.6.22-rc6-mm1/include/linux/mmzone.h =================================================================== --- linux-2.6.22-rc6-mm1.orig/include/linux/mmzone.h +++ linux-2.6.22-rc6-mm1/include/linux/mmzone.h @@ -178,10 +178,33 @@ enum zone_type { */ ZONE_HIGHMEM, #endif +#ifdef CONFIG_ZONE_MOVABLE ZONE_MOVABLE, - MAX_NR_ZONES +#endif + MAX_NR_ZONES, + /* + * Number for not configured zones. + */ +#ifndef CONFIG_ZONE_DMA + ZONE_DMA, +#endif +#ifndef CONFIG_ZONE_DMA32 + ZONE_DMA32, +#endif +#ifndef CONFIG_HIGHMEM + ZONE_HIGHMEM, +#endif +#ifndef CONFIG_ZONE_MOVABLE + ZONE_MOVABLE, +#endif + MAX_POSSIBLE_ZONES, }; +static inline int is_configured_zone(enum zone_type type) +{ + return (type < MAX_NR_ZONES); +} + /* * When a memory allocation must conform to specific limitations (such * as being suitable for DMA) the caller will pass in hints to the @@ -200,7 +223,7 @@ enum zone_type { + defined(CONFIG_ZONE_DMA32) \ + 1 \ + defined(CONFIG_HIGHMEM) \ - + 1 \ + + defined(CONFIG_ZONE_MOVABLE) \ ) #if __ZONE_COUNT < 2 #define ZONES_SHIFT 0 @@ -546,21 +569,13 @@ extern int movable_zone; static inline int zone_movable_is_highmem(void) { -#if defined(CONFIG_HIGHMEM) && defined(CONFIG_ARCH_POPULATES_NODE_MAP) return movable_zone == ZONE_HIGHMEM; -#else - return 0; -#endif } static inline int is_highmem_idx(enum zone_type idx) { -#ifdef CONFIG_HIGHMEM return (idx == ZONE_HIGHMEM || (idx == ZONE_MOVABLE && zone_movable_is_highmem())); -#else - return 0; -#endif } static inline int is_normal_idx(enum zone_type idx) @@ -576,13 +591,9 @@ static inline int is_normal_idx(enum zon */ static inline int is_highmem(struct zone *zone) { -#ifdef CONFIG_HIGHMEM int zone_idx = zone - zone->zone_pgdat->node_zones; return zone_idx == ZONE_HIGHMEM || (zone_idx == ZONE_MOVABLE && zone_movable_is_highmem()); -#else - return 0; -#endif } static inline int is_normal(struct zone *zone) @@ -592,20 +603,12 @@ static inline int is_normal(struct zone static inline int is_dma32(struct zone *zone) { -#ifdef CONFIG_ZONE_DMA32 return zone == zone->zone_pgdat->node_zones + ZONE_DMA32; -#else - return 0; -#endif } static inline int is_dma(struct zone *zone) { -#ifdef CONFIG_ZONE_DMA return zone == zone->zone_pgdat->node_zones + ZONE_DMA; -#else - return 0; -#endif } /* These two functions are used to setup the per zone pages min values */ Index: linux-2.6.22-rc6-mm1/include/linux/gfp.h =================================================================== --- linux-2.6.22-rc6-mm1.orig/include/linux/gfp.h +++ linux-2.6.22-rc6-mm1/include/linux/gfp.h @@ -116,21 +116,20 @@ static inline int allocflags_to_migratet static inline enum zone_type gfp_zone(gfp_t flags) { -#ifdef CONFIG_ZONE_DMA - if (flags & __GFP_DMA) + if (is_configured_zone(ZONE_DMA) && (flags & __GFP_DMA)) return ZONE_DMA; -#endif -#ifdef CONFIG_ZONE_DMA32 - if (flags & __GFP_DMA32) + + if (is_configured_zone(ZONE_DMA32) && (flags & __GFP_DMA32)) return ZONE_DMA32; -#endif - if ((flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) == - (__GFP_HIGHMEM | __GFP_MOVABLE)) + + if (is_configured_zone(ZONE_MOVABLE) && + (flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) == (__GFP_HIGHMEM | __GFP_MOVABLE)) + return ZONE_MOVABLE; -#ifdef CONFIG_HIGHMEM - if (flags & __GFP_HIGHMEM) + + if (is_configured_zone(ZONE_HIGHMEM) && (flags & __GFP_HIGHMEM)) return ZONE_HIGHMEM; -#endif + return ZONE_NORMAL; } Index: linux-2.6.22-rc6-mm1/mm/Kconfig =================================================================== --- linux-2.6.22-rc6-mm1.orig/mm/Kconfig +++ linux-2.6.22-rc6-mm1/mm/Kconfig @@ -112,6 +112,16 @@ config SPARSEMEM_EXTREME def_bool y depends on SPARSEMEM && !SPARSEMEM_STATIC +config ZONE_MOVABLE + bool "Create a zone for Movable Pages" + depends on ARCH_POPULATES_NODE_MAP + help + This option allows you to create a zone only for movable pages. + *movable pages* means which can be target of page migration. + With page migration, you will be able to do "deflag memory" and + "memory unplug". You can do it with usual zones but MOVABLE zones + enables page migration related stuff much easier. + # eventually, we can have this option just 'select SPARSEMEM' config MEMORY_HOTPLUG bool "Allow for memory hot-add" Index: linux-2.6.22-rc6-mm1/mm/page_alloc.c =================================================================== --- linux-2.6.22-rc6-mm1.orig/mm/page_alloc.c +++ linux-2.6.22-rc6-mm1/mm/page_alloc.c @@ -76,35 +76,34 @@ static void __free_pages_ok(struct page * * TBD: should special case ZONE_DMA32 machines here - in those we normally * don't need any ZONE_NORMAL reservation + * see zone_variables_init(); */ -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = { -#ifdef CONFIG_ZONE_DMA - 256, -#endif -#ifdef CONFIG_ZONE_DMA32 - 256, -#endif -#ifdef CONFIG_HIGHMEM - 32, -#endif - 32, -}; +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1]; EXPORT_SYMBOL(totalram_pages); -static char * const zone_names[MAX_NR_ZONES] = { -#ifdef CONFIG_ZONE_DMA - "DMA", -#endif -#ifdef CONFIG_ZONE_DMA32 - "DMA32", -#endif - "Normal", -#ifdef CONFIG_HIGHMEM - "HighMem", -#endif - "Movable", -}; +static char *zone_names[MAX_POSSIBLE_ZONES]; +static char name_dma[] = "DMA"; +static char name_dma32[] = "DMA32"; +static char name_normal[] = "Normal"; +static char name_highmem[] = "Highmem"; +static char name_movable[] = "Movable"; + +static inline void __init zone_variables_init(void) +{ + zone_names[ZONE_DMA] = name_dma; + zone_names[ZONE_DMA32] = name_dma32; + zone_names[ZONE_NORMAL] = name_normal; + zone_names[ZONE_HIGHMEM] = name_highmem; + zone_names[ZONE_MOVABLE] = name_movable; + if (is_configured_zone(ZONE_DMA)) + sysctl_lowmem_reserve_ratio[ZONE_DMA] = 256; + if (is_configured_zone(ZONE_DMA32)) + sysctl_lowmem_reserve_ratio[ZONE_DMA32] = 256; + if (is_configured_zone(ZONE_HIGHMEM)) + sysctl_lowmem_reserve_ratio[ZONE_NORMAL] = 32; + /* HIGHMEM and MOVABLE have value 0 */ +} int min_free_kbytes = 1024; @@ -135,8 +134,8 @@ static unsigned long __meminitdata dma_r static struct node_active_region __meminitdata early_node_map[MAX_ACTIVE_REGIONS]; static int __meminitdata nr_nodemap_entries; - static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES]; - static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES]; + static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_POSSIBLE_ZONES]; + static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_POSSIBLE_ZONES]; #ifdef CONFIG_MEMORY_HOTPLUG_RESERVE static unsigned long __meminitdata node_boundary_start_pfn[MAX_NUMNODES]; static unsigned long __meminitdata node_boundary_end_pfn[MAX_NUMNODES]; @@ -1835,14 +1834,15 @@ void si_meminfo_node(struct sysinfo *val val->totalram = pgdat->node_present_pages; val->freeram = node_page_state(nid, NR_FREE_PAGES); -#ifdef CONFIG_HIGHMEM - val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].present_pages; - val->freehigh = zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM], + if (is_configured_zone(ZONE_HIGHMEM)) { + val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].present_pages; + val->freehigh = + zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM], NR_FREE_PAGES); -#else - val->totalhigh = 0; - val->freehigh = 0; -#endif + } else { + val->totalhigh = 0; + val->freehigh = 0; + } val->mem_unit = PAGE_SIZE; } #endif @@ -3487,7 +3487,6 @@ void __meminit free_area_init_node(int n calculate_node_totalpages(pgdat, zones_size, zholes_size); alloc_node_mem_map(pgdat); - free_area_init_core(pgdat, zones_size, zholes_size); } @@ -3871,6 +3870,7 @@ void __init free_area_init_nodes(unsigne early_node_map[i].end_pfn); /* Initialise every node */ + zone_variables_init(); setup_nr_node_ids(); for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); @@ -3888,7 +3888,9 @@ static int __init cmdline_parse_kernelco unsigned long long coremem; if (!p) return -EINVAL; - + /* can we use ZONE_MOVABLE ? */ + if (!is_configured_zone(ZONE_MOVABLE)) + return 0; coremem = memparse(p, &p); required_kernelcore = coremem >> PAGE_SHIFT; @@ -3927,6 +3929,7 @@ EXPORT_SYMBOL(contig_page_data); void __init free_area_init(unsigned long *zones_size) { + zone_variables_init(); free_area_init_node(0, NODE_DATA(0), zones_size, __pa(PAGE_OFFSET) >> PAGE_SHIFT, NULL); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 9:08 ` KAMEZAWA Hiroyuki @ 2007-07-10 9:48 ` Andy Whitcroft 2007-07-10 11:03 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 19+ messages in thread From: Andy Whitcroft @ 2007-07-10 9:48 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Mel Gorman, Nick Piggin, Linux Memory Management, Andrew Morton KAMEZAWA Hiroyuki wrote: > On Mon, 9 Jul 2007 14:21:41 +0100 > mel@skynet.ie (Mel Gorman) wrote: >> I'm pretty sure it can be made look nice by changing enum zone_type to >> conditionally define ZONE_MOVABLE and define __GFP_MOVABLE to be 0 when >> it doesn't exist. I'll look at Kame's patch before starting in case it's >> nicer. >> > This patch is just for sharing idea. I updated mine against 2.6.22-rc6-mm1. > just confirmed my system can boot with this. > > Cheers, > -Kame > == > Includes 2 feature. > > 1. By defining ZONE_xxx even if they are not configured, we can remove many > ifdefs. > Instead of #ifdef, is_configurated_zone() func is added. > compiler will do enough work to inline it and remove unnecessary codes. > > 2. This patch makes ZONE_MOVABLE to be configurable. > > Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> A nice little trick moving the 'unused' zones after MAX_NR_ZONES. A few of thoughts below, but generally it looks very promising. Lots of nasty #ifdef's going away is always a cause for cheering. > --- > include/linux/gfp.h | 21 ++++++------- > include/linux/mmzone.h | 47 ++++++++++++++++-------------- > mm/Kconfig | 10 ++++++ > mm/page_alloc.c | 75 +++++++++++++++++++++++++------------------------ > 4 files changed, 84 insertions(+), 69 deletions(-) > > Index: linux-2.6.22-rc6-mm1/include/linux/mmzone.h > =================================================================== > --- linux-2.6.22-rc6-mm1.orig/include/linux/mmzone.h > +++ linux-2.6.22-rc6-mm1/include/linux/mmzone.h > @@ -178,10 +178,33 @@ enum zone_type { > */ > ZONE_HIGHMEM, > #endif > +#ifdef CONFIG_ZONE_MOVABLE > ZONE_MOVABLE, > - MAX_NR_ZONES > +#endif > + MAX_NR_ZONES, > + /* > + * Number for not configured zones. > + */ > +#ifndef CONFIG_ZONE_DMA > + ZONE_DMA, > +#endif > +#ifndef CONFIG_ZONE_DMA32 > + ZONE_DMA32, > +#endif > +#ifndef CONFIG_HIGHMEM > + ZONE_HIGHMEM, > +#endif > +#ifndef CONFIG_ZONE_MOVABLE > + ZONE_MOVABLE, > +#endif > + MAX_POSSIBLE_ZONES, > }; > > +static inline int is_configured_zone(enum zone_type type) > +{ > + return (type < MAX_NR_ZONES); > +} > + > /* > * When a memory allocation must conform to specific limitations (such > * as being suitable for DMA) the caller will pass in hints to the > @@ -200,7 +223,7 @@ enum zone_type { > + defined(CONFIG_ZONE_DMA32) \ > + 1 \ > + defined(CONFIG_HIGHMEM) \ > - + 1 \ > + + defined(CONFIG_ZONE_MOVABLE) \ > ) > #if __ZONE_COUNT < 2 > #define ZONES_SHIFT 0 > @@ -546,21 +569,13 @@ extern int movable_zone; > > static inline int zone_movable_is_highmem(void) > { > -#if defined(CONFIG_HIGHMEM) && defined(CONFIG_ARCH_POPULATES_NODE_MAP) > return movable_zone == ZONE_HIGHMEM; > -#else > - return 0; > -#endif > } > > static inline int is_highmem_idx(enum zone_type idx) > { > -#ifdef CONFIG_HIGHMEM > return (idx == ZONE_HIGHMEM || > (idx == ZONE_MOVABLE && zone_movable_is_highmem())); > -#else > - return 0; > -#endif > } > > static inline int is_normal_idx(enum zone_type idx) > @@ -576,13 +591,9 @@ static inline int is_normal_idx(enum zon > */ > static inline int is_highmem(struct zone *zone) > { > -#ifdef CONFIG_HIGHMEM > int zone_idx = zone - zone->zone_pgdat->node_zones; > return zone_idx == ZONE_HIGHMEM || > (zone_idx == ZONE_MOVABLE && zone_movable_is_highmem()); > -#else > - return 0; > -#endif > } > > static inline int is_normal(struct zone *zone) > @@ -592,20 +603,12 @@ static inline int is_normal(struct zone > > static inline int is_dma32(struct zone *zone) > { > -#ifdef CONFIG_ZONE_DMA32 > return zone == zone->zone_pgdat->node_zones + ZONE_DMA32; I would have expected all of the is_zonename() checks to include the zone_is_configured() checks, to allow the optimiser to catch on and elide the code. if (zone_is_configured(ZONE_DMA32) return zone == zone->zone_pgdat->node_zones + ZONE_DMA32; else return 0; Perhaps a little helper: static inline zone_idx_is(int idx, int target) { if (zone_is_configured(target)) return idx == target; else return 0; } > -#else > - return 0; > -#endif > } > > static inline int is_dma(struct zone *zone) > { > -#ifdef CONFIG_ZONE_DMA > return zone == zone->zone_pgdat->node_zones + ZONE_DMA; > -#else > - return 0; > -#endif > } > > /* These two functions are used to setup the per zone pages min values */ > Index: linux-2.6.22-rc6-mm1/include/linux/gfp.h > =================================================================== > --- linux-2.6.22-rc6-mm1.orig/include/linux/gfp.h > +++ linux-2.6.22-rc6-mm1/include/linux/gfp.h > @@ -116,21 +116,20 @@ static inline int allocflags_to_migratet > > static inline enum zone_type gfp_zone(gfp_t flags) > { > -#ifdef CONFIG_ZONE_DMA > - if (flags & __GFP_DMA) > + if (is_configured_zone(ZONE_DMA) && (flags & __GFP_DMA)) > return ZONE_DMA; > -#endif > -#ifdef CONFIG_ZONE_DMA32 > - if (flags & __GFP_DMA32) > + > + if (is_configured_zone(ZONE_DMA32) && (flags & __GFP_DMA32)) > return ZONE_DMA32; > -#endif > - if ((flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) == > - (__GFP_HIGHMEM | __GFP_MOVABLE)) > + > + if (is_configured_zone(ZONE_MOVABLE) && > + (flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) == (__GFP_HIGHMEM | __GFP_MOVABLE)) > + > return ZONE_MOVABLE; > -#ifdef CONFIG_HIGHMEM > - if (flags & __GFP_HIGHMEM) > + > + if (is_configured_zone(ZONE_HIGHMEM) && (flags & __GFP_HIGHMEM)) > return ZONE_HIGHMEM; > -#endif > + > return ZONE_NORMAL; > } > > Index: linux-2.6.22-rc6-mm1/mm/Kconfig > =================================================================== > --- linux-2.6.22-rc6-mm1.orig/mm/Kconfig > +++ linux-2.6.22-rc6-mm1/mm/Kconfig > @@ -112,6 +112,16 @@ config SPARSEMEM_EXTREME > def_bool y > depends on SPARSEMEM && !SPARSEMEM_STATIC > > +config ZONE_MOVABLE > + bool "Create a zone for Movable Pages" > + depends on ARCH_POPULATES_NODE_MAP > + help > + This option allows you to create a zone only for movable pages. > + *movable pages* means which can be target of page migration. > + With page migration, you will be able to do "deflag memory" and > + "memory unplug". You can do it with usual zones but MOVABLE zones > + enables page migration related stuff much easier. > + > # eventually, we can have this option just 'select SPARSEMEM' > config MEMORY_HOTPLUG > bool "Allow for memory hot-add" > Index: linux-2.6.22-rc6-mm1/mm/page_alloc.c > =================================================================== > --- linux-2.6.22-rc6-mm1.orig/mm/page_alloc.c > +++ linux-2.6.22-rc6-mm1/mm/page_alloc.c > @@ -76,35 +76,34 @@ static void __free_pages_ok(struct page > * > * TBD: should special case ZONE_DMA32 machines here - in those we normally > * don't need any ZONE_NORMAL reservation > + * see zone_variables_init(); > */ > -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = { > -#ifdef CONFIG_ZONE_DMA > - 256, > -#endif > -#ifdef CONFIG_ZONE_DMA32 > - 256, > -#endif > -#ifdef CONFIG_HIGHMEM > - 32, > -#endif > - 32, > -}; > +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1]; > > EXPORT_SYMBOL(totalram_pages); > > -static char * const zone_names[MAX_NR_ZONES] = { > -#ifdef CONFIG_ZONE_DMA > - "DMA", > -#endif > -#ifdef CONFIG_ZONE_DMA32 > - "DMA32", > -#endif > - "Normal", > -#ifdef CONFIG_HIGHMEM > - "HighMem", > -#endif > - "Movable", > -}; > +static char *zone_names[MAX_POSSIBLE_ZONES]; > +static char name_dma[] = "DMA"; > +static char name_dma32[] = "DMA32"; > +static char name_normal[] = "Normal"; > +static char name_highmem[] = "Highmem"; > +static char name_movable[] = "Movable"; > + > +static inline void __init zone_variables_init(void) > +{ > + zone_names[ZONE_DMA] = name_dma; > + zone_names[ZONE_DMA32] = name_dma32; > + zone_names[ZONE_NORMAL] = name_normal; > + zone_names[ZONE_HIGHMEM] = name_highmem; > + zone_names[ZONE_MOVABLE] = name_movable; You are able to always assign these as the array is sized on MAX_POSSIBLE_ZONES, so I would have thought that these could be statically initialised right? static char * const zone_names = { [ZONE_DMA] = "DMA", [ZONE_DMA32] = "DMA32", ... }; And in fact if you were to simply size sysctl_lowmem_reserve_ratio at MAX_POSSIBLE_ZONES could you not do the same there too? Then you would not need to introduce zone_variables_init(). int sysctl_lowmem_reserve_ratio[MAX_POSSIBLE_ZONES] = { [ZONE_DMA] = 256, [ZONE_DMA32] = 256, [ZONE_HIGHMEM] = 32 }; > + if (is_configured_zone(ZONE_DMA)) > + sysctl_lowmem_reserve_ratio[ZONE_DMA] = 256; > + if (is_configured_zone(ZONE_DMA32)) > + sysctl_lowmem_reserve_ratio[ZONE_DMA32] = 256; > + if (is_configured_zone(ZONE_HIGHMEM)) > + sysctl_lowmem_reserve_ratio[ZONE_NORMAL] = 32; > + /* HIGHMEM and MOVABLE have value 0 */ > +} > > int min_free_kbytes = 1024; > > @@ -135,8 +134,8 @@ static unsigned long __meminitdata dma_r > > static struct node_active_region __meminitdata early_node_map[MAX_ACTIVE_REGIONS]; > static int __meminitdata nr_nodemap_entries; > - static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES]; > - static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES]; > + static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_POSSIBLE_ZONES]; > + static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_POSSIBLE_ZONES]; > #ifdef CONFIG_MEMORY_HOTPLUG_RESERVE > static unsigned long __meminitdata node_boundary_start_pfn[MAX_NUMNODES]; > static unsigned long __meminitdata node_boundary_end_pfn[MAX_NUMNODES]; > @@ -1835,14 +1834,15 @@ void si_meminfo_node(struct sysinfo *val > > val->totalram = pgdat->node_present_pages; > val->freeram = node_page_state(nid, NR_FREE_PAGES); > -#ifdef CONFIG_HIGHMEM > - val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].present_pages; > - val->freehigh = zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM], > + if (is_configured_zone(ZONE_HIGHMEM)) { > + val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].present_pages; > + val->freehigh = > + zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM], > NR_FREE_PAGES); > -#else > - val->totalhigh = 0; > - val->freehigh = 0; > -#endif > + } else { > + val->totalhigh = 0; > + val->freehigh = 0; > + } > val->mem_unit = PAGE_SIZE; > } > #endif > @@ -3487,7 +3487,6 @@ void __meminit free_area_init_node(int n > calculate_node_totalpages(pgdat, zones_size, zholes_size); > > alloc_node_mem_map(pgdat); > - > free_area_init_core(pgdat, zones_size, zholes_size); > } Whitespace change. > > @@ -3871,6 +3870,7 @@ void __init free_area_init_nodes(unsigne > early_node_map[i].end_pfn); > > /* Initialise every node */ > + zone_variables_init(); > setup_nr_node_ids(); > for_each_online_node(nid) { > pg_data_t *pgdat = NODE_DATA(nid); > @@ -3888,7 +3888,9 @@ static int __init cmdline_parse_kernelco > unsigned long long coremem; > if (!p) > return -EINVAL; > - > + /* can we use ZONE_MOVABLE ? */ > + if (!is_configured_zone(ZONE_MOVABLE)) > + return 0; Will this cause an error to the user? Probabally want it too. > coremem = memparse(p, &p); > required_kernelcore = coremem >> PAGE_SHIFT; > > @@ -3927,6 +3929,7 @@ EXPORT_SYMBOL(contig_page_data); > > void __init free_area_init(unsigned long *zones_size) > { > + zone_variables_init(); > free_area_init_node(0, NODE_DATA(0), zones_size, > __pa(PAGE_OFFSET) >> PAGE_SHIFT, NULL); > } -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-10 9:48 ` Andy Whitcroft @ 2007-07-10 11:03 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2007-07-10 11:03 UTC (permalink / raw) To: Andy Whitcroft Cc: Mel Gorman, Nick Piggin, Linux Memory Management, Andrew Morton On Tue, 10 Jul 2007 10:48:04 +0100 Andy Whitcroft <apw@shadowen.org> wrote: > I would have expected all of the is_zonename() checks to include the > zone_is_configured() checks, to allow the optimiser to catch on and > elide the code. > > if (zone_is_configured(ZONE_DMA32) > return zone == zone->zone_pgdat->node_zones + ZONE_DMA32; > else > return 0; > > Perhaps a little helper: > > static inline zone_idx_is(int idx, int target) > { > if (zone_is_configured(target)) > return idx == target; > else > return 0; > } > Ah, this looks nice. > You are able to always assign these as the array is sized on > MAX_POSSIBLE_ZONES, so I would have thought that these could be > statically initialised right? > > static char * const zone_names = { > [ZONE_DMA] = "DMA", > [ZONE_DMA32] = "DMA32", > ... > }; > > > And in fact if you were to simply size sysctl_lowmem_reserve_ratio at > MAX_POSSIBLE_ZONES could you not do the same there too? Then you would > not need to introduce zone_variables_init(). > > int sysctl_lowmem_reserve_ratio[MAX_POSSIBLE_ZONES] = { > [ZONE_DMA] = 256, > [ZONE_DMA32] = 256, > [ZONE_HIGHMEM] = 32 > }; > Oh, it's simpler. thank you. -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zone movable patches comments 2007-07-09 11:04 ` Mel Gorman 2007-07-09 11:44 ` KAMEZAWA Hiroyuki 2007-07-09 12:15 ` Nick Piggin @ 2007-07-09 17:39 ` Christoph Lameter 2 siblings, 0 replies; 19+ messages in thread From: Christoph Lameter @ 2007-07-09 17:39 UTC (permalink / raw) To: Mel Gorman; +Cc: Nick Piggin, Linux Memory Management, Andrew Morton On Mon, 9 Jul 2007, Mel Gorman wrote: > > much overhead if the zone is not populated, but there has been a fair > > bit of work towards taking out unneeded zones. > > > > It could be made configurable as zone_type already has configurable > zones. However, as it is that would always be set on distro kernels for > CONFIG_HUGETLB_PAGE, is there any point? It might make sense for embedded > systems but I've received pushback from Andrew before for trying to introduce > config options that affect the allocator before. Well it could be removed when we get memory compaction right? Its only useful to guarantee reclaimable memory in a certain region when we only have antifrag? The more memory becomes movable the less need for it. > It could but it was named this way for a reason. It was more important that > the administrator get the amount of memory for non-movable allocations > correct than movable allocations. If the size of ZONE_MOVABLE is wrong, > the hugepage pool may not be able to grow as large as desired. If the size > of memory usable of non-movable allocations is wrong, it's worse. Yeah that causes concern. The current situation is that the huge page pool grows until fragmentation makes it impossible to get more. If you would remove ZONE_MOVABLE then that situation would continue to exist. The guarantee is useful as long as we do not have memory defragmentation/compaction because then reclaim can guarantee that the desired number of higher order pages can be obtained through reclaiming pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-07-12 12:11 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-07-09 7:50 zone movable patches comments Nick Piggin 2007-07-09 10:30 ` KAMEZAWA Hiroyuki 2007-07-09 11:04 ` Mel Gorman 2007-07-09 11:44 ` KAMEZAWA Hiroyuki 2007-07-09 12:15 ` Nick Piggin 2007-07-09 13:21 ` Mel Gorman 2007-07-10 7:57 ` Nick Piggin 2007-07-10 9:21 ` Andy Whitcroft 2007-07-10 9:54 ` Yasunori Goto 2007-07-10 10:12 ` Andy Whitcroft 2007-07-10 9:51 ` Mel Gorman 2007-07-10 10:16 ` Nick Piggin 2007-07-10 10:18 ` Nick Piggin 2007-07-10 13:21 ` Mel Gorman 2007-07-12 12:11 ` Andy Whitcroft 2007-07-10 9:08 ` KAMEZAWA Hiroyuki 2007-07-10 9:48 ` Andy Whitcroft 2007-07-10 11:03 ` KAMEZAWA Hiroyuki 2007-07-09 17:39 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox