* Re: [PATCH]Fix: Init page count for all pages during higher order allocs [not found] <20020429202446.A2326@in.ibm.com> @ 2002-04-29 17:40 ` Eric W. Biederman 2002-04-30 5:31 ` Suparna Bhattacharya 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2002-04-29 17:40 UTC (permalink / raw) To: suparna; +Cc: linux-kernel, marcelo, linux-mm Suparna Bhattacharya <suparna@in.ibm.com> writes: > The call to set_page_count(page, 1) in page_alloc.c appears to happen > only for the first page, for order 1 and higher allocations. > This leaves the count for the rest of the pages in that block > uninitialised. Actually it should be zero. This is deliberate because high order pages should not be referenced by their partial pages. It might make sense to add a PG_large flag and then in the immediately following struct page add a pointer to the next page, so you can identify these pages by inspection. Doing something similar to the PG_skip flag. Beyond that I get nervous, that people will treat it as endorsement of doing a high order continuous allocation and then fragmenting the page. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-04-29 17:40 ` [PATCH]Fix: Init page count for all pages during higher order allocs Eric W. Biederman @ 2002-04-30 5:31 ` Suparna Bhattacharya 2002-04-30 14:05 ` Eric W. Biederman 2002-04-30 19:47 ` Andrew Morton 0 siblings, 2 replies; 12+ messages in thread From: Suparna Bhattacharya @ 2002-04-30 5:31 UTC (permalink / raw) To: Eric W. Biederman; +Cc: linux-kernel, marcelo, linux-mm On Mon, Apr 29, 2002 at 11:40:21AM -0600, Eric W. Biederman wrote: > Suparna Bhattacharya <suparna@in.ibm.com> writes: > > > The call to set_page_count(page, 1) in page_alloc.c appears to happen > > only for the first page, for order 1 and higher allocations. > > This leaves the count for the rest of the pages in that block > > uninitialised. > > Actually it should be zero. > > This is deliberate because high order pages should not be referenced by > their partial pages. That sounds reasonable provided there is a way to identify the main page struct corresponding to an area that's part of a higher order page. > It might make sense to add a PG_large flag and > then in the immediately following struct page add a pointer to the next > page, so you can identify these pages by inspection. Doing something > similar to the PG_skip flag. Maybe different solutions could emerge for this in 2.4 and 2.5. Even a PG_partial flag for the partial pages will enable us to traverse back to the main page, and vice-versa to determine the partial pages covered by the main page, without any additional pointers. Is that an acceptable option for 2.4 ? (That's one more page flag ...) It would be good to have a way to determine the order directly from the page struct, without such traversals, at least in 2.5. > > Beyond that I get nervous, that people will treat it as endorsement of > doing a high order continuous allocation and then fragmenting the page. I don't think it would amount to such an endorsement. It's just a matter of replicating the settings from the main page to the partial pages - which might be considered an alternate protocol, though a little inefficient for really high orders. However, having the partial page counts zeroed out probably helps as a safeguard in some situations in view of the page count sanity checks. Or are there any scenarios where you forsee a problem/breakage ? Regards Suparna > > Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-04-30 5:31 ` Suparna Bhattacharya @ 2002-04-30 14:05 ` Eric W. Biederman 2002-04-30 15:08 ` Suparna Bhattacharya 2002-04-30 19:47 ` Andrew Morton 1 sibling, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2002-04-30 14:05 UTC (permalink / raw) To: suparna; +Cc: linux-kernel, marcelo, linux-mm Suparna Bhattacharya <suparna@in.ibm.com> writes: > On Mon, Apr 29, 2002 at 11:40:21AM -0600, Eric W. Biederman wrote: > > Suparna Bhattacharya <suparna@in.ibm.com> writes: > > > > > The call to set_page_count(page, 1) in page_alloc.c appears to happen > > > only for the first page, for order 1 and higher allocations. > > > This leaves the count for the rest of the pages in that block > > > uninitialised. > > > > Actually it should be zero. > > > > This is deliberate because high order pages should not be referenced by > > their partial pages. > > That sounds reasonable provided there is a way to identify the main > page struct corresponding to an area that's part of a higher > order page. Reasonable. For a crash dump you are doing a physical scan through the struct pages correct? Actually I have a stupid question. Given the fact that the kernel keeps most pages in active use why is it worth checking for which pages are not used? > > It might make sense to add a PG_large flag and > > then in the immediately following struct page add a pointer to the next > > page, so you can identify these pages by inspection. Doing something > > similar to the PG_skip flag. > > Maybe different solutions could emerge for this in 2.4 and 2.5. > > Even a PG_partial flag for the partial pages will enable us to > traverse back to the main page, and vice-versa to determine the > partial pages covered by the main page, without any additional > pointers. Is that an acceptable option for 2.4 ? (That's one > more page flag ...) > > It would be good to have a way to determine the order directly > from the page struct, without such traversals, at least in 2.5. It is important the page struct be kept small. Especially for very rarely used features. I can see dedicating a bit that says get all of the information out of the next struct page that is totally unused. And as part of freeing the page the first thing we do is clear that bit. But I can't see a justification for putting any more in the primary struct page. > > > > Beyond that I get nervous, that people will treat it as endorsement of > > doing a high order continuous allocation and then fragmenting the page. > > I don't think it would amount to such an endorsement. It's just a matter > of replicating the settings from the main page to the partial pages - > which might be considered an alternate protocol, though a little > inefficient for really high orders. However, having the partial page > counts zeroed out probably helps as a safeguard in some situations in > view of the page count sanity checks. Or are there any scenarios where > you forsee a problem/breakage ? Using the count on the unused page structs implies you can use them independently. The page count is only accurate on the initial page struct. The one that is used. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-04-30 14:05 ` Eric W. Biederman @ 2002-04-30 15:08 ` Suparna Bhattacharya 0 siblings, 0 replies; 12+ messages in thread From: Suparna Bhattacharya @ 2002-04-30 15:08 UTC (permalink / raw) To: Eric W. Biederman; +Cc: linux-kernel, marcelo, linux-mm On Tue, Apr 30, 2002 at 08:05:58AM -0600, Eric W. Biederman wrote: > Suparna Bhattacharya <suparna@in.ibm.com> writes: > > > On Mon, Apr 29, 2002 at 11:40:21AM -0600, Eric W. Biederman wrote: > > > Suparna Bhattacharya <suparna@in.ibm.com> writes: > > > > > > > The call to set_page_count(page, 1) in page_alloc.c appears to happen > > > > only for the first page, for order 1 and higher allocations. > > > > This leaves the count for the rest of the pages in that block > > > > uninitialised. > > > > > > Actually it should be zero. > > > > > > This is deliberate because high order pages should not be referenced by > > > their partial pages. > > > > That sounds reasonable provided there is a way to identify the main > > page struct corresponding to an area that's part of a higher > > order page. > > Reasonable. For a crash dump you are doing a physical scan through > the struct pages correct? Yes. > > Actually I have a stupid question. Given the fact that the kernel keeps > most pages in active use why is it worth checking for which pages are not > used? You mean slabs, mempools etc will cause pages to be in active use ? Or you mean that page cache will be filled up if there's lots of memory ? That probably depends on the load on the system ... if you have a large memory system that's not heavily loaded, then don't we expect lots of inactive pages ? > > > > It might make sense to add a PG_large flag and > > > then in the immediately following struct page add a pointer to the next > > > page, so you can identify these pages by inspection. Doing something > > > similar to the PG_skip flag. > > > > Maybe different solutions could emerge for this in 2.4 and 2.5. > > > > Even a PG_partial flag for the partial pages will enable us to > > traverse back to the main page, and vice-versa to determine the > > partial pages covered by the main page, without any additional > > pointers. Is that an acceptable option for 2.4 ? (That's one > > more page flag ...) > > > > It would be good to have a way to determine the order directly > > from the page struct, without such traversals, at least in 2.5. > > It is important the page struct be kept small. Especially for very rarely > used features. I can see dedicating a bit that says get all of the information > out of the next struct page that is totally unused. And as part of freeing > the page the first thing we do is clear that bit. But I can't see a justification > for putting any more in the primary struct page. > > > > > > > Beyond that I get nervous, that people will treat it as endorsement of > > > doing a high order continuous allocation and then fragmenting the page. > > > > I don't think it would amount to such an endorsement. It's just a matter > > of replicating the settings from the main page to the partial pages - > > which might be considered an alternate protocol, though a little > > inefficient for really high orders. However, having the partial page > > counts zeroed out probably helps as a safeguard in some situations in > > view of the page count sanity checks. Or are there any scenarios where > > you forsee a problem/breakage ? > > Using the count on the unused page structs implies you can use them > independently. The page count is only accurate on the initial page > struct. The one that is used. Depends on the way we look at it, and how we define the policy. True, the page count is only accurate for the initial page struct in terms of determining the number of references to the page, post allocation. For the other pages, page count is only an indicator of whether it is in use or not - the only user is the allocator, and the allocator will only release them when the main page is released. No one else is supposed to be using/holding those page structs directly. To make any decisions about reference counting of the page etc, one should be looking at the first one, and nothing prevents that. > > Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-04-30 5:31 ` Suparna Bhattacharya 2002-04-30 14:05 ` Eric W. Biederman @ 2002-04-30 19:47 ` Andrew Morton 2002-05-02 8:54 ` Suparna Bhattacharya 1 sibling, 1 reply; 12+ messages in thread From: Andrew Morton @ 2002-04-30 19:47 UTC (permalink / raw) To: suparna; +Cc: Eric W. Biederman, linux-kernel, marcelo, linux-mm Suparna Bhattacharya wrote: > > ... > > It might make sense to add a PG_large flag and > > then in the immediately following struct page add a pointer to the next > > page, so you can identify these pages by inspection. Doing something > > similar to the PG_skip flag. > > Maybe different solutions could emerge for this in 2.4 and 2.5. > > Even a PG_partial flag for the partial pages will enable us to > traverse back to the main page, and vice-versa to determine the > partial pages covered by the main page, without any additional > pointers. Is that an acceptable option for 2.4 ? (That's one > more page flag ...) > I'd suggest that you go with the PG_partial thing for the follow-on pages. If you have a patch for crashdumps, and that patch is included in the main kernel, and it happens to rely on the addition of a new page flag well gee, that's a tiny change. Plus it only affects code paths in the `order > 0' case, which are rare. Plus you can independently use PG_partial to detect when someone is freeing pages from the wrong part of a higher-order allocation - that's a feature ;) An alternative is to just set PG_inuse against _all_ pages in rmqueue(), and clear PG_inuse against all pages in __free_pages_ok(). Which seems cleaner, and would fix other problems, I suspect. - -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-04-30 19:47 ` Andrew Morton @ 2002-05-02 8:54 ` Suparna Bhattacharya 2002-05-02 13:08 ` Hugh Dickins 2002-05-07 7:34 ` Bharata B Rao 0 siblings, 2 replies; 12+ messages in thread From: Suparna Bhattacharya @ 2002-05-02 8:54 UTC (permalink / raw) To: Andrew Morton; +Cc: Eric W. Biederman, linux-kernel, marcelo, linux-mm On Tue, Apr 30, 2002 at 12:47:24PM -0700, Andrew Morton wrote: > Suparna Bhattacharya wrote: > > > > ... > > > It might make sense to add a PG_large flag and > > > then in the immediately following struct page add a pointer to the next > > > page, so you can identify these pages by inspection. Doing something > > > similar to the PG_skip flag. > > > > Maybe different solutions could emerge for this in 2.4 and 2.5. > > > > Even a PG_partial flag for the partial pages will enable us to > > traverse back to the main page, and vice-versa to determine the > > partial pages covered by the main page, without any additional > > pointers. Is that an acceptable option for 2.4 ? (That's one > > more page flag ...) > > > > I'd suggest that you go with the PG_partial thing for the > follow-on pages. > > If you have a patch for crashdumps, and that patch is > included in the main kernel, and it happens to rely on the > addition of a new page flag well gee, that's a tiny change. > > Plus it only affects code paths in the `order > 0' case, > which are rare. > > Plus you can independently use PG_partial to detect when > someone is freeing pages from the wrong part of a higher-order > allocation - that's a feature ;) I guess the current check for page count during free should catch this too in general. Possibly PG_partial would be more reliable because the page count is more susceptible to modification as it is touched more often ... > > An alternative is to just set PG_inuse against _all_ pages > in rmqueue(), and clear PG_inuse against all pages in > __free_pages_ok(). Which seems cleaner, and would fix other > problems, I suspect. This works well for us. If no one minds the extra flag, and it is preferable to the option of initializing page count for higher order pages, we'll go ahead and do this. BTW, with PG_inuse, we can detect higher order pages too - ones which are in use, but have a zero page count, i.e. PG_inuse + zero page count == equivalent to == PG_partial. So it is possible to locate the main page (or initial page) of the higher order area, just as with PG_partial. Likewise, with PG_partial, since this would be set and cleared during alloc/free, we can figure out if a page is in use by checking if page count is non-zero or this is a partial page, i.e. PG_Partial | page_count > 0 == equivalent to == PG_inuse. We can take any one way and define appropriate macros to get both effects. Regards Suparna > > - -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-05-02 8:54 ` Suparna Bhattacharya @ 2002-05-02 13:08 ` Hugh Dickins 2002-05-02 21:13 ` Daniel Phillips 2002-05-03 12:24 ` Suparna Bhattacharya 2002-05-07 7:34 ` Bharata B Rao 1 sibling, 2 replies; 12+ messages in thread From: Hugh Dickins @ 2002-05-02 13:08 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm On Thu, 2 May 2002, Suparna Bhattacharya wrote: [ discussion of PG_inuse / PG_partial / PG_large snipped ] Any of those can handle that job (distinguishing non0orders), but I do believe you want a further PG_ flag for crash dumps. The pages allocated GFP_HIGHUSER are about as uninteresting as the free pages: the cases where they're interesting (for analyzing a kernel crash, as opposed to snooping on a crashed customer's personal data!) are _very_ rare, but the waste of space and time putting them in a crash dump is very often abominable, and of course worse on larger machines. As someone else noted in this thread, the kernel tries to keep pages in use anyway, so omitting free pages won't buy you a great deal on its own. And I think it's to omit free pages that you want to distinguish the count 0 continuations from the count 0 frees? PG_highuser? PG_data? Or inverses: PG_internal? PG_dumpable? I think not PG_highuser, because it's too specific to what just happens to be the best, but inadequate, test I've found so far. A first guess is that pages allocated with __GFP_HIGHMEM can be omitted from a dump, but that works out wrong on vmalloced space and on highmem pagetables, both of which are important in a dump. GFP_HIGHUSER test dumps vmalloced pages, and both Andrea's 2.4 or Ingo's 2.5 highmem pagetables. But (notably in reboot after crash: dump copied from swap) memory can be full of GFP_USER blockdev pages. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-05-02 13:08 ` Hugh Dickins @ 2002-05-02 21:13 ` Daniel Phillips 2002-05-03 12:24 ` Suparna Bhattacharya 1 sibling, 0 replies; 12+ messages in thread From: Daniel Phillips @ 2002-05-02 21:13 UTC (permalink / raw) To: Hugh Dickins, Suparna Bhattacharya Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm On Thursday 02 May 2002 15:08, Hugh Dickins wrote: > On Thu, 2 May 2002, Suparna Bhattacharya wrote: > As someone else noted in this thread, the kernel tries to keep > pages in use anyway, so omitting free pages won't buy you a great > deal on its own. And I think it's to omit free pages that you want > to distinguish the count 0 continuations from the count 0 frees? Then why not count=-1 for the continuation pages? -- Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-05-02 13:08 ` Hugh Dickins 2002-05-02 21:13 ` Daniel Phillips @ 2002-05-03 12:24 ` Suparna Bhattacharya 2002-05-03 13:46 ` Hugh Dickins 1 sibling, 1 reply; 12+ messages in thread From: Suparna Bhattacharya @ 2002-05-03 12:24 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm On Thu, May 02, 2002 at 02:08:50PM +0100, Hugh Dickins wrote: > On Thu, 2 May 2002, Suparna Bhattacharya wrote: > [ discussion of PG_inuse / PG_partial / PG_large snipped ] > > Any of those can handle that job (distinguishing non0orders), > but I do believe you want a further PG_ flag for crash dumps. > > The pages allocated GFP_HIGHUSER are about as uninteresting > as the free pages: the cases where they're interesting (for > analyzing a kernel crash, as opposed to snooping on a crashed > customer's personal data!) are _very_ rare, but the waste of > space and time putting them in a crash dump is very often > abominable, and of course worse on larger machines. Well, we are working on various options to be able to dump pages selectively, and PG_inuse is by no means the only check. For example we have an option that tries to exclude non-kernel pages from the dump based on a simple heuristic of checking the PG_lru flag (actually exclude LRU pages and unreferenced pages). This works for vmalloc'ed pages too. > > As someone else noted in this thread, the kernel tries to keep > pages in use anyway, so omitting free pages won't buy you a great > deal on its own. And I think it's to omit free pages that you want True, it is only when a system is very lightly loaded (plus not running for long) and has lots of memory that we'd expect many free pages. Maybe that not a very typical situation in a realistic workload, but one can envision further checks that may be helpful. At least in a low load situation we don't want to confuse free pages with kernel pages (in the example I discussed above). > to distinguish the count 0 continuations from the count 0 frees? > > PG_highuser? PG_data? Or inverses: PG_internal? PG_dumpable? > I think not PG_highuser, because it's too specific to what just > happens to be the best, but inadequate, test I've found so far. I wouldn't want this kind of a flag to be specific to dump, but am really looking at little things that help a with generic page classification scheme that also addresses the needs for dump. We would like dump to make its decisions based on a configured requirement e.g depending on the dump level, and adapt or tune our heuristics without changing the rest of the kernel. The flags should just indicate the nature of the page - it's up to dump or any other kind of analyser to decide whether to pick it up or not. For different kind of situations and problems one might need more or less memory to be dumped, also possibly depending on availability of space. If ever we introduce anything specifically for dump, it could be a PG_dumped indicator to help avoid dumping already dumped pages in a multi-pass selection scheme, but that's something for later ... > > A first guess is that pages allocated with __GFP_HIGHMEM can be > omitted from a dump, but that works out wrong on vmalloced space > and on highmem pagetables, both of which are important in a dump. > GFP_HIGHUSER test dumps vmalloced pages, and both Andrea's 2.4 or > Ingo's 2.5 highmem pagetables. But (notably in reboot after crash: > dump copied from swap) memory can be full of GFP_USER blockdev pages. > > Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-05-03 12:24 ` Suparna Bhattacharya @ 2002-05-03 13:46 ` Hugh Dickins 2002-05-07 10:11 ` Suparna Bhattacharya 0 siblings, 1 reply; 12+ messages in thread From: Hugh Dickins @ 2002-05-03 13:46 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm On Fri, 3 May 2002, Suparna Bhattacharya wrote: > > For example we have an option that tries to exclude non-kernel > pages from the dump based on a simple heuristic of checking the > PG_lru flag (actually exclude LRU pages and unreferenced pages). I hadn't thought of using PG_lru (last thought about it before anonymous pages were put on the LRU in 2.4.14): good idea, seems much more appealing than my extra flag for GFP_HIGHUSER. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-05-03 13:46 ` Hugh Dickins @ 2002-05-07 10:11 ` Suparna Bhattacharya 0 siblings, 0 replies; 12+ messages in thread From: Suparna Bhattacharya @ 2002-05-07 10:11 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm On Fri, May 03, 2002 at 02:46:34PM +0100, Hugh Dickins wrote: > On Fri, 3 May 2002, Suparna Bhattacharya wrote: > > > > For example we have an option that tries to exclude non-kernel > > pages from the dump based on a simple heuristic of checking the > > PG_lru flag (actually exclude LRU pages and unreferenced pages). > > I hadn't thought of using PG_lru (last thought about it before > anonymous pages were put on the LRU in 2.4.14): good idea, Owe that one to Andrew Morton mostly for suggesting a PG_lru check in the context of a way to identify Anon pages. Regards Suparna -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs 2002-05-02 8:54 ` Suparna Bhattacharya 2002-05-02 13:08 ` Hugh Dickins @ 2002-05-07 7:34 ` Bharata B Rao 1 sibling, 0 replies; 12+ messages in thread From: Bharata B Rao @ 2002-05-07 7:34 UTC (permalink / raw) To: linux-kernel; +Cc: Suparna, Andrew Morton, Eric W. Biederman, marcelo, linux-mm On Thu, May 02, 2002 at 02:24:41PM +0530, Suparna Bhattacharya wrote: > On Tue, Apr 30, 2002 at 12:47:24PM -0700, Andrew Morton wrote: > > > An alternative is to just set PG_inuse against _all_ pages > > in rmqueue(), and clear PG_inuse against all pages in > > __free_pages_ok(). Which seems cleaner, and would fix other > > problems, I suspect. > > This works well for us. If no one minds the extra flag, and it > is preferable to the option of initializing page count for > higher order pages, we'll go ahead and do this. > Here is a patch against 2.4.18 kernel which uses PG_inuse page flag. As has been discussed in this thread, this flag will be used to track all pages (including partial higher order pages) allocated by the kernel. Per Andi Kleen's suggestion it is preferable to use __set_bit/__clear_bit in this case. However __clear_bit doesn't seem to be defined for all architectures in 2.4, and hence we couldn't use it in arch independent code as yet. A backport from 2.5 can be contemplated, but even there definitions seem to be missing for some archs (e.g mips). diff -urN -X dontdiff 2418-pure/include/linux/mm.h linux-2.4.18+flg/include/linux/mm.h --- 2418-pure/include/linux/mm.h Fri Dec 21 23:12:03 2001 +++ linux-2.4.18+flg/include/linux/mm.h Tue May 7 12:46:53 2002 @@ -285,6 +285,9 @@ #define PG_arch_1 13 #define PG_reserved 14 #define PG_launder 15 /* written out by VM pressure.. */ +#define PG_inuse 16 /* set for all pages(including higher + order partial-pages) allocated + by the kernel. */ /* Make it prettier to test the above... */ #define UnlockPage(page) unlock_page(page) @@ -301,6 +304,15 @@ #define SetPageChecked(page) set_bit(PG_checked, &(page)->flags) #define PageLaunder(page) test_bit(PG_launder, &(page)->flags) #define SetPageLaunder(page) set_bit(PG_launder, &(page)->flags) + +/* + * Using the non-atomic version __set_bit as per Andi Kleen's suggestion. + * Currently __clear_bit is not available on all architectures in 2.4. + */ +#define PageInuse(page) test_bit(PG_inuse, &(page)->flags) +#define SetPageInuse(page) __set_bit(PG_inuse, &(page)->flags) +/* Replace this by __clear_bit in 2.5 */ +#define ClearPageInuse(page) clear_bit(PG_inuse, &(page)->flags) extern void FASTCALL(set_page_dirty(struct page *)); diff -urN -X dontdiff 2418-pure/mm/page_alloc.c linux-2.4.18+flg/mm/page_alloc.c --- 2418-pure/mm/page_alloc.c Tue Feb 26 01:08:14 2002 +++ linux-2.4.18+flg/mm/page_alloc.c Thu May 2 15:16:05 2002 @@ -69,6 +69,14 @@ free_area_t *area; struct page *base; zone_t *zone; + unsigned int i; + + i = 1UL << order; + page += i; + do { + page--; + ClearPageInuse(page); + } while (--i); /* Yes, think what happens when other parts of the kernel take * a reference to a page in order to pin it for io. -ben @@ -181,7 +189,7 @@ static struct page * rmqueue(zone_t *zone, unsigned int order) { free_area_t * area = zone->free_area + order; - unsigned int curr_order = order; + unsigned int i, curr_order = order; struct list_head *head, *curr; unsigned long flags; struct page *page; @@ -206,6 +214,13 @@ page = expand(zone, page, index, order, curr_order, area); spin_unlock_irqrestore(&zone->lock, flags); + i = 1UL << order; + page += i; + do { + page--; + SetPageInuse(page); + } while (--i); + set_page_count(page, 1); if (BAD_RANGE(zone,page)) BUG(); @@ -236,6 +251,7 @@ { struct page * page = NULL; int __freed = 0; + unsigned int i; if (!(gfp_mask & __GFP_WAIT)) goto out; @@ -264,9 +280,15 @@ if (tmp->index == order && memclass(tmp->zone, classzone)) { list_del(entry); current->nr_local_pages--; - set_page_count(tmp, 1); - page = tmp; + i = 1UL << order; + page = tmp + i; + do { + page--; + SetPageInuse(page); + } while (--i); + + set_page_count(page, 1); if (page->buffers) BUG(); if (page->mapping) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2002-05-07 10:11 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20020429202446.A2326@in.ibm.com>
2002-04-29 17:40 ` [PATCH]Fix: Init page count for all pages during higher order allocs Eric W. Biederman
2002-04-30 5:31 ` Suparna Bhattacharya
2002-04-30 14:05 ` Eric W. Biederman
2002-04-30 15:08 ` Suparna Bhattacharya
2002-04-30 19:47 ` Andrew Morton
2002-05-02 8:54 ` Suparna Bhattacharya
2002-05-02 13:08 ` Hugh Dickins
2002-05-02 21:13 ` Daniel Phillips
2002-05-03 12:24 ` Suparna Bhattacharya
2002-05-03 13:46 ` Hugh Dickins
2002-05-07 10:11 ` Suparna Bhattacharya
2002-05-07 7:34 ` Bharata B Rao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox