linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
       [not found] <20020429202446.A2326@in.ibm.com>
@ 2002-04-29 17:40 ` Eric W. Biederman
  2002-04-30  5:31   ` Suparna Bhattacharya
  0 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2002-04-29 17:40 UTC (permalink / raw)
  To: suparna; +Cc: linux-kernel, marcelo, linux-mm

Suparna Bhattacharya <suparna@in.ibm.com> writes:

> The call to set_page_count(page, 1) in page_alloc.c appears to happen 
> only for the first page, for order 1 and higher allocations.
> This leaves the count for the rest of the pages in that block 
> uninitialised.

Actually it should be zero.

This is deliberate because high order pages should not be referenced by
their partial pages.  It might make sense to add a PG_large flag and
then in the immediately following struct page add a pointer to the next
page, so you can identify these pages by inspection.  Doing something
similar to the PG_skip flag.

Beyond that I get nervous, that people will treat it as endorsement of
doing a high order continuous allocation and then fragmenting the page.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-04-29 17:40 ` [PATCH]Fix: Init page count for all pages during higher order allocs Eric W. Biederman
@ 2002-04-30  5:31   ` Suparna Bhattacharya
  2002-04-30 14:05     ` Eric W. Biederman
  2002-04-30 19:47     ` Andrew Morton
  0 siblings, 2 replies; 12+ messages in thread
From: Suparna Bhattacharya @ 2002-04-30  5:31 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, marcelo, linux-mm

On Mon, Apr 29, 2002 at 11:40:21AM -0600, Eric W. Biederman wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> writes:
> 
> > The call to set_page_count(page, 1) in page_alloc.c appears to happen 
> > only for the first page, for order 1 and higher allocations.
> > This leaves the count for the rest of the pages in that block 
> > uninitialised.
> 
> Actually it should be zero.
> 
> This is deliberate because high order pages should not be referenced by
> their partial pages.  

That sounds reasonable provided there is a way to identify the main 
page struct corresponding to an area that's part of a higher 
order page. 

> It might make sense to add a PG_large flag and
> then in the immediately following struct page add a pointer to the next
> page, so you can identify these pages by inspection.  Doing something
> similar to the PG_skip flag.

Maybe different solutions could emerge for this in 2.4 and 2.5. 

Even a PG_partial flag for the partial pages will enable us to
traverse back to the main page, and vice-versa to determine the
partial pages covered by the main page, without any additional
pointers. Is that an acceptable option for 2.4 ? (That's one
more page flag ...)

It would be good to have a way to determine the order directly
from the page struct, without such traversals, at least in 2.5. 

> 
> Beyond that I get nervous, that people will treat it as endorsement of
> doing a high order continuous allocation and then fragmenting the page.

I don't think it would amount to such an endorsement. It's just a matter
of replicating the settings from the main page to the partial pages - 
which might be considered an alternate protocol, though a little 
inefficient for really high orders. However, having the partial page 
counts zeroed out probably helps as a safeguard in some situations in
view of the page count sanity checks. Or are there any scenarios where 
you forsee a problem/breakage ?

Regards
Suparna

> 
> Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-04-30  5:31   ` Suparna Bhattacharya
@ 2002-04-30 14:05     ` Eric W. Biederman
  2002-04-30 15:08       ` Suparna Bhattacharya
  2002-04-30 19:47     ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2002-04-30 14:05 UTC (permalink / raw)
  To: suparna; +Cc: linux-kernel, marcelo, linux-mm

Suparna Bhattacharya <suparna@in.ibm.com> writes:

> On Mon, Apr 29, 2002 at 11:40:21AM -0600, Eric W. Biederman wrote:
> > Suparna Bhattacharya <suparna@in.ibm.com> writes:
> > 
> > > The call to set_page_count(page, 1) in page_alloc.c appears to happen 
> > > only for the first page, for order 1 and higher allocations.
> > > This leaves the count for the rest of the pages in that block 
> > > uninitialised.
> > 
> > Actually it should be zero.
> > 
> > This is deliberate because high order pages should not be referenced by
> > their partial pages.  
> 
> That sounds reasonable provided there is a way to identify the main 
> page struct corresponding to an area that's part of a higher 
> order page. 

Reasonable.  For a crash dump you are doing a physical scan through
the struct pages correct?

Actually I have a stupid question.  Given the fact that the kernel keeps
most pages in active use why is it worth checking for which pages are not
used?
 
> > It might make sense to add a PG_large flag and
> > then in the immediately following struct page add a pointer to the next
> > page, so you can identify these pages by inspection.  Doing something
> > similar to the PG_skip flag.
> 
> Maybe different solutions could emerge for this in 2.4 and 2.5. 
> 
> Even a PG_partial flag for the partial pages will enable us to
> traverse back to the main page, and vice-versa to determine the
> partial pages covered by the main page, without any additional
> pointers. Is that an acceptable option for 2.4 ? (That's one
> more page flag ...)
> 
> It would be good to have a way to determine the order directly
> from the page struct, without such traversals, at least in 2.5. 

It is important the page struct be kept small.  Especially for very rarely
used features.  I can see dedicating a bit that says get all of the information
out of the next struct page that is totally unused.  And as part of freeing
the page the first thing we do is clear that bit.  But I can't see a justification
for putting any more in the primary struct page.

> > 
> > Beyond that I get nervous, that people will treat it as endorsement of
> > doing a high order continuous allocation and then fragmenting the page.
> 
> I don't think it would amount to such an endorsement. It's just a matter
> of replicating the settings from the main page to the partial pages - 
> which might be considered an alternate protocol, though a little 
> inefficient for really high orders. However, having the partial page 
> counts zeroed out probably helps as a safeguard in some situations in
> view of the page count sanity checks. Or are there any scenarios where 
> you forsee a problem/breakage ?

Using the count on the unused page structs implies you can use them
independently.  The page count is only accurate on the initial page
struct.  The one that is used.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-04-30 14:05     ` Eric W. Biederman
@ 2002-04-30 15:08       ` Suparna Bhattacharya
  0 siblings, 0 replies; 12+ messages in thread
From: Suparna Bhattacharya @ 2002-04-30 15:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, marcelo, linux-mm

On Tue, Apr 30, 2002 at 08:05:58AM -0600, Eric W. Biederman wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> writes:
> 
> > On Mon, Apr 29, 2002 at 11:40:21AM -0600, Eric W. Biederman wrote:
> > > Suparna Bhattacharya <suparna@in.ibm.com> writes:
> > > 
> > > > The call to set_page_count(page, 1) in page_alloc.c appears to happen 
> > > > only for the first page, for order 1 and higher allocations.
> > > > This leaves the count for the rest of the pages in that block 
> > > > uninitialised.
> > > 
> > > Actually it should be zero.
> > > 
> > > This is deliberate because high order pages should not be referenced by
> > > their partial pages.  
> > 
> > That sounds reasonable provided there is a way to identify the main 
> > page struct corresponding to an area that's part of a higher 
> > order page. 
> 
> Reasonable.  For a crash dump you are doing a physical scan through
> the struct pages correct?

Yes. 

> 
> Actually I have a stupid question.  Given the fact that the kernel keeps
> most pages in active use why is it worth checking for which pages are not
> used?

You mean slabs, mempools etc will cause pages to be in active use ? Or
you mean that page cache will be filled up if there's lots of memory ?
That probably depends on the load on the system ... if you have a 
large memory system that's not heavily loaded, then don't we expect
lots of inactive pages ?


>  
> > > It might make sense to add a PG_large flag and
> > > then in the immediately following struct page add a pointer to the next
> > > page, so you can identify these pages by inspection.  Doing something
> > > similar to the PG_skip flag.
> > 
> > Maybe different solutions could emerge for this in 2.4 and 2.5. 
> > 
> > Even a PG_partial flag for the partial pages will enable us to
> > traverse back to the main page, and vice-versa to determine the
> > partial pages covered by the main page, without any additional
> > pointers. Is that an acceptable option for 2.4 ? (That's one
> > more page flag ...)
> > 
> > It would be good to have a way to determine the order directly
> > from the page struct, without such traversals, at least in 2.5. 
> 
> It is important the page struct be kept small.  Especially for very rarely
> used features.  I can see dedicating a bit that says get all of the information
> out of the next struct page that is totally unused.  And as part of freeing
> the page the first thing we do is clear that bit.  But I can't see a justification
> for putting any more in the primary struct page.
> 
> > > 
> > > Beyond that I get nervous, that people will treat it as endorsement of
> > > doing a high order continuous allocation and then fragmenting the page.
> > 
> > I don't think it would amount to such an endorsement. It's just a matter
> > of replicating the settings from the main page to the partial pages - 
> > which might be considered an alternate protocol, though a little 
> > inefficient for really high orders. However, having the partial page 
> > counts zeroed out probably helps as a safeguard in some situations in
> > view of the page count sanity checks. Or are there any scenarios where 
> > you forsee a problem/breakage ?
> 
> Using the count on the unused page structs implies you can use them
> independently.  The page count is only accurate on the initial page
> struct.  The one that is used.

Depends on the way we look at it, and how we define the policy.
True, the page count is only accurate for the initial page struct in
terms of determining the number of references to the page, post allocation.
For the other pages, page count is only an indicator of whether it is
in use or not - the only user is the allocator, and the allocator will
only release them when the main page is released. No one else is
supposed to be using/holding those page structs directly.

To make any decisions about reference counting of the page etc, one 
should be looking at the first one, and nothing prevents that. 

> 
> Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-04-30  5:31   ` Suparna Bhattacharya
  2002-04-30 14:05     ` Eric W. Biederman
@ 2002-04-30 19:47     ` Andrew Morton
  2002-05-02  8:54       ` Suparna Bhattacharya
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-04-30 19:47 UTC (permalink / raw)
  To: suparna; +Cc: Eric W. Biederman, linux-kernel, marcelo, linux-mm

Suparna Bhattacharya wrote:
> 
> ...
> > It might make sense to add a PG_large flag and
> > then in the immediately following struct page add a pointer to the next
> > page, so you can identify these pages by inspection.  Doing something
> > similar to the PG_skip flag.
> 
> Maybe different solutions could emerge for this in 2.4 and 2.5.
> 
> Even a PG_partial flag for the partial pages will enable us to
> traverse back to the main page, and vice-versa to determine the
> partial pages covered by the main page, without any additional
> pointers. Is that an acceptable option for 2.4 ? (That's one
> more page flag ...)
> 

I'd suggest that you go with the PG_partial thing for the
follow-on pages.

If you have a patch for crashdumps, and that patch is
included in the main kernel, and it happens to rely on the
addition of a new page flag well gee, that's a tiny change.

Plus it only affects code paths in the `order > 0' case,
which are rare.

Plus you can independently use PG_partial to detect when
someone is freeing pages from the wrong part of a higher-order
allocation - that's a feature ;)

An alternative is to just set PG_inuse against _all_ pages
in rmqueue(), and clear PG_inuse against all pages in
__free_pages_ok().  Which seems cleaner, and would fix other
problems, I suspect.

-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-04-30 19:47     ` Andrew Morton
@ 2002-05-02  8:54       ` Suparna Bhattacharya
  2002-05-02 13:08         ` Hugh Dickins
  2002-05-07  7:34         ` Bharata B Rao
  0 siblings, 2 replies; 12+ messages in thread
From: Suparna Bhattacharya @ 2002-05-02  8:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Eric W. Biederman, linux-kernel, marcelo, linux-mm

On Tue, Apr 30, 2002 at 12:47:24PM -0700, Andrew Morton wrote:
> Suparna Bhattacharya wrote:
> > 
> > ...
> > > It might make sense to add a PG_large flag and
> > > then in the immediately following struct page add a pointer to the next
> > > page, so you can identify these pages by inspection.  Doing something
> > > similar to the PG_skip flag.
> > 
> > Maybe different solutions could emerge for this in 2.4 and 2.5.
> > 
> > Even a PG_partial flag for the partial pages will enable us to
> > traverse back to the main page, and vice-versa to determine the
> > partial pages covered by the main page, without any additional
> > pointers. Is that an acceptable option for 2.4 ? (That's one
> > more page flag ...)
> > 
> 
> I'd suggest that you go with the PG_partial thing for the
> follow-on pages.
> 
> If you have a patch for crashdumps, and that patch is
> included in the main kernel, and it happens to rely on the
> addition of a new page flag well gee, that's a tiny change.
> 
> Plus it only affects code paths in the `order > 0' case,
> which are rare.
> 
> Plus you can independently use PG_partial to detect when
> someone is freeing pages from the wrong part of a higher-order
> allocation - that's a feature ;)

I guess the current check for page count during free should catch
this too in general. Possibly PG_partial would be more reliable 
because the page count is more susceptible to modification as it
is touched more often ... 

> 
> An alternative is to just set PG_inuse against _all_ pages
> in rmqueue(), and clear PG_inuse against all pages in
> __free_pages_ok().  Which seems cleaner, and would fix other
> problems, I suspect.

This works well for us. If no one minds the extra flag, and it
is preferable to the option of initializing page count for 
higher order pages, we'll go ahead and do this.
 
BTW, with PG_inuse, we can detect higher order pages too - ones
which are in use, but have a zero page count, i.e. PG_inuse +
zero page count == equivalent to == PG_partial. So it is possible
to locate the main page (or initial page) of the higher order
area, just as with PG_partial.

Likewise, with PG_partial, since this would be set and cleared
during alloc/free, we can figure out if a page is in use by
checking if page count is non-zero or this is a partial page,
i.e. PG_Partial | page_count > 0 == equivalent to == PG_inuse.

We can take any one way and define appropriate macros to
get both effects.

Regards
Suparna

> 
> -
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-05-02  8:54       ` Suparna Bhattacharya
@ 2002-05-02 13:08         ` Hugh Dickins
  2002-05-02 21:13           ` Daniel Phillips
  2002-05-03 12:24           ` Suparna Bhattacharya
  2002-05-07  7:34         ` Bharata B Rao
  1 sibling, 2 replies; 12+ messages in thread
From: Hugh Dickins @ 2002-05-02 13:08 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm

On Thu, 2 May 2002, Suparna Bhattacharya wrote:
[ discussion of PG_inuse / PG_partial / PG_large snipped ]

Any of those can handle that job (distinguishing non0orders),
but I do believe you want a further PG_ flag for crash dumps.

The pages allocated GFP_HIGHUSER are about as uninteresting
as the free pages: the cases where they're interesting (for
analyzing a kernel crash, as opposed to snooping on a crashed
customer's personal data!) are _very_ rare, but the waste of
space and time putting them in a crash dump is very often
abominable, and of course worse on larger machines.

As someone else noted in this thread, the kernel tries to keep
pages in use anyway, so omitting free pages won't buy you a great
deal on its own.  And I think it's to omit free pages that you want
to distinguish the count 0 continuations from the count 0 frees?

PG_highuser? PG_data?  Or inverses: PG_internal? PG_dumpable?
I think not PG_highuser, because it's too specific to what just
happens to be the best, but inadequate, test I've found so far.

A first guess is that pages allocated with __GFP_HIGHMEM can be
omitted from a dump, but that works out wrong on vmalloced space
and on highmem pagetables, both of which are important in a dump.
GFP_HIGHUSER test dumps vmalloced pages, and both Andrea's 2.4 or
Ingo's 2.5 highmem pagetables.  But (notably in reboot after crash:
dump copied from swap) memory can be full of GFP_USER blockdev pages.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-05-02 13:08         ` Hugh Dickins
@ 2002-05-02 21:13           ` Daniel Phillips
  2002-05-03 12:24           ` Suparna Bhattacharya
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Phillips @ 2002-05-02 21:13 UTC (permalink / raw)
  To: Hugh Dickins, Suparna Bhattacharya
  Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm

On Thursday 02 May 2002 15:08, Hugh Dickins wrote:
> On Thu, 2 May 2002, Suparna Bhattacharya wrote:
> As someone else noted in this thread, the kernel tries to keep
> pages in use anyway, so omitting free pages won't buy you a great
> deal on its own.  And I think it's to omit free pages that you want
> to distinguish the count 0 continuations from the count 0 frees?

Then why not count=-1 for the continuation pages?

-- 
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-05-02 13:08         ` Hugh Dickins
  2002-05-02 21:13           ` Daniel Phillips
@ 2002-05-03 12:24           ` Suparna Bhattacharya
  2002-05-03 13:46             ` Hugh Dickins
  1 sibling, 1 reply; 12+ messages in thread
From: Suparna Bhattacharya @ 2002-05-03 12:24 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm

On Thu, May 02, 2002 at 02:08:50PM +0100, Hugh Dickins wrote:
> On Thu, 2 May 2002, Suparna Bhattacharya wrote:
> [ discussion of PG_inuse / PG_partial / PG_large snipped ]
> 
> Any of those can handle that job (distinguishing non0orders),
> but I do believe you want a further PG_ flag for crash dumps.
> 
> The pages allocated GFP_HIGHUSER are about as uninteresting
> as the free pages: the cases where they're interesting (for
> analyzing a kernel crash, as opposed to snooping on a crashed
> customer's personal data!) are _very_ rare, but the waste of
> space and time putting them in a crash dump is very often
> abominable, and of course worse on larger machines.

Well, we are working on various options to be able to dump
pages selectively, and PG_inuse is by no means the only check.
For example we have an option that tries to exclude non-kernel
pages from the dump based on a simple heuristic of checking the
PG_lru flag (actually exclude LRU pages and unreferenced pages). 
This works for vmalloc'ed pages too.

> 
> As someone else noted in this thread, the kernel tries to keep
> pages in use anyway, so omitting free pages won't buy you a great
> deal on its own.  And I think it's to omit free pages that you want

True, it is only when a system is very lightly loaded (plus not 
running for long) and has lots of memory that we'd expect 
many free pages. Maybe that not a very typical situation in a 
realistic workload, but one can envision further checks that may 
be helpful. At least in a low load situation we don't want to 
confuse free pages with kernel pages (in the example I discussed 
above).

> to distinguish the count 0 continuations from the count 0 frees?
> 
> PG_highuser? PG_data?  Or inverses: PG_internal? PG_dumpable?
> I think not PG_highuser, because it's too specific to what just
> happens to be the best, but inadequate, test I've found so far.

I wouldn't want this kind of a flag to be specific to dump, but 
am really looking at little things that help a with generic page 
classification scheme that also addresses the needs for dump.
We would like dump to make its decisions based on a configured
requirement e.g depending on the dump level, and adapt or tune
our heuristics without changing the rest of the kernel.

The flags should just indicate the nature of the page - it's up to
dump or any other kind of analyser to decide whether to pick it 
up or not. For different kind of situations and problems one
might need more or less memory to be dumped, also possibly
depending on availability of space.

If ever we introduce anything specifically for dump, it could be
a PG_dumped indicator to help avoid dumping already dumped pages
in a multi-pass selection scheme, but that's something for later
... 

> 
> A first guess is that pages allocated with __GFP_HIGHMEM can be
> omitted from a dump, but that works out wrong on vmalloced space
> and on highmem pagetables, both of which are important in a dump.
> GFP_HIGHUSER test dumps vmalloced pages, and both Andrea's 2.4 or
> Ingo's 2.5 highmem pagetables.  But (notably in reboot after crash:
> dump copied from swap) memory can be full of GFP_USER blockdev pages.
> 
> Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-05-03 12:24           ` Suparna Bhattacharya
@ 2002-05-03 13:46             ` Hugh Dickins
  2002-05-07 10:11               ` Suparna Bhattacharya
  0 siblings, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2002-05-03 13:46 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm

On Fri, 3 May 2002, Suparna Bhattacharya wrote:
> 
> For example we have an option that tries to exclude non-kernel
> pages from the dump based on a simple heuristic of checking the
> PG_lru flag (actually exclude LRU pages and unreferenced pages). 

I hadn't thought of using PG_lru (last thought about it before
anonymous pages were put on the LRU in 2.4.14): good idea,
seems much more appealing than my extra flag for GFP_HIGHUSER.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-05-02  8:54       ` Suparna Bhattacharya
  2002-05-02 13:08         ` Hugh Dickins
@ 2002-05-07  7:34         ` Bharata B Rao
  1 sibling, 0 replies; 12+ messages in thread
From: Bharata B Rao @ 2002-05-07  7:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Suparna, Andrew Morton, Eric W. Biederman, marcelo, linux-mm

On Thu, May 02, 2002 at 02:24:41PM +0530, Suparna Bhattacharya wrote:
> On Tue, Apr 30, 2002 at 12:47:24PM -0700, Andrew Morton wrote:
> 
> > An alternative is to just set PG_inuse against _all_ pages
> > in rmqueue(), and clear PG_inuse against all pages in
> > __free_pages_ok().  Which seems cleaner, and would fix other
> > problems, I suspect.
> 
> This works well for us. If no one minds the extra flag, and it
> is preferable to the option of initializing page count for 
> higher order pages, we'll go ahead and do this.
>  

Here is a patch against 2.4.18 kernel which uses PG_inuse page flag.
As has been discussed in this thread, this flag will be used to track all 
pages (including partial higher order pages) allocated by the kernel.

Per Andi Kleen's suggestion it is preferable to use __set_bit/__clear_bit
in this case. However __clear_bit doesn't seem to be defined for all
architectures in 2.4, and hence we couldn't use it in arch
independent code as yet. A backport from 2.5 can be contemplated, but
even there definitions seem to be missing for some archs (e.g mips).


diff -urN -X dontdiff 2418-pure/include/linux/mm.h linux-2.4.18+flg/include/linux/mm.h
--- 2418-pure/include/linux/mm.h	Fri Dec 21 23:12:03 2001
+++ linux-2.4.18+flg/include/linux/mm.h	Tue May  7 12:46:53 2002
@@ -285,6 +285,9 @@
 #define PG_arch_1		13
 #define PG_reserved		14
 #define PG_launder		15	/* written out by VM pressure.. */
+#define PG_inuse		16	/* set for all pages(including higher
+					   order partial-pages) allocated 
+					   by the kernel. */
 
 /* Make it prettier to test the above... */
 #define UnlockPage(page)	unlock_page(page)
@@ -301,6 +304,15 @@
 #define SetPageChecked(page)	set_bit(PG_checked, &(page)->flags)
 #define PageLaunder(page)	test_bit(PG_launder, &(page)->flags)
 #define SetPageLaunder(page)	set_bit(PG_launder, &(page)->flags)
+
+/* 
+ * Using the non-atomic version __set_bit as per Andi Kleen's suggestion.
+ * Currently __clear_bit is not available on all architectures in 2.4.
+ */
+#define PageInuse(page)		test_bit(PG_inuse, &(page)->flags)
+#define SetPageInuse(page)	__set_bit(PG_inuse, &(page)->flags)
+/* Replace this by __clear_bit in 2.5 */
+#define ClearPageInuse(page)	clear_bit(PG_inuse, &(page)->flags)
 
 extern void FASTCALL(set_page_dirty(struct page *));
 
diff -urN -X dontdiff 2418-pure/mm/page_alloc.c linux-2.4.18+flg/mm/page_alloc.c
--- 2418-pure/mm/page_alloc.c	Tue Feb 26 01:08:14 2002
+++ linux-2.4.18+flg/mm/page_alloc.c	Thu May  2 15:16:05 2002
@@ -69,6 +69,14 @@
 	free_area_t *area;
 	struct page *base;
 	zone_t *zone;
+	unsigned int i;
+
+	i = 1UL << order;
+	page += i;
+	do {
+		page--;
+		ClearPageInuse(page);
+	} while (--i);
 
 	/* Yes, think what happens when other parts of the kernel take 
 	 * a reference to a page in order to pin it for io. -ben
@@ -181,7 +189,7 @@
 static struct page * rmqueue(zone_t *zone, unsigned int order)
 {
 	free_area_t * area = zone->free_area + order;
-	unsigned int curr_order = order;
+	unsigned int i, curr_order = order;
 	struct list_head *head, *curr;
 	unsigned long flags;
 	struct page *page;
@@ -206,6 +214,13 @@
 			page = expand(zone, page, index, order, curr_order, area);
 			spin_unlock_irqrestore(&zone->lock, flags);
 
+			i = 1UL << order;
+			page += i;
+			do {
+				page--;
+				SetPageInuse(page);
+			} while (--i);
+
 			set_page_count(page, 1);
 			if (BAD_RANGE(zone,page))
 				BUG();
@@ -236,6 +251,7 @@
 {
 	struct page * page = NULL;
 	int __freed = 0;
+	unsigned int i;
 
 	if (!(gfp_mask & __GFP_WAIT))
 		goto out;
@@ -264,9 +280,15 @@
 				if (tmp->index == order && memclass(tmp->zone, classzone)) {
 					list_del(entry);
 					current->nr_local_pages--;
-					set_page_count(tmp, 1);
-					page = tmp;
 
+					i = 1UL << order;
+					page = tmp + i;
+					do {
+						page--;
+						SetPageInuse(page);
+					} while (--i);
+
+					set_page_count(page, 1);
 					if (page->buffers)
 						BUG();
 					if (page->mapping)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH]Fix: Init page count for all pages during higher order allocs
  2002-05-03 13:46             ` Hugh Dickins
@ 2002-05-07 10:11               ` Suparna Bhattacharya
  0 siblings, 0 replies; 12+ messages in thread
From: Suparna Bhattacharya @ 2002-05-07 10:11 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Eric W. Biederman, linux-kernel, marcelo, linux-mm

On Fri, May 03, 2002 at 02:46:34PM +0100, Hugh Dickins wrote:
> On Fri, 3 May 2002, Suparna Bhattacharya wrote:
> > 
> > For example we have an option that tries to exclude non-kernel
> > pages from the dump based on a simple heuristic of checking the
> > PG_lru flag (actually exclude LRU pages and unreferenced pages). 
> 
> I hadn't thought of using PG_lru (last thought about it before
> anonymous pages were put on the LRU in 2.4.14): good idea,

Owe that one to Andrew Morton mostly for suggesting a PG_lru 
check in the context of a way to identify Anon pages.

Regards
Suparna
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-05-07 10:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20020429202446.A2326@in.ibm.com>
2002-04-29 17:40 ` [PATCH]Fix: Init page count for all pages during higher order allocs Eric W. Biederman
2002-04-30  5:31   ` Suparna Bhattacharya
2002-04-30 14:05     ` Eric W. Biederman
2002-04-30 15:08       ` Suparna Bhattacharya
2002-04-30 19:47     ` Andrew Morton
2002-05-02  8:54       ` Suparna Bhattacharya
2002-05-02 13:08         ` Hugh Dickins
2002-05-02 21:13           ` Daniel Phillips
2002-05-03 12:24           ` Suparna Bhattacharya
2002-05-03 13:46             ` Hugh Dickins
2002-05-07 10:11               ` Suparna Bhattacharya
2002-05-07  7:34         ` Bharata B Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox