linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* pud_bad vs pud_bad
@ 2009-02-05 18:23 Jeremy Fitzhardinge
  2009-02-05 18:43 ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-05 18:23 UTC (permalink / raw)
  To: William Lee Irwin III, Ingo Molnar
  Cc: Linux Kernel Mailing List, Linux Memory Management List

I'm looking at unifying the 32 and 64-bit versions of pud_bad.

32-bits defines it as:

static inline int pud_bad(pud_t pud)
{
	return (pud_val(pud) & ~(PTE_PFN_MASK | _KERNPG_TABLE | _PAGE_USER)) != 0;
}

and 64 as:

static inline int pud_bad(pud_t pud)
{
	return (pud_val(pud) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
}


I'm inclined to go with the 64-bit version, but I'm wondering if there's 
something subtle I'm missing here.

Thoughts?

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 18:23 pud_bad vs pud_bad Jeremy Fitzhardinge
@ 2009-02-05 18:43 ` Ingo Molnar
  2009-02-05 18:54   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2009-02-05 18:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> I'm looking at unifying the 32 and 64-bit versions of pud_bad.
>
> 32-bits defines it as:
>
> static inline int pud_bad(pud_t pud)
> {
> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _KERNPG_TABLE | _PAGE_USER)) != 0;
> }
>
> and 64 as:
>
> static inline int pud_bad(pud_t pud)
> {
> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
> }
>
>
> I'm inclined to go with the 64-bit version, but I'm wondering if there's 
> something subtle I'm missing here.

Why go with the 64-bit version? The 32-bit check looks more compact and 
should result in smaller code.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 18:43 ` Ingo Molnar
@ 2009-02-05 18:54   ` Jeremy Fitzhardinge
  2009-02-05 19:10     ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-05 18:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List

Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> I'm looking at unifying the 32 and 64-bit versions of pud_bad.
>>
>> 32-bits defines it as:
>>
>> static inline int pud_bad(pud_t pud)
>> {
>> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _KERNPG_TABLE | _PAGE_USER)) != 0;
>> }
>>
>> and 64 as:
>>
>> static inline int pud_bad(pud_t pud)
>> {
>> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
>> }
>>
>>
>> I'm inclined to go with the 64-bit version, but I'm wondering if there's 
>> something subtle I'm missing here.
>>     
>
> Why go with the 64-bit version? The 32-bit check looks more compact and 
> should result in smaller code.
>   

Well, its stricter.  But I don't really understand what condition its 
actually testing for.

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 18:54   ` Jeremy Fitzhardinge
@ 2009-02-05 19:10     ` Ingo Molnar
  2009-02-05 19:26       ` Jeremy Fitzhardinge
  2009-02-05 19:38       ` Hugh Dickins
  0 siblings, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2009-02-05 19:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>
>>   
>>> I'm looking at unifying the 32 and 64-bit versions of pud_bad.
>>>
>>> 32-bits defines it as:
>>>
>>> static inline int pud_bad(pud_t pud)
>>> {
>>> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _KERNPG_TABLE | _PAGE_USER)) != 0;
>>> }
>>>
>>> and 64 as:
>>>
>>> static inline int pud_bad(pud_t pud)
>>> {
>>> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
>>> }
>>>
>>>
>>> I'm inclined to go with the 64-bit version, but I'm wondering if 
>>> there's something subtle I'm missing here.
>>>     
>>
>> Why go with the 64-bit version? The 32-bit check looks more compact and 
>> should result in smaller code.
>>   
>
> Well, its stricter.  But I don't really understand what condition its  
> actually testing for.

Well it tests: "beyond the bits covered by PTE_PFN|_PAGE_USER, the rest 
must only be _KERNPG_TABLE".

The _KERNPG_TABLE bits are disjunct from PTE_PFN|_PAGE_USER bits, so this 
makes sense.

But the 32-bit check does the exact same thing but via a single binary 
operation: it checks whether any bits outside of those bits are zero - just 
via a simpler test that compiles to more compact code.

So i'd go with the 32-bit version. (unless there are some sign-extension 
complications i'm missing - but i think we got rid of those already.)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 19:10     ` Ingo Molnar
@ 2009-02-05 19:26       ` Jeremy Fitzhardinge
  2009-02-05 19:31         ` Ingo Molnar
  2009-02-05 19:38       ` Hugh Dickins
  1 sibling, 1 reply; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-05 19:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List

Ingo Molnar wrote:
> But the 32-bit check does the exact same thing but via a single binary 
> operation: it checks whether any bits outside of those bits are zero - just 
> via a simpler test that compiles to more compact code.
>
> So i'd go with the 32-bit version. (unless there are some sign-extension 
> complications i'm missing - but i think we got rid of those already.)

OK, fair enough.  I wouldn't be surprised if gcc does that transform 
anyway, but we may as well be consistent about it.

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 19:26       ` Jeremy Fitzhardinge
@ 2009-02-05 19:31         ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2009-02-05 19:31 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> But the 32-bit check does the exact same thing but via a single binary  
>> operation: it checks whether any bits outside of those bits are zero - 
>> just via a simpler test that compiles to more compact code.
>>
>> So i'd go with the 32-bit version. (unless there are some 
>> sign-extension complications i'm missing - but i think we got rid of 
>> those already.)
>
> OK, fair enough.  I wouldn't be surprised if gcc does that transform 
> anyway, but we may as well be consistent about it.

i checked and it doesnt - at least 4.3.2 inserts an extra AND instruction. 
So the 32-bit version is really better. (beyond being more readable)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 19:10     ` Ingo Molnar
  2009-02-05 19:26       ` Jeremy Fitzhardinge
@ 2009-02-05 19:38       ` Hugh Dickins
  2009-02-05 19:49         ` Ingo Molnar
  2009-02-05 20:42         ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 22+ messages in thread
From: Hugh Dickins @ 2009-02-05 19:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, William Lee Irwin III,
	Linux Kernel Mailing List, Linux Memory Management List

On Thu, 5 Feb 2009, Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > Ingo Molnar wrote:
> >> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >>   
> >>> I'm looking at unifying the 32 and 64-bit versions of pud_bad.
> >>>
> >>> 32-bits defines it as:
> >>>
> >>> static inline int pud_bad(pud_t pud)
> >>> {
> >>> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _KERNPG_TABLE | _PAGE_USER)) != 0;
> >>> }
> >>>
> >>> and 64 as:
> >>>
> >>> static inline int pud_bad(pud_t pud)
> >>> {
> >>> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
> >>> }
> >>>
> >>>
> >>> I'm inclined to go with the 64-bit version, but I'm wondering if 
> >>> there's something subtle I'm missing here.
> >>>     
> >>
> >> Why go with the 64-bit version? The 32-bit check looks more compact and 
> >> should result in smaller code.
> >>   
> >
> > Well, its stricter.  But I don't really understand what condition its  
> > actually testing for.
> 
> Well it tests: "beyond the bits covered by PTE_PFN|_PAGE_USER, the rest 
> must only be _KERNPG_TABLE".
> 
> The _KERNPG_TABLE bits are disjunct from PTE_PFN|_PAGE_USER bits, so this 
> makes sense.
> 
> But the 32-bit check does the exact same thing but via a single binary 
> operation: it checks whether any bits outside of those bits are zero -
> just via a simpler test that compiles to more compact code.

Simpler and more compact, but not as strict: in particular, a value of
0 or 1 is identified as bad by that 64-bit test, but not by the 32-bit.

I most definitely prefer the stricter 64-bit version.  I thought we'd
gone around this all before, but maybe that was for pmd_bad(): there
too one variant was weaker than the other and we went for the stronger.

However... I forget how the folding works out.  The pgd in the 32-bit
PAE case used to have just the pfn and the present bit set in that
little array of four entries: if pud_bad() ends up getting applied
to that, I guess it will blow up.

If so, my preferred answer would actually be to make those 4 entries
look more like real ptes; but you may think I'm being a bit silly.

Not quite sure why wli is Cc'ed but I've fixed his address:
it's good to see you back, Bill.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 19:38       ` Hugh Dickins
@ 2009-02-05 19:49         ` Ingo Molnar
  2009-02-05 19:58           ` wli
  2009-02-05 20:12           ` Hugh Dickins
  2009-02-05 20:42         ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2009-02-05 19:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jeremy Fitzhardinge, William Lee Irwin III,
	Linux Kernel Mailing List, Linux Memory Management List


* Hugh Dickins <hugh@veritas.com> wrote:

> On Thu, 5 Feb 2009, Ingo Molnar wrote:
> > * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > > Ingo Molnar wrote:
> > >> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > >>   
> > >>> I'm looking at unifying the 32 and 64-bit versions of pud_bad.
> > >>>
> > >>> 32-bits defines it as:
> > >>>
> > >>> static inline int pud_bad(pud_t pud)
> > >>> {
> > >>> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _KERNPG_TABLE | _PAGE_USER)) != 0;
> > >>> }
> > >>>
> > >>> and 64 as:
> > >>>
> > >>> static inline int pud_bad(pud_t pud)
> > >>> {
> > >>> 	return (pud_val(pud) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
> > >>> }
> > >>>
> > >>>
> > >>> I'm inclined to go with the 64-bit version, but I'm wondering if 
> > >>> there's something subtle I'm missing here.
> > >>>     
> > >>
> > >> Why go with the 64-bit version? The 32-bit check looks more compact and 
> > >> should result in smaller code.
> > >>   
> > >
> > > Well, its stricter.  But I don't really understand what condition its  
> > > actually testing for.
> > 
> > Well it tests: "beyond the bits covered by PTE_PFN|_PAGE_USER, the rest 
> > must only be _KERNPG_TABLE".
> > 
> > The _KERNPG_TABLE bits are disjunct from PTE_PFN|_PAGE_USER bits, so this 
> > makes sense.
> > 
> > But the 32-bit check does the exact same thing but via a single binary 
> > operation: it checks whether any bits outside of those bits are zero -
> > just via a simpler test that compiles to more compact code.
> 
> Simpler and more compact, but not as strict: in particular, a value of
> 0 or 1 is identified as bad by that 64-bit test, but not by the 32-bit.

yes, indeed you are right - the 64-bit test does not allow the KERNPG_TABLE 
bits to go zero.

Those are the present, rw, accessed and dirty bits. Do they really matter 
that much? If a toplevel entry goes !present or readonly, we notice that 
_fast_, without any checks. If it goes !access or !dirty - does that matter?

These checks are done all the time, and even a single instruction can count. 
The bits that are checked are enough to notice random memory corruption.

( albeit these days with large RAM sizes pagetable corruption is quite rare 
  and only happens if it's specifically corrupting the pagetable - and then 
  it's not just a single bit. Most of the memory corruption goes into the 
  pagecache. )

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 19:49         ` Ingo Molnar
@ 2009-02-05 19:58           ` wli
  2009-02-05 20:14             ` Hugh Dickins
  2009-02-05 20:12           ` Hugh Dickins
  1 sibling, 1 reply; 22+ messages in thread
From: wli @ 2009-02-05 19:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, Jeremy Fitzhardinge, Linux Kernel Mailing List,
	Linux Memory Management List

* Hugh Dickins <hugh@veritas.com> wrote:
>> Simpler and more compact, but not as strict: in particular, a value of
>> 0 or 1 is identified as bad by that 64-bit test, but not by the 32-bit.

On Thu, Feb 05, 2009 at 08:49:32PM +0100, Ingo Molnar wrote:
> yes, indeed you are right - the 64-bit test does not allow the KERNPG_TABLE 
> bits to go zero.
> Those are the present, rw, accessed and dirty bits. Do they really matter 
> that much? If a toplevel entry goes !present or readonly, we notice that 
> _fast_, without any checks. If it goes !access or !dirty - does that matter?
> These checks are done all the time, and even a single instruction can count. 
> The bits that are checked are enough to notice random memory corruption.
> ( albeit these days with large RAM sizes pagetable corruption is quite rare 
>   and only happens if it's specifically corrupting the pagetable - and then 
>   it's not just a single bit. Most of the memory corruption goes into the 
>   pagecache. )

The RW bit needs to be allowed to become read-only for hugetlb COW.
Changing it over to the 32-bit method is a bugfix by that token.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 19:49         ` Ingo Molnar
  2009-02-05 19:58           ` wli
@ 2009-02-05 20:12           ` Hugh Dickins
  1 sibling, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2009-02-05 20:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, William Lee Irwin III,
	Linux Kernel Mailing List, Linux Memory Management List

On Thu, 5 Feb 2009, Ingo Molnar wrote:
> * Hugh Dickins <hugh@veritas.com> wrote:
> > 
> > Simpler and more compact, but not as strict: in particular, a value of
> > 0 or 1 is identified as bad by that 64-bit test, but not by the 32-bit.
> 
> yes, indeed you are right - the 64-bit test does not allow the KERNPG_TABLE 
> bits to go zero.
> 
> Those are the present, rw, accessed and dirty bits. Do they really matter 
> that much? If a toplevel entry goes !present or readonly, we notice that 
> _fast_, without any checks. If it goes !access or !dirty - does that matter?

I've not given it a great deal of thought, why this or that bit.
These p??_bad checks originate from 2.4 or earlier, and by mistake
got weakened somewhere along the way, and last time it was discussed
we agreed to strenghthen them (and IIRC Jeremy himself did so).

> 
> These checks are done all the time, and even a single instruction can count. 
> The bits that are checked are enough to notice random memory corruption.

Well, I am surprised that you would be arguing for weakening such
a very simple check.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 19:58           ` wli
@ 2009-02-05 20:14             ` Hugh Dickins
  2009-02-05 20:56               ` wli
  0 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2009-02-05 20:14 UTC (permalink / raw)
  To: wli
  Cc: Ingo Molnar, Jeremy Fitzhardinge, Linux Kernel Mailing List,
	Linux Memory Management List

On Thu, 5 Feb 2009, wli@movementarian.org wrote:
> 
> The RW bit needs to be allowed to become read-only for hugetlb COW.
> Changing it over to the 32-bit method is a bugfix by that token.

If there's a bugfix to be made there, of course I'm in favour:
but how come we've never seen such a bug?  hugetlb COW has been
around for a year or two by now, hasn't it?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 19:38       ` Hugh Dickins
  2009-02-05 19:49         ` Ingo Molnar
@ 2009-02-05 20:42         ` Jeremy Fitzhardinge
  2009-02-05 20:51           ` Hugh Dickins
  2009-02-05 20:57           ` Ingo Molnar
  1 sibling, 2 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-05 20:42 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Ingo Molnar, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List

Hugh Dickins wrote:
> However... I forget how the folding works out.  The pgd in the 32-bit
> PAE case used to have just the pfn and the present bit set in that
> little array of four entries: if pud_bad() ends up getting applied
> to that, I guess it will blow up.
>   

Ah, that's a good point.

> If so, my preferred answer would actually be to make those 4 entries
> look more like real ptes; but you may think I'm being a bit silly.
>   

Hardware doesn't allow it.  It will explode (well, trap) if you set 
anything other than P in the top level.

By the by, what are the chances we'll be able to deprecate non-PAE 32-bit?

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 20:42         ` Jeremy Fitzhardinge
@ 2009-02-05 20:51           ` Hugh Dickins
  2009-02-05 21:05             ` Jeremy Fitzhardinge
  2009-02-05 20:57           ` Ingo Molnar
  1 sibling, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2009-02-05 20:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List

On Thu, 5 Feb 2009, Jeremy Fitzhardinge wrote:
> Hugh Dickins wrote:
> > However... I forget how the folding works out.  The pgd in the 32-bit
> > PAE case used to have just the pfn and the present bit set in that
> > little array of four entries: if pud_bad() ends up getting applied
> > to that, I guess it will blow up.
> 
> Ah, that's a good point.
> 
> > If so, my preferred answer would actually be to make those 4 entries
> > look more like real ptes; but you may think I'm being a bit silly.
> 
> Hardware doesn't allow it.  It will explode (well, trap) if you set anything
> other than P in the top level.

Oh, interesting, I'd never realized that.

> By the by, what are the chances we'll be able to deprecate non-PAE 32-bit?

I sincerely hope 0!  I shed no tears at losing support for NUMAQ,
but why should we be forced to double all the 32-bit ptes?  You want
us all to be using NX?  Or you just want to cut your test/edit matrix -
that I can well understand!

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 20:14             ` Hugh Dickins
@ 2009-02-05 20:56               ` wli
  2009-02-05 21:09                 ` Hugh Dickins
  0 siblings, 1 reply; 22+ messages in thread
From: wli @ 2009-02-05 20:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Ingo Molnar, Jeremy Fitzhardinge, Linux Kernel Mailing List,
	Linux Memory Management List

On Thu, 5 Feb 2009, wli@movementarian.org wrote:
>> The RW bit needs to be allowed to become read-only for hugetlb COW.
>> Changing it over to the 32-bit method is a bugfix by that token.

On Thu, Feb 05, 2009 at 08:14:42PM +0000, Hugh Dickins wrote:
> If there's a bugfix to be made there, of course I'm in favour:
> but how come we've never seen such a bug?  hugetlb COW has been
> around for a year or two by now, hasn't it?

We can tell from the code that a write-protected pte mapping of a
1GB hugetlb page would be flagged as bad. It must not be called on
ptes mapping hugetlb pages if they're not getting flagged.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 20:42         ` Jeremy Fitzhardinge
  2009-02-05 20:51           ` Hugh Dickins
@ 2009-02-05 20:57           ` Ingo Molnar
  1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2009-02-05 20:57 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Hugh Dickins, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Hugh Dickins wrote:
>> However... I forget how the folding works out.  The pgd in the 32-bit
>> PAE case used to have just the pfn and the present bit set in that
>> little array of four entries: if pud_bad() ends up getting applied
>> to that, I guess it will blow up.
>>   
>
> Ah, that's a good point.
>
>> If so, my preferred answer would actually be to make those 4 entries
>> look more like real ptes; but you may think I'm being a bit silly.
>
> Hardware doesn't allow it.  It will explode (well, trap) if you set  
> anything other than P in the top level.

Yeah. I was the first Linux hacker in history to put a x86 CPU into PAE mode 
under Linux 10+ years ago, and i can attest to the 'explodes way too easily' 
aspect quite emphatically ;-) Took me 3-4 days to bootstrap it.

> By the by, what are the chances we'll be able to deprecate non-PAE 32-bit?

For the next 10 years: pretty much zero.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 20:51           ` Hugh Dickins
@ 2009-02-05 21:05             ` Jeremy Fitzhardinge
  2009-02-05 21:50               ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-05 21:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Ingo Molnar, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List

Hugh Dickins wrote:
>> Hardware doesn't allow it.  It will explode (well, trap) if you set anything
>> other than P in the top level.
>>     
>
> Oh, interesting, I'd never realized that.
>   

There are some later extensions to reuse some of the bits for things 
like tlb reload policy (I think; I'd have to check to be sure), so 
they're fairly non-pte-like.

>> By the by, what are the chances we'll be able to deprecate non-PAE 32-bit?
>>     
>
> I sincerely hope 0!  I shed no tears at losing support for NUMAQ,
> but why should we be forced to double all the 32-bit ptes?  You want
> us all to be using NX?  Or you just want to cut your test/edit matrix -
> that I can well understand!
>   

Yes, that's the gist of it.  We could simplify things by having only one 
pte format and only have to parameterise with 3/4 level pagetables.  
We'd lose support for non-PAE cpus, including the first Pentium M (which 
is probably still in fairly wide use, unfortunately).

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 20:56               ` wli
@ 2009-02-05 21:09                 ` Hugh Dickins
  0 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2009-02-05 21:09 UTC (permalink / raw)
  To: wli
  Cc: Ingo Molnar, Jeremy Fitzhardinge, Linux Kernel Mailing List,
	Linux Memory Management List

On Thu, 5 Feb 2009, wli@movementarian.org wrote:
> On Thu, 5 Feb 2009, wli@movementarian.org wrote:
> >> The RW bit needs to be allowed to become read-only for hugetlb COW.
> >> Changing it over to the 32-bit method is a bugfix by that token.
> 
> On Thu, Feb 05, 2009 at 08:14:42PM +0000, Hugh Dickins wrote:
> > If there's a bugfix to be made there, of course I'm in favour:
> > but how come we've never seen such a bug?  hugetlb COW has been
> > around for a year or two by now, hasn't it?
> 
> We can tell from the code that a write-protected pte mapping of a
> 1GB hugetlb page would be flagged as bad. It must not be called on
> ptes mapping hugetlb pages if they're not getting flagged.

Ah, I see what you mean now.  Yes, the hugetlb case goes its own way
and doesn't normally hit those p??_bad() macro/inlines; but we got
caught out in follow_page() a year ago, a bad looked huge or a
huge looked bad, but I forget the details at this instant.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 21:05             ` Jeremy Fitzhardinge
@ 2009-02-05 21:50               ` Ingo Molnar
  2009-02-05 22:07                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2009-02-05 21:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Hugh Dickins, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

>> I sincerely hope 0!  I shed no tears at losing support for NUMAQ, but why 
>> should we be forced to double all the 32-bit ptes?  You want us all to be 
>> using NX?  Or you just want to cut your test/edit matrix - that I can 
>> well understand!
>
> Yes, that's the gist of it.  We could simplify things by having only one 
> pte format and only have to parameterise with 3/4 level pagetables.  We'd 
> lose support for non-PAE cpus, including the first Pentium M (which is 
> probably still in fairly wide use, unfortunately).

We'd also lose a fair bit of performance (not to mention the pagetable 
footprint doubling that Hugh already mentioned) on 32-bit PAE capable 
systems that dont actually have RAM above 4G physical.

Bad idea really ...

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 21:50               ` Ingo Molnar
@ 2009-02-05 22:07                 ` Jeremy Fitzhardinge
  2009-02-05 23:42                   ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-05 22:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List

Ingo Molnar wrote:
> We'd also lose a fair bit of performance (not to mention the pagetable 
> footprint doubling that Hugh already mentioned) on 32-bit PAE capable 
> systems that dont actually have RAM above 4G physical.
>   

Why's that?  Do you mean directly from using PAE, or as a side-effect of 
highmem?

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 22:07                 ` Jeremy Fitzhardinge
@ 2009-02-05 23:42                   ` Ingo Molnar
  2009-02-06  0:08                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2009-02-05 23:42 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Hugh Dickins, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> We'd also lose a fair bit of performance (not to mention the pagetable  
>> footprint doubling that Hugh already mentioned) on 32-bit PAE capable  
>> systems that dont actually have RAM above 4G physical.
>
> Why's that?  Do you mean directly from using PAE, or as a side-effect of 
> highmem?

just the act of using PAE was measured to cause multi-percent slowdown in 
fork() and exec() latencies, etc. The pagetables are twice as large so is 
that really surprising?

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-05 23:42                   ` Ingo Molnar
@ 2009-02-06  0:08                     ` Jeremy Fitzhardinge
  2009-02-06  0:50                       ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-06  0:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List

Ingo Molnar wrote:
> just the act of using PAE was measured to cause multi-percent slowdown in 
> fork() and exec() latencies, etc. The pagetables are twice as large so is 
> that really surprising?
>   

Is there a similar slowdown running the CPU in 32 vs 64 bit mode?  Or 
does having more/wider registers mitigate it?

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pud_bad vs pud_bad
  2009-02-06  0:08                     ` Jeremy Fitzhardinge
@ 2009-02-06  0:50                       ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2009-02-06  0:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Hugh Dickins, William Lee Irwin III, Linux Kernel Mailing List,
	Linux Memory Management List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> just the act of using PAE was measured to cause multi-percent slowdown 
>> in fork() and exec() latencies, etc. The pagetables are twice as large 
>> so is that really surprising?
>>   
>
> Is there a similar slowdown running the CPU in 32 vs 64 bit mode?  Or does 
> having more/wider registers mitigate it?

Yes, of course there's a slowdown on 64-bit kernels in fork() performance, 
mainly related to pte size.

Here's some numbers done with perfstat. The "fork" binary forks 256 times, 
waits for the child tasks and then exits. It is a 32-bit binary, statically 
linked - i.e. very similar layout and function on both 32-bit and 64-bit 
kernels.

The results (tabulated a bit, average result of 20 runs):

 $ perfstat -e -3,-4,-5 ./fork

  Performance counter stats for './fork':

        32-bit  32-bit-PAE     64-bit
     ---------  ----------  ---------
     27.367537   30.660090  31.542003  task clock ticks     (msecs)

          5785        5810       5751  pagefaults           (events)
           389         388        388  context switches     (events)
             4           4          4  CPU migrations       (events)
     ---------  ----------  ---------
                    +12.0%     +15.2%  overhead

So PAE is 12.0% slower (the overhead of double the pte size and three page 
table levels), and 64-bit is 15.2% slower (the extra overhead of having four 
page table levels added to the overhead of double the pte size).

Larger ptes do not come for free and the 64-bit instructions do not mitigate 
the cachemiss overhead and memory bandwidth cost.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-02-06  0:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-05 18:23 pud_bad vs pud_bad Jeremy Fitzhardinge
2009-02-05 18:43 ` Ingo Molnar
2009-02-05 18:54   ` Jeremy Fitzhardinge
2009-02-05 19:10     ` Ingo Molnar
2009-02-05 19:26       ` Jeremy Fitzhardinge
2009-02-05 19:31         ` Ingo Molnar
2009-02-05 19:38       ` Hugh Dickins
2009-02-05 19:49         ` Ingo Molnar
2009-02-05 19:58           ` wli
2009-02-05 20:14             ` Hugh Dickins
2009-02-05 20:56               ` wli
2009-02-05 21:09                 ` Hugh Dickins
2009-02-05 20:12           ` Hugh Dickins
2009-02-05 20:42         ` Jeremy Fitzhardinge
2009-02-05 20:51           ` Hugh Dickins
2009-02-05 21:05             ` Jeremy Fitzhardinge
2009-02-05 21:50               ` Ingo Molnar
2009-02-05 22:07                 ` Jeremy Fitzhardinge
2009-02-05 23:42                   ` Ingo Molnar
2009-02-06  0:08                     ` Jeremy Fitzhardinge
2009-02-06  0:50                       ` Ingo Molnar
2009-02-05 20:57           ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox