linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* BUG in x86_64 hugepage support
@ 2006-03-15  1:20 Nishanth Aravamudan
  2006-03-15  4:03 ` Chen, Kenneth W
  0 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15  1:20 UTC (permalink / raw)
  To: agl, david, ak; +Cc: linux-mm, discuss

Hello,

While doing some testing of libhugetlbfs, I ran into the following BUGs
on my x86_64 box when checking mprotect with hugepages (running make
func in libhugetlbfs is all it took here) (distro is Ubuntu Dapper, runs
32-bit userspace).

[  633.480724] ----------- [cut here ] --------- [please bite here ] ---------
[  633.480733] Kernel BUG at ...rc6-mm1/arch/x86_64/mm/../../i386/mm/hugetlbpage.c:31
[  633.480736] invalid opcode: 0000 [1] PREEMPT SMP 
[  633.480740] last sysfs file: /block/sdb/sdb1/stat
[  633.480743] CPU 1 
[  633.480745] Modules linked in:
[  633.480750] Pid: 7872, comm: mprotect Not tainted 2.6.16-rc6-mm1 #1
[  633.480753] RIP: 0010:[<ffffffff80188e46>] <ffffffff80188e46>{huge_pte_alloc+230}
[  633.480764] RSP: 0000:ffff81006675dd18  EFLAGS: 00010283
[  633.480767] RAX: 0000000000000001 RBX: ffff8100648ce008 RCX: 0000000000000000
[  633.480772] RDX: ffff81007c7aa560 RSI: 0000000055800000 RDI: ffff8100648e4480
[  633.480776] RBP: ffff81006675dd38 R08: 00000000556c19fc R09: 00000000ffffd178
[  633.480780] R10: ffff81006675c000 R11: 0000000000000246 R12: 0000000000000560
[  633.480784] R13: ffff8100648e4480 R14: ffff8100648e4480 R15: 0000000000000000
[  633.480788] FS:  00002ac688028e10(0000) GS:ffff81007f1b4740(0063) knlGS:00000000556c68e0
[  633.480792] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  633.480795] CR2: 0000000055800000 CR3: 000000006593d000 CR4: 00000000000006e0
[  633.480800] Process mprotect (pid: 7872, threadinfo ffff81006675c000, task ffff8100402fc750)
[  633.480803] Stack: 0000000055800000 0000000000000000 0000000000000000 0000000055800000 
[  633.480809]        ffff81006675dd88 ffffffff801c5383 ffff81006675de68 ffff8100734e3c60 
[  633.480817]        ffff81006675dda8 ffff8100734e3c60 
[  633.480821] Call Trace: <ffffffff801c5383>{hugetlb_fault+51} <ffffffff801085ea>{__handle_mm_fault+90}
[  633.480834]        <ffffffff8010a727>{ia32_setup_sigcontext+327} <ffffffff80173fdb>{notifier_call_chain+43}
[  633.480845]        <ffffffff80173b19>{do_page_fault+1241} <ffffffff80171275>{_spin_unlock_irq+21}
[  633.480854]        <ffffffff80134cc5>{sys_rt_sigprocmask+229} <ffffffff8016c741>{error_exit+0}
[  633.480868] 
[  633.480869] Code: 0f 0b 68 08 a6 4e 80 c2 1f 00 48 8b 5d e8 4c 8b 65 f0 48 89 
[  633.480881] RIP <ffffffff80188e46>{huge_pte_alloc+230} RSP <ffff81006675dd18>
[  633.480888]  ----------- [cut here ] --------- [please bite here ] ---------
[  633.492589] Kernel BUG at ...rc6-mm1/arch/x86_64/mm/../../i386/mm/hugetlbpage.c:31
[  633.492593] invalid opcode: 0000 [2] PREEMPT SMP 
[  633.492597] last sysfs file: /block/sdb/sdb1/stat
[  633.492600] CPU 1 
[  633.492602] Modules linked in:
[  633.492606] Pid: 7873, comm: mprotect Not tainted 2.6.16-rc6-mm1 #1
[  633.492610] RIP: 0010:[<ffffffff80188e46>] <ffffffff80188e46>{huge_pte_alloc+230}
[  633.492620] RSP: 0000:ffff81006675fd18  EFLAGS: 00010283
[  633.492624] RAX: 0000000000000001 RBX: ffff810066745550 RCX: 0000000000000000
[  633.492628] RDX: ffff81006596bab0 RSI: 00002aaaaac00000 RDI: ffff8100648e4480
[  633.492632] RBP: ffff81006675fd38 R08: 0000000000000000 R09: 0000000000000000
[  633.492635] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000ab0
[  633.492639] R13: ffff8100648e4480 R14: ffff8100648e4480 R15: 0000000000000000
[  633.492644] FS:  00002b745df2ce10(0000) GS:ffff81007f1b4740(0000) knlGS:00000000556ac6c0
[  633.492647] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  633.492651] CR2: 00002aaaaac00000 CR3: 000000006670a000 CR4: 00000000000006e0
[  633.492655] Process mprotect (pid: 7873, threadinfo ffff81006675e000, task ffff8100402fd530)
[  633.492658] Stack: 00002aaaaac00000 0000000000000000 0000000000000003 00002aaaaac00000 
[  633.492665]        ffff81006675fd88 ffffffff801c5383 ffff81006675fe68 ffff8100734e3af0 
[  633.492673]        ffff81006675fd78 ffff8100734e3af0 
[  633.492677] Call Trace: <ffffffff801c5383>{hugetlb_fault+51} <ffffffff801085ea>{__handle_mm_fault+90}
[  633.492690]        <ffffffff80173fdb>{notifier_call_chain+43} <ffffffff80173b19>{do_page_fault+1241}
[  633.492701]        <ffffffff80123652>{sys_mprotect+1522} <ffffffff8016c741>{error_exit+0}
[  633.492715] 
[  633.492716] Code: 0f 0b 68 08 a6 4e 80 c2 1f 00 48 8b 5d e8 4c 8b 65 f0 48 89 
[  633.492728] RIP <ffffffff80188e46>{huge_pte_alloc+230} RSP <ffff81006675fd18>

The line in question (arch/i386/mm/hugetlbpage.c:31) in 2.6.16-rc6-mm1
is:

	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));

We are trying to verify that if the pte was succesfully allocated that
it is filled in and that it is a hugetlb pte.

After some discussion with Adam Litke, I added some debugging to see
what pte_val we were getting:

	huge_pte_alloc failed: pte == 800000003d800027

which indicates our flags = 0x27 or 00100111.

On x86_64, pte_huge is defined to be:

	#define __LARGE_PTE (_PAGE_PSE|_PAGE_PRESENT) // __LARGE_PTE = 10000001
	static inline int pte_huge(pte_t pte)           { return (pte_val(pte) & __LARGE_PTE) == __LARGE_PTE; }

Clearly, pte_huge() is going to return 0, as

	pte_val(pte) & __LARGE_PTE == 0x1 != __LARGE_PTE

in this case.

I believe the issue occurs due to the following code path:

	sys_mprotect() --> hugetlb_change_protection() --> pte_modify()

On x86_64, that last call is:

	#defined _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY) // upper bits all 1, lower 11 bits = 00001100000
	unsigned long __supported_pte_mask __read_mostly = ~0UL;
	static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
	{
		pte_val(pte) &= _PAGE_CHG_MASK;
		pte_val(pte) |= pgprot_val(newprot);
		pte_val(pte) &= __supported_pte_mask;
		return pte;
	}

So, the first &= results in the lower 11 bits of pte_val(pte) being all
0s. By my analysis, this is the problem, pte_modify() on x86_64 is
clearing the bits we check to see if a pte is a hugetlb one. To see if
this might be an accurate analysis, I modified _PAGE_CHG_MASK as
follows:

	-#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
	+#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE | _PAGE_PRESENT)

That is, forcing the bits we care about to get set in pte_modify(). This
removed the BUG()s I was seeing in our testing.

This obviously isn't a solution, though, but I don't know what is :) I
am hoping somebody with a bit more VM (or x86-64) experience can figure
out the right fix. I would appreciate any input, or corrections to my
analysis.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: BUG in x86_64 hugepage support
  2006-03-15  1:20 BUG in x86_64 hugepage support Nishanth Aravamudan
@ 2006-03-15  4:03 ` Chen, Kenneth W
  2006-03-15  4:35   ` Nishanth Aravamudan
  0 siblings, 1 reply; 10+ messages in thread
From: Chen, Kenneth W @ 2006-03-15  4:03 UTC (permalink / raw)
  To: 'Nishanth Aravamudan', agl, david, ak; +Cc: linux-mm, discuss

Nishanth Aravamudan wrote on Tuesday, March 14, 2006 5:20 PM
> While doing some testing of libhugetlbfs, I ran into the following BUGs
> on my x86_64 box when checking mprotect with hugepages (running make
> func in libhugetlbfs is all it took here) (distro is Ubuntu Dapper, runs
> 32-bit userspace).
> 
> So, the first &= results in the lower 11 bits of pte_val(pte) being all
> 0s. By my analysis, this is the problem, pte_modify() on x86_64 is
> clearing the bits we check to see if a pte is a hugetlb one. To see if
> this might be an accurate analysis, I modified _PAGE_CHG_MASK as
> follows:
> 
> 	-#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
> 	+#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE | _PAGE_PRESENT)
> 
> That is, forcing the bits we care about to get set in pte_modify(). This
> removed the BUG()s I was seeing in our testing.


I think your analysis looked correct.  Though I don't think you want to
add _PAGE_PRESENT to _PAGE_CHG_MASK.  The reason being newprot suppose
to have correct present bit (based on what the new protection is) and
it will be or'ed to form new pte.

I think _PAGE_PSE bit should be in _PAGE_CHG_MASK.

- Ken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG in x86_64 hugepage support
  2006-03-15  4:03 ` Chen, Kenneth W
@ 2006-03-15  4:35   ` Nishanth Aravamudan
  2006-03-15  7:08     ` Chen, Kenneth W
  0 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15  4:35 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: agl, david, ak, linux-mm, discuss

On 14.03.2006 [20:03:20 -0800], Chen, Kenneth W wrote:
> Nishanth Aravamudan wrote on Tuesday, March 14, 2006 5:20 PM
> > While doing some testing of libhugetlbfs, I ran into the following
> > BUGs on my x86_64 box when checking mprotect with hugepages (running
> > make func in libhugetlbfs is all it took here) (distro is Ubuntu
> > Dapper, runs 32-bit userspace).
> > 
> > So, the first &= results in the lower 11 bits of pte_val(pte) being
> > all 0s. By my analysis, this is the problem, pte_modify() on x86_64
> > is clearing the bits we check to see if a pte is a hugetlb one. To
> > see if this might be an accurate analysis, I modified _PAGE_CHG_MASK
> > as follows:
> > 
> > 	-#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
> > 	+#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE | _PAGE_PRESENT)
> > 
> > That is, forcing the bits we care about to get set in pte_modify().
> > This removed the BUG()s I was seeing in our testing.
> 
> I think your analysis looked correct.  Though I don't think you want
> to add _PAGE_PRESENT to _PAGE_CHG_MASK.  The reason being newprot
> suppose to have correct present bit (based on what the new protection
> is) and it will be or'ed to form new pte.

Thanks for the response!

> I think _PAGE_PSE bit should be in _PAGE_CHG_MASK.

I can try a kernel with just the _PAGE_PSE bit added to _PAGE_CHG_MASK
and see if that helps. I think it will still BUG() in my case, however,
as __LARGE_PTE is 10000001, so only setting the 8th bit will be
insufficient. So maybe there is also something wrong with what is being
generated by pgprot_val(newprot)? I will try adding some more debugging
output to see what is happening in pte_modify.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: BUG in x86_64 hugepage support
  2006-03-15  4:35   ` Nishanth Aravamudan
@ 2006-03-15  7:08     ` Chen, Kenneth W
  2006-03-15  7:30       ` Nishanth Aravamudan
  0 siblings, 1 reply; 10+ messages in thread
From: Chen, Kenneth W @ 2006-03-15  7:08 UTC (permalink / raw)
  To: 'Nishanth Aravamudan'; +Cc: agl, david, ak, linux-mm, discuss

Nishanth Aravamudan wrote on Tuesday, March 14, 2006 8:36 PM
> > I think _PAGE_PSE bit should be in _PAGE_CHG_MASK.
> 
> I can try a kernel with just the _PAGE_PSE bit added to _PAGE_CHG_MASK
> and see if that helps. I think it will still BUG() in my case, however,
> as __LARGE_PTE is 10000001, so only setting the 8th bit will be
> insufficient. So maybe there is also something wrong with what is being
> generated by pgprot_val(newprot)? I will try adding some more debugging
> output to see what is happening in pte_modify.

Forget about preserving _PAGE_PSE BIT, It won't work. I just realized that
_PAGE_PROTNONE clashes with _PAGE_PSE (both use bit 7). Still thinking ...

- Ken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG in x86_64 hugepage support
  2006-03-15  7:08     ` Chen, Kenneth W
@ 2006-03-15  7:30       ` Nishanth Aravamudan
  2006-03-15  8:50         ` [discuss] " Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15  7:30 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: agl, david, ak, linux-mm, discuss

On 14.03.2006 [23:08:57 -0800], Chen, Kenneth W wrote:
> Nishanth Aravamudan wrote on Tuesday, March 14, 2006 8:36 PM
> > > I think _PAGE_PSE bit should be in _PAGE_CHG_MASK.
> > 
> > I can try a kernel with just the _PAGE_PSE bit added to _PAGE_CHG_MASK
> > and see if that helps. I think it will still BUG() in my case, however,
> > as __LARGE_PTE is 10000001, so only setting the 8th bit will be
> > insufficient. So maybe there is also something wrong with what is being
> > generated by pgprot_val(newprot)? I will try adding some more debugging
> > output to see what is happening in pte_modify.
> 
> Forget about preserving _PAGE_PSE BIT, It won't work. I just realized that
> _PAGE_PROTNONE clashes with _PAGE_PSE (both use bit 7). Still thinking ...

And here I was rather proud of the following patch :)

I figured, if we know that we're dealing with a hugepage pte, then we
probably can set _PAGE_PSE in newprot, no? This fixes the BUGs here, but
may have side affects I'm unaware of (and I was unable to test on an
i386 kernel, since all of a sudden -mm decided to stop building with
Ubuntu's biarch gcc -- another topic altogether).

Here's the patch I used.

Description: We currently fail mprotect testing in libhugetlbfs because
the PSE bit in the hugepage PTEs gets unset. In the case where we know
that a filled hugetlb PTE is going to have its protection changed, make
sure it stays a hugetlb PTE by setting the PSE bit in the new protection
flags.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff -urpN 2.6.16-rc6-mm1/mm/hugetlb.c 2.6.16-rc6-mm1-dev/mm/hugetlb.c
--- 2.6.16-rc6-mm1/mm/hugetlb.c	2006-03-14 22:49:44.000000000 -0800
+++ 2.6.16-rc6-mm1-dev/mm/hugetlb.c	2006-03-14 22:51:31.000000000 -0800
@@ -740,6 +740,7 @@ void hugetlb_change_protection(struct vm
 			continue;
 		if (!pte_none(*ptep)) {
 			pte = huge_ptep_get_and_clear(mm, address, ptep);
+			pgprot_val(newprot) |= _PAGE_PSE;
 			pte = pte_modify(pte, newprot);
 			set_huge_pte_at(mm, address, ptep, pte);
 			lazy_mmu_prot_update(pte);

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [discuss] Re: BUG in x86_64 hugepage support
  2006-03-15  7:30       ` Nishanth Aravamudan
@ 2006-03-15  8:50         ` Jan Beulich
  2006-03-15 10:03           ` Chen, Kenneth W
  2006-03-15 15:13           ` Nishanth Aravamudan
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Beulich @ 2006-03-15  8:50 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: david, Kenneth W Chen, linux-mm, Andreas Kleen, agl, discuss

>diff -urpN 2.6.16-rc6-mm1/mm/hugetlb.c 2.6.16-rc6-mm1-dev/mm/hugetlb.c
>--- 2.6.16-rc6-mm1/mm/hugetlb.c	2006-03-14 22:49:44.000000000 -0800
>+++ 2.6.16-rc6-mm1-dev/mm/hugetlb.c	2006-03-14 22:51:31.000000000 -0800
>@@ -740,6 +740,7 @@ void hugetlb_change_protection(struct vm
> 			continue;
> 		if (!pte_none(*ptep)) {
> 			pte = huge_ptep_get_and_clear(mm, address, ptep);
>+			pgprot_val(newprot) |= _PAGE_PSE;
> 			pte = pte_modify(pte, newprot);
> 			set_huge_pte_at(mm, address, ptep, pte);
> 			lazy_mmu_prot_update(pte);

This is architecture independent code - you shouldn't be using _PAGE_PSE here. Probably
x86-64 (and then likely also i386) should define their own set_huge_pte_at(), and use that#
to or in the needed flag?

Jan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [discuss] Re: BUG in x86_64 hugepage support
  2006-03-15  8:50         ` [discuss] " Jan Beulich
@ 2006-03-15 10:03           ` Chen, Kenneth W
  2006-03-15 15:14             ` Nishanth Aravamudan
  2006-03-15 15:56             ` Nishanth Aravamudan
  2006-03-15 15:13           ` Nishanth Aravamudan
  1 sibling, 2 replies; 10+ messages in thread
From: Chen, Kenneth W @ 2006-03-15 10:03 UTC (permalink / raw)
  To: 'Jan Beulich', Nishanth Aravamudan
  Cc: david, linux-mm, Andreas Kleen, agl, discuss

Nishanth Aravamudan wrote on Tuesday, March 14, 2006 11:31 PM
> Description: We currently fail mprotect testing in libhugetlbfs because
> the PSE bit in the hugepage PTEs gets unset. In the case where we know
> that a filled hugetlb PTE is going to have its protection changed, make
> sure it stays a hugetlb PTE by setting the PSE bit in the new protection
> flags.

Jan Beulich wrote on Wednesday, March 15, 2006 12:50 AM
> This is architecture independent code - you shouldn't be using
> _PAGE_PSE here. Probably x86-64 (and then likely also i386) should
> define their own set_huge_pte_at(), and use that# to or in the
> needed flag?


Yeah, that will do.  i386, x86_64 should also clean up pte_mkhuge() macro.
The unconditional setting of _PAGE_PRESENT bit was a leftover stuff from
the good'old day of pre-faulting hugetlb page.  



[patch] fix i386/x86-64 _PAGE_PSE bit when changing page protection

On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.
The identify of hugetlb pte is lost when changing page protection
via mprotect. A page fault occurs later will trigger a bug check in
huge_pte_alloc().

The fix is to always make new pte a hugetlb pte and also to clean up
legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.


Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>


diff -Nurp linux-2.6.15/include/asm-i386/pgtable.h linux-2.6.15-mm/include/asm-i386/pgtable.h
--- linux-2.6.15/include/asm-i386/pgtable.h	2006-01-02 19:21:10.000000000 -0800
+++ linux-2.6.15-mm/include/asm-i386/pgtable.h	2006-03-15 00:35:03.000000000 -0800
@@ -219,13 +219,12 @@ extern unsigned long pg0[];
  * The following only work if pte_present() is true.
  * Undefined behaviour if not..
  */
-#define __LARGE_PTE (_PAGE_PSE | _PAGE_PRESENT)
 static inline int pte_user(pte_t pte)		{ return (pte).pte_low & _PAGE_USER; }
 static inline int pte_read(pte_t pte)		{ return (pte).pte_low & _PAGE_USER; }
 static inline int pte_dirty(pte_t pte)		{ return (pte).pte_low & _PAGE_DIRTY; }
 static inline int pte_young(pte_t pte)		{ return (pte).pte_low & _PAGE_ACCESSED; }
 static inline int pte_write(pte_t pte)		{ return (pte).pte_low & _PAGE_RW; }
-static inline int pte_huge(pte_t pte)		{ return ((pte).pte_low & __LARGE_PTE) == __LARGE_PTE; }
+static inline int pte_huge(pte_t pte)		{ return (pte).pte_low & _PAGE_PSE; }
 
 /*
  * The following only works if pte_present() is not true.
@@ -242,7 +241,7 @@ static inline pte_t pte_mkexec(pte_t pte
 static inline pte_t pte_mkdirty(pte_t pte)	{ (pte).pte_low |= _PAGE_DIRTY; return pte; }
 static inline pte_t pte_mkyoung(pte_t pte)	{ (pte).pte_low |= _PAGE_ACCESSED; return pte; }
 static inline pte_t pte_mkwrite(pte_t pte)	{ (pte).pte_low |= _PAGE_RW; return pte; }
-static inline pte_t pte_mkhuge(pte_t pte)	{ (pte).pte_low |= __LARGE_PTE; return pte; }
+static inline pte_t pte_mkhuge(pte_t pte)	{ (pte).pte_low |= _PAGE_PSE; return pte; }
 
 #ifdef CONFIG_X86_PAE
 # include <asm/pgtable-3level.h>
diff -Nurp linux-2.6.15/include/asm-ia64/pgtable.h linux-2.6.15-mm/include/asm-ia64/pgtable.h
--- linux-2.6.15/include/asm-ia64/pgtable.h	2006-03-15 00:46:18.000000000 -0800
+++ linux-2.6.15-mm/include/asm-ia64/pgtable.h	2006-03-14 21:53:00.000000000 -0800
@@ -314,7 +314,7 @@ ia64_phys_addr_valid (unsigned long addr
 #define pte_mkyoung(pte)	(__pte(pte_val(pte) | _PAGE_A))
 #define pte_mkclean(pte)	(__pte(pte_val(pte) & ~_PAGE_D))
 #define pte_mkdirty(pte)	(__pte(pte_val(pte) | _PAGE_D))
-#define pte_mkhuge(pte)		(__pte(pte_val(pte) | _PAGE_P))
+#define pte_mkhuge(pte)		(__pte(pte_val(pte)))
 
 /*
  * Macro to a page protection value as "uncacheable".  Note that "protection" is really a
diff -Nurp linux-2.6.15/include/asm-x86_64/pgtable.h linux-2.6.15-mm/include/asm-x86_64/pgtable.h
--- linux-2.6.15/include/asm-x86_64/pgtable.h	2006-03-15 00:30:16.000000000 -0800
+++ linux-2.6.15-mm/include/asm-x86_64/pgtable.h	2006-03-15 00:35:55.000000000 -0800
@@ -273,7 +272,7 @@ static inline int pte_dirty(pte_t pte)		
 static inline int pte_young(pte_t pte)		{ return pte_val(pte) & _PAGE_ACCESSED; }
 static inline int pte_write(pte_t pte)		{ return pte_val(pte) & _PAGE_RW; }
 static inline int pte_file(pte_t pte)		{ return pte_val(pte) & _PAGE_FILE; }
-static inline int pte_huge(pte_t pte)		{ return (pte_val(pte) & __LARGE_PTE) == __LARGE_PTE; }
+static inline int pte_huge(pte_t pte)		{ return pte_val(pte) & _PAGE_PSE; }
 
 static inline pte_t pte_rdprotect(pte_t pte)	{ set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_USER)); return pte; }
 static inline pte_t pte_exprotect(pte_t pte)	{ set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_USER)); return pte; }
@@ -285,7 +284,7 @@ static inline pte_t pte_mkexec(pte_t pte
 static inline pte_t pte_mkdirty(pte_t pte)	{ set_pte(&pte, __pte(pte_val(pte) | _PAGE_DIRTY)); return pte; }
 static inline pte_t pte_mkyoung(pte_t pte)	{ set_pte(&pte, __pte(pte_val(pte) | _PAGE_ACCESSED)); return pte; }
 static inline pte_t pte_mkwrite(pte_t pte)	{ set_pte(&pte, __pte(pte_val(pte) | _PAGE_RW)); return pte; }
-static inline pte_t pte_mkhuge(pte_t pte)	{ set_pte(&pte, __pte(pte_val(pte) | __LARGE_PTE)); return pte; }
+static inline pte_t pte_mkhuge(pte_t pte)	{ set_pte(&pte, __pte(pte_val(pte) | _PAGE_PSE)); return pte; }
 
 struct vm_area_struct;
 
diff -Nurp linux-2.6.15/mm/hugetlb.c linux-2.6.15-mm/mm/hugetlb.c
--- linux-2.6.15/mm/hugetlb.c	2006-03-15 00:30:20.000000000 -0800
+++ linux-2.6.15-mm/mm/hugetlb.c	2006-03-14 23:49:55.000000000 -0800
@@ -731,7 +731,7 @@ void hugetlb_change_protection(struct vm
 			continue;
 		if (!pte_none(*ptep)) {
 			pte = huge_ptep_get_and_clear(mm, address, ptep);
-			pte = pte_modify(pte, newprot);
+			pte = pte_mkhuge(pte_modify(pte, newprot));
 			set_huge_pte_at(mm, address, ptep, pte);
 			lazy_mmu_prot_update(pte);
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [discuss] Re: BUG in x86_64 hugepage support
  2006-03-15  8:50         ` [discuss] " Jan Beulich
  2006-03-15 10:03           ` Chen, Kenneth W
@ 2006-03-15 15:13           ` Nishanth Aravamudan
  1 sibling, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 15:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: david, Kenneth W Chen, linux-mm, Andreas Kleen, agl, discuss

On 15.03.2006 [09:50:17 +0100], Jan Beulich wrote:
> >diff -urpN 2.6.16-rc6-mm1/mm/hugetlb.c 2.6.16-rc6-mm1-dev/mm/hugetlb.c
> >--- 2.6.16-rc6-mm1/mm/hugetlb.c	2006-03-14 22:49:44.000000000 -0800
> >+++ 2.6.16-rc6-mm1-dev/mm/hugetlb.c	2006-03-14 22:51:31.000000000 -0800
> >@@ -740,6 +740,7 @@ void hugetlb_change_protection(struct vm
> > 			continue;
> > 		if (!pte_none(*ptep)) {
> > 			pte = huge_ptep_get_and_clear(mm, address, ptep);
> >+			pgprot_val(newprot) |= _PAGE_PSE;
> > 			pte = pte_modify(pte, newprot);
> > 			set_huge_pte_at(mm, address, ptep, pte);
> > 			lazy_mmu_prot_update(pte);
> 
> This is architecture independent code - you shouldn't be using
> _PAGE_PSE here. Probably x86-64 (and then likely also i386) should
> define their own set_huge_pte_at(), and use that# to or in the needed
> flag?

Good point, makes sense to me.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [discuss] Re: BUG in x86_64 hugepage support
  2006-03-15 10:03           ` Chen, Kenneth W
@ 2006-03-15 15:14             ` Nishanth Aravamudan
  2006-03-15 15:56             ` Nishanth Aravamudan
  1 sibling, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 15:14 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: 'Jan Beulich', david, linux-mm, Andreas Kleen, agl, discuss

On 15.03.2006 [02:03:00 -0800], Chen, Kenneth W wrote:
> Nishanth Aravamudan wrote on Tuesday, March 14, 2006 11:31 PM
> > Description: We currently fail mprotect testing in libhugetlbfs because
> > the PSE bit in the hugepage PTEs gets unset. In the case where we know
> > that a filled hugetlb PTE is going to have its protection changed, make
> > sure it stays a hugetlb PTE by setting the PSE bit in the new protection
> > flags.
> 
> Jan Beulich wrote on Wednesday, March 15, 2006 12:50 AM
> > This is architecture independent code - you shouldn't be using
> > _PAGE_PSE here. Probably x86-64 (and then likely also i386) should
> > define their own set_huge_pte_at(), and use that# to or in the
> > needed flag?
> 
> 
> Yeah, that will do.  i386, x86_64 should also clean up pte_mkhuge()
> macro.  The unconditional setting of _PAGE_PRESENT bit was a leftover
> stuff from the good'old day of pre-faulting hugetlb page.  

Patch looks correct, I'll reboot with it applied and make sure it fixes
the BUGs (and doesn't affect any of the other tests).

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [discuss] Re: BUG in x86_64 hugepage support
  2006-03-15 10:03           ` Chen, Kenneth W
  2006-03-15 15:14             ` Nishanth Aravamudan
@ 2006-03-15 15:56             ` Nishanth Aravamudan
  1 sibling, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 15:56 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: 'Jan Beulich', david, linux-mm, Andreas Kleen, agl, discuss

On 15.03.2006 [02:03:00 -0800], Chen, Kenneth W wrote:
> Nishanth Aravamudan wrote on Tuesday, March 14, 2006 11:31 PM
> > Description: We currently fail mprotect testing in libhugetlbfs
> > because the PSE bit in the hugepage PTEs gets unset. In the case
> > where we know that a filled hugetlb PTE is going to have its
> > protection changed, make sure it stays a hugetlb PTE by setting the
> > PSE bit in the new protection flags.
> 
> Jan Beulich wrote on Wednesday, March 15, 2006 12:50 AM
> > This is architecture independent code - you shouldn't be using
> > _PAGE_PSE here. Probably x86-64 (and then likely also i386) should
> > define their own set_huge_pte_at(), and use that# to or in the
> > needed flag?
> 
> 
> Yeah, that will do.  i386, x86_64 should also clean up pte_mkhuge()
> macro.  The unconditional setting of _PAGE_PRESENT bit was a leftover
> stuff from the good'old day of pre-faulting hugetlb page.  
> 
> 
> 
> [patch] fix i386/x86-64 _PAGE_PSE bit when changing page protection
> 
> On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.
> The identify of hugetlb pte is lost when changing page protection via
> mprotect. A page fault occurs later will trigger a bug check in
> huge_pte_alloc().
> 
> The fix is to always make new pte a hugetlb pte and also to clean up
> legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
> 
> 
> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>

I can confirm this fixes the BUGs I was seeing on x86_64 testing of
libhugetlbfs' mprotect support.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-03-15 15:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-15  1:20 BUG in x86_64 hugepage support Nishanth Aravamudan
2006-03-15  4:03 ` Chen, Kenneth W
2006-03-15  4:35   ` Nishanth Aravamudan
2006-03-15  7:08     ` Chen, Kenneth W
2006-03-15  7:30       ` Nishanth Aravamudan
2006-03-15  8:50         ` [discuss] " Jan Beulich
2006-03-15 10:03           ` Chen, Kenneth W
2006-03-15 15:14             ` Nishanth Aravamudan
2006-03-15 15:56             ` Nishanth Aravamudan
2006-03-15 15:13           ` Nishanth Aravamudan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox