* BUG in x86_64 hugepage support
@ 2006-03-15 1:20 Nishanth Aravamudan
2006-03-15 4:03 ` Chen, Kenneth W
0 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 1:20 UTC (permalink / raw)
To: agl, david, ak; +Cc: linux-mm, discuss
Hello,
While doing some testing of libhugetlbfs, I ran into the following BUGs
on my x86_64 box when checking mprotect with hugepages (running make
func in libhugetlbfs is all it took here) (distro is Ubuntu Dapper, runs
32-bit userspace).
[ 633.480724] ----------- [cut here ] --------- [please bite here ] ---------
[ 633.480733] Kernel BUG at ...rc6-mm1/arch/x86_64/mm/../../i386/mm/hugetlbpage.c:31
[ 633.480736] invalid opcode: 0000 [1] PREEMPT SMP
[ 633.480740] last sysfs file: /block/sdb/sdb1/stat
[ 633.480743] CPU 1
[ 633.480745] Modules linked in:
[ 633.480750] Pid: 7872, comm: mprotect Not tainted 2.6.16-rc6-mm1 #1
[ 633.480753] RIP: 0010:[<ffffffff80188e46>] <ffffffff80188e46>{huge_pte_alloc+230}
[ 633.480764] RSP: 0000:ffff81006675dd18 EFLAGS: 00010283
[ 633.480767] RAX: 0000000000000001 RBX: ffff8100648ce008 RCX: 0000000000000000
[ 633.480772] RDX: ffff81007c7aa560 RSI: 0000000055800000 RDI: ffff8100648e4480
[ 633.480776] RBP: ffff81006675dd38 R08: 00000000556c19fc R09: 00000000ffffd178
[ 633.480780] R10: ffff81006675c000 R11: 0000000000000246 R12: 0000000000000560
[ 633.480784] R13: ffff8100648e4480 R14: ffff8100648e4480 R15: 0000000000000000
[ 633.480788] FS: 00002ac688028e10(0000) GS:ffff81007f1b4740(0063) knlGS:00000000556c68e0
[ 633.480792] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 633.480795] CR2: 0000000055800000 CR3: 000000006593d000 CR4: 00000000000006e0
[ 633.480800] Process mprotect (pid: 7872, threadinfo ffff81006675c000, task ffff8100402fc750)
[ 633.480803] Stack: 0000000055800000 0000000000000000 0000000000000000 0000000055800000
[ 633.480809] ffff81006675dd88 ffffffff801c5383 ffff81006675de68 ffff8100734e3c60
[ 633.480817] ffff81006675dda8 ffff8100734e3c60
[ 633.480821] Call Trace: <ffffffff801c5383>{hugetlb_fault+51} <ffffffff801085ea>{__handle_mm_fault+90}
[ 633.480834] <ffffffff8010a727>{ia32_setup_sigcontext+327} <ffffffff80173fdb>{notifier_call_chain+43}
[ 633.480845] <ffffffff80173b19>{do_page_fault+1241} <ffffffff80171275>{_spin_unlock_irq+21}
[ 633.480854] <ffffffff80134cc5>{sys_rt_sigprocmask+229} <ffffffff8016c741>{error_exit+0}
[ 633.480868]
[ 633.480869] Code: 0f 0b 68 08 a6 4e 80 c2 1f 00 48 8b 5d e8 4c 8b 65 f0 48 89
[ 633.480881] RIP <ffffffff80188e46>{huge_pte_alloc+230} RSP <ffff81006675dd18>
[ 633.480888] ----------- [cut here ] --------- [please bite here ] ---------
[ 633.492589] Kernel BUG at ...rc6-mm1/arch/x86_64/mm/../../i386/mm/hugetlbpage.c:31
[ 633.492593] invalid opcode: 0000 [2] PREEMPT SMP
[ 633.492597] last sysfs file: /block/sdb/sdb1/stat
[ 633.492600] CPU 1
[ 633.492602] Modules linked in:
[ 633.492606] Pid: 7873, comm: mprotect Not tainted 2.6.16-rc6-mm1 #1
[ 633.492610] RIP: 0010:[<ffffffff80188e46>] <ffffffff80188e46>{huge_pte_alloc+230}
[ 633.492620] RSP: 0000:ffff81006675fd18 EFLAGS: 00010283
[ 633.492624] RAX: 0000000000000001 RBX: ffff810066745550 RCX: 0000000000000000
[ 633.492628] RDX: ffff81006596bab0 RSI: 00002aaaaac00000 RDI: ffff8100648e4480
[ 633.492632] RBP: ffff81006675fd38 R08: 0000000000000000 R09: 0000000000000000
[ 633.492635] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000ab0
[ 633.492639] R13: ffff8100648e4480 R14: ffff8100648e4480 R15: 0000000000000000
[ 633.492644] FS: 00002b745df2ce10(0000) GS:ffff81007f1b4740(0000) knlGS:00000000556ac6c0
[ 633.492647] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 633.492651] CR2: 00002aaaaac00000 CR3: 000000006670a000 CR4: 00000000000006e0
[ 633.492655] Process mprotect (pid: 7873, threadinfo ffff81006675e000, task ffff8100402fd530)
[ 633.492658] Stack: 00002aaaaac00000 0000000000000000 0000000000000003 00002aaaaac00000
[ 633.492665] ffff81006675fd88 ffffffff801c5383 ffff81006675fe68 ffff8100734e3af0
[ 633.492673] ffff81006675fd78 ffff8100734e3af0
[ 633.492677] Call Trace: <ffffffff801c5383>{hugetlb_fault+51} <ffffffff801085ea>{__handle_mm_fault+90}
[ 633.492690] <ffffffff80173fdb>{notifier_call_chain+43} <ffffffff80173b19>{do_page_fault+1241}
[ 633.492701] <ffffffff80123652>{sys_mprotect+1522} <ffffffff8016c741>{error_exit+0}
[ 633.492715]
[ 633.492716] Code: 0f 0b 68 08 a6 4e 80 c2 1f 00 48 8b 5d e8 4c 8b 65 f0 48 89
[ 633.492728] RIP <ffffffff80188e46>{huge_pte_alloc+230} RSP <ffff81006675fd18>
The line in question (arch/i386/mm/hugetlbpage.c:31) in 2.6.16-rc6-mm1
is:
BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
We are trying to verify that if the pte was succesfully allocated that
it is filled in and that it is a hugetlb pte.
After some discussion with Adam Litke, I added some debugging to see
what pte_val we were getting:
huge_pte_alloc failed: pte == 800000003d800027
which indicates our flags = 0x27 or 00100111.
On x86_64, pte_huge is defined to be:
#define __LARGE_PTE (_PAGE_PSE|_PAGE_PRESENT) // __LARGE_PTE = 10000001
static inline int pte_huge(pte_t pte) { return (pte_val(pte) & __LARGE_PTE) == __LARGE_PTE; }
Clearly, pte_huge() is going to return 0, as
pte_val(pte) & __LARGE_PTE == 0x1 != __LARGE_PTE
in this case.
I believe the issue occurs due to the following code path:
sys_mprotect() --> hugetlb_change_protection() --> pte_modify()
On x86_64, that last call is:
#defined _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY) // upper bits all 1, lower 11 bits = 00001100000
unsigned long __supported_pte_mask __read_mostly = ~0UL;
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
pte_val(pte) &= _PAGE_CHG_MASK;
pte_val(pte) |= pgprot_val(newprot);
pte_val(pte) &= __supported_pte_mask;
return pte;
}
So, the first &= results in the lower 11 bits of pte_val(pte) being all
0s. By my analysis, this is the problem, pte_modify() on x86_64 is
clearing the bits we check to see if a pte is a hugetlb one. To see if
this might be an accurate analysis, I modified _PAGE_CHG_MASK as
follows:
-#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE | _PAGE_PRESENT)
That is, forcing the bits we care about to get set in pte_modify(). This
removed the BUG()s I was seeing in our testing.
This obviously isn't a solution, though, but I don't know what is :) I
am hoping somebody with a bit more VM (or x86-64) experience can figure
out the right fix. I would appreciate any input, or corrections to my
analysis.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: BUG in x86_64 hugepage support
2006-03-15 1:20 BUG in x86_64 hugepage support Nishanth Aravamudan
@ 2006-03-15 4:03 ` Chen, Kenneth W
2006-03-15 4:35 ` Nishanth Aravamudan
0 siblings, 1 reply; 10+ messages in thread
From: Chen, Kenneth W @ 2006-03-15 4:03 UTC (permalink / raw)
To: 'Nishanth Aravamudan', agl, david, ak; +Cc: linux-mm, discuss
Nishanth Aravamudan wrote on Tuesday, March 14, 2006 5:20 PM
> While doing some testing of libhugetlbfs, I ran into the following BUGs
> on my x86_64 box when checking mprotect with hugepages (running make
> func in libhugetlbfs is all it took here) (distro is Ubuntu Dapper, runs
> 32-bit userspace).
>
> So, the first &= results in the lower 11 bits of pte_val(pte) being all
> 0s. By my analysis, this is the problem, pte_modify() on x86_64 is
> clearing the bits we check to see if a pte is a hugetlb one. To see if
> this might be an accurate analysis, I modified _PAGE_CHG_MASK as
> follows:
>
> -#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
> +#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE | _PAGE_PRESENT)
>
> That is, forcing the bits we care about to get set in pte_modify(). This
> removed the BUG()s I was seeing in our testing.
I think your analysis looked correct. Though I don't think you want to
add _PAGE_PRESENT to _PAGE_CHG_MASK. The reason being newprot suppose
to have correct present bit (based on what the new protection is) and
it will be or'ed to form new pte.
I think _PAGE_PSE bit should be in _PAGE_CHG_MASK.
- Ken
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG in x86_64 hugepage support
2006-03-15 4:03 ` Chen, Kenneth W
@ 2006-03-15 4:35 ` Nishanth Aravamudan
2006-03-15 7:08 ` Chen, Kenneth W
0 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 4:35 UTC (permalink / raw)
To: Chen, Kenneth W; +Cc: agl, david, ak, linux-mm, discuss
On 14.03.2006 [20:03:20 -0800], Chen, Kenneth W wrote:
> Nishanth Aravamudan wrote on Tuesday, March 14, 2006 5:20 PM
> > While doing some testing of libhugetlbfs, I ran into the following
> > BUGs on my x86_64 box when checking mprotect with hugepages (running
> > make func in libhugetlbfs is all it took here) (distro is Ubuntu
> > Dapper, runs 32-bit userspace).
> >
> > So, the first &= results in the lower 11 bits of pte_val(pte) being
> > all 0s. By my analysis, this is the problem, pte_modify() on x86_64
> > is clearing the bits we check to see if a pte is a hugetlb one. To
> > see if this might be an accurate analysis, I modified _PAGE_CHG_MASK
> > as follows:
> >
> > -#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
> > +#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE | _PAGE_PRESENT)
> >
> > That is, forcing the bits we care about to get set in pte_modify().
> > This removed the BUG()s I was seeing in our testing.
>
> I think your analysis looked correct. Though I don't think you want
> to add _PAGE_PRESENT to _PAGE_CHG_MASK. The reason being newprot
> suppose to have correct present bit (based on what the new protection
> is) and it will be or'ed to form new pte.
Thanks for the response!
> I think _PAGE_PSE bit should be in _PAGE_CHG_MASK.
I can try a kernel with just the _PAGE_PSE bit added to _PAGE_CHG_MASK
and see if that helps. I think it will still BUG() in my case, however,
as __LARGE_PTE is 10000001, so only setting the 8th bit will be
insufficient. So maybe there is also something wrong with what is being
generated by pgprot_val(newprot)? I will try adding some more debugging
output to see what is happening in pte_modify.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: BUG in x86_64 hugepage support
2006-03-15 4:35 ` Nishanth Aravamudan
@ 2006-03-15 7:08 ` Chen, Kenneth W
2006-03-15 7:30 ` Nishanth Aravamudan
0 siblings, 1 reply; 10+ messages in thread
From: Chen, Kenneth W @ 2006-03-15 7:08 UTC (permalink / raw)
To: 'Nishanth Aravamudan'; +Cc: agl, david, ak, linux-mm, discuss
Nishanth Aravamudan wrote on Tuesday, March 14, 2006 8:36 PM
> > I think _PAGE_PSE bit should be in _PAGE_CHG_MASK.
>
> I can try a kernel with just the _PAGE_PSE bit added to _PAGE_CHG_MASK
> and see if that helps. I think it will still BUG() in my case, however,
> as __LARGE_PTE is 10000001, so only setting the 8th bit will be
> insufficient. So maybe there is also something wrong with what is being
> generated by pgprot_val(newprot)? I will try adding some more debugging
> output to see what is happening in pte_modify.
Forget about preserving _PAGE_PSE BIT, It won't work. I just realized that
_PAGE_PROTNONE clashes with _PAGE_PSE (both use bit 7). Still thinking ...
- Ken
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG in x86_64 hugepage support
2006-03-15 7:08 ` Chen, Kenneth W
@ 2006-03-15 7:30 ` Nishanth Aravamudan
2006-03-15 8:50 ` [discuss] " Jan Beulich
0 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 7:30 UTC (permalink / raw)
To: Chen, Kenneth W; +Cc: agl, david, ak, linux-mm, discuss
On 14.03.2006 [23:08:57 -0800], Chen, Kenneth W wrote:
> Nishanth Aravamudan wrote on Tuesday, March 14, 2006 8:36 PM
> > > I think _PAGE_PSE bit should be in _PAGE_CHG_MASK.
> >
> > I can try a kernel with just the _PAGE_PSE bit added to _PAGE_CHG_MASK
> > and see if that helps. I think it will still BUG() in my case, however,
> > as __LARGE_PTE is 10000001, so only setting the 8th bit will be
> > insufficient. So maybe there is also something wrong with what is being
> > generated by pgprot_val(newprot)? I will try adding some more debugging
> > output to see what is happening in pte_modify.
>
> Forget about preserving _PAGE_PSE BIT, It won't work. I just realized that
> _PAGE_PROTNONE clashes with _PAGE_PSE (both use bit 7). Still thinking ...
And here I was rather proud of the following patch :)
I figured, if we know that we're dealing with a hugepage pte, then we
probably can set _PAGE_PSE in newprot, no? This fixes the BUGs here, but
may have side affects I'm unaware of (and I was unable to test on an
i386 kernel, since all of a sudden -mm decided to stop building with
Ubuntu's biarch gcc -- another topic altogether).
Here's the patch I used.
Description: We currently fail mprotect testing in libhugetlbfs because
the PSE bit in the hugepage PTEs gets unset. In the case where we know
that a filled hugetlb PTE is going to have its protection changed, make
sure it stays a hugetlb PTE by setting the PSE bit in the new protection
flags.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
diff -urpN 2.6.16-rc6-mm1/mm/hugetlb.c 2.6.16-rc6-mm1-dev/mm/hugetlb.c
--- 2.6.16-rc6-mm1/mm/hugetlb.c 2006-03-14 22:49:44.000000000 -0800
+++ 2.6.16-rc6-mm1-dev/mm/hugetlb.c 2006-03-14 22:51:31.000000000 -0800
@@ -740,6 +740,7 @@ void hugetlb_change_protection(struct vm
continue;
if (!pte_none(*ptep)) {
pte = huge_ptep_get_and_clear(mm, address, ptep);
+ pgprot_val(newprot) |= _PAGE_PSE;
pte = pte_modify(pte, newprot);
set_huge_pte_at(mm, address, ptep, pte);
lazy_mmu_prot_update(pte);
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [discuss] Re: BUG in x86_64 hugepage support
2006-03-15 7:30 ` Nishanth Aravamudan
@ 2006-03-15 8:50 ` Jan Beulich
2006-03-15 10:03 ` Chen, Kenneth W
2006-03-15 15:13 ` Nishanth Aravamudan
0 siblings, 2 replies; 10+ messages in thread
From: Jan Beulich @ 2006-03-15 8:50 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: david, Kenneth W Chen, linux-mm, Andreas Kleen, agl, discuss
>diff -urpN 2.6.16-rc6-mm1/mm/hugetlb.c 2.6.16-rc6-mm1-dev/mm/hugetlb.c
>--- 2.6.16-rc6-mm1/mm/hugetlb.c 2006-03-14 22:49:44.000000000 -0800
>+++ 2.6.16-rc6-mm1-dev/mm/hugetlb.c 2006-03-14 22:51:31.000000000 -0800
>@@ -740,6 +740,7 @@ void hugetlb_change_protection(struct vm
> continue;
> if (!pte_none(*ptep)) {
> pte = huge_ptep_get_and_clear(mm, address, ptep);
>+ pgprot_val(newprot) |= _PAGE_PSE;
> pte = pte_modify(pte, newprot);
> set_huge_pte_at(mm, address, ptep, pte);
> lazy_mmu_prot_update(pte);
This is architecture independent code - you shouldn't be using _PAGE_PSE here. Probably
x86-64 (and then likely also i386) should define their own set_huge_pte_at(), and use that#
to or in the needed flag?
Jan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [discuss] Re: BUG in x86_64 hugepage support
2006-03-15 8:50 ` [discuss] " Jan Beulich
@ 2006-03-15 10:03 ` Chen, Kenneth W
2006-03-15 15:14 ` Nishanth Aravamudan
2006-03-15 15:56 ` Nishanth Aravamudan
2006-03-15 15:13 ` Nishanth Aravamudan
1 sibling, 2 replies; 10+ messages in thread
From: Chen, Kenneth W @ 2006-03-15 10:03 UTC (permalink / raw)
To: 'Jan Beulich', Nishanth Aravamudan
Cc: david, linux-mm, Andreas Kleen, agl, discuss
Nishanth Aravamudan wrote on Tuesday, March 14, 2006 11:31 PM
> Description: We currently fail mprotect testing in libhugetlbfs because
> the PSE bit in the hugepage PTEs gets unset. In the case where we know
> that a filled hugetlb PTE is going to have its protection changed, make
> sure it stays a hugetlb PTE by setting the PSE bit in the new protection
> flags.
Jan Beulich wrote on Wednesday, March 15, 2006 12:50 AM
> This is architecture independent code - you shouldn't be using
> _PAGE_PSE here. Probably x86-64 (and then likely also i386) should
> define their own set_huge_pte_at(), and use that# to or in the
> needed flag?
Yeah, that will do. i386, x86_64 should also clean up pte_mkhuge() macro.
The unconditional setting of _PAGE_PRESENT bit was a leftover stuff from
the good'old day of pre-faulting hugetlb page.
[patch] fix i386/x86-64 _PAGE_PSE bit when changing page protection
On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.
The identify of hugetlb pte is lost when changing page protection
via mprotect. A page fault occurs later will trigger a bug check in
huge_pte_alloc().
The fix is to always make new pte a hugetlb pte and also to clean up
legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
diff -Nurp linux-2.6.15/include/asm-i386/pgtable.h linux-2.6.15-mm/include/asm-i386/pgtable.h
--- linux-2.6.15/include/asm-i386/pgtable.h 2006-01-02 19:21:10.000000000 -0800
+++ linux-2.6.15-mm/include/asm-i386/pgtable.h 2006-03-15 00:35:03.000000000 -0800
@@ -219,13 +219,12 @@ extern unsigned long pg0[];
* The following only work if pte_present() is true.
* Undefined behaviour if not..
*/
-#define __LARGE_PTE (_PAGE_PSE | _PAGE_PRESENT)
static inline int pte_user(pte_t pte) { return (pte).pte_low & _PAGE_USER; }
static inline int pte_read(pte_t pte) { return (pte).pte_low & _PAGE_USER; }
static inline int pte_dirty(pte_t pte) { return (pte).pte_low & _PAGE_DIRTY; }
static inline int pte_young(pte_t pte) { return (pte).pte_low & _PAGE_ACCESSED; }
static inline int pte_write(pte_t pte) { return (pte).pte_low & _PAGE_RW; }
-static inline int pte_huge(pte_t pte) { return ((pte).pte_low & __LARGE_PTE) == __LARGE_PTE; }
+static inline int pte_huge(pte_t pte) { return (pte).pte_low & _PAGE_PSE; }
/*
* The following only works if pte_present() is not true.
@@ -242,7 +241,7 @@ static inline pte_t pte_mkexec(pte_t pte
static inline pte_t pte_mkdirty(pte_t pte) { (pte).pte_low |= _PAGE_DIRTY; return pte; }
static inline pte_t pte_mkyoung(pte_t pte) { (pte).pte_low |= _PAGE_ACCESSED; return pte; }
static inline pte_t pte_mkwrite(pte_t pte) { (pte).pte_low |= _PAGE_RW; return pte; }
-static inline pte_t pte_mkhuge(pte_t pte) { (pte).pte_low |= __LARGE_PTE; return pte; }
+static inline pte_t pte_mkhuge(pte_t pte) { (pte).pte_low |= _PAGE_PSE; return pte; }
#ifdef CONFIG_X86_PAE
# include <asm/pgtable-3level.h>
diff -Nurp linux-2.6.15/include/asm-ia64/pgtable.h linux-2.6.15-mm/include/asm-ia64/pgtable.h
--- linux-2.6.15/include/asm-ia64/pgtable.h 2006-03-15 00:46:18.000000000 -0800
+++ linux-2.6.15-mm/include/asm-ia64/pgtable.h 2006-03-14 21:53:00.000000000 -0800
@@ -314,7 +314,7 @@ ia64_phys_addr_valid (unsigned long addr
#define pte_mkyoung(pte) (__pte(pte_val(pte) | _PAGE_A))
#define pte_mkclean(pte) (__pte(pte_val(pte) & ~_PAGE_D))
#define pte_mkdirty(pte) (__pte(pte_val(pte) | _PAGE_D))
-#define pte_mkhuge(pte) (__pte(pte_val(pte) | _PAGE_P))
+#define pte_mkhuge(pte) (__pte(pte_val(pte)))
/*
* Macro to a page protection value as "uncacheable". Note that "protection" is really a
diff -Nurp linux-2.6.15/include/asm-x86_64/pgtable.h linux-2.6.15-mm/include/asm-x86_64/pgtable.h
--- linux-2.6.15/include/asm-x86_64/pgtable.h 2006-03-15 00:30:16.000000000 -0800
+++ linux-2.6.15-mm/include/asm-x86_64/pgtable.h 2006-03-15 00:35:55.000000000 -0800
@@ -273,7 +272,7 @@ static inline int pte_dirty(pte_t pte)
static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; }
static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW; }
static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; }
-static inline int pte_huge(pte_t pte) { return (pte_val(pte) & __LARGE_PTE) == __LARGE_PTE; }
+static inline int pte_huge(pte_t pte) { return pte_val(pte) & _PAGE_PSE; }
static inline pte_t pte_rdprotect(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_USER)); return pte; }
static inline pte_t pte_exprotect(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_USER)); return pte; }
@@ -285,7 +284,7 @@ static inline pte_t pte_mkexec(pte_t pte
static inline pte_t pte_mkdirty(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_DIRTY)); return pte; }
static inline pte_t pte_mkyoung(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_ACCESSED)); return pte; }
static inline pte_t pte_mkwrite(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_RW)); return pte; }
-static inline pte_t pte_mkhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | __LARGE_PTE)); return pte; }
+static inline pte_t pte_mkhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_PSE)); return pte; }
struct vm_area_struct;
diff -Nurp linux-2.6.15/mm/hugetlb.c linux-2.6.15-mm/mm/hugetlb.c
--- linux-2.6.15/mm/hugetlb.c 2006-03-15 00:30:20.000000000 -0800
+++ linux-2.6.15-mm/mm/hugetlb.c 2006-03-14 23:49:55.000000000 -0800
@@ -731,7 +731,7 @@ void hugetlb_change_protection(struct vm
continue;
if (!pte_none(*ptep)) {
pte = huge_ptep_get_and_clear(mm, address, ptep);
- pte = pte_modify(pte, newprot);
+ pte = pte_mkhuge(pte_modify(pte, newprot));
set_huge_pte_at(mm, address, ptep, pte);
lazy_mmu_prot_update(pte);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [discuss] Re: BUG in x86_64 hugepage support
2006-03-15 8:50 ` [discuss] " Jan Beulich
2006-03-15 10:03 ` Chen, Kenneth W
@ 2006-03-15 15:13 ` Nishanth Aravamudan
1 sibling, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 15:13 UTC (permalink / raw)
To: Jan Beulich; +Cc: david, Kenneth W Chen, linux-mm, Andreas Kleen, agl, discuss
On 15.03.2006 [09:50:17 +0100], Jan Beulich wrote:
> >diff -urpN 2.6.16-rc6-mm1/mm/hugetlb.c 2.6.16-rc6-mm1-dev/mm/hugetlb.c
> >--- 2.6.16-rc6-mm1/mm/hugetlb.c 2006-03-14 22:49:44.000000000 -0800
> >+++ 2.6.16-rc6-mm1-dev/mm/hugetlb.c 2006-03-14 22:51:31.000000000 -0800
> >@@ -740,6 +740,7 @@ void hugetlb_change_protection(struct vm
> > continue;
> > if (!pte_none(*ptep)) {
> > pte = huge_ptep_get_and_clear(mm, address, ptep);
> >+ pgprot_val(newprot) |= _PAGE_PSE;
> > pte = pte_modify(pte, newprot);
> > set_huge_pte_at(mm, address, ptep, pte);
> > lazy_mmu_prot_update(pte);
>
> This is architecture independent code - you shouldn't be using
> _PAGE_PSE here. Probably x86-64 (and then likely also i386) should
> define their own set_huge_pte_at(), and use that# to or in the needed
> flag?
Good point, makes sense to me.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [discuss] Re: BUG in x86_64 hugepage support
2006-03-15 10:03 ` Chen, Kenneth W
@ 2006-03-15 15:14 ` Nishanth Aravamudan
2006-03-15 15:56 ` Nishanth Aravamudan
1 sibling, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 15:14 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Jan Beulich', david, linux-mm, Andreas Kleen, agl, discuss
On 15.03.2006 [02:03:00 -0800], Chen, Kenneth W wrote:
> Nishanth Aravamudan wrote on Tuesday, March 14, 2006 11:31 PM
> > Description: We currently fail mprotect testing in libhugetlbfs because
> > the PSE bit in the hugepage PTEs gets unset. In the case where we know
> > that a filled hugetlb PTE is going to have its protection changed, make
> > sure it stays a hugetlb PTE by setting the PSE bit in the new protection
> > flags.
>
> Jan Beulich wrote on Wednesday, March 15, 2006 12:50 AM
> > This is architecture independent code - you shouldn't be using
> > _PAGE_PSE here. Probably x86-64 (and then likely also i386) should
> > define their own set_huge_pte_at(), and use that# to or in the
> > needed flag?
>
>
> Yeah, that will do. i386, x86_64 should also clean up pte_mkhuge()
> macro. The unconditional setting of _PAGE_PRESENT bit was a leftover
> stuff from the good'old day of pre-faulting hugetlb page.
Patch looks correct, I'll reboot with it applied and make sure it fixes
the BUGs (and doesn't affect any of the other tests).
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [discuss] Re: BUG in x86_64 hugepage support
2006-03-15 10:03 ` Chen, Kenneth W
2006-03-15 15:14 ` Nishanth Aravamudan
@ 2006-03-15 15:56 ` Nishanth Aravamudan
1 sibling, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2006-03-15 15:56 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Jan Beulich', david, linux-mm, Andreas Kleen, agl, discuss
On 15.03.2006 [02:03:00 -0800], Chen, Kenneth W wrote:
> Nishanth Aravamudan wrote on Tuesday, March 14, 2006 11:31 PM
> > Description: We currently fail mprotect testing in libhugetlbfs
> > because the PSE bit in the hugepage PTEs gets unset. In the case
> > where we know that a filled hugetlb PTE is going to have its
> > protection changed, make sure it stays a hugetlb PTE by setting the
> > PSE bit in the new protection flags.
>
> Jan Beulich wrote on Wednesday, March 15, 2006 12:50 AM
> > This is architecture independent code - you shouldn't be using
> > _PAGE_PSE here. Probably x86-64 (and then likely also i386) should
> > define their own set_huge_pte_at(), and use that# to or in the
> > needed flag?
>
>
> Yeah, that will do. i386, x86_64 should also clean up pte_mkhuge()
> macro. The unconditional setting of _PAGE_PRESENT bit was a leftover
> stuff from the good'old day of pre-faulting hugetlb page.
>
>
>
> [patch] fix i386/x86-64 _PAGE_PSE bit when changing page protection
>
> On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.
> The identify of hugetlb pte is lost when changing page protection via
> mprotect. A page fault occurs later will trigger a bug check in
> huge_pte_alloc().
>
> The fix is to always make new pte a hugetlb pte and also to clean up
> legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
>
>
> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
I can confirm this fixes the BUGs I was seeing on x86_64 testing of
libhugetlbfs' mprotect support.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-03-15 15:57 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-15 1:20 BUG in x86_64 hugepage support Nishanth Aravamudan
2006-03-15 4:03 ` Chen, Kenneth W
2006-03-15 4:35 ` Nishanth Aravamudan
2006-03-15 7:08 ` Chen, Kenneth W
2006-03-15 7:30 ` Nishanth Aravamudan
2006-03-15 8:50 ` [discuss] " Jan Beulich
2006-03-15 10:03 ` Chen, Kenneth W
2006-03-15 15:14 ` Nishanth Aravamudan
2006-03-15 15:56 ` Nishanth Aravamudan
2006-03-15 15:13 ` Nishanth Aravamudan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox