hugepage patches

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* hugepage patches
@ 2003-01-31 23:15 Andrew Morton
  2003-01-31 23:13 ` David S. Miller
                   ` (13 more replies)
  0 siblings, 14 replies; 48+ messages in thread
From: Andrew Morton @ 2003-01-31 23:15 UTC (permalink / raw)
  To: David S. Miller, Seth, Rohit, David Mosberger, Anton Blanchard,
	William Lee Irwin III
  Cc: linux-mm

Gentlemen, I have a bunch of patches here which fix various bugs in the
interaction between hugepages and other kernel functions.

The main impact on hugetlbpage impementations is:

- hugepages are no longer PG_reserved

- hugepages must be page_cache_got in the follow_page() function

- need to implement either hugepage_vma()/follow_huge_addr() or
  pmd_huge()/follow_huge_pmd(), depending on whether a page's hugeness can be
  determined via pmd inspection.  Implementations of both schemes for ia32
  are here.

The code is not heavily tested or reviewed at this time.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
@ 2003-01-31 23:13 ` David S. Miller
  2003-01-31 23:36   ` Andrew Morton
  2003-01-31 23:16 ` Andrew Morton
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: David S. Miller @ 2003-01-31 23:13 UTC (permalink / raw)
  To: akpm; +Cc: rohit.seth, davidm, anton, wli, linux-mm

  
   - need to implement either hugepage_vma()/follow_huge_addr() or
     pmd_huge()/follow_huge_pmd(), depending on whether a page's hugeness can be
     determined via pmd inspection.  Implementations of both schemes for ia32
     are here.

Remind me why we can't just look at the PTE?  Why can't
we end up doing something like:

	if (!pmd_is_huge(pmd)) {
		ptep = ...;
		if (pte_is_huge(*ptep)) {
		}
	}

Which is what all these systems besides x86 and PPC-BAT are doing.  I
don't see the real requirement for a full VMA lookup in these cases.
The page tables say fully whether we have huge stuff here or not.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:13 ` David S. Miller
@ 2003-01-31 23:36   ` Andrew Morton
  2003-01-31 23:23     ` David S. Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-01-31 23:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: rohit.seth, davidm, anton, wli, linux-mm

"David S. Miller" <davem@redhat.com> wrote:
>
> Remind me why we can't just look at the PTE?

Diktat ;)

Linus Torvalds <torvalds@transmeta.com> wrote:
>
> ...
> Your big-page approach makes the assumption that I refuse to make - namely 
> that the "big page" is somehow attached to the page tables, and to the pmd 
> in particular.
> 
> On many architectures, big pages are totally independent of the smaller 
> pages, and don't necessarily have any of the x86 aligment/size 
> restrictions.
> 
> While on an x86, a big page is always the size of a PMD, on a ppc it can
> be any power-of-two size and alignment from 128kB to 256MB. And fixing
> that to a pmd boundary just doesn't work. They have other restrictions
> instead: they are mapped by the "BAT array", and there are 8 of those (and
> I think Linux/PPC uses a few of them for the kernel itself).
> 
> So a portable big-page approach must _not_ tie the big pages to the page
> tables. I don't like big pages particularly, but if I add big page support
> to the kernel I want to at least do it in such a way that other people
> than just Intel can use it.
> 
> Portability means that 
>  - the architecture must be able to set its large pages totally 
>    independently of the page tables. 
>  - the architecture may have other non-size-related limits on the large
>    page areas, like "only 6 large page areas can be allocated per VM"
> 
> and quite frankly, anything that goes in and mucks with the VM deeply is 
> bound to fail, I think. The patch that Intel made (with some input from 
> me) and which I attached to the previous email does this, and has almost 
> zero impact on the "normal" MM code.
> 
> 			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:36   ` Andrew Morton
@ 2003-01-31 23:23     ` David S. Miller
  2003-01-31 23:45       ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: David S. Miller @ 2003-01-31 23:23 UTC (permalink / raw)
  To: akpm; +Cc: rohit.seth, davidm, anton, wli, linux-mm

   "David S. Miller" <davem@redhat.com> wrote:
   >
   > Remind me why we can't just look at the PTE?
   
   Diktat ;)
   
I understand, but give _ME_ a way to use the pagetables if
that is how things are implemented.  Don't force me to do
a VMA lookup if I need not.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:23     ` David S. Miller
@ 2003-01-31 23:45       ` Andrew Morton
  2003-01-31 23:48         ` David S. Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-01-31 23:45 UTC (permalink / raw)
  To: David S. Miller; +Cc: rohit.seth, davidm, anton, wli, linux-mm

"David S. Miller" <davem@redhat.com> wrote:
>
>    From: Andrew Morton <akpm@digeo.com>
>    Date: Fri, 31 Jan 2003 15:36:26 -0800
> 
>    "David S. Miller" <davem@redhat.com> wrote:
>    >
>    > Remind me why we can't just look at the PTE?
>    
>    Diktat ;)
>    
> I understand, but give _ME_ a way to use the pagetables if
> that is how things are implemented.  Don't force me to do
> a VMA lookup if I need not.

I did?  pmd_huge()/follow_huge_pmd().  Patch 2/4.

It might not be 100% appropriate for sparc64 pagetable representation - I
just guessed...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:45       ` Andrew Morton
@ 2003-01-31 23:48         ` David S. Miller
  0 siblings, 0 replies; 48+ messages in thread
From: David S. Miller @ 2003-01-31 23:48 UTC (permalink / raw)
  To: akpm; +Cc: rohit.seth, davidm, anton, wli, linux-mm

   
   I did?  pmd_huge()/follow_huge_pmd().  Patch 2/4.
   
   It might not be 100% appropriate for sparc64 pagetable representation - I
   just guessed...
   
Oh I see, yes that appears it will work.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
  2003-01-31 23:13 ` David S. Miller
@ 2003-01-31 23:16 ` Andrew Morton
  2003-01-31 23:17 ` Andrew Morton
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-01-31 23:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davem, rohit.seth, davidm, anton, wli, linux-mm

1/4

Using a futex in a large page causes a kernel lockup in __pin_page() -
because __pin_page's page revalidation uses follow_page(), and follow_page()
doesn't work for hugepages.

The patch fixes up follow_page() to return the appropriate 4k page for
hugepages.

This incurs a vma lookup for each follow_page(), which is considerable
overhead in some situations.  We only _need_ to do this if the architecture
cannot determin a page's hugeness from the contents of the PMD.

So this patch is a "reference" implementation for, say, PPC BAT-based
hugepages.




 arch/i386/mm/hugetlbpage.c |   29 +++++++++++++++++++++++++++++
 include/linux/hugetlb.h    |   18 ++++++++++++++++--
 include/linux/sched.h      |    4 +++-
 mm/memory.c                |    5 +++++
 mm/mmap.c                  |    2 +-
 linux/mm.h                 |    0 
 6 files changed, 54 insertions(+), 4 deletions(-)

diff -puN mm/memory.c~pin_page-fix mm/memory.c
--- 25/mm/memory.c~pin_page-fix	Fri Jan 31 13:32:13 2003
+++ 25-akpm/mm/memory.c	Fri Jan 31 14:29:59 2003
@@ -607,6 +607,11 @@ follow_page(struct mm_struct *mm, unsign
 	pmd_t *pmd;
 	pte_t *ptep, pte;
 	unsigned long pfn;
+	struct vm_area_struct *vma;
+
+	vma = hugepage_vma(mm, address);
+	if (vma)
+		return follow_huge_addr(mm, vma, address, write);
 
 	pgd = pgd_offset(mm, address);
 	if (pgd_none(*pgd) || pgd_bad(*pgd))
diff -puN include/linux/hugetlb.h~pin_page-fix include/linux/hugetlb.h
--- 25/include/linux/hugetlb.h~pin_page-fix	Fri Jan 31 13:32:13 2003
+++ 25-akpm/include/linux/hugetlb.h	Fri Jan 31 14:29:59 2003
@@ -20,16 +20,28 @@ int hugetlb_prefault(struct address_spac
 void huge_page_release(struct page *);
 int hugetlb_report_meminfo(char *);
 int is_hugepage_mem_enough(size_t);
-
+struct page *follow_huge_addr(struct mm_struct *mm, struct vm_area_struct *vma,
+			unsigned long address, int write);
+struct vm_area_struct *hugepage_vma(struct mm_struct *mm,
+					unsigned long address);
 extern int htlbpage_max;
 
+static inline void
+mark_mm_hugetlb(struct mm_struct *mm, struct vm_area_struct *vma)
+{
+	if (is_vm_hugetlb_page(vma))
+		mm->used_hugetlb = 1;
+}
+
 #else /* !CONFIG_HUGETLB_PAGE */
+
 static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
 {
 	return 0;
 }
 
-#define follow_hugetlb_page(m,v,p,vs,a,b,i)		({ BUG(); 0; })
+#define follow_hugetlb_page(m,v,p,vs,a,b,i)	({ BUG(); 0; })
+#define follow_huge_addr(mm, vma, addr, write)	0
 #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
 #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
 #define zap_hugepage_range(vma, start, len)	BUG()
@@ -37,6 +49,8 @@ static inline int is_vm_hugetlb_page(str
 #define huge_page_release(page)			BUG()
 #define is_hugepage_mem_enough(size)		0
 #define hugetlb_report_meminfo(buf)		0
+#define hugepage_vma(mm, addr)			0
+#define mark_mm_hugetlb(mm, vma)		do { } while (0)
 
 #endif /* !CONFIG_HUGETLB_PAGE */
 
diff -puN arch/i386/mm/hugetlbpage.c~pin_page-fix arch/i386/mm/hugetlbpage.c
--- 25/arch/i386/mm/hugetlbpage.c~pin_page-fix	Fri Jan 31 13:32:13 2003
+++ 25-akpm/arch/i386/mm/hugetlbpage.c	Fri Jan 31 14:29:59 2003
@@ -150,6 +150,35 @@ back1:
 	return i;
 }
 
+struct page *
+follow_huge_addr(struct mm_struct *mm,
+	struct vm_area_struct *vma, unsigned long address, int write)
+{
+	unsigned long start = address;
+	int length = 1;
+	int nr;
+	struct page *page;
+
+	nr = follow_hugetlb_page(mm, vma, &page, NULL, &start, &length, 0);
+	if (nr == 1)
+		return page;
+	return NULL;
+}
+
+/*
+ * If virtual address `addr' lies within a huge page, return its controlling
+ * VMA, else NULL.
+ */
+struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
+{
+	if (mm->used_hugetlb) {
+		struct vm_area_struct *vma = find_vma(mm, addr);
+		if (vma && is_vm_hugetlb_page(vma))
+			return vma;
+	}
+	return NULL;
+}
+
 void free_huge_page(struct page *page)
 {
 	BUG_ON(page_count(page));
diff -puN mm/mmap.c~pin_page-fix mm/mmap.c
--- 25/mm/mmap.c~pin_page-fix	Fri Jan 31 13:32:13 2003
+++ 25-akpm/mm/mmap.c	Fri Jan 31 13:32:13 2003
@@ -362,6 +362,7 @@ static void vma_link(struct mm_struct *m
 	if (mapping)
 		up(&mapping->i_shared_sem);
 
+	mark_mm_hugetlb(mm, vma);
 	mm->map_count++;
 	validate_mm(mm);
 }
@@ -1427,7 +1428,6 @@ void exit_mmap(struct mm_struct *mm)
 		kmem_cache_free(vm_area_cachep, vma);
 		vma = next;
 	}
-		
 }
 
 /* Insert vm structure into process list sorted by address
diff -puN include/linux/mm.h~pin_page-fix include/linux/mm.h
diff -puN include/linux/sched.h~pin_page-fix include/linux/sched.h
--- 25/include/linux/sched.h~pin_page-fix	Fri Jan 31 13:32:13 2003
+++ 25-akpm/include/linux/sched.h	Fri Jan 31 13:32:13 2003
@@ -203,7 +203,9 @@ struct mm_struct {
 	unsigned long swap_address;
 
 	unsigned dumpable:1;
-
+#ifdef CONFIG_HUGETLB_PAGE
+	int used_hugetlb;
+#endif
 	/* Architecture-specific MM context */
 	mm_context_t context;
 

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
  2003-01-31 23:13 ` David S. Miller
  2003-01-31 23:16 ` Andrew Morton
@ 2003-01-31 23:17 ` Andrew Morton
  2003-01-31 23:18 ` Andrew Morton
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-01-31 23:17 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

2/4



ia32 and others can determine a page's hugeness by inspecting the pmd's value
directly.  No need to perform a VMA lookup against the user's virtual
address.

This patch ifdef's away the VMA-based implementation of
hugepage-aware-follow_page for ia32 and replaces it with a pmd-based
implementation.

The intent is that architectures will implement one or the other.  So the architecture either:

1: Implements hugepage_vma()/follow_huge_addr(), and stubs out
   pmd_huge()/follow_huge_pmd() or

2: Implements pmd_huge()/follow_huge_pmd(), and stubs out
   hugepage_vma()/follow_huge_addr()



 arch/i386/mm/hugetlbpage.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 include/asm-i386/pgtable.h |    5 +++++
 include/linux/hugetlb.h    |    3 +++
 mm/memory.c                |    6 +++++-
 4 files changed, 58 insertions(+), 1 deletion(-)

diff -puN mm/memory.c~pin_page-pmd mm/memory.c
--- 25/mm/memory.c~pin_page-pmd	Fri Jan 31 14:30:01 2003
+++ 25-akpm/mm/memory.c	Fri Jan 31 14:30:01 2003
@@ -618,7 +618,11 @@ follow_page(struct mm_struct *mm, unsign
 		goto out;
 
 	pmd = pmd_offset(pgd, address);
-	if (pmd_none(*pmd) || pmd_bad(*pmd))
+	if (pmd_none(*pmd))
+		goto out;
+	if (pmd_huge(*pmd))
+		return follow_huge_pmd(mm, address, pmd, write);
+	if (pmd_bad(*pmd))
 		goto out;
 
 	ptep = pte_offset_map(pmd, address);
diff -puN include/linux/hugetlb.h~pin_page-pmd include/linux/hugetlb.h
--- 25/include/linux/hugetlb.h~pin_page-pmd	Fri Jan 31 14:30:01 2003
+++ 25-akpm/include/linux/hugetlb.h	Fri Jan 31 14:30:01 2003
@@ -24,6 +24,8 @@ struct page *follow_huge_addr(struct mm_
 			unsigned long address, int write);
 struct vm_area_struct *hugepage_vma(struct mm_struct *mm,
 					unsigned long address);
+struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+				pmd_t *pmd, int write);
 extern int htlbpage_max;
 
 static inline void
@@ -51,6 +53,7 @@ static inline int is_vm_hugetlb_page(str
 #define hugetlb_report_meminfo(buf)		0
 #define hugepage_vma(mm, addr)			0
 #define mark_mm_hugetlb(mm, vma)		do { } while (0)
+#define follow_huge_pmd(mm, addr, pmd, write)	0
 
 #endif /* !CONFIG_HUGETLB_PAGE */
 
diff -puN include/asm-i386/pgtable.h~pin_page-pmd include/asm-i386/pgtable.h
--- 25/include/asm-i386/pgtable.h~pin_page-pmd	Fri Jan 31 14:30:01 2003
+++ 25-akpm/include/asm-i386/pgtable.h	Fri Jan 31 14:30:01 2003
@@ -177,6 +177,11 @@ extern unsigned long pg0[1024];
 #define pmd_clear(xp)	do { set_pmd(xp, __pmd(0)); } while (0)
 #define	pmd_bad(x)	((pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)) != _KERNPG_TABLE)
 
+#ifdef CONFIG_HUGETLB_PAGE
+int pmd_huge(pmd_t pmd);
+#else
+#define pmd_huge(x)	0
+#endif
 
 #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT))
 
diff -puN arch/i386/mm/hugetlbpage.c~pin_page-pmd arch/i386/mm/hugetlbpage.c
--- 25/arch/i386/mm/hugetlbpage.c~pin_page-pmd	Fri Jan 31 14:30:01 2003
+++ 25-akpm/arch/i386/mm/hugetlbpage.c	Fri Jan 31 14:30:01 2003
@@ -150,6 +150,7 @@ back1:
 	return i;
 }
 
+#if 0	/* This is just for testing */
 struct page *
 follow_huge_addr(struct mm_struct *mm,
 	struct vm_area_struct *vma, unsigned long address, int write)
@@ -179,6 +180,50 @@ struct vm_area_struct *hugepage_vma(stru
 	return NULL;
 }
 
+int pmd_huge(pmd_t pmd)
+{
+	return 0;
+}
+
+struct page *
+follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+		pmd_t *pmd, int write)
+{
+	return NULL;
+}
+
+#else
+
+struct page *
+follow_huge_addr(struct mm_struct *mm,
+	struct vm_area_struct *vma, unsigned long address, int write)
+{
+	return NULL;
+}
+
+struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
+{
+	return NULL;
+}
+
+int pmd_huge(pmd_t pmd)
+{
+	return !!(pmd_val(pmd) & _PAGE_PSE);
+}
+
+struct page *
+follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+		pmd_t *pmd, int write)
+{
+	struct page *page;
+
+	page = pte_page(*(pte_t *)pmd);
+	if (page)
+		page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
+	return page;
+}
+#endif
+
 void free_huge_page(struct page *page)
 {
 	BUG_ON(page_count(page));

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (2 preceding siblings ...)
  2003-01-31 23:17 ` Andrew Morton
@ 2003-01-31 23:18 ` Andrew Morton
  2003-01-31 23:18 ` Andrew Morton
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-01-31 23:18 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

3/4



We currently have a problem when things like ptrace, futexes and direct-io
try to pin user pages.  If the user's address is in a huge page we're
elevting the refcount of a constituent 4k page, not the head page of the
high-order allocation unit.

To solve this, a generic way of handling higher-order pages has been
implemented:

- A higher-order page is called a "compound page".  Chose this because
  "huge page", "large page", "super page", etc all seem to mean different
  things to different people.

- The first (controlling) 4k page of a compound page is referred to as the
  "head" page.

- The remaining pages are tail pages.

All pages have PG_compound set.  All pages have their lru.next pointing at
the head page (even the head page has this).

The head page's lru.prev, if non-zero, holds the address of the compound
page's put_page() function.

The order of the allocation is stored in the first tail page's lru.prev.
This is only for debug at present.  This usage means that zero-order pages
may not be compound.

The above relationships are established for _all_ higher-order pages in the
page allocator.  Which has some cost, but not much - another atomic op during
fork(), mainly.

This functionality is only enabled if CONFIG_HUGETLB_PAGE, although it could
be turned on permanently.  There's a little extra cost in get_page/put_page.




 linux/mm.h         |   35 ++++++++++++++++++++++++++--
 linux/page-flags.h |    7 ++++-
 page_alloc.c       |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+), 3 deletions(-)

diff -puN include/linux/page-flags.h~compound-pages include/linux/page-flags.h
--- 25/include/linux/page-flags.h~compound-pages	2003-01-30 23:43:18.000000000 -0800
+++ 25-akpm/include/linux/page-flags.h	2003-01-30 23:43:18.000000000 -0800
@@ -72,7 +72,8 @@
 
 #define PG_direct		16	/* ->pte_chain points directly at pte */
 #define PG_mappedtodisk		17	/* Has blocks allocated on-disk */
-#define PG_reclaim		18	/* To be recalimed asap */
+#define PG_reclaim		18	/* To be reclaimed asap */
+#define PG_compound		19	/* Part of a compound page */
 
 /*
  * Global page accounting.  One instance per CPU.  Only unsigned longs are
@@ -251,6 +252,10 @@ extern void get_full_page_state(struct p
 #define ClearPageReclaim(page)	clear_bit(PG_reclaim, &(page)->flags)
 #define TestClearPageReclaim(page) test_and_clear_bit(PG_reclaim, &(page)->flags)
 
+#define PageCompound(page)	test_bit(PG_compound, &(page)->flags)
+#define SetPageCompound(page)	set_bit(PG_compound, &(page)->flags)
+#define ClearPageCompound(page)	clear_bit(PG_compound, &(page)->flags)
+
 /*
  * The PageSwapCache predicate doesn't use a PG_flag at this time,
  * but it may again do so one day.
diff -puN mm/page_alloc.c~compound-pages mm/page_alloc.c
--- 25/mm/page_alloc.c~compound-pages	2003-01-30 23:43:18.000000000 -0800
+++ 25-akpm/mm/page_alloc.c	2003-01-31 01:47:02.000000000 -0800
@@ -85,6 +85,62 @@ static void bad_page(const char *functio
 	page->mapping = NULL;
 }
 
+#ifndef CONFIG_HUGETLB_PAGE
+#define prep_compound_page(page, order) do { } while (0)
+#define destroy_compound_page(page, order) do { } while (0)
+#else
+/*
+ * Higher-order pages are called "compound pages".  They are structured thusly:
+ *
+ * The first PAGE_SIZE page is called the "head page".
+ *
+ * The remaining PAGE_SIZE pages are called "tail pages".
+ *
+ * All pages have PG_compound set.  All pages have their lru.next pointing at
+ * the head page (even the head page has this).
+ *
+ * The head page's lru.prev, if non-zero, holds the address of the compound
+ * page's put_page() function.
+ *
+ * The order of the allocation is stored in the first tail page's lru.prev.
+ * This is only for debug at present.  This usage means that zero-order pages
+ * may not be compound.
+ */
+static void prep_compound_page(struct page *page, int order)
+{
+	int i;
+	int nr_pages = 1 << order;
+
+	page->lru.prev = NULL;
+	page[1].lru.prev = (void *)order;
+	for (i = 0; i < nr_pages; i++) {
+		struct page *p = page + i;
+
+		SetPageCompound(p);
+		p->lru.next = (void *)page;
+	}
+}
+
+static void destroy_compound_page(struct page *page, int order)
+{
+	int i;
+	int nr_pages = 1 << order;
+
+	if (page[1].lru.prev != (void *)order)
+		bad_page(__FUNCTION__, page);
+
+	for (i = 0; i < nr_pages; i++) {
+		struct page *p = page + i;
+
+		if (!PageCompound(p))
+			bad_page(__FUNCTION__, page);
+		if (p->lru.next != (void *)page)
+			bad_page(__FUNCTION__, page);
+		ClearPageCompound(p);
+	}
+}
+#endif		/* CONFIG_HUGETLB_PAGE */
+
 /*
  * Freeing function for a buddy system allocator.
  *
@@ -114,6 +170,8 @@ static inline void __free_pages_bulk (st
 {
 	unsigned long page_idx, index;
 
+	if (order)
+		destroy_compound_page(page, order);
 	page_idx = page - base;
 	if (page_idx & ~mask)
 		BUG();
@@ -409,6 +467,12 @@ void free_cold_page(struct page *page)
 	free_hot_cold_page(page, 1);
 }
 
+/*
+ * Really, prep_compound_page() should be called from __rmqueue_bulk().  But
+ * we cheat by calling it from here, in the order > 0 path.  Saves a branch
+ * or two.
+ */
+
 static struct page *buffered_rmqueue(struct zone *zone, int order, int cold)
 {
 	unsigned long flags;
@@ -435,6 +499,8 @@ static struct page *buffered_rmqueue(str
 		spin_lock_irqsave(&zone->lock, flags);
 		page = __rmqueue(zone, order);
 		spin_unlock_irqrestore(&zone->lock, flags);
+		if (order && page)
+			prep_compound_page(page, order);
 	}
 
 	if (page != NULL) {
diff -puN include/linux/mm.h~compound-pages include/linux/mm.h
--- 25/include/linux/mm.h~compound-pages	2003-01-30 23:43:18.000000000 -0800
+++ 25-akpm/include/linux/mm.h	2003-01-30 23:43:18.000000000 -0800
@@ -208,24 +208,55 @@ struct page {
  * Also, many kernel routines increase the page count before a critical
  * routine so they can be sure the page doesn't go away from under them.
  */
-#define get_page(p)		atomic_inc(&(p)->count)
-#define __put_page(p)		atomic_dec(&(p)->count)
 #define put_page_testzero(p)				\
 	({						\
 		BUG_ON(page_count(page) == 0);		\
 		atomic_dec_and_test(&(p)->count);	\
 	})
+
 #define page_count(p)		atomic_read(&(p)->count)
 #define set_page_count(p,v) 	atomic_set(&(p)->count, v)
+#define __put_page(p)		atomic_dec(&(p)->count)
 
 extern void FASTCALL(__page_cache_release(struct page *));
 
+#ifdef CONFIG_HUGETLB_PAGE
+
+static inline void get_page(struct page *page)
+{
+	if (PageCompound(page))
+		page = (struct page *)page->lru.next;
+	atomic_inc(&page->count);
+}
+
 static inline void put_page(struct page *page)
 {
+	if (PageCompound(page)) {
+		page = (struct page *)page->lru.next;
+		if (page->lru.prev) {	/* destructor? */
+			(*(void (*)(struct page *))page->lru.prev)(page);
+			return;
+		}
+	}
 	if (!PageReserved(page) && put_page_testzero(page))
 		__page_cache_release(page);
 }
 
+#else		/* CONFIG_HUGETLB_PAGE */
+
+static inline void get_page(struct page *page)
+{
+	atomic_inc(&page->count);
+}
+
+static inline void put_page(struct page *page)
+{
+	if (!PageReserved(page) && put_page_testzero(page))
+		__page_cache_release(page);
+}
+
+#endif		/* CONFIG_HUGETLB_PAGE */
+
 /*
  * Multiple processes may "see" the same page. E.g. for untouched
  * mappings of /dev/null, all processes see the same page full of

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (3 preceding siblings ...)
  2003-01-31 23:18 ` Andrew Morton
@ 2003-01-31 23:18 ` Andrew Morton
  2003-02-01  8:58   ` Ingo Oeser
  2003-02-02 10:55 ` Andrew Morton
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-01-31 23:18 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

4/4



The odd thing about hugetlb is that it maintains its own freelist of pages. 
And it has to do that, else it would trivially run out of pages due to buddy
fragmetation.

So we we don't want callers of put_page() to be passing those pages
to __free_pages_ok() on the final put().

So hugetlb installs a destructor in the compound pages to point at
free_huge_page(), which knows how to put these pages back onto the free list.

Also, don't mark hugepages as all PageReserved any more.  That's preenting
callers from doing proper refcounting.  Any code which does a user pagetable
walk and hits part of a hugepage will now handle it transparently.




 arch/i386/mm/hugetlbpage.c    |   22 ++++++++++------------
 arch/ia64/mm/hugetlbpage.c    |    8 ++------
 arch/sparc64/mm/hugetlbpage.c |    7 +------
 3 files changed, 13 insertions(+), 24 deletions(-)

diff -puN arch/i386/mm/hugetlbpage.c~compound-pages-hugetlb arch/i386/mm/hugetlbpage.c
--- 25/arch/i386/mm/hugetlbpage.c~compound-pages-hugetlb	Fri Jan 31 14:34:55 2003
+++ 25-akpm/arch/i386/mm/hugetlbpage.c	Fri Jan 31 14:35:16 2003
@@ -46,6 +46,7 @@ static struct page *alloc_hugetlb_page(v
 	htlbpagemem--;
 	spin_unlock(&htlbpage_lock);
 	set_page_count(page, 1);
+	page->lru.prev = (void *)huge_page_release;
 	for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
 		clear_highpage(&page[i]);
 	return page;
@@ -134,6 +135,7 @@ back1:
 		page = pte_page(pte);
 		if (pages) {
 			page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
+			get_page(page);
 			pages[i] = page;
 		}
 		if (vmas)
@@ -218,8 +220,10 @@ follow_huge_pmd(struct mm_struct *mm, un
 	struct page *page;
 
 	page = pte_page(*(pte_t *)pmd);
-	if (page)
+	if (page) {
 		page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
+		get_page(page);
+	}
 	return page;
 }
 #endif
@@ -372,8 +376,8 @@ int try_to_free_low(int count)
 
 int set_hugetlb_mem_size(int count)
 {
-	int j, lcount;
-	struct page *page, *map;
+	int lcount;
+	struct page *page;
 	extern long htlbzone_pages;
 	extern struct list_head htlbpage_freelist;
 
@@ -389,11 +393,6 @@ int set_hugetlb_mem_size(int count)
 			page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
 			if (page == NULL)
 				break;
-			map = page;
-			for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
-				SetPageReserved(map);
-				map++;
-			}
 			spin_lock(&htlbpage_lock);
 			list_add(&page->list, &htlbpage_freelist);
 			htlbpagemem++;
@@ -415,7 +414,8 @@ int set_hugetlb_mem_size(int count)
 	return (int) htlbzone_pages;
 }
 
-int hugetlb_sysctl_handler(ctl_table *table, int write, struct file *file, void *buffer, size_t *length)
+int hugetlb_sysctl_handler(ctl_table *table, int write,
+		struct file *file, void *buffer, size_t *length)
 {
 	proc_dointvec(table, write, file, buffer, length);
 	htlbpage_max = set_hugetlb_mem_size(htlbpage_max);
@@ -432,15 +432,13 @@ __setup("hugepages=", hugetlb_setup);
 
 static int __init hugetlb_init(void)
 {
-	int i, j;
+	int i;
 	struct page *page;
 
 	for (i = 0; i < htlbpage_max; ++i) {
 		page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
 		if (!page)
 			break;
-		for (j = 0; j < HPAGE_SIZE/PAGE_SIZE; ++j)
-			SetPageReserved(&page[j]);
 		spin_lock(&htlbpage_lock);
 		list_add(&page->list, &htlbpage_freelist);
 		spin_unlock(&htlbpage_lock);
diff -puN arch/ia64/mm/hugetlbpage.c~compound-pages-hugetlb arch/ia64/mm/hugetlbpage.c
--- 25/arch/ia64/mm/hugetlbpage.c~compound-pages-hugetlb	Fri Jan 31 15:04:32 2003
+++ 25-akpm/arch/ia64/mm/hugetlbpage.c	Fri Jan 31 15:06:27 2003
@@ -227,6 +227,7 @@ back1:
 		page = pte_page(pte);
 		if (pages) {
 			page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
+			get_page(page);
 			pages[i] = page;
 		}
 		if (vmas)
@@ -303,11 +304,6 @@ set_hugetlb_mem_size (int count)
 			page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
 			if (page == NULL)
 				break;
-			map = page;
-			for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
-				SetPageReserved(map);
-				map++;
-			}
 			spin_lock(&htlbpage_lock);
 			list_add(&page->list, &htlbpage_freelist);
 			htlbpagemem++;
@@ -327,7 +323,7 @@ set_hugetlb_mem_size (int count)
 		map = page;
 		for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
 			map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
-					1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
+					1 << PG_dirty | 1 << PG_active |
 					1 << PG_private | 1<< PG_writeback);
 			map++;
 		}
diff -puN arch/sparc64/mm/hugetlbpage.c~compound-pages-hugetlb arch/sparc64/mm/hugetlbpage.c
--- 25/arch/sparc64/mm/hugetlbpage.c~compound-pages-hugetlb	Fri Jan 31 15:05:00 2003
+++ 25-akpm/arch/sparc64/mm/hugetlbpage.c	Fri Jan 31 15:06:35 2003
@@ -288,6 +288,7 @@ back1:
 		page = pte_page(pte);
 		if (pages) {
 			page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
+			get_page(page);
 			pages[i] = page;
 		}
 		if (vmas)
@@ -584,11 +585,6 @@ int set_hugetlb_mem_size(int count)
 			page = alloc_pages(GFP_ATOMIC, HUGETLB_PAGE_ORDER);
 			if (page == NULL)
 				break;
-			map = page;
-			for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
-				SetPageReserved(map);
-				map++;
-			}
 			spin_lock(&htlbpage_lock);
 			list_add(&page->list, &htlbpage_freelist);
 			htlbpagemem++;
@@ -613,7 +609,6 @@ int set_hugetlb_mem_size(int count)
 			map->flags &= ~(1UL << PG_locked | 1UL << PG_error |
 					1UL << PG_referenced |
 					1UL << PG_dirty | 1UL << PG_active |
-					1UL << PG_reserved |
 					1UL << PG_private | 1UL << PG_writeback);
 			set_page_count(page, 0);
 			map++;

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:18 ` Andrew Morton
@ 2003-02-01  8:58   ` Ingo Oeser
  2003-02-01  9:31     ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Ingo Oeser @ 2003-02-01  8:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

Hi Andrew,

On Fri, Jan 31, 2003 at 03:18:58PM -0800, Andrew Morton wrote:
> Also, don't mark hugepages as all PageReserved any more.  That's preenting
> callers from doing proper refcounting.  Any code which does a user pagetable
> walk and hits part of a hugepage will now handle it transparently.

Heh, that's helping me a lot and makes get_one_user_page very
simple again (and simplify the follow_huge_* stuff even more).

This could help futex slow-path and remove loads of code.

Once this hugetlb stuff settles down a bit, I'll rewrite the
page-walking again to accomodate this. No API changes, just
internal rewrites.

So please tell the linux-mm list, when it's finished and I'll have
sth. ready for -mm in the first week of March[1].

Regards

Ingo Oeser

[1] Important exams in February, sorry.
-- 
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-01  8:58   ` Ingo Oeser
@ 2003-02-01  9:31     ` Andrew Morton
  2003-02-01 10:00       ` William Lee Irwin III
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-02-01  9:31 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-mm

Ingo Oeser <ingo.oeser@informatik.tu-chemnitz.de> wrote:
>
> Hi Andrew,
> 
> On Fri, Jan 31, 2003 at 03:18:58PM -0800, Andrew Morton wrote:
> > Also, don't mark hugepages as all PageReserved any more.  That's preenting
> > callers from doing proper refcounting.  Any code which does a user pagetable
> > walk and hits part of a hugepage will now handle it transparently.
> 
> Heh, that's helping me a lot and makes get_one_user_page very
> simple again (and simplify the follow_huge_* stuff even more).
> 
> This could help futex slow-path and remove loads of code.
> 
> Once this hugetlb stuff settles down a bit, I'll rewrite the
> page-walking again to accomodate this. No API changes, just
> internal rewrites.

OK...

> So please tell the linux-mm list, when it's finished and I'll have
> sth. ready for -mm in the first week of March[1].

Well I'm thinking of renaming it to hugebugfs.  It should be settled down
shortly.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-01  9:31     ` Andrew Morton
@ 2003-02-01 10:00       ` William Lee Irwin III
  2003-02-01 10:14         ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: William Lee Irwin III @ 2003-02-01 10:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Oeser, linux-mm

On Sat, Feb 01, 2003 at 01:31:36AM -0800, Andrew Morton wrote:
> Well I'm thinking of renaming it to hugebugfs.  It should be settled down
> shortly.

We've had a difference of opinion wrt. the proper mechanism for
referring things to the head of their superpage. I guess in one
sense I could be blamed for not following directions, but I _really_
didn't want to go in the direction of killing ->lru for all time.

There is also other shite I'd _really_ rather not get into publicly.

-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-01 10:00       ` William Lee Irwin III
@ 2003-02-01 10:14         ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-01 10:14 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: ingo.oeser, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> On Sat, Feb 01, 2003 at 01:31:36AM -0800, Andrew Morton wrote:
> > Well I'm thinking of renaming it to hugebugfs.  It should be settled down
> > shortly.
> 
> We've had a difference of opinion wrt. the proper mechanism for
> referring things to the head of their superpage. I guess in one
> sense I could be blamed for not following directions, but I _really_
> didn't want to go in the direction of killing ->lru for all time.

It's not killed - tons of stuff can be stuck at page[1].

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (4 preceding siblings ...)
  2003-01-31 23:18 ` Andrew Morton
@ 2003-02-02 10:55 ` Andrew Morton
  2003-02-02 10:55 ` Andrew Morton
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:55 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

5/4

get_unmapped_area for hugetlbfs

Having to specify the mapping address is a pain.  Give hugetlbfs files a
file_operations.get_unmapped_area().

The implementation is in hugetlbfs rather than in arch code because it's
probably common to several architectures.  If the architecture has special
needs it can define HAVE_ARCH_HUGETLB_UNMAPPED_AREA and go it alone.  Just
like HAVE_ARCH_UNMAPPED_AREA.



Having to specify the mapping address is a pain.  Give hugetlbfs files a
file_operations.get_unmapped_area().

The implementation is in hugetlbfs rather than in arch code because it's
probably common to several architectures.  If the architecture has special
needs it can define HAVE_ARCH_HUGETLB_UNMAPPED_AREA and go it alone.  Just
like HAVE_ARCH_UNMAPPED_AREA.



 hugetlbfs/inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 44 insertions(+), 2 deletions(-)

diff -puN fs/hugetlbfs/inode.c~hugetlbfs-get_unmapped_area fs/hugetlbfs/inode.c
--- 25/fs/hugetlbfs/inode.c~hugetlbfs-get_unmapped_area	2003-02-01 01:13:03.000000000 -0800
+++ 25-akpm/fs/hugetlbfs/inode.c	2003-02-02 01:17:01.000000000 -0800
@@ -74,6 +74,47 @@ static int hugetlbfs_file_mmap(struct fi
 }
 
 /*
+ * Called under down_write(mmap_sem), page_table_lock is not held
+ */
+
+#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
+unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags);
+#else
+static unsigned long
+hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (len > TASK_SIZE)
+		return -ENOMEM;
+
+	if (addr) {
+		addr = ALIGN(addr, HPAGE_SIZE);
+		vma = find_vma(mm, addr);
+		if (TASK_SIZE - len >= addr &&
+		    (!vma || addr + len <= vma->vm_start))
+			return addr;
+	}
+
+	addr = ALIGN(mm->free_area_cache, HPAGE_SIZE);
+
+	for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
+		/* At this point:  (!vma || addr < vma->vm_end). */
+		if (TASK_SIZE - len < addr)
+			return -ENOMEM;
+		if (!vma || addr + len <= vma->vm_start)
+			return addr;
+		addr = ALIGN(vma->vm_end, HPAGE_SIZE);
+	}
+}
+#endif
+
+/*
  * Read a page. Again trivial. If it didn't already exist
  * in the page cache, it is zero-filled.
  */
@@ -466,8 +507,9 @@ static struct address_space_operations h
 };
 
 struct file_operations hugetlbfs_file_operations = {
-	.mmap		= hugetlbfs_file_mmap,
-	.fsync		= simple_sync_file,
+	.mmap			= hugetlbfs_file_mmap,
+	.fsync			= simple_sync_file,
+	.get_unmapped_area	= hugetlb_get_unmapped_area,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (5 preceding siblings ...)
  2003-02-02 10:55 ` Andrew Morton
@ 2003-02-02 10:55 ` Andrew Morton
  2003-02-02 19:59   ` William Lee Irwin III
  2003-02-02 10:55 ` Andrew Morton
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:55 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

6/4

hugetlbfs: fix truncate


- Opening a hugetlbfs file O_TRUNC calls the generic vmtruncate() functions
  and nukes the kernel.

  Give S_ISREG hugetlbfs files a inode_operations, and hence a setattr
  which know how to handle these files.

- Don't permit the user to truncate hugetlbfs files to sizes which are not
  a multiple of HPAGE_SIZE.

- We don't support expanding in ftruncate(), so remove that code.




 hugetlbfs/inode.c |   39 ++++++++++++++++-----------------------
 1 files changed, 16 insertions(+), 23 deletions(-)

diff -puN fs/hugetlbfs/inode.c~hugetlbfs-truncate-fix fs/hugetlbfs/inode.c
--- 25/fs/hugetlbfs/inode.c~hugetlbfs-truncate-fix	2003-02-02 01:17:04.000000000 -0800
+++ 25-akpm/fs/hugetlbfs/inode.c	2003-02-02 01:17:04.000000000 -0800
@@ -34,6 +34,7 @@ static struct super_operations hugetlbfs
 static struct address_space_operations hugetlbfs_aops;
 struct file_operations hugetlbfs_file_operations;
 static struct inode_operations hugetlbfs_dir_inode_operations;
+static struct inode_operations hugetlbfs_inode_operations;
 
 static struct backing_dev_info hugetlbfs_backing_dev_info = {
 	.ra_pages	= 0,	/* No readahead */
@@ -326,44 +327,29 @@ static void hugetlb_vmtruncate_list(stru
 	}
 }
 
+/*
+ * Expanding truncates are not allowed.
+ */
 static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
 {
 	unsigned long pgoff;
 	struct address_space *mapping = inode->i_mapping;
-	unsigned long limit;
 
-	pgoff = (offset + HPAGE_SIZE - 1) >> HPAGE_SHIFT;
+	if (offset > inode->i_size)
+		return -EINVAL;
 
-	if (inode->i_size < offset)
-		goto do_expand;
+	BUG_ON(offset & ~HPAGE_MASK);
+	pgoff = offset >> HPAGE_SHIFT;
 
 	inode->i_size = offset;
 	down(&mapping->i_shared_sem);
-	if (list_empty(&mapping->i_mmap) && list_empty(&mapping->i_mmap_shared))
-		goto out_unlock;
 	if (!list_empty(&mapping->i_mmap))
 		hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
 	if (!list_empty(&mapping->i_mmap_shared))
 		hugetlb_vmtruncate_list(&mapping->i_mmap_shared, pgoff);
-
-out_unlock:
 	up(&mapping->i_shared_sem);
 	truncate_hugepages(mapping, offset);
 	return 0;
-
-do_expand:
-	limit = current->rlim[RLIMIT_FSIZE].rlim_cur;
-	if (limit != RLIM_INFINITY && offset > limit)
-		goto out_sig;
-	if (offset > inode->i_sb->s_maxbytes)
-		goto out;
-	inode->i_size = offset;
-	return 0;
-
-out_sig:
-	send_sig(SIGXFSZ, current, 0);
-out:
-	return -EFBIG;
 }
 
 static int hugetlbfs_setattr(struct dentry *dentry, struct iattr *attr)
@@ -390,7 +376,9 @@ static int hugetlbfs_setattr(struct dent
 		goto out;
 
 	if (ia_valid & ATTR_SIZE) {
-		error = hugetlb_vmtruncate(inode, attr->ia_size);
+		error = -EINVAL;
+		if (!(attr->ia_size & ~HPAGE_MASK))
+			error = hugetlb_vmtruncate(inode, attr->ia_size);
 		if (error)
 			goto out;
 		attr->ia_valid &= ~ATTR_SIZE;
@@ -425,6 +413,7 @@ hugetlbfs_get_inode(struct super_block *
 			init_special_inode(inode, mode, dev);
 			break;
 		case S_IFREG:
+			inode->i_op = &hugetlbfs_inode_operations;
 			inode->i_fop = &hugetlbfs_file_operations;
 			break;
 		case S_IFDIR:
@@ -525,6 +514,10 @@ static struct inode_operations hugetlbfs
 	.setattr	= hugetlbfs_setattr,
 };
 
+static struct inode_operations hugetlbfs_inode_operations = {
+	.setattr	= hugetlbfs_setattr,
+};
+
 static struct super_operations hugetlbfs_ops = {
 	.statfs		= simple_statfs,
 	.drop_inode	= hugetlbfs_drop_inode,

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-02 10:55 ` Andrew Morton
@ 2003-02-02 19:59   ` William Lee Irwin III
  2003-02-02 20:49     ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: William Lee Irwin III @ 2003-02-02 19:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davem, rohit.seth, davidm, anton, linux-mm

On Sun, Feb 02, 2003 at 02:55:46AM -0800, Andrew Morton wrote:
> 6/4
> hugetlbfs: fix truncate
> - Opening a hugetlbfs file O_TRUNC calls the generic vmtruncate() functions
>   and nukes the kernel.
>   Give S_ISREG hugetlbfs files a inode_operations, and hence a setattr
>   which know how to handle these files.
> - Don't permit the user to truncate hugetlbfs files to sizes which are not
>   a multiple of HPAGE_SIZE.
> - We don't support expanding in ftruncate(), so remove that code.

erm, IIRC ftruncate() was the only way to expand the things; without
read() or write() showing up this creates a huge semantic deficit.
When I wake up the rest of the way I'll eventually remember which
debate I lost that introduced an alternative method.

Leaving .setattr out of the non-directory inode ops and/or not having
a non-directory i_ops is a relatively huge omission. Not sure how
anything actually survived that.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-02 19:59   ` William Lee Irwin III
@ 2003-02-02 20:49     ` Andrew Morton
  2003-02-03 15:09       ` Eric W. Biederman
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 20:49 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: davem, rohit.seth, davidm, anton, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> On Sun, Feb 02, 2003 at 02:55:46AM -0800, Andrew Morton wrote:
> > 6/4
> > hugetlbfs: fix truncate
> > - Opening a hugetlbfs file O_TRUNC calls the generic vmtruncate() functions
> >   and nukes the kernel.
> >   Give S_ISREG hugetlbfs files a inode_operations, and hence a setattr
> >   which know how to handle these files.
> > - Don't permit the user to truncate hugetlbfs files to sizes which are not
> >   a multiple of HPAGE_SIZE.
> > - We don't support expanding in ftruncate(), so remove that code.
> 
> erm, IIRC ftruncate() was the only way to expand the things;

Expanding ftruncate would be nice, but the current way of performing
the page instantiation at mmap() time seems sufficient.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-02 20:49     ` Andrew Morton
@ 2003-02-03 15:09       ` Eric W. Biederman
  2003-02-03 21:29         ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Eric W. Biederman @ 2003-02-03 15:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: William Lee Irwin III, davem, rohit.seth, davidm, anton, linux-mm

Andrew Morton <akpm@digeo.com> writes:

> William Lee Irwin III <wli@holomorphy.com> wrote:
> >
> > On Sun, Feb 02, 2003 at 02:55:46AM -0800, Andrew Morton wrote:
> > > 6/4
> > > hugetlbfs: fix truncate
> > > - Opening a hugetlbfs file O_TRUNC calls the generic vmtruncate() functions
> > >   and nukes the kernel.
> > >   Give S_ISREG hugetlbfs files a inode_operations, and hence a setattr
> > >   which know how to handle these files.
> > > - Don't permit the user to truncate hugetlbfs files to sizes which are not
> > >   a multiple of HPAGE_SIZE.
> > > - We don't support expanding in ftruncate(), so remove that code.
> > 
> > erm, IIRC ftruncate() was the only way to expand the things;
> 
> Expanding ftruncate would be nice, but the current way of performing
> the page instantiation at mmap() time seems sufficient.

Having an expanding/shrinking ftruncate will trivially allow posix shared
memory semantics.   

I am trying to digest the idea of a mmap that grows a file.  There isn't
anything else that works that way is there?

It looks like you are removing the limit checking from hugetlbfs, by
removing the expansion code from ftruncate.  And given the fact that
nothing else grows in mmap, I suspect the code will be much easier to
write and maintain if the growth is constrained to happen in ftruncate.

mmap growing a file just sounds totally non-intuitive.  Though I do
agree, allocating that page at the time of growth sounds reasonable.

I may be missing something but it looks like there is not code present
to prevent multiple page allocations at the same time conflicting
when i_size is grown. 

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-03 15:09       ` Eric W. Biederman
@ 2003-02-03 21:29         ` Andrew Morton
  2003-02-04  5:37           ` Eric W. Biederman
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-02-03 21:29 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: wli, davem, rohit.seth, davidm, anton, linux-mm

ebiederm@xmission.com (Eric W. Biederman) wrote:
>
> > 
> > Expanding ftruncate would be nice, but the current way of performing
> > the page instantiation at mmap() time seems sufficient.
> 
> Having an expanding/shrinking ftruncate will trivially allow posix shared
> memory semantics.   
> 
> I am trying to digest the idea of a mmap that grows a file.  There isn't
> anything else that works that way is there?

Not that I can think of.

> It looks like you are removing the limit checking from hugetlbfs, by
> removing the expansion code from ftruncate.

There was no expansion code.

The code I took out was vestigial.  We can put it all back if we decide to
add a new expand-with-ftruncate feature to hugetlbfs.

>  And given the fact that
> nothing else grows in mmap, I suspect the code will be much easier to
> write and maintain if the growth is constrained to happen in ftruncate.

That would require a fault handler.  We don't have one of those for hugetlbs.
 Probably not hard to add one though.

> I may be missing something but it looks like there is not code present
> to prevent multiple page allocations at the same time conflicting
> when i_size is grown. 

All the mmap code runs under down_write(current->mm->mmap_sem);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-03 21:29         ` Andrew Morton
@ 2003-02-04  5:37           ` Eric W. Biederman
  2003-02-04  5:50             ` William Lee Irwin III
  0 siblings, 1 reply; 48+ messages in thread
From: Eric W. Biederman @ 2003-02-04  5:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: wli, davem, rohit.seth, davidm, anton, linux-mm

Andrew Morton <akpm@digeo.com> writes:

> ebiederm@xmission.com (Eric W. Biederman) wrote:
> >
> > > 
> > > Expanding ftruncate would be nice, but the current way of performing
> > > the page instantiation at mmap() time seems sufficient.
> > 
> > Having an expanding/shrinking ftruncate will trivially allow posix shared
> > memory semantics.   
> > 
> > I am trying to digest the idea of a mmap that grows a file.  There isn't
> > anything else that works that way is there?
> 
> Not that I can think of.
> 
> > It looks like you are removing the limit checking from hugetlbfs, by
> > removing the expansion code from ftruncate.
> 
> There was no expansion code.

inode->i_size was grown, but I admit no huge pages were allocated.
 
> The code I took out was vestigial.  We can put it all back if we decide to
> add a new expand-with-ftruncate feature to hugetlbfs.
> 
> >  And given the fact that
> > nothing else grows in mmap, I suspect the code will be much easier to
> > write and maintain if the growth is constrained to happen in ftruncate.
> 
> That would require a fault handler.  We don't have one of those for hugetlbs.
>  Probably not hard to add one though.

I don't see that ftruncate setting the size would require a fault
handler.  ftruncate just needs to be called before mmap.  But a fault
handler would certainly make the code more like the rest of the mmap
cases.  

With a fault handler I start getting dangerous thoughts of paging
hugetlbfs to swap, probably not a good idea.

> > I may be missing something but it looks like there is not code present
> > to prevent multiple page allocations at the same time conflicting
> > when i_size is grown. 
> 
> All the mmap code runs under down_write(current->mm->mmap_sem);

Last I looked i_size is commonly protected by inode->i_sem.

current->mm->mmap_sem really doesn't provide protection if there is
a shared area between mappings in two different mm's.  Not a problem
if the code is a private mapping but otherwise...

Does hugetlbfs support shared mappings?  If it is exclusively
for private mappings the code makes much more sense than I am
thinking.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-04  5:37           ` Eric W. Biederman
@ 2003-02-04  5:50             ` William Lee Irwin III
  2003-02-04  7:06               ` Eric W. Biederman
  0 siblings, 1 reply; 48+ messages in thread
From: William Lee Irwin III @ 2003-02-04  5:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, davem, rohit.seth, davidm, anton, linux-mm

On Mon, Feb 03, 2003 at 10:37:51PM -0700, Eric W. Biederman wrote:
> current->mm->mmap_sem really doesn't provide protection if there is
> a shared area between mappings in two different mm's.  Not a problem
> if the code is a private mapping but otherwise...
> Does hugetlbfs support shared mappings?  If it is exclusively
> for private mappings the code makes much more sense than I am
> thinking.

It's supposedly for massively shared mappings to reduce PTE overhead.
Well, in theory there's some kind of TLB benefit, but the only thing
ppl really care about is x86 pagetable structure gets rid of L3 space
entirely so you don't burn 12+GB of L3 pagetables for appserver loads.

-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-04  5:50             ` William Lee Irwin III
@ 2003-02-04  7:06               ` Eric W. Biederman
  2003-02-04  7:16                 ` Martin J. Bligh
  0 siblings, 1 reply; 48+ messages in thread
From: Eric W. Biederman @ 2003-02-04  7:06 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andrew Morton, davem, rohit.seth, davidm, anton, linux-mm

William Lee Irwin III <wli@holomorphy.com> writes:

> On Mon, Feb 03, 2003 at 10:37:51PM -0700, Eric W. Biederman wrote:
> > current->mm->mmap_sem really doesn't provide protection if there is
> > a shared area between mappings in two different mm's.  Not a problem
> > if the code is a private mapping but otherwise...
> > Does hugetlbfs support shared mappings?  If it is exclusively
> > for private mappings the code makes much more sense than I am
> > thinking.
> 
> It's supposedly for massively shared mappings to reduce PTE overhead.

O.k.  Then the code definitely needs to handle shared mappings..

> Well, in theory there's some kind of TLB benefit, but the only thing
> ppl really care about is x86 pagetable structure gets rid of L3 space
> entirely so you don't burn 12+GB of L3 pagetables for appserver loads.

I am with the group that actually cares more about the TLB benefit.
For HPC loads there is really only one application per machine.  And with
just one page table, the only real advantage is the more efficient use
of the TLB.  

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-04  7:06               ` Eric W. Biederman
@ 2003-02-04  7:16                 ` Martin J. Bligh
  2003-02-04 12:40                   ` Eric W. Biederman
  0 siblings, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2003-02-04  7:16 UTC (permalink / raw)
  To: Eric W. Biederman, William Lee Irwin III
  Cc: Andrew Morton, davem, rohit.seth, davidm, anton, linux-mm

> O.k.  Then the code definitely needs to handle shared mappings..

Why? we just divided the pagetable size by a factor of 1000, so
the problem is no longer really there ;-)
 
>> Well, in theory there's some kind of TLB benefit, but the only thing
>> ppl really care about is x86 pagetable structure gets rid of L3 space
>> entirely so you don't burn 12+GB of L3 pagetables for appserver loads.
> 
> I am with the group that actually cares more about the TLB benefit.
> For HPC loads there is really only one application per machine.  And with
> just one page table, the only real advantage is the more efficient use
> of the TLB.  

The reason we don't see it much is that we mostly have P3's which only
have 4 entries for large pages. P4's would be much easier to demonstrate
such things on, and I don't think we've really tried very hard on that with
hugetlbfs (earlier Java work by the research group showed impressive
improvements on an earlier implementation).

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-04  7:16                 ` Martin J. Bligh
@ 2003-02-04 12:40                   ` Eric W. Biederman
  2003-02-04 15:55                     ` Martin J. Bligh
  2003-02-04 21:12                     ` Andrew Morton
  0 siblings, 2 replies; 48+ messages in thread
From: Eric W. Biederman @ 2003-02-04 12:40 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: William Lee Irwin III, Andrew Morton, davem, rohit.seth, davidm,
	anton, linux-mm

"Martin J. Bligh" <mbligh@aracnet.com> writes:

> > O.k.  Then the code definitely needs to handle shared mappings..
> 
> Why? we just divided the pagetable size by a factor of 1000, so
> the problem is no longer really there ;-)

William said one of the cases was to handle massively shared
mappings.  You cannot create a massively shared mapping except by
sharing.

Did I misunderstand what was meant by a massively shared mapping?

I can't imagine it being useful to guys like oracle without MAP_SHARED
support....

> >> Well, in theory there's some kind of TLB benefit, but the only thing
> >> ppl really care about is x86 pagetable structure gets rid of L3 space
> >> entirely so you don't burn 12+GB of L3 pagetables for appserver loads.
> > 
> > I am with the group that actually cares more about the TLB benefit.
> > For HPC loads there is really only one application per machine.  And with
> > just one page table, the only real advantage is the more efficient use
> > of the TLB.  
> 
> The reason we don't see it much is that we mostly have P3's which only
> have 4 entries for large pages. P4's would be much easier to demonstrate
> such things on, and I don't think we've really tried very hard on that with
> hugetlbfs (earlier Java work by the research group showed impressive
> improvements on an earlier implementation).

Cool.  I have no doubt the benefit is there.    Measuring how large it
is will certainly be interesting.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-04 12:40                   ` Eric W. Biederman
@ 2003-02-04 15:55                     ` Martin J. Bligh
  2003-02-05 12:18                       ` Eric W. Biederman
  2003-02-04 21:12                     ` Andrew Morton
  1 sibling, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2003-02-04 15:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: William Lee Irwin III, Andrew Morton, davem, rohit.seth, davidm,
	anton, linux-mm

>> > O.k.  Then the code definitely needs to handle shared mappings..
>> 
>> Why? we just divided the pagetable size by a factor of 1000, so
>> the problem is no longer really there ;-)
> 
> William said one of the cases was to handle massively shared
> mappings.  You cannot create a massively shared mapping except by
> sharing.
> 
> Did I misunderstand what was meant by a massively shared mapping?
> 
> I can't imagine it being useful to guys like oracle without MAP_SHARED
> support....

Create a huge shmem segment. and don't share the pagetables. Without large
pages, it's an enormous waste of space in mindless duplication. With large
pages, it's a much smaller waste of space (no PTEs) in mindless
duplication. 
Still not optimal, but makes the problem manageable.

> Cool.  I have no doubt the benefit is there.    Measuring how large it
> is will certainly be interesting.

See the IBM research group's paper on large page support from last years
OLS. Pretty impressive stuff.

M.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-04 15:55                     ` Martin J. Bligh
@ 2003-02-05 12:18                       ` Eric W. Biederman
  0 siblings, 0 replies; 48+ messages in thread
From: Eric W. Biederman @ 2003-02-05 12:18 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: William Lee Irwin III, Andrew Morton, davem, rohit.seth, davidm,
	anton, linux-mm

"Martin J. Bligh" <mbligh@aracnet.com> writes:

> > Did I misunderstand what was meant by a massively shared mapping?
> > 
> > I can't imagine it being useful to guys like oracle without MAP_SHARED
> > support....
> 
> Create a huge shmem segment. and don't share the pagetables. Without large
> pages, it's an enormous waste of space in mindless duplication. With large
> pages, it's a much smaller waste of space (no PTEs) in mindless
> duplication. 
> Still not optimal, but makes the problem manageable.

And this is exactly the mmap(MAP_SHARED) case.  Where a single memory
segment is shared between multiple mm's. 

Eric


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-04 12:40                   ` Eric W. Biederman
  2003-02-04 15:55                     ` Martin J. Bligh
@ 2003-02-04 21:12                     ` Andrew Morton
  2003-02-05 12:25                       ` Eric W. Biederman
  1 sibling, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-02-04 21:12 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: mbligh, wli, davem, rohit.seth, davidm, anton, linux-mm

ebiederm@xmission.com (Eric W. Biederman) wrote:
>
> I can't imagine it being useful to guys like oracle without MAP_SHARED
> support....

MAP_SHARED is supported.  I haven't tested it much though.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-04 21:12                     ` Andrew Morton
@ 2003-02-05 12:25                       ` Eric W. Biederman
  2003-02-05 19:57                         ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Eric W. Biederman @ 2003-02-05 12:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mbligh, wli, davem, rohit.seth, davidm, anton, linux-mm

Andrew Morton <akpm@digeo.com> writes:

> ebiederm@xmission.com (Eric W. Biederman) wrote:
> >
> > I can't imagine it being useful to guys like oracle without MAP_SHARED
> > support....
> 
> MAP_SHARED is supported.  I haven't tested it much though.

Given that none of the standard kernel idioms to prevent races in
this kind of code are present, I would be very surprised if it
was not racy.

- inode->i_sem is not taken to protect inode->i_size.
- After successfully allocating a page, a test is not made to see if
  another process with the same mapping has allocated the page first.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-05 12:25                       ` Eric W. Biederman
@ 2003-02-05 19:57                         ` Andrew Morton
  2003-02-05 20:00                           ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-02-05 19:57 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: mbligh, wli, davem, rohit.seth, davidm, anton, linux-mm

ebiederm@xmission.com (Eric W. Biederman) wrote:
>
> Andrew Morton <akpm@digeo.com> writes:
> 
> > ebiederm@xmission.com (Eric W. Biederman) wrote:
> > >
> > > I can't imagine it being useful to guys like oracle without MAP_SHARED
> > > support....
> > 
> > MAP_SHARED is supported.  I haven't tested it much though.
> 
> Given that none of the standard kernel idioms to prevent races in
> this kind of code are present, I would be very surprised if it
> was not racy.
> 
> - inode->i_sem is not taken to protect inode->i_size.

OK, I'll fix that up.

> - After successfully allocating a page, a test is not made to see if
>   another process with the same mapping has allocated the page first.

In this case, add_to_page_cache() in hugetlb_prefault() will return -EEXIST,
and the page which lost the race will be freed again.

Uh, but we don't establish a pte against the page which got there first. 
I'll fix that up too.  Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-05 19:57                         ` Andrew Morton
@ 2003-02-05 20:00                           ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-05 20:00 UTC (permalink / raw)
  To: ebiederm, mbligh, wli, davem, rohit.seth, davidm, anton, linux-mm

Andrew Morton <akpm@digeo.com> wrote:
>
> > - inode->i_sem is not taken to protect inode->i_size.
> 
> OK, I'll fix that up.
> 
> > - After successfully allocating a page, a test is not made to see if
> >   another process with the same mapping has allocated the page first.
> 
> In this case, add_to_page_cache() in hugetlb_prefault() will return -EEXIST,
> and the page which lost the race will be freed again.
> 
> Uh, but we don't establish a pte against the page which got there first. 
> I'll fix that up too.  Thanks.

No, everything is OK isn't it?  The entire operation (i_size update and
allocate/add_to_page_cache()) is serialised under i_sem.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (6 preceding siblings ...)
  2003-02-02 10:55 ` Andrew Morton
@ 2003-02-02 10:55 ` Andrew Morton
  2003-02-02 10:56 ` Andrew Morton
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:55 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

7/4

hugetlbfs i_size fixes


We're expanding hugetlbfs i_size in the wrong place.  If someone attempts to
mmap more pages than are available, i_size is updated to reflect the
attempted mapping size.

So set i_size only when pages are successfully added to the mapping.

i_size handling at truncate time is still a bit wrong - if the mapping has
pages at (say) page offset 100-200 and the mappng is truncated to (say_ page
offset 50, i_size should be set to zero.  But it is instead set to
50*HPAGE_SIZE.  That's harmless.


 i386/mm/hugetlbpage.c    |    5 +++++
 ia64/mm/hugetlbpage.c    |    0 
 sparc64/mm/hugetlbpage.c |    0 
 x86_64/mm/hugetlbpage.c  |    6 ++++++
 hugetlbfs/inode.c        |    5 -----
 5 files changed, 11 insertions(+), 5 deletions(-)

diff -puN fs/hugetlbfs/inode.c~hugetlbfs-i_size-fix fs/hugetlbfs/inode.c
--- 25/fs/hugetlbfs/inode.c~hugetlbfs-i_size-fix	2003-02-01 02:07:22.000000000 -0800
+++ 25-akpm/fs/hugetlbfs/inode.c	2003-02-01 02:07:22.000000000 -0800
@@ -45,7 +45,6 @@ static int hugetlbfs_file_mmap(struct fi
 {
 	struct inode *inode =file->f_dentry->d_inode;
 	struct address_space *mapping = inode->i_mapping;
-	size_t len;
 	int ret;
 
 	if (!capable(CAP_IPC_LOCK))
@@ -66,10 +65,6 @@ static int hugetlbfs_file_mmap(struct fi
 	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
 	vma->vm_ops = &hugetlb_vm_ops;
 	ret = hugetlb_prefault(mapping, vma);
-	len = (vma->vm_end - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT);
-	if (inode->i_size < len)
-		inode->i_size = len;
-
 	up(&inode->i_sem);
 	return ret;
 }
diff -puN arch/i386/mm/hugetlbpage.c~hugetlbfs-i_size-fix arch/i386/mm/hugetlbpage.c
--- 25/arch/i386/mm/hugetlbpage.c~hugetlbfs-i_size-fix	2003-02-01 02:07:22.000000000 -0800
+++ 25-akpm/arch/i386/mm/hugetlbpage.c	2003-02-01 02:07:22.000000000 -0800
@@ -284,6 +284,7 @@ void zap_hugepage_range(struct vm_area_s
 int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
 {
 	struct mm_struct *mm = current->mm;
+	struct inode *inode = mapping->host;
 	unsigned long addr;
 	int ret = 0;
 
@@ -307,6 +308,7 @@ int hugetlb_prefault(struct address_spac
 			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 		page = find_get_page(mapping, idx);
 		if (!page) {
+			loff_t i_size;
 			page = alloc_hugetlb_page();
 			if (!page) {
 				ret = -ENOMEM;
@@ -318,6 +320,9 @@ int hugetlb_prefault(struct address_spac
 				free_huge_page(page);
 				goto out;
 			}
+			i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
+			if (i_size > inode->i_size)
+				inode->i_size = i_size;
 		}
 		set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
 	}
diff -puN arch/ia64/mm/hugetlbpage.c~hugetlbfs-i_size-fix arch/ia64/mm/hugetlbpage.c
diff -puN arch/sparc64/mm/hugetlbpage.c~hugetlbfs-i_size-fix arch/sparc64/mm/hugetlbpage.c
diff -puN arch/x86_64/mm/hugetlbpage.c~hugetlbfs-i_size-fix arch/x86_64/mm/hugetlbpage.c
--- 25/arch/x86_64/mm/hugetlbpage.c~hugetlbfs-i_size-fix	2003-02-01 02:07:22.000000000 -0800
+++ 25-akpm/arch/x86_64/mm/hugetlbpage.c	2003-02-01 02:07:22.000000000 -0800
@@ -205,6 +205,7 @@ void zap_hugepage_range(struct vm_area_s
 int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
 {
 	struct mm_struct *mm = current->mm;
+	struct inode = mapping->host;
 	unsigned long addr;
 	int ret = 0;
 
@@ -228,6 +229,8 @@ int hugetlb_prefault(struct address_spac
 			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 		page = find_get_page(mapping, idx);
 		if (!page) {
+			loff_t i_size;
+
 			page = alloc_hugetlb_page();
 			if (!page) {
 				ret = -ENOMEM;
@@ -239,6 +242,9 @@ int hugetlb_prefault(struct address_spac
 				free_huge_page(page);
 				goto out;
 			}
+			i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
+			if (i_size > inode->i_size)
+				inode->i_size = i_size;
 		}
 		set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
 	}

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (7 preceding siblings ...)
  2003-02-02 10:55 ` Andrew Morton
@ 2003-02-02 10:56 ` Andrew Morton
  2003-02-02 20:06   ` William Lee Irwin III
  2003-02-02 10:56 ` Andrew Morton
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:56 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

8/4

hugetlbfs cleanups

- Remove quota code.

- Remove extraneous copy-n-paste code from truncate: that's only for
  physically-backed filesystems.

- Whitespace changes.


 hugetlbfs/inode.c |   91 ++++++++----------------------------------------------
 1 files changed, 15 insertions(+), 76 deletions(-)

diff -puN fs/hugetlbfs/inode.c~hugetlbfs-cleanup fs/hugetlbfs/inode.c
--- 25/fs/hugetlbfs/inode.c~hugetlbfs-cleanup	2003-02-02 01:17:07.000000000 -0800
+++ 25-akpm/fs/hugetlbfs/inode.c	2003-02-02 01:17:07.000000000 -0800
@@ -120,12 +120,16 @@ static int hugetlbfs_readpage(struct fil
 	return -EINVAL;
 }
 
-static int hugetlbfs_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
+static int
+hugetlbfs_prepare_write(struct file *file, struct page *page,
+			unsigned offset, unsigned to)
 {
 	return -EINVAL;
 }
 
-static int hugetlbfs_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to)
+static int
+hugetlbfs_commit_write(struct file *file, struct page *page,
+			unsigned offset, unsigned to)
 {
 	return -EINVAL;
 }
@@ -140,28 +144,8 @@ void huge_pagevec_release(struct pagevec
 	pagevec_reinit(pvec);
 }
 
-void truncate_partial_hugepage(struct page *page, unsigned partial)
-{
-	int i;
-	const unsigned piece = partial & (PAGE_SIZE - 1);
-	const unsigned tailstart = PAGE_SIZE - piece;
-	const unsigned whole_pages = partial / PAGE_SIZE;
-	const unsigned last_page_offset = HPAGE_SIZE/PAGE_SIZE - whole_pages;
-
-	for (i = HPAGE_SIZE/PAGE_SIZE - 1; i >= last_page_offset; ++i)
-		memclear_highpage_flush(&page[i], 0, PAGE_SIZE);
-
-	if (!piece)
-		return;
-
-	memclear_highpage_flush(&page[last_page_offset - 1], tailstart, piece);
-}
-
-void truncate_huge_page(struct address_space *mapping, struct page *page)
+void truncate_huge_page(struct page *page)
 {
-	if (page->mapping != mapping)
-		return;
-
 	clear_page_dirty(page);
 	ClearPageUptodate(page);
 	remove_from_page_cache(page);
@@ -170,52 +154,13 @@ void truncate_huge_page(struct address_s
 
 void truncate_hugepages(struct address_space *mapping, loff_t lstart)
 {
-	const pgoff_t start = (lstart + HPAGE_SIZE - 1) >> HPAGE_SHIFT;
-	const unsigned partial = lstart & (HPAGE_SIZE - 1);
+	const pgoff_t start = lstart >> HPAGE_SHIFT;
 	struct pagevec pvec;
 	pgoff_t next;
 	int i;
 
 	pagevec_init(&pvec, 0);
 	next = start;
-
-	while (pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
-		for (i = 0; i < pagevec_count(&pvec); ++i) {
-			struct page *page = pvec.pages[i];
-			pgoff_t page_index = page->index;
-
-			if (page_index > next)
-				next = page_index;
-
-			++next;
-
-			if (TestSetPageLocked(page))
-				continue;
-
-			if (PageWriteback(page)) {
-				unlock_page(page);
-				continue;
-			}
-
-			truncate_huge_page(mapping, page);
-			unlock_page(page);
-		}
-		huge_pagevec_release(&pvec);
-		cond_resched();
-	}
-
-	if (partial) {
-		struct page *page = find_lock_page(mapping, start - 1);
-		if (page) {
-			wait_on_page_writeback(page);
-			truncate_partial_hugepage(page, partial);
-			unlock_page(page);
-			huge_page_release(page);
-		}
-	}
-
-	next = start;
-
 	while (1) {
 		if (!pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
 			if (next == start)
@@ -228,11 +173,10 @@ void truncate_hugepages(struct address_s
 			struct page *page = pvec.pages[i];
 
 			lock_page(page);
-			wait_on_page_writeback(page);
 			if (page->index > next)
 				next = page->index;
 			++next;
-			truncate_huge_page(mapping, page);
+			truncate_huge_page(page);
 			unlock_page(page);
 		}
 		huge_pagevec_release(&pvec);
@@ -363,13 +307,6 @@ static int hugetlbfs_setattr(struct dent
 	error = security_inode_setattr(dentry, attr);
 	if (error)
 		goto out;
-
-	if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
-	    (ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid))
-		error = DQUOT_TRANSFER(inode, attr) ? -EDQUOT : 0;
-	if (error)
-		goto out;
-
 	if (ia_valid & ATTR_SIZE) {
 		error = -EINVAL;
 		if (!(attr->ia_size & ~HPAGE_MASK))
@@ -401,7 +338,7 @@ hugetlbfs_get_inode(struct super_block *
 		inode->i_blocks = 0;
 		inode->i_rdev = NODEV;
 		inode->i_mapping->a_ops = &hugetlbfs_aops;
-		inode->i_mapping->backing_dev_info = &hugetlbfs_backing_dev_info;
+		inode->i_mapping->backing_dev_info =&hugetlbfs_backing_dev_info;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 		switch (mode & S_IFMT) {
 		default:
@@ -444,7 +381,7 @@ hugetlbfs_mknod(struct inode *dir, struc
 	return error;
 }
 
-static int hugetlbfs_mkdir(struct inode * dir, struct dentry * dentry, int mode)
+static int hugetlbfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 {
 	int retval = hugetlbfs_mknod(dir, dentry, mode | S_IFDIR, 0);
 	if (!retval)
@@ -457,7 +394,8 @@ static int hugetlbfs_create(struct inode
 	return hugetlbfs_mknod(dir, dentry, mode | S_IFREG, 0);
 }
 
-static int hugetlbfs_symlink(struct inode * dir, struct dentry *dentry, const char * symname)
+static int
+hugetlbfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
 {
 	struct inode *inode;
 	int error = -ENOSPC;
@@ -518,7 +456,8 @@ static struct super_operations hugetlbfs
 	.drop_inode	= hugetlbfs_drop_inode,
 };
 
-static int hugetlbfs_fill_super(struct super_block * sb, void * data, int silent)
+static int
+hugetlbfs_fill_super(struct super_block * sb, void * data, int silent)
 {
 	struct inode * inode;
 	struct dentry * root;

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-02 10:56 ` Andrew Morton
@ 2003-02-02 20:06   ` William Lee Irwin III
  0 siblings, 0 replies; 48+ messages in thread
From: William Lee Irwin III @ 2003-02-02 20:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davem, rohit.seth, davidm, anton, linux-mm

On Sun, Feb 02, 2003 at 02:56:09AM -0800, Andrew Morton wrote:
> hugetlbfs cleanups
> - Remove quota code.
> - Remove extraneous copy-n-paste code from truncate: that's only for
>   physically-backed filesystems.
> - Whitespace changes.

quotas wold allow per-user limits on the memory consumed with the stuff.
I guess since I've not pursued it / tested it / etc. out it goes...


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (8 preceding siblings ...)
  2003-02-02 10:56 ` Andrew Morton
@ 2003-02-02 10:56 ` Andrew Morton
  2003-02-02 10:56 ` Andrew Morton
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:56 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

9/4

Give all architectures a hugetlb_nopage().


If someone maps a hugetlbfs file, then truncates it, then references the part
of the mapping outside the truncation point, they take a pagefault and we end
up hitting hugetlb_nopage().

We want to prevent this from ever happening.  This patch just makes sure that
all architectures have a goes-BUG hugetlb_nopage() to trap it.



 i386/mm/hugetlbpage.c    |   10 ++++++++--
 ia64/mm/hugetlbpage.c    |   11 +++++++++--
 sparc64/mm/hugetlbpage.c |    8 ++++++++
 x86_64/mm/hugetlbpage.c  |    4 ++--
 4 files changed, 27 insertions(+), 6 deletions(-)

diff -puN arch/i386/mm/hugetlbpage.c~hugetlbfs-nopage-cleanup arch/i386/mm/hugetlbpage.c
--- 25/arch/i386/mm/hugetlbpage.c~hugetlbfs-nopage-cleanup	2003-02-01 22:35:51.000000000 -0800
+++ 25-akpm/arch/i386/mm/hugetlbpage.c	2003-02-01 22:37:04.000000000 -0800
@@ -26,7 +26,6 @@ static long    htlbpagemem;
 int     htlbpage_max;
 static long    htlbzone_pages;
 
-struct vm_operations_struct hugetlb_vm_ops;
 static LIST_HEAD(htlbpage_freelist);
 static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
 
@@ -472,7 +471,14 @@ int is_hugepage_mem_enough(size_t size)
 	return 1;
 }
 
-static struct page *hugetlb_nopage(struct vm_area_struct * area, unsigned long address, int unused)
+/*
+ * We cannot handle pagefaults against hugetlb pages at all.  They cause
+ * handle_mm_fault() to try to instantiate regular-sized pages in the
+ * hugegpage VMA.  do_page_fault() is supposed to trap this, so BUG is we get
+ * this far.
+ */
+static struct page *
+hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
 {
 	BUG();
 	return NULL;
diff -puN arch/ia64/mm/hugetlbpage.c~hugetlbfs-nopage-cleanup arch/ia64/mm/hugetlbpage.c
--- 25/arch/ia64/mm/hugetlbpage.c~hugetlbfs-nopage-cleanup	2003-02-01 22:35:51.000000000 -0800
+++ 25-akpm/arch/ia64/mm/hugetlbpage.c	2003-02-01 22:37:08.000000000 -0800
@@ -18,7 +18,6 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 
-static struct vm_operations_struct hugetlb_vm_ops;
 struct list_head htlbpage_freelist;
 spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
 extern long htlbpagemem;
@@ -333,6 +332,14 @@ set_hugetlb_mem_size (int count)
 	return (int) htlbzone_pages;
 }
 
+static struct page *
+hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
+{
+	BUG();
+	return NULL;
+}
+
 static struct vm_operations_struct hugetlb_vm_ops = {
-	.close =	zap_hugetlb_resources
+	.nopage =	hugetlb_nopage,
+	.close =	zap_hugetlb_resources,
 };
diff -puN arch/sparc64/mm/hugetlbpage.c~hugetlbfs-nopage-cleanup arch/sparc64/mm/hugetlbpage.c
--- 25/arch/sparc64/mm/hugetlbpage.c~hugetlbfs-nopage-cleanup	2003-02-01 22:35:51.000000000 -0800
+++ 25-akpm/arch/sparc64/mm/hugetlbpage.c	2003-02-01 22:37:13.000000000 -0800
@@ -619,6 +619,14 @@ int set_hugetlb_mem_size(int count)
 	return (int) htlbzone_pages;
 }
 
+static struct page *
+hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
+{
+	BUG();
+	return NULL;
+}
+
 static struct vm_operations_struct hugetlb_vm_ops = {
+	.nopage = hugetlb_nopage,
 	.close	= zap_hugetlb_resources,
 };
diff -puN arch/x86_64/mm/hugetlbpage.c~hugetlbfs-nopage-cleanup arch/x86_64/mm/hugetlbpage.c
--- 25/arch/x86_64/mm/hugetlbpage.c~hugetlbfs-nopage-cleanup	2003-02-01 22:35:51.000000000 -0800
+++ 25-akpm/arch/x86_64/mm/hugetlbpage.c	2003-02-01 22:37:19.000000000 -0800
@@ -25,7 +25,6 @@ static long    htlbpagemem;
 int     htlbpage_max;
 static long    htlbzone_pages;
 
-struct vm_operations_struct hugetlb_vm_ops;
 static LIST_HEAD(htlbpage_freelist);
 static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
 
@@ -349,7 +348,8 @@ int hugetlb_report_meminfo(char *buf)
 			HPAGE_SIZE/1024);
 }
 
-static struct page * hugetlb_nopage(struct vm_area_struct * area, unsigned long address, int unused)
+static struct page *
+hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
 {
 	BUG();
 	return NULL;

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (9 preceding siblings ...)
  2003-02-02 10:56 ` Andrew Morton
@ 2003-02-02 10:56 ` Andrew Morton
  2003-02-02 10:57 ` Andrew Morton
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:56 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

10/4

Fix hugetlbfs faults


If the underlying mapping was truncated and someone references the
now-unmapped memory the kernel will enter handle_mm_fault() and will start
instantiating PAGE_SIZE pte's inside the hugepage VMA.  Everything goes
generally pear-shaped.

So trap this in handle_mm_fault().  It adds no overhead to non-hugepage
builds.

Another possible fix would be to not unmap the huge pages at all in truncate
- just anonymise them.

But I think we want full ftruncate semantics for hugepages for management
purposes.


 i386/mm/fault.c |    0 
 memory.c        |    4 ++++
 2 files changed, 4 insertions(+)

diff -puN arch/i386/mm/fault.c~hugetlbfs-fault-fix arch/i386/mm/fault.c
diff -puN mm/memory.c~hugetlbfs-fault-fix mm/memory.c
--- 25/mm/memory.c~hugetlbfs-fault-fix	2003-02-01 22:46:48.000000000 -0800
+++ 25-akpm/mm/memory.c	2003-02-01 22:46:48.000000000 -0800
@@ -1447,6 +1447,10 @@ int handle_mm_fault(struct mm_struct *mm
 	pgd = pgd_offset(mm, address);
 
 	inc_page_state(pgfault);
+
+	if (is_vm_hugetlb_page(vma))
+		return VM_FAULT_SIGBUS;	/* mapping truncation does this. */
+
 	/*
 	 * We need the page table lock to synchronize with kswapd
 	 * and the SMP-safe atomic PTE updates.

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (10 preceding siblings ...)
  2003-02-02 10:56 ` Andrew Morton
@ 2003-02-02 10:57 ` Andrew Morton
  2003-02-02 10:57 ` Andrew Morton
  2003-02-02 10:57 ` Andrew Morton
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:57 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

11/4

ia32 hugetlb cleanup

- whitespace

- remove unneeded spinlocking no-op.




 i386/mm/hugetlbpage.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff -puN arch/i386/mm/hugetlbpage.c~hugetlbpage-cleanup arch/i386/mm/hugetlbpage.c
--- 25/arch/i386/mm/hugetlbpage.c~hugetlbpage-cleanup	2003-02-01 22:06:04.000000000 -0800
+++ 25-akpm/arch/i386/mm/hugetlbpage.c	2003-02-01 22:06:25.000000000 -0800
@@ -248,7 +248,9 @@ void huge_page_release(struct page *page
 	free_huge_page(page);
 }
 
-void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long end)
+void
+unmap_hugepage_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -258,8 +260,6 @@ void unmap_hugepage_range(struct vm_area
 	BUG_ON(start & (HPAGE_SIZE - 1));
 	BUG_ON(end & (HPAGE_SIZE - 1));
 
-	spin_lock(&htlbpage_lock);
-	spin_unlock(&htlbpage_lock);
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		pte = huge_pte_offset(mm, address);
 		if (pte_none(*pte))
@@ -272,7 +272,9 @@ void unmap_hugepage_range(struct vm_area
 	flush_tlb_range(vma, start, end);
 }
 
-void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long length)
+void
+zap_hugepage_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long length)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	spin_lock(&mm->page_table_lock);

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (11 preceding siblings ...)
  2003-02-02 10:57 ` Andrew Morton
@ 2003-02-02 10:57 ` Andrew Morton
  2003-02-02 20:17   ` William Lee Irwin III
  2003-02-02 10:57 ` Andrew Morton
  13 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:57 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

12/4

Fix hugetlb_vmtruncate_list()


This function is quite wrong - has an "=" where it should have an "-" and
confuses PAGE_SIZE and HPAGE_SIZE in its address and file offset arithmetic.




 hugetlbfs/inode.c |   46 ++++++++++++++++++++++++++++++++--------------
 1 files changed, 32 insertions(+), 14 deletions(-)

diff -puN fs/hugetlbfs/inode.c~hugetlb_vmtruncate-fixes fs/hugetlbfs/inode.c
--- 25/fs/hugetlbfs/inode.c~hugetlb_vmtruncate-fixes	2003-02-02 01:17:12.000000000 -0800
+++ 25-akpm/fs/hugetlbfs/inode.c	2003-02-02 02:53:49.000000000 -0800
@@ -240,29 +240,47 @@ static void hugetlbfs_drop_inode(struct 
 		hugetlbfs_forget_inode(inode);
 }
 
-static void hugetlb_vmtruncate_list(struct list_head *list, unsigned long pgoff)
+/*
+ * h_pgoff is in HPAGE_SIZE units.
+ * vma->vm_pgoff is in PAGE_SIZE units.
+ */
+static void
+hugetlb_vmtruncate_list(struct list_head *list, unsigned long h_pgoff)
 {
-	unsigned long start, end, length, delta;
 	struct vm_area_struct *vma;
 
 	list_for_each_entry(vma, list, shared) {
-		start = vma->vm_start;
-		end = vma->vm_end;
-		length = end - start;
-
-		if (vma->vm_pgoff >= pgoff) {
-			zap_hugepage_range(vma, start, length);
+		unsigned long h_vm_pgoff;
+		unsigned long v_length;
+		unsigned long h_length;
+		unsigned long v_offset;
+
+		h_vm_pgoff = vma->vm_pgoff << (HPAGE_SHIFT - PAGE_SHIFT);
+		v_length = vma->vm_end - vma->vm_start;
+		h_length = v_length >> HPAGE_SHIFT;
+		v_offset = (h_pgoff - h_vm_pgoff) << HPAGE_SHIFT;
+
+		/*
+		 * Is this VMA fully outside the truncation point?
+		 */
+		if (h_vm_pgoff >= h_pgoff) {
+			zap_hugepage_range(vma, vma->vm_start, v_length);
 			continue;
 		}
 
-		length >>= PAGE_SHIFT;
-		delta = pgoff = vma->vm_pgoff;
-		if (delta >= length)
+		/*
+		 * Is this VMA fully inside the truncaton point?
+		 */
+		if (h_vm_pgoff + (v_length >> HPAGE_SHIFT) <= h_pgoff)
 			continue;
 
-		start += delta << PAGE_SHIFT;
-		length = (length - delta) << PAGE_SHIFT;
-		zap_hugepage_range(vma, start, length);
+		/*
+		 * The VMA straddles the truncation point.  v_offset is the
+		 * offset (in bytes) into the VMA where the point lies.
+		 */
+		zap_hugepage_range(vma,
+				vma->vm_start + v_offset,
+				v_length - v_offset);
 	}
 }
 

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-02 10:57 ` Andrew Morton
@ 2003-02-02 20:17   ` William Lee Irwin III
  0 siblings, 0 replies; 48+ messages in thread
From: William Lee Irwin III @ 2003-02-02 20:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davem, rohit.seth, davidm, anton, linux-mm

On Sun, Feb 02, 2003 at 02:57:20AM -0800, Andrew Morton wrote:
> 12/4
> Fix hugetlb_vmtruncate_list()
> This function is quite wrong - has an "=" where it should have an "-" and
> confuses PAGE_SIZE and HPAGE_SIZE in its address and file offset arithmetic.

AFAICT the = typo and passing in a pgoff shifted the wrong amount were
the bogons here; maybe there's another one somewhere else.
Heavy-handed but correct.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-01-31 23:15 hugepage patches Andrew Morton
                   ` (12 preceding siblings ...)
  2003-02-02 10:57 ` Andrew Morton
@ 2003-02-02 10:57 ` Andrew Morton
  13 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-02 10:57 UTC (permalink / raw)
  To: davem, rohit.seth, davidm, anton, wli, linux-mm

13/4

hugetlb mremap fix

If you attempt tp perform a relocating 4k-aligned mremap and the new address
for the map lands on top of a hugepage VMA, do_mremap() will attempt to
perform a 4k-aligned unmap inside the hugetlb VMA.  The hugetlb layer goes
BUG.

Fix that by trapping the poorly-aligned unmap attempt in do_munmap(). 
do_remap() will then fall through without having done anything to the place
where it tests for a hugetlb VMA.

It would be neater to perform these checks on entry to do_mremap(), but that
would incur another VMA lookup.

Also, if you attempt to perform a 4k-aligned and/or sized munmap() inside a
hugepage VMA the same BUG happens.  This patch fixes that too.

 mmap.c |    5 +++++
 1 files changed, 5 insertions(+)

diff -puN mm/mmap.c~hugetlb-mremap-fix mm/mmap.c
--- 25/mm/mmap.c~hugetlb-mremap-fix	2003-02-02 02:53:56.000000000 -0800
+++ 25-akpm/mm/mmap.c	2003-02-02 02:53:56.000000000 -0800
@@ -1227,6 +1227,11 @@ int do_munmap(struct mm_struct *mm, unsi
 		return 0;
 	/* we have  start < mpnt->vm_end  */

+	if (is_vm_hugetlb_page(mpnt)) {
+		if ((start & ~HPAGE_MASK) || (len & ~HPAGE_MASK))
+			return -EINVAL;
+	}
+
 	/* if it doesn't overlap, we have nothing.. */
 	end = start + len;
 	if (mpnt->vm_start >= end)

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: hugepage patches
@ 2003-02-07 21:49 Seth, Rohit
  2003-02-07 22:00 ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Seth, Rohit @ 2003-02-07 21:49 UTC (permalink / raw)
  To: Andrew Morton, davem, Seth, Rohit, davidm, anton, wli, linux-mm

Andrew,

New allocation of hugepages is an atomic operation.  Partial allocations
of hugepages is not a possibility.  Agree with the initial bug report
though that i_size could be getting mistakenly updated.  Please apply
this patch instead.

thanks,
rohit

--- fs/hugetlbfs/inode.c.7      Fri Feb  7 13:47:39 2003
+++ fs/hugetlbfs/inode.c        Fri Feb  7 13:48:17 2003
@@ -66,7 +66,7 @@
        vma->vm_ops = &hugetlb_vm_ops;
        ret = hugetlb_prefault(mapping, vma);
        len = (vma->vm_end - vma->vm_start) + (vma->vm_pgoff <<
PAGE_SHIFT);
-       if (inode->i_size < len)
+       if (!ret && (inode->i_size < len))
                inode->i_size = len;
 
        up(&inode->i_sem);

> -----Original Message-----
> From: Andrew Morton [mailto:akpm@digeo.com] 
> Sent: Sunday, February 02, 2003 2:56 AM
> To: davem@redhat.com; rohit.seth@intel.com; 
> davidm@napali.hpl.hp.com; anton@samba.org; 
> wli@holomorphy.com; linux-mm@kvack.org
> Subject: Re: hugepage patches
> 
> 
> 7/4
> 
> hugetlbfs i_size fixes
> 
> 
> We're expanding hugetlbfs i_size in the wrong place.  If 
> someone attempts to mmap more pages than are available, 
> i_size is updated to reflect the attempted mapping size.
> 
> So set i_size only when pages are successfully added to the mapping.
> 
> i_size handling at truncate time is still a bit wrong - if 
> the mapping has pages at (say) page offset 100-200 and the 
> mappng is truncated to (say_ page offset 50, i_size should be 
> set to zero.  But it is instead set to 50*HPAGE_SIZE.  That's 
> harmless.
> 
> 
>  i386/mm/hugetlbpage.c    |    5 +++++
>  ia64/mm/hugetlbpage.c    |    0 
>  sparc64/mm/hugetlbpage.c |    0 
>  x86_64/mm/hugetlbpage.c  |    6 ++++++
>  hugetlbfs/inode.c        |    5 -----
>  5 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff -puN fs/hugetlbfs/inode.c~hugetlbfs-i_size-fix 
> fs/hugetlbfs/inode.c
> --- 25/fs/hugetlbfs/inode.c~hugetlbfs-i_size-fix	
> 2003-02-01 02:07:22.000000000 -0800
> +++ 25-akpm/fs/hugetlbfs/inode.c	2003-02-01 
> 02:07:22.000000000 -0800
> @@ -45,7 +45,6 @@ static int hugetlbfs_file_mmap(struct fi
>  {
>  	struct inode *inode =file->f_dentry->d_inode;
>  	struct address_space *mapping = inode->i_mapping;
> -	size_t len;
>  	int ret;
>  
>  	if (!capable(CAP_IPC_LOCK))
> @@ -66,10 +65,6 @@ static int hugetlbfs_file_mmap(struct fi
>  	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
>  	vma->vm_ops = &hugetlb_vm_ops;
>  	ret = hugetlb_prefault(mapping, vma);
> -	len = (vma->vm_end - vma->vm_start) + (vma->vm_pgoff << 
> PAGE_SHIFT);
> -	if (inode->i_size < len)
> -		inode->i_size = len;
> -
>  	up(&inode->i_sem);
>  	return ret;
>  }
> diff -puN arch/i386/mm/hugetlbpage.c~hugetlbfs-i_size-fix 
> arch/i386/mm/hugetlbpage.c
> --- 25/arch/i386/mm/hugetlbpage.c~hugetlbfs-i_size-fix	
> 2003-02-01 02:07:22.000000000 -0800
> +++ 25-akpm/arch/i386/mm/hugetlbpage.c	2003-02-01 
> 02:07:22.000000000 -0800
> @@ -284,6 +284,7 @@ void zap_hugepage_range(struct vm_area_s  
> int hugetlb_prefault(struct address_space *mapping, struct 
> vm_area_struct *vma)  {
>  	struct mm_struct *mm = current->mm;
> +	struct inode *inode = mapping->host;
>  	unsigned long addr;
>  	int ret = 0;
>  
> @@ -307,6 +308,7 @@ int hugetlb_prefault(struct address_spac
>  			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
>  		page = find_get_page(mapping, idx);
>  		if (!page) {
> +			loff_t i_size;
>  			page = alloc_hugetlb_page();
>  			if (!page) {
>  				ret = -ENOMEM;
> @@ -318,6 +320,9 @@ int hugetlb_prefault(struct address_spac
>  				free_huge_page(page);
>  				goto out;
>  			}
> +			i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
> +			if (i_size > inode->i_size)
> +				inode->i_size = i_size;
>  		}
>  		set_huge_pte(mm, vma, page, pte, vma->vm_flags 
> & VM_WRITE);
>  	}
> diff -puN arch/ia64/mm/hugetlbpage.c~hugetlbfs-i_size-fix 
> arch/ia64/mm/hugetlbpage.c diff -puN 
> arch/sparc64/mm/hugetlbpage.c~hugetlbfs-i_size-fix 
> arch/sparc64/mm/hugetlbpage.c diff -puN 
> arch/x86_64/mm/hugetlbpage.c~hugetlbfs-i_size-fix 
> arch/x86_64/mm/hugetlbpage.c
> --- 25/arch/x86_64/mm/hugetlbpage.c~hugetlbfs-i_size-fix	
> 2003-02-01 02:07:22.000000000 -0800
> +++ 25-akpm/arch/x86_64/mm/hugetlbpage.c	2003-02-01 
> 02:07:22.000000000 -0800
> @@ -205,6 +205,7 @@ void zap_hugepage_range(struct vm_area_s  
> int hugetlb_prefault(struct address_space *mapping, struct 
> vm_area_struct *vma)  {
>  	struct mm_struct *mm = current->mm;
> +	struct inode = mapping->host;
>  	unsigned long addr;
>  	int ret = 0;
>  
> @@ -228,6 +229,8 @@ int hugetlb_prefault(struct address_spac
>  			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
>  		page = find_get_page(mapping, idx);
>  		if (!page) {
> +			loff_t i_size;
> +
>  			page = alloc_hugetlb_page();
>  			if (!page) {
>  				ret = -ENOMEM;
> @@ -239,6 +242,9 @@ int hugetlb_prefault(struct address_spac
>  				free_huge_page(page);
>  				goto out;
>  			}
> +			i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
> +			if (i_size > inode->i_size)
> +				inode->i_size = i_size;
>  		}
>  		set_huge_pte(mm, vma, page, pte, vma->vm_flags 
> & VM_WRITE);
>  	}
> 
> _
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-07 21:49 Seth, Rohit
@ 2003-02-07 22:00 ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-07 22:00 UTC (permalink / raw)
  To: Seth, Rohit; +Cc: davem, davidm, anton, wli, linux-mm

"Seth, Rohit" <rohit.seth@intel.com> wrote:
>
> Andrew,
> 
> New allocation of hugepages is an atomic operation.  Partial allocations
> of hugepages is not a possibility.

Yes it is?  If you ask hugetlb_prefault() to fault in four pages, and there
are only two pages available then it will instantiate just the two pages.

And updating i_size at the place where we add the page to pagecache makes
some sense..


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: hugepage patches
@ 2003-02-07 22:02 Seth, Rohit
  2003-02-07 22:24 ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Seth, Rohit @ 2003-02-07 22:02 UTC (permalink / raw)
  To: Andrew Morton, Seth, Rohit; +Cc: davem, davidm, anton, wli, linux-mm

The allocated pages will be zapped on the way back from do_mmap_pgoff
for the failure case.

> -----Original Message-----
> From: Andrew Morton [mailto:akpm@digeo.com] 
> Sent: Friday, February 07, 2003 2:00 PM
> To: Seth, Rohit
> Cc: davem@redhat.com; rohit.seth@intel.com; 
> davidm@napali.hpl.hp.com; anton@samba.org; 
> wli@holomorphy.com; linux-mm@kvack.org
> Subject: Re: hugepage patches
> 
> 
> "Seth, Rohit" <rohit.seth@intel.com> wrote:
> >
> > Andrew,
> > 
> > New allocation of hugepages is an atomic operation.  Partial 
> > allocations of hugepages is not a possibility.
> 
> Yes it is?  If you ask hugetlb_prefault() to fault in four 
> pages, and there are only two pages available then it will 
> instantiate just the two pages.
> 
> And updating i_size at the place where we add the page to 
> pagecache makes some sense..
> 
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-07 22:02 Seth, Rohit
@ 2003-02-07 22:24 ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-07 22:24 UTC (permalink / raw)
  To: Seth, Rohit; +Cc: davem, davidm, anton, wli, linux-mm

"Seth, Rohit" <rohit.seth@intel.com> wrote:
>
> The allocated pages will be zapped on the way back from do_mmap_pgoff
> for the failure case.

Bah.  OK.  Why don't we grow i_size in truncate like a real fs?

Ho hum.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: hugepage patches
@ 2003-02-08  1:47 Seth, Rohit
  2003-02-08  2:02 ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Seth, Rohit @ 2003-02-08  1:47 UTC (permalink / raw)
  To: Andrew Morton, davem, Seth, Rohit, davidm, anton, wli, linux-mm

[-- Attachment #1: Type: text/plain, Size: 4120 bytes --]

Andrew,

Will it be possible to have a macro, something like
is_valid_hugepage_addr, that has the arch. specific definition of
checking the validity (like len > TASK_SIZE etc) of any hugepage addr.
It will make the following code more usable across archs. I know we
could have HAVE_ARCH_HUGETLB_UNMAPPED_AREA to have arch specific thing,
but just thought if a small cahnge in existing function could make this
code widely useable.

In addition, HUGE_PAGE_ALIGNMENT sanity check is also needed in
generic_unmapped_area code for MAP_FIXED cases.

I'm attaching a patch.  For i386, the addr parameter to this function is
not modified.  But other archs like ia64 will do that.

thanks,
rohit




> -----Original Message-----
> From: Andrew Morton [mailto:akpm@digeo.com] 
> Sent: Sunday, February 02, 2003 2:55 AM
> To: davem@redhat.com; rohit.seth@intel.com; 
> davidm@napali.hpl.hp.com; anton@samba.org; 
> wli@holomorphy.com; linux-mm@kvack.org
> Subject: Re: hugepage patches
> 
> 
> 5/4
> 
> get_unmapped_area for hugetlbfs
> 
> Having to specify the mapping address is a pain.  Give 
> hugetlbfs files a file_operations.get_unmapped_area().
> 
> The implementation is in hugetlbfs rather than in arch code 
> because it's probably common to several architectures.  If 
> the architecture has special needs it can define 
> HAVE_ARCH_HUGETLB_UNMAPPED_AREA and go it alone.  Just like 
> HAVE_ARCH_UNMAPPED_AREA.
> 
> 
> 
> Having to specify the mapping address is a pain.  Give 
> hugetlbfs files a file_operations.get_unmapped_area().
> 
> The implementation is in hugetlbfs rather than in arch code 
> because it's probably common to several architectures.  If 
> the architecture has special needs it can define 
> HAVE_ARCH_HUGETLB_UNMAPPED_AREA and go it alone.  Just like 
> HAVE_ARCH_UNMAPPED_AREA.
> 
> 
> 
>  hugetlbfs/inode.c |   46 
> ++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff -puN fs/hugetlbfs/inode.c~hugetlbfs-get_unmapped_area 
> fs/hugetlbfs/inode.c
> --- 25/fs/hugetlbfs/inode.c~hugetlbfs-get_unmapped_area	
> 2003-02-01 01:13:03.000000000 -0800
> +++ 25-akpm/fs/hugetlbfs/inode.c	2003-02-02 
> 01:17:01.000000000 -0800
> @@ -74,6 +74,47 @@ static int hugetlbfs_file_mmap(struct fi
>  }
>  
>  /*
> + * Called under down_write(mmap_sem), page_table_lock is not held */
> +
> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
> +unsigned long hugetlb_get_unmapped_area(struct file *file, 
> unsigned long addr,
> +		unsigned long len, unsigned long pgoff, 
> unsigned long flags); #else
> +static unsigned long
> +hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, 
> unsigned long flags)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +
> +	if (len & ~HPAGE_MASK)
> +		return -EINVAL;
> +	if (len > TASK_SIZE)
> +		return -ENOMEM;
> +
> +	if (addr) {
> +		addr = ALIGN(addr, HPAGE_SIZE);
> +		vma = find_vma(mm, addr);
> +		if (TASK_SIZE - len >= addr &&
> +		    (!vma || addr + len <= vma->vm_start))
> +			return addr;
> +	}
> +
> +	addr = ALIGN(mm->free_area_cache, HPAGE_SIZE);
> +
> +	for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
> +		/* At this point:  (!vma || addr < vma->vm_end). */
> +		if (TASK_SIZE - len < addr)
> +			return -ENOMEM;
> +		if (!vma || addr + len <= vma->vm_start)
> +			return addr;
> +		addr = ALIGN(vma->vm_end, HPAGE_SIZE);
> +	}
> +}
> +#endif
> +
> +/*
>   * Read a page. Again trivial. If it didn't already exist
>   * in the page cache, it is zero-filled.
>   */
> @@ -466,8 +507,9 @@ static struct address_space_operations h
>  };
>  
>  struct file_operations hugetlbfs_file_operations = {
> -	.mmap		= hugetlbfs_file_mmap,
> -	.fsync		= simple_sync_file,
> +	.mmap			= hugetlbfs_file_mmap,
> +	.fsync			= simple_sync_file,
> +	.get_unmapped_area	= hugetlb_get_unmapped_area,
>  };
>  
>  static struct inode_operations hugetlbfs_dir_inode_operations = {
> 
> _




> 

[-- Attachment #2: patch5 --]
[-- Type: application/octet-stream, Size: 2068 bytes --]

--- mm/mmap.c.0	Fri Feb  7 16:34:19 2003
+++ mm/mmap.c	Fri Feb  7 16:40:20 2003
@@ -677,10 +677,13 @@
 unsigned long get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags)
 {
 	if (flags & MAP_FIXED) {
+		unsigned long ret;
 		if (addr > TASK_SIZE - len)
 			return -ENOMEM;
 		if (addr & ~PAGE_MASK)
 			return -EINVAL;
+		if (is_file_hugepages(file) && (ret = is_valid_hugepage_range(&addr, len, 1)))
+			return ret;
 		return addr;
 	}
 
--- fs/hugetlbfs/inode.c.75	Fri Feb  7 14:36:23 2003
+++ fs/hugetlbfs/inode.c	Fri Feb  7 16:30:58 2003
@@ -87,11 +87,10 @@
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
+	unsigned long ret = 0;
 
-	if (len & ~HPAGE_MASK)
-		return -EINVAL;
-	if (len > TASK_SIZE)
-		return -ENOMEM;
+	if (ret = is_valid_hugepage_range(&addr, len, 0))
+		return ret;
 
 	if (addr) {
 		addr = ALIGN(addr, HPAGE_SIZE);
--- arch/i386/mm/hugetlbpage.c.0	Fri Feb  7 16:11:29 2003
+++ arch/i386/mm/hugetlbpage.c	Fri Feb  7 16:43:46 2003
@@ -88,6 +88,20 @@
 	set_pte(page_table, entry);
 }
 
+unsigned long is_valid_hugepage_range(unsigned long *addrp, unsigned long len, int flag)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (flag) {
+		if (*addr & ~HPAGE_MASK)
+			return -EINVAL;
+		return 0;
+	}
+	if (len > TASK_SIZE)
+		return -ENOMEM;
+	return 0;
+}
+
 int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma)
 {
--- include/linux/hugetlb.h.0	Fri Feb  7 16:45:22 2003
+++ include/linux/hugetlb.h	Fri Feb  7 16:50:31 2003
@@ -21,6 +21,7 @@
 void hugetlb_release_key(struct hugetlb_key *);
 int hugetlb_report_meminfo(char *);
 int is_hugepage_mem_enough(size_t);
+unsigned long is_valid_hugepage_range(unsigned long *, unsigned long, int);
 
 extern int htlbpage_max;
 
@@ -38,6 +39,7 @@
 #define huge_page_release(page)			BUG()
 #define is_hugepage_mem_enough(size)		0
 #define hugetlb_report_meminfo(buf)		0
+#define is_valid_hugepage_range(addr, len, flg)	0
 
 #endif /* !CONFIG_HUGETLB_PAGE */
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-08  1:47 Seth, Rohit
@ 2003-02-08  2:02 ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-08  2:02 UTC (permalink / raw)
  To: Seth, Rohit; +Cc: davem, davidm, anton, wli, linux-mm

"Seth, Rohit" <rohit.seth@intel.com> wrote:
>
> Andrew,
> 
> Will it be possible to have a macro, something like
> is_valid_hugepage_addr, that has the arch. specific definition of
> checking the validity (like len > TASK_SIZE etc) of any hugepage addr.
> It will make the following code more usable across archs. I know we
> could have HAVE_ARCH_HUGETLB_UNMAPPED_AREA to have arch specific thing,
> but just thought if a small cahnge in existing function could make this
> code widely useable.
> 
> In addition, HUGE_PAGE_ALIGNMENT sanity check is also needed in
> generic_unmapped_area code for MAP_FIXED cases.

Oh cripes.  Yes, we need to fix that.

> I'm attaching a patch.  For i386, the addr parameter to this function is
> not modified.  But other archs like ia64 will do that.

OK, but it needs some changes.

- is_valid_hugepage_range() will not compile.  `addrp' vs `addr'

- We should not pass in a flag variable which alters a function's behaviour
  in this manner.  Especially when it has the wonderful name "flag", and no
  supporting commentary!

  Please split this into two separate (and documented) functions.

- A name like "is_valid_hugepage_range" implies that this function is
  purely a predicate.  Yet it is capable of altering part of the caller's
  environment.  Can we have a more appropriate name?

- I've been trying to keep ia64/sparc64/x86_64 as uptodate as I can
  throughout this.  I think we can safely copy the ia32 implementation over
  into there as well, can't we?

  If there's any doubt then probably it's best to just leave the symbol
  undefined, let the arch maintainers curse us ;)

Are you working against Linus's current tree?  A lot has changed in there. 
I'd like to hear if hugetlbfs is working correctly in a non-ia32 kernel.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: hugepage patches
@ 2003-02-08  3:05 Seth, Rohit
  2003-02-08  8:48 ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Seth, Rohit @ 2003-02-08  3:05 UTC (permalink / raw)
  To: Andrew Morton, Seth, Rohit; +Cc: davem, davidm, anton, wli, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1723 bytes --]

> OK, but it needs some changes.
> 
> - is_valid_hugepage_range() will not compile.  `addrp' vs `addr'
> 
> - We should not pass in a flag variable which alters a 
> function's behaviour
>   in this manner.  Especially when it has the wonderful name 
> "flag", and no
>   supporting commentary!
> 
>   Please split this into two separate (and documented) functions.


Attached is the updated patch based on your comments.  

> 
> - A name like "is_valid_hugepage_range" implies that this function is
>   purely a predicate.  Yet it is capable of altering part of 
> the caller's
>   environment.  Can we have a more appropriate name?
> 
> - I've been trying to keep ia64/sparc64/x86_64 as uptodate as I can
>   throughout this.  I think we can safely copy the ia32 
> implementation over
>   into there as well, can't we?

For ia64, there is a separate kernel patch that David Mosberger
maintains.  Linus's tree won't work as is on ia64. Not sure about
x86_64/sparc64.

> 
>   If there's any doubt then probably it's best to just leave 
> the symbol
>   undefined, let the arch maintainers curse us ;)

> 
> Are you working against Linus's current tree?  A lot has 
> changed in there. 
> I'd like to hear if hugetlbfs is working correctly in a 
> non-ia32 kernel.

Yeah, I am working on Linus's 2.5.59 tree. Will download your mm9 to get
my tree updated.  Is there any other patch that you want me to apply
before sending you any more updates.

As far as non-ia32 kernel is concerned, hugetlbfs on ia64 should be
working fine. Though I've not yet tried the 2.5.59 on ia64. 2.5.59 ia64
patch that David maintains has the same level of hugetlb support as i386
tree. 


> 

[-- Attachment #2: patch.750 --]
[-- Type: application/octet-stream, Size: 2635 bytes --]

--- mm/mmap.c.750	Fri Feb  7 18:50:27 2003
+++ mm/mmap.c	Fri Feb  7 18:50:55 2003
@@ -682,7 +682,7 @@
 			return -ENOMEM;
 		if (addr & ~PAGE_MASK)
 			return -EINVAL;
-		if (is_file_hugepages(file) && (ret = is_valid_hugepage_range(&addr, len, 1)))
+		if (is_file_hugepages(file) && (ret = is_align_hugepage_range(addr, len)))
 			return ret;
 		return addr;
 	}
--- include/linux/hugetlb.h.750	Fri Feb  7 18:55:37 2003
+++ include/linux/hugetlb.h	Fri Feb  7 18:49:24 2003
@@ -21,7 +21,8 @@
 void hugetlb_release_key(struct hugetlb_key *);
 int hugetlb_report_meminfo(char *);
 int is_hugepage_mem_enough(size_t);
-unsigned long is_valid_hugepage_range(unsigned long *, unsigned long, int);
+unsigned long chk_align_and_fix_addr(unsigned long *, unsigned long);
+unsigned long is_align_hugepage_range(unsigned long, unsigned long);
 
 extern int htlbpage_max;
 
@@ -39,7 +40,7 @@
 #define huge_page_release(page)			BUG()
 #define is_hugepage_mem_enough(size)		0
 #define hugetlb_report_meminfo(buf)		0
-#define is_valid_hugepage_range(addr, len, flg)		0
+#define is_align_hugepage_range(addr, len)	0
 
 #endif /* !CONFIG_HUGETLB_PAGE */
 
--- arch/i386/mm/hugetlbpage.c.750	Fri Feb  7 18:40:54 2003
+++ arch/i386/mm/hugetlbpage.c	Fri Feb  7 18:59:17 2003
@@ -88,17 +88,24 @@
 	set_pte(page_table, entry);
 }
 
-unsigned long is_valid_hugepage_range(unsigned long *addrp, unsigned long len, int flag)
+/* This function checks for proper alignment for len.  It updates the input addrp parameter so that
+ * it points to valid(and aligned) hugepage address range (For i386 it is just proper alignment).
+ */
+unsigned long chk_align_and_fix_addr(unsigned long *addrp, unsigned long len)
 {
 	if (len & ~HPAGE_MASK)
 		return -EINVAL;
-	if (flag) {
-		if (*addr & ~HPAGE_MASK)
-			return -EINVAL;
-		return 0;
-	}
-	if (len > TASK_SIZE)
-		return -ENOMEM;
+	*addrp = ALIGN(*addrp, HPAGE_SIZE);
+	return 0;
+}
+/* This function checks for proper alignement of input addr and len parameters.
+ */
+unsigned long is_align_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
 	return 0;
 }
 
--- fs/hugetlbfs/inode.c.750	Fri Feb  7 18:38:31 2003
+++ fs/hugetlbfs/inode.c	Fri Feb  7 18:46:04 2003
@@ -89,11 +89,10 @@
 	struct vm_area_struct *vma;
 	unsigned long ret = 0;
 
-	if (ret = is_valid_hugepage_range(&addr, len, 0))
+	if (ret = chk_align_and_fix_addr(&addr, len))
 		return ret;
 
 	if (addr) {
-		addr = ALIGN(addr, HPAGE_SIZE);
 		vma = find_vma(mm, addr);
 		if (TASK_SIZE - len >= addr &&
 		    (!vma || addr + len <= vma->vm_start))

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: hugepage patches
  2003-02-08  3:05 Seth, Rohit
@ 2003-02-08  8:48 ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2003-02-08  8:48 UTC (permalink / raw)
  To: Seth, Rohit; +Cc: davem, davidm, anton, wli, linux-mm

"Seth, Rohit" <rohit.seth@intel.com> wrote:
>
> Attached is the updated patch based on your comments.  

Thanks.

The MAP_FIXED alignment fix is clearly needed.

But the chk_align_and_fix_addr() part looks odd.  Bear in mind that MAP_FIXED
requests never make it down into the file_ops.get_unmapped_area() method.

So what is this check/fixup doing in hugetlbfs_get_unmapped_area()?

If we really need some arch-specific check/fixup in there then it will need
to be applied as hugetlbfs_get_unmapped_area() walks through the VMA's.  I
suspect it would be simpler to cut-n-paste the whole function into the arch
code, and work on it there.

My understanding of the ia64 problem is that a certain range of the user's
virtual address space is reserved for hugepages.  Normal size pages may not
be placed there, and huge pages may not be placed elsewhere.

In that case, we still need to put the check in mm/mmap.c for users placing
regular-sized pages inside the hugepage virtual address range with MAP_FIXED.
I thought it was pretty pointless putting that hook into Linus's tree until
the ia64 code which actually implemented the hook was also in his tree.

And the ia64 version of hugetlb_get_unmapped_area() will merely need to
maintain the VMA tree inside address region 4.

So...  I don't see why we need more than the below code, at least until
Linus's ia64 directory is up to date?

> For ia64, there is a separate kernel patch that David Mosberger
> maintains.  Linus's tree won't work as is on ia64. Not sure about
> x86_64/sparc64.

Why isn't David keeping Linus in sync?

> Yeah, I am working on Linus's 2.5.59 tree. Will download your mm9 to get
> my tree updated.  Is there any other patch that you want me to apply
> before sending you any more updates.

I threw -mm9 away.  Signals were very broken in it.  I'll do -mm10 or
2.5.60-mm1 this weekend; please check out the hugepage work in there - there
have been a number of changes.

> As far as non-ia32 kernel is concerned, hugetlbfs on ia64 should be
> working fine. Though I've not yet tried the 2.5.59 on ia64. 2.5.59 ia64
> patch that David maintains has the same level of hugetlb support as i386
> tree. 

OK, thanks.



diff -puN arch/i386/mm/hugetlbpage.c~hugepage-address-validation arch/i386/mm/hugetlbpage.c
--- 25/arch/i386/mm/hugetlbpage.c~hugepage-address-validation	2003-02-08 00:34:42.000000000 -0800
+++ 25-akpm/arch/i386/mm/hugetlbpage.c	2003-02-08 00:34:42.000000000 -0800
@@ -88,6 +88,18 @@ static void set_huge_pte(struct mm_struc
 	set_pte(page_table, entry);
 }
 
+/*
+ * This function checks for proper alignment of input addr and len parameters.
+ */
+int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
+	return 0;
+}
+
 int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma)
 {
diff -puN arch/ia64/mm/hugetlbpage.c~hugepage-address-validation arch/ia64/mm/hugetlbpage.c
--- 25/arch/ia64/mm/hugetlbpage.c~hugepage-address-validation	2003-02-08 00:34:42.000000000 -0800
+++ 25-akpm/arch/ia64/mm/hugetlbpage.c	2003-02-08 00:34:42.000000000 -0800
@@ -96,6 +96,18 @@ set_huge_pte (struct mm_struct *mm, stru
 	return;
 }
 
+/*
+ * This function checks for proper alignment of input addr and len parameters.
+ */
+int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
+	return 0;
+}
+
 int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma)
 {
diff -puN arch/sparc64/mm/hugetlbpage.c~hugepage-address-validation arch/sparc64/mm/hugetlbpage.c
--- 25/arch/sparc64/mm/hugetlbpage.c~hugepage-address-validation	2003-02-08 00:34:42.000000000 -0800
+++ 25-akpm/arch/sparc64/mm/hugetlbpage.c	2003-02-08 00:34:42.000000000 -0800
@@ -232,6 +232,18 @@ out_error:
 	return -1;
 }
 
+/*
+ * This function checks for proper alignment of input addr and len parameters.
+ */
+int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
+	return 0;
+}
+
 int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			    struct vm_area_struct *vma)
 {
diff -puN arch/x86_64/mm/hugetlbpage.c~hugepage-address-validation arch/x86_64/mm/hugetlbpage.c
--- 25/arch/x86_64/mm/hugetlbpage.c~hugepage-address-validation	2003-02-08 00:34:42.000000000 -0800
+++ 25-akpm/arch/x86_64/mm/hugetlbpage.c	2003-02-08 00:34:42.000000000 -0800
@@ -86,6 +86,18 @@ static void set_huge_pte(struct mm_struc
 	set_pte(page_table, entry);
 }
 
+/*
+ * This function checks for proper alignment of input addr and len parameters.
+ */
+int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
+	return 0;
+}
+
 int
 copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma)
diff -puN include/linux/hugetlb.h~hugepage-address-validation include/linux/hugetlb.h
--- 25/include/linux/hugetlb.h~hugepage-address-validation	2003-02-08 00:34:42.000000000 -0800
+++ 25-akpm/include/linux/hugetlb.h	2003-02-08 00:34:42.000000000 -0800
@@ -26,6 +26,7 @@ struct vm_area_struct *hugepage_vma(stru
 					unsigned long address);
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 				pmd_t *pmd, int write);
+int is_aligned_hugepage_range(unsigned long addr, unsigned long len);
 int pmd_huge(pmd_t pmd);
 
 extern int htlbpage_max;
@@ -56,6 +57,7 @@ static inline int is_vm_hugetlb_page(str
 #define hugepage_vma(mm, addr)			0
 #define mark_mm_hugetlb(mm, vma)		do { } while (0)
 #define follow_huge_pmd(mm, addr, pmd, write)	0
+#define is_aligned_hugepage_range(addr, len)	0
 #define pmd_huge(x)	0
 
 #ifndef HPAGE_MASK
diff -puN mm/mmap.c~hugepage-address-validation mm/mmap.c
--- 25/mm/mmap.c~hugepage-address-validation	2003-02-08 00:34:42.000000000 -0800
+++ 25-akpm/mm/mmap.c	2003-02-08 00:34:42.000000000 -0800
@@ -801,6 +801,13 @@ get_unmapped_area(struct file *file, uns
 			return -ENOMEM;
 		if (addr & ~PAGE_MASK)
 			return -EINVAL;
+		if (is_file_hugepages(file)) {
+			unsigned long ret;
+
+			ret = is_aligned_hugepage_range(addr, len);
+			if (ret)
+				return ret;
+		}
 		return addr;
 	}
 
@@ -1224,8 +1231,10 @@ int do_munmap(struct mm_struct *mm, unsi
 	/* we have  start < mpnt->vm_end  */
 
 	if (is_vm_hugetlb_page(mpnt)) {
-		if ((start & ~HPAGE_MASK) || (len & ~HPAGE_MASK))
-			return -EINVAL;
+		int ret = is_aligned_hugepage_range(start, len);
+
+		if (ret)
+			return ret;
 	}
 
 	/* if it doesn't overlap, we have nothing.. */

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2003-02-08  8:48 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-31 23:15 hugepage patches Andrew Morton
2003-01-31 23:13 ` David S. Miller
2003-01-31 23:36   ` Andrew Morton
2003-01-31 23:23     ` David S. Miller
2003-01-31 23:45       ` Andrew Morton
2003-01-31 23:48         ` David S. Miller
2003-01-31 23:16 ` Andrew Morton
2003-01-31 23:17 ` Andrew Morton
2003-01-31 23:18 ` Andrew Morton
2003-01-31 23:18 ` Andrew Morton
2003-02-01  8:58   ` Ingo Oeser
2003-02-01  9:31     ` Andrew Morton
2003-02-01 10:00       ` William Lee Irwin III
2003-02-01 10:14         ` Andrew Morton
2003-02-02 10:55 ` Andrew Morton
2003-02-02 10:55 ` Andrew Morton
2003-02-02 19:59   ` William Lee Irwin III
2003-02-02 20:49     ` Andrew Morton
2003-02-03 15:09       ` Eric W. Biederman
2003-02-03 21:29         ` Andrew Morton
2003-02-04  5:37           ` Eric W. Biederman
2003-02-04  5:50             ` William Lee Irwin III
2003-02-04  7:06               ` Eric W. Biederman
2003-02-04  7:16                 ` Martin J. Bligh
2003-02-04 12:40                   ` Eric W. Biederman
2003-02-04 15:55                     ` Martin J. Bligh
2003-02-05 12:18                       ` Eric W. Biederman
2003-02-04 21:12                     ` Andrew Morton
2003-02-05 12:25                       ` Eric W. Biederman
2003-02-05 19:57                         ` Andrew Morton
2003-02-05 20:00                           ` Andrew Morton
2003-02-02 10:55 ` Andrew Morton
2003-02-02 10:56 ` Andrew Morton
2003-02-02 20:06   ` William Lee Irwin III
2003-02-02 10:56 ` Andrew Morton
2003-02-02 10:56 ` Andrew Morton
2003-02-02 10:57 ` Andrew Morton
2003-02-02 10:57 ` Andrew Morton
2003-02-02 20:17   ` William Lee Irwin III
2003-02-02 10:57 ` Andrew Morton
2003-02-07 21:49 Seth, Rohit
2003-02-07 22:00 ` Andrew Morton
2003-02-07 22:02 Seth, Rohit
2003-02-07 22:24 ` Andrew Morton
2003-02-08  1:47 Seth, Rohit
2003-02-08  2:02 ` Andrew Morton
2003-02-08  3:05 Seth, Rohit
2003-02-08  8:48 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox