linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] get_user_pages shortcut for anonymous pages.
@ 2004-04-05 14:24 Martin Schwidefsky
  2004-04-05 21:29 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Schwidefsky @ 2004-04-05 14:24 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm

Hi Andrew,

> I think this will do the wrong thing if the virtual address
> refers to an anon page which is swapped out.
Oh yes, follow_page returns NULL for swapped out pages.

> You'd need to teach follow_page() to return one of three values:
> page-present, page-not-present-but-used-to-be or
> page-not-present-and-never-was.
Hmm, this would get ugly because follow_page calls
follow_huge_addr and follow_huge_pmd for system with highmem. I
really don't want to change follow_page. Instead I added a check
for pgd_none/pgd_bad and pmd_none/pmd_bad for page directory
entries needed for the pages in question. After all the patch is
supposed to prevent the creation of page tables so why not check
the pgd/pmd slots? 

diff -urN linux-2.6/mm/memory.c linux-2.6-bigcore/mm/memory.c
--- linux-2.6/mm/memory.c	Sun Apr  4 05:36:58 2004
+++ linux-2.6-bigcore/mm/memory.c	Mon Apr  5 16:06:10 2004
@@ -688,6 +688,32 @@
 }
 
 
+static inline int
+untouched_anonymous_page(struct mm_struct* mm, struct vm_area_struct *vma,
+			 unsigned long address)
+{
+	pgd_t *pgd;
+	pmd_t *pmd;
+
+	/* Check if the vma is for an anonymous mapping. */
+	if (vma->vm_ops && vma->vm_ops->nopage)
+		return 0;
+
+	/* Check if page directory entry exists. */
+	pgd = pgd_offset(mm, address);
+	if (pgd_none(*pgd) || pgd_bad(*pgd))
+		return 1;
+
+	/* Check if page middle directory entry exists. */
+	pmd = pmd_offset(pgd, address);
+	if (pmd_none(*pmd) || pmd_bad(*pmd))
+		return 1;
+
+	/* There is a pte slot for 'address' in 'mm'. */
+	return 0;
+}
+
+
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, int len, int write, int force,
 		struct page **pages, struct vm_area_struct **vmas)
@@ -750,6 +776,18 @@
 			struct page *map;
 			int lookup_write = write;
 			while (!(map = follow_page(mm, start, lookup_write))) {
+				/*
+				 * Shortcut for anonymous pages. We don't want
+				 * to force the creation of pages tables for
+				 * insanly big anonymously mapped areas that
+				 * nobody touched so far. This is important
+				 * for doing a core dump for these mappings.
+				 */
+				if (!lookup_write &&
+				    untouched_anonymous_page(mm,vma,start)) {
+					map = ZERO_PAGE(start);
+					break;
+				}
 				spin_unlock(&mm->page_table_lock);
 				switch (handle_mm_fault(mm,vma,start,write)) {
 				case VM_FAULT_MINOR:


blue skies,
  Martin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH] get_user_pages shortcut for anonymous pages.
@ 2004-04-06  7:24 Martin Schwidefsky
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Schwidefsky @ 2004-04-06  7:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm




> OK..  I'm not sure that this patch makes sense though.  I mean, if your
> test had gone and dirtied all these pages rather than forcing the
coredump
> code to do it, we'd still exhaust all physical memory with pagetables,
> assuming you have enough swapspace.  So I don't see we're gaining much?

Well, it the test would have tried to dirty all these pages it would have
run out of memory long before the available real memory is filled up with
page tables. After bigcore has finished I had a core file of 2 terabyte.
What we are gaining with the patch is that a system can't be "crashed"
any more by a wild store of a process to a memory location below the
stack. Consider a store to current stack - 1TB. The stack vma is extended
to include this address because of VM_GROWSDOWN. If such a process dies
(which is likely for a defunc process) then the elf core dumper will
cause the system to hang because of too many page tables. I known that
this can easily be circumvented with ulimit. This is why I asked the
question if I am wasting my time with this.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schonaicherstr. 220, D-71032 Boblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread
* [PATCH] get_user_pages shortcut for anonymous pages.
@ 2004-04-02 14:17 Martin Schwidefsky
  2004-04-05  5:59 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Schwidefsky @ 2004-04-02 14:17 UTC (permalink / raw)
  To: linux-mm

Hi,
did anybody else stumble over the bigcore test case in gdb on a 64
bit architecture? For s390-64 and no ulimit the bigcore test in
fact crashes the kernel. The system is still pingable but it doesn't
do anything because every single pages is used for page tables. The
bigcore process is not terminated because the system thinks that
there is enough swap space left to free some pages to continue.
But this isn't true because its all page tables. I can't solve the
real problem (too many page table pages) but I have a patch that
helps with the bigcore test. The reason why bigcore creates a lot
of pages tables is that the elf core dumper uses get_user_pages to
get the pages frames for all vmas of the process. get_user_pages
does a lookup for each page with follow_page and if the page
doesn't exist uses handle_mm_fault to force the page in if possible. 
It's handle_mm_fault that allocates the page middle directories and
the page tables. To prevent that I added a check to get_user_pages
to find out if the vma in question is for an anonymous mapping and
if the caller of get_user_pages only wants to read from the pages.
If this is the case (and follow_page returned NULL) just return
ZERO_PAGE without going over handle_mm_fault.
I tested this on a 256MB machine and bigcore successfully created
a 2TB sparse file that gdb could read. Is this something that is
worth to pursue or I am just wasting my time ?

blues skies,
  Martin.

diff -urN linux-2.6/mm/memory.c linux-2.6-bigcore/mm/memory.c
--- linux-2.6/mm/memory.c	Fri Apr  2 11:05:27 2004
+++ linux-2.6-bigcore/mm/memory.c	Fri Apr  2 11:08:08 2004
@@ -750,6 +750,18 @@
 			struct page *map;
 			int lookup_write = write;
 			while (!(map = follow_page(mm, start, lookup_write))) {
+				/*
+				 * Shortcut for anonymous pages. We don't want
+				 * to force the creation of pages tables for
+				 * insanly big anonymously mapped areas that
+				 * nobody touched so far. This is important
+				 * for doing a core dump for these mappings.
+				 */
+				if (!lookup_write && 
+				    (!vma->vm_ops || !vma->vm_ops->nopage)) {
+					map = ZERO_PAGE(start);
+					break;
+				}
 				spin_unlock(&mm->page_table_lock);
 				switch (handle_mm_fault(mm,vma,start,write)) {
 				case VM_FAULT_MINOR:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-04-06  7:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-05 14:24 [PATCH] get_user_pages shortcut for anonymous pages Martin Schwidefsky
2004-04-05 21:29 ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2004-04-06  7:24 Martin Schwidefsky
2004-04-02 14:17 Martin Schwidefsky
2004-04-05  5:59 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox