* 2.5.53-mm2
@ 2002-12-29 0:52 Andrew Morton
2003-01-02 4:53 ` 2.5.53-mm2 William Lee Irwin III
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2002-12-29 0:52 UTC (permalink / raw)
To: lkml, linux-mm
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.53/2.5.53-mm2/
Mainly stability work:
. If pte_chain_alloc() fails to allocate GFP_ATOMIC memory, the kernel
oopses. This is a long-standing rmap problem. Present also in the 2.4
rmap patches and, as far as I know, production Red Hat kernels.
So it is clearly a very rare problem but it is not acceptable to have
an unchecked kmalloc in the core of the 2.5 VM.
The approach which I took was to change the page_add_rmap() API to
require the caller to pass in a preallocated pte_chain. And change
all callers to allocate their pte_chains with GFP_KERNEL.
This change is fairly ugly, but every other hare-brained scheme I
could come up with had holes. This one adds maybe 20 instructions
to pagefaults and works...
The swapoff path has not yet been converted - this can still oops.
The locking isn't quite right yet, if shared pagetables are enabled.
. If radix_tree_insert() fails to allocate GFP_ATOMIC memory, a system call
will return -ENOMEM, resulting in application failure.
This was fixed by implementing a reservation API within the slab allocator.
Before taking locks the caller of radix_tree_insert will ask slab to preallocate
sufficient objects in this CPU's slab head array to guarantee that the
allocation of up to seven (on ia32) radix_tree_nodes cannot fail.
This permitted the removal of the radix tree mempool. That's a 130 kbyte
saving. (260k on 64-bit).
. Some aggressive pruning of various system-wide memory reserve settings:
- The page reservation limits in the page allocator have been reduced
from ~256 pages per zone to ~4 pages per zone.
- The preallocation levels in the slab head arrays (which were ridiculously
large) have been reduced from 32k-128k to, typically, a single page.
- The per-cpu-pages head arrays in the page allocator have been reduced
from ~64 pages to 2 pages.
The net effect of these changes is to remove almost all of the kernel's
reserved memory buffers. Instead of maintaining several megabytes of
free memory the kernel will only maintain some tens of kilobytes.
And guess what? Everything still works.
I won't be submitting these changes - they are here for robustness testing.
But it certainly does indicate that the settings of these thresholds need
to be reviewed. And that there don't appear to be any low-on-memory deadlocks
in the VM (with ext2, at least..)
. An updated dcache_rcu patch which should fix a rename-related race which
Al Viro noted.
Changes since 2.5.53-mm1:
+linus.patch
Latest -BK
+aic-bounce.patch
aic7xxx highmem IO fix
+misc.patch
triviata
+devfs-fix.patch
A partial fix for a CONFIG_DEVFS=y boot problem
+copy_page_range-cleanup.patch
Small cleanups, partly to ease the maintenance of the shared pagetable
diff.
+pte_chain_alloc-fix.patch
Infrastructure for handling pte_chain_alloc() failures.
+page_add_rmap-rework.patch
Handle pte_chain_alloc() failures.
shpte-ng.patch
Lots of changes to handle pte_chain_alloc() failures.
+slab-preallocation.patch
Add an API to slab to reserve objects in the per-CPU head arrays.
+slab-export-tuning.patch
Export the slab head-array tuning functions.
+rat-preallocation.patch
Add a reservation API to the radix_tree code.
+use-rat-preallocation.patch
Use the reservation API to avoid radix_tree allocation failures.
+teeny-mem-limits.patch
Remove most of the page allocator page reserves.
+smaller-head-arrays.patch
Remove most of the slab memory reserves.
+remove-hugetlb-syscalls.patch
Remove the hugetlb system calls. hugetlbfs is suitable.
All 72 patches:
linus.patch
cset-1.951-to-1.1030.txt.gz
kgdb.patch
aic-bounce.patch
rcf.patch
run-child-first after fork
ga2.patch
don't call console drivers on non-online CPUs
misc.patch
misc fixes
devfs-fix.patch
dio-return-partial-result.patch
aio-direct-io-infrastructure.patch
AIO support for raw/O_DIRECT
deferred-bio-dirtying.patch
bio dirtying infrastructure
aio-direct-io.patch
AIO support for raw/O_DIRECT
aio-dio-debug.patch
dio-reduce-context-switch-rate.patch
Reduced wakeup rate in direct-io code
cputimes_stat.patch
Retore per-cpu time accounting, with a config option
reduce-random-context-switch-rate.patch
Reduce context switch rate due to the random driver
inlines-net.patch
rbtree-iosched.patch
rbtree-based IO scheduler
deadsched-fix.patch
deadline scheduler fix
quota-smp-locks.patch
Subject: [PATCH] Quota SMP locks
copy_page_range-cleanup.patch
copy_page_range: minor cleanup
pte_chain_alloc-fix.patch
page_add_rmap-rework.patch
shpte-ng.patch
pagetable sharing for ia32
slab-preallocation.patch
slab-export-tuning.patch
rat-preallocation.patch
use-rat-preallocation.patch
teeny-mem-limits.patch
smaller-head-arrays.patch
ptrace-flush.patch
Subject: [PATCH] ptrace on 2.5.44
buffer-debug.patch
buffer.c debugging
warn-null-wakeup.patch
pentium-II.patch
Pentium-II support bits
rcu-stats.patch
RCU statistics reporting
auto-unplug.patch
self-unplugging request queues
less-unplugging.patch
Remove most of the blk_run_queues() calls
ext3-fsync-speedup.patch
Clean up ext3_sync_file()
lockless-current_kernel_time.patch
Lockless current_kernel_timer()
scheduler-tunables.patch
scheduler tunables
dio-always-kmalloc.patch
direct-io: dynamically allocate struct dio
file-nr-doc-fix.patch
Docs: fix explanation of file-nr
set_page_dirty_lock.patch
fix set_page_dirty vs truncate&free races
remove-memshared.patch
Remove /proc/meminfo:MemShared
bin2bcd.patch
BIN_TO_BCD consolidation
log_buf_size.patch
move LOG_BUF_SIZE to header/config
semtimedop-update.patch
Enable semtimedop for ia64 32-bit emulation.
drain_local_pages.patch
add drain_local_pages() for CONFIG_SOFTWARE_SUSPEND
htlb-2.patch
hugetlb: fix MAP_FIXED handling
kmalloc_percpu.patch
kmalloc_percpu -- stripped down version
config_page_offset.patch
Configurable kenrel/user memory split
config_hz.patch
CONFIGurable HZ
dont-aligns-vmas.patch
Don't cacheline-align vm_area_struct
remove-swappable.patch
remove task_struct.swappable
remove-hugetlb-syscalls.patch
Subject: [hugetlb] remove hugetlb syscalls
wli-01_numaq_io.patch
(undescribed patch)
wli-02_do_sak.patch
(undescribed patch)
wli-03_proc_super.patch
(undescribed patch)
wli-06_uml_get_task.patch
(undescribed patch)
wli-07_numaq_mem_map.patch
(undescribed patch)
wli-08_numaq_pgdat.patch
(undescribed patch)
wli-09_has_stopped_jobs.patch
(undescribed patch)
wli-10_inode_wait.patch
(undescribed patch)
wli-11_pgd_ctor.patch
(undescribed patch)
wli-12_pidhash_size.patch
(undescribed patch)
wli-13_rmap_nrpte.patch
(undescribed patch)
dcache_rcu-2.patch
dcache_rcu-2-2.5.51.patch
dcache_rcu-3.patch
dcache_rcu-3-2.5.51.patch
page-walk-api.patch
page-walk-scsi.patch
page-walk-api-update.patch
pagewalk API update
gup-check-valid.patch
valid page test in get_user_pages()
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 2.5.53-mm2
2002-12-29 0:52 2.5.53-mm2 Andrew Morton
@ 2003-01-02 4:53 ` William Lee Irwin III
2003-01-02 5:25 ` 2.5.53-mm2 William Lee Irwin III
0 siblings, 1 reply; 3+ messages in thread
From: William Lee Irwin III @ 2003-01-02 4:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Sat, Dec 28, 2002 at 04:52:20PM -0800, Andrew Morton wrote:
> wli-11_pgd_ctor.patch
> (undescribed patch)
A moment's reflection on the subject suggests to me it's worthwhile to
generalize pgd_ctor support so it works (without #ifdefs!) on both PAE
and non-PAE. This tiny tweak is actually more noticeably beneficial
on non-PAE systems but only really because pgd_alloc() is more visible;
the most likely reason it's less visible on PAE is "other overhead".
It looks particularly nice since it removes more code than it adds.
Touch tested on NUMA-Q (PAE). OFTC #kn testers testing the non-PAE case.
arch/i386/mm/init.c | 36 +++++++++++++----------
arch/i386/mm/pgtable.c | 58 ++++++++++++--------------------------
include/asm-i386/pgtable-3level.h | 2 -
include/asm-i386/pgtable.h | 13 +-------
4 files changed, 41 insertions(+), 68 deletions(-)
diff -urpN mm3-2.5.53-1/arch/i386/mm/init.c mm3-2.5.53-2/arch/i386/mm/init.c
--- mm3-2.5.53-1/arch/i386/mm/init.c 2003-01-01 18:49:19.000000000 -0800
+++ mm3-2.5.53-2/arch/i386/mm/init.c 2003-01-01 18:51:17.000000000 -0800
@@ -504,32 +504,36 @@ void __init mem_init(void)
#endif
}
-#if CONFIG_X86_PAE
#include <linux/slab.h>
-kmem_cache_t *pae_pmd_cachep;
-kmem_cache_t *pae_pgd_cachep;
+kmem_cache_t *pmd_cache;
+kmem_cache_t *pgd_cache;
-void pae_pmd_ctor(void *, kmem_cache_t *, unsigned long);
-void pae_pgd_ctor(void *, kmem_cache_t *, unsigned long);
+void pmd_ctor(void *, kmem_cache_t *, unsigned long);
+void pgd_ctor(void *, kmem_cache_t *, unsigned long);
void __init pgtable_cache_init(void)
{
+ if (PTRS_PER_PMD > 1) {
+ pmd_cache = kmem_cache_create("pae_pmd",
+ PTRS_PER_PMD*sizeof(pmd_t),
+ 0,
+ SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
+ pmd_ctor,
+ NULL);
+
+ if (!pmd_cache)
+ panic("pgtable_cache_init(): cannot create pmd cache");
+ }
+
/*
* PAE pgds must be 16-byte aligned:
*/
- pae_pmd_cachep = kmem_cache_create("pae_pmd", 4096, 0,
- SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pae_pmd_ctor, NULL);
-
- if (!pae_pmd_cachep)
- panic("init_pae(): cannot allocate pae_pmd SLAB cache");
-
- pae_pgd_cachep = kmem_cache_create("pae_pgd", 32, 0,
- SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pae_pgd_ctor, NULL);
- if (!pae_pgd_cachep)
- panic("init_pae(): Cannot alloc pae_pgd SLAB cache");
+ pgd_cache = kmem_cache_create("pgd", PTRS_PER_PGD*sizeof(pgd_t), 0,
+ SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pgd_ctor, NULL);
+ if (!pgd_cache)
+ panic("pgtable_cache_init(): Cannot create pgd cache");
}
-#endif
/* Put this after the callers, so that it cannot be inlined */
static int do_test_wp_bit(void)
diff -urpN mm3-2.5.53-1/arch/i386/mm/pgtable.c mm3-2.5.53-2/arch/i386/mm/pgtable.c
--- mm3-2.5.53-1/arch/i386/mm/pgtable.c 2003-01-01 18:49:19.000000000 -0800
+++ mm3-2.5.53-2/arch/i386/mm/pgtable.c 2003-01-01 18:51:17.000000000 -0800
@@ -166,19 +166,20 @@ struct page *pte_alloc_one(struct mm_str
return pte;
}
-#if CONFIG_X86_PAE
+extern kmem_cache_t *pmd_cache;
+extern kmem_cache_t *pgd_cache;
-extern kmem_cache_t *pae_pmd_cachep;
-
-void pae_pmd_ctor(void *__pmd, kmem_cache_t *pmd_cache, unsigned long flags)
+void pmd_ctor(void *__pmd, kmem_cache_t *pmd_cache, unsigned long flags)
{
clear_page(__pmd);
}
-void pae_pgd_ctor(void *__pgd, kmem_cache_t *pgd_cache, unsigned long flags)
+void pgd_ctor(void *__pgd, kmem_cache_t *pgd_cache, unsigned long flags)
{
pgd_t *pgd = __pgd;
+ if (PTRS_PER_PMD == 1)
+ memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
memcpy(pgd + USER_PTRS_PER_PGD,
swapper_pg_dir + USER_PTRS_PER_PGD,
(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
@@ -187,59 +188,38 @@ void pae_pgd_ctor(void *__pgd, kmem_cach
pgd_t *pgd_alloc(struct mm_struct *mm)
{
int i;
- pgd_t *pgd = kmem_cache_alloc(pae_pgd_cachep, SLAB_KERNEL);
+ pgd_t *pgd = kmem_cache_alloc(pgd_cache, SLAB_KERNEL);
- if (!pgd)
+ if (PTRS_PER_PMD == 1)
+ return pgd;
+ else if (!pgd)
return NULL;
for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
- pmd_t *pmd = kmem_cache_alloc(pae_pmd_cachep, SLAB_KERNEL);
+ pmd_t *pmd = kmem_cache_alloc(pmd_cache, SLAB_KERNEL);
if (!pmd)
goto out_oom;
- else if ((unsigned long)pmd & ~PAGE_MASK) {
- printk("kmem_cache_alloc did wrong! death ensues!\n");
- goto out_oom;
- }
set_pgd(pgd + i, __pgd(1 + __pa((unsigned long long)((unsigned long)pmd))));
}
return pgd;
out_oom:
for (i--; i >= 0; --i)
- kmem_cache_free(pae_pmd_cachep, (void *)__va(pgd_val(pgd[i])-1));
- kmem_cache_free(pae_pgd_cachep, (void *)pgd);
+ kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1));
+ kmem_cache_free(pgd_cache, (void *)pgd);
return NULL;
}
void pgd_free(pgd_t *pgd)
{
int i;
- for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
- kmem_cache_free(pae_pmd_cachep, (void *)__va(pgd_val(pgd[i])-1));
- set_pgd(pgd + i, __pgd(0));
- }
- kmem_cache_free(pae_pgd_cachep, (void *)pgd);
-}
-
-#else
-pgd_t *pgd_alloc(struct mm_struct *mm)
-{
- pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL);
-
- if (pgd) {
- memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
- memcpy(pgd + USER_PTRS_PER_PGD,
- swapper_pg_dir + USER_PTRS_PER_PGD,
- (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
+ if (PTRS_PER_PMD > 1) {
+ for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
+ kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1));
+ set_pgd(pgd + i, __pgd(0));
+ }
}
- return pgd;
-}
-void pgd_free(pgd_t *pgd)
-{
- free_page((unsigned long)pgd);
+ kmem_cache_free(pgd_cache, (void *)pgd);
}
-
-#endif /* CONFIG_X86_PAE */
-
diff -urpN mm3-2.5.53-1/include/asm-i386/pgtable-3level.h mm3-2.5.53-2/include/asm-i386/pgtable-3level.h
--- mm3-2.5.53-1/include/asm-i386/pgtable-3level.h 2002-12-23 21:21:07.000000000 -0800
+++ mm3-2.5.53-2/include/asm-i386/pgtable-3level.h 2003-01-01 18:51:17.000000000 -0800
@@ -106,6 +106,4 @@ static inline pmd_t pfn_pmd(unsigned lon
return __pmd(((unsigned long long)page_nr << PAGE_SHIFT) | pgprot_val(pgprot));
}
-extern struct kmem_cache_s *pae_pgd_cachep;
-
#endif /* _I386_PGTABLE_3LEVEL_H */
diff -urpN mm3-2.5.53-1/include/asm-i386/pgtable.h mm3-2.5.53-2/include/asm-i386/pgtable.h
--- mm3-2.5.53-1/include/asm-i386/pgtable.h 2003-01-01 18:49:21.000000000 -0800
+++ mm3-2.5.53-2/include/asm-i386/pgtable.h 2003-01-01 18:51:17.000000000 -0800
@@ -41,22 +41,13 @@ extern unsigned long empty_zero_page[102
#ifndef __ASSEMBLY__
#if CONFIG_X86_PAE
# include <asm/pgtable-3level.h>
-
-/*
- * Need to initialise the X86 PAE caches
- */
-extern void pgtable_cache_init(void);
-
#else
# include <asm/pgtable-2level.h>
+#endif
-/*
- * No page table caches to initialise
- */
-#define pgtable_cache_init() do { } while (0)
+void pgtable_cache_init(void);
#endif
-#endif
#define __beep() asm("movb $0x3,%al; outb %al,$0x61")
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 2.5.53-mm2
2003-01-02 4:53 ` 2.5.53-mm2 William Lee Irwin III
@ 2003-01-02 5:25 ` William Lee Irwin III
0 siblings, 0 replies; 3+ messages in thread
From: William Lee Irwin III @ 2003-01-02 5:25 UTC (permalink / raw)
To: Andrew Morton, lkml, linux-mm
On Sat, Dec 28, 2002 at 04:52:20PM -0800, Andrew Morton wrote:
>> wli-11_pgd_ctor.patch
On Wed, Jan 01, 2003 at 08:53:27PM -0800, William Lee Irwin III wrote:
> A moment's reflection on the subject suggests to me it's worthwhile to
> generalize pgd_ctor support so it works (without #ifdefs!) on both PAE
> and non-PAE. This tiny tweak is actually more noticeably beneficial
> on non-PAE systems but only really because pgd_alloc() is more visible;
> the most likely reason it's less visible on PAE is "other overhead".
> It looks particularly nice since it removes more code than it adds.
> Touch tested on NUMA-Q (PAE). OFTC #kn testers testing the non-PAE case.
For those needing more interpretation, this is essentially a reinstatement
of the 2.4.x-style pgd/pmd cache optimization in a leak-free and accounted
(in /proc/slabinfo) manner.
The point of the optimizations is that these initializations are large
cache hits to take in a single shot, and in the PAE case, amount to a
full L1 cache flush as they traverse almost an entire 16K.
No rigorous benchmarking has been done yet.
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2003-01-02 5:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-29 0:52 2.5.53-mm2 Andrew Morton
2003-01-02 4:53 ` 2.5.53-mm2 William Lee Irwin III
2003-01-02 5:25 ` 2.5.53-mm2 William Lee Irwin III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox