* [PATCH 0/5] mm improvements
@ 2004-02-04 9:39 Nick Piggin
2004-02-04 9:40 ` [PATCH 1/5] " Nick Piggin
` (5 more replies)
0 siblings, 6 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 9:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Patches against 2.6.2-rc3-mm1.
Please test / review / comment.
1/5: vm-no-rss-limit.patch
Remove broken RSS limiting. Simple problem, Rik is onto it.
2/5: vm-dont-rotate-active-list.patch
Nikita's patch to keep more page ordering info in the active list.
Also should improve system time due to less useless scanning
Helps swapping loads significantly.
3/5: vm-lru-info.patch
Keep more referenced info in the active list. Should also improve
system time in some cases. Helps swapping loads significantly.
4/5: vm-fix-shrink-zone.patch
Most significant part of this patch changes active / inactive
balancing. This improves non swapping kbuild by a few %. Helps
swapping significantly.
It also contains a number of other small fixes which have little
measurable impact on kbuild.
5/5: vm-tune-throttle.patch
Try to allocate a bit harder before giving up / throttling on
writeout.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 1/5] mm improvements
2004-02-04 9:39 [PATCH 0/5] mm improvements Nick Piggin
@ 2004-02-04 9:40 ` Nick Piggin
2004-02-04 19:45 ` Rik van Riel
2004-02-04 9:40 ` [PATCH 2/5] " Nick Piggin
` (4 subsequent siblings)
5 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 9:40 UTC (permalink / raw)
To: Andrew Morton, linux-mm
[-- Attachment #1: Type: text/plain, Size: 120 bytes --]
Nick Piggin wrote:
> 1/5: vm-no-rss-limit.patch
> Remove broken RSS limiting. Simple problem, Rik is onto it.
>
[-- Attachment #2: vm-no-rss-limit.patch --]
[-- Type: text/plain, Size: 6472 bytes --]
linux-2.6-npiggin/include/linux/init_task.h | 2 --
linux-2.6-npiggin/include/linux/sched.h | 1 -
linux-2.6-npiggin/include/linux/swap.h | 4 ++--
linux-2.6-npiggin/kernel/sys.c | 8 --------
linux-2.6-npiggin/mm/rmap.c | 18 +-----------------
linux-2.6-npiggin/mm/vmscan.c | 12 ++++--------
6 files changed, 7 insertions(+), 38 deletions(-)
diff -puN include/linux/init_task.h~vm-no-rss-limit include/linux/init_task.h
--- linux-2.6/include/linux/init_task.h~vm-no-rss-limit 2004-02-04 14:09:43.000000000 +1100
+++ linux-2.6-npiggin/include/linux/init_task.h 2004-02-04 14:09:43.000000000 +1100
@@ -2,7 +2,6 @@
#define _LINUX__INIT_TASK_H
#include <linux/file.h>
-#include <linux/resource.h>
#define INIT_FILES \
{ \
@@ -43,7 +42,6 @@
.mmlist = LIST_HEAD_INIT(name.mmlist), \
.cpu_vm_mask = CPU_MASK_ALL, \
.default_kioctx = INIT_KIOCTX(name.default_kioctx, name), \
- .rlimit_rss = RLIM_INFINITY \
}
#define INIT_SIGNALS(sig) { \
diff -puN include/linux/sched.h~vm-no-rss-limit include/linux/sched.h
--- linux-2.6/include/linux/sched.h~vm-no-rss-limit 2004-02-04 14:09:43.000000000 +1100
+++ linux-2.6-npiggin/include/linux/sched.h 2004-02-04 14:09:43.000000000 +1100
@@ -206,7 +206,6 @@ struct mm_struct {
unsigned long arg_start, arg_end, env_start, env_end;
unsigned long rss, total_vm, locked_vm;
unsigned long def_flags;
- unsigned long rlimit_rss;
unsigned long saved_auxv[40]; /* for /proc/PID/auxv */
diff -puN include/linux/swap.h~vm-no-rss-limit include/linux/swap.h
--- linux-2.6/include/linux/swap.h~vm-no-rss-limit 2004-02-04 14:09:43.000000000 +1100
+++ linux-2.6-npiggin/include/linux/swap.h 2004-02-04 14:09:43.000000000 +1100
@@ -179,7 +179,7 @@ extern int vm_swappiness;
/* linux/mm/rmap.c */
#ifdef CONFIG_MMU
-int FASTCALL(page_referenced(struct page *, int *));
+int FASTCALL(page_referenced(struct page *));
struct pte_chain *FASTCALL(page_add_rmap(struct page *, pte_t *,
struct pte_chain *));
void FASTCALL(page_remove_rmap(struct page *, pte_t *));
@@ -188,7 +188,7 @@ int FASTCALL(try_to_unmap(struct page *)
/* linux/mm/shmem.c */
extern int shmem_unuse(swp_entry_t entry, struct page *page);
#else
-#define page_referenced(page, _x) TestClearPageReferenced(page)
+#define page_referenced(page) TestClearPageReferenced(page)
#define try_to_unmap(page) SWAP_FAIL
#endif /* CONFIG_MMU */
diff -puN kernel/sys.c~vm-no-rss-limit kernel/sys.c
--- linux-2.6/kernel/sys.c~vm-no-rss-limit 2004-02-04 14:09:43.000000000 +1100
+++ linux-2.6-npiggin/kernel/sys.c 2004-02-04 14:09:43.000000000 +1100
@@ -1478,14 +1478,6 @@ asmlinkage long sys_setrlimit(unsigned i
if (retval)
return retval;
- /* The rlimit is specified in bytes, convert to pages for mm. */
- if (resource == RLIMIT_RSS && current->mm) {
- unsigned long pages = RLIM_INFINITY;
- if (new_rlim.rlim_cur != RLIM_INFINITY)
- pages = new_rlim.rlim_cur >> PAGE_SHIFT;
- current->mm->rlimit_rss = pages;
- }
-
*old_rlim = new_rlim;
return 0;
}
diff -puN mm/rmap.c~vm-no-rss-limit mm/rmap.c
--- linux-2.6/mm/rmap.c~vm-no-rss-limit 2004-02-04 14:09:43.000000000 +1100
+++ linux-2.6-npiggin/mm/rmap.c 2004-02-04 14:09:43.000000000 +1100
@@ -104,7 +104,6 @@ pte_chain_encode(struct pte_chain *pte_c
/**
* page_referenced - test if the page was referenced
* @page: the page to test
- * @rsslimit: set if the process(es) using the page is(are) over RSS limit.
*
* Quick test_and_clear_referenced for all mappings to a page,
* returns the number of processes which referenced the page.
@@ -112,13 +111,9 @@ pte_chain_encode(struct pte_chain *pte_c
*
* If the page has a single-entry pte_chain, collapse that back to a PageDirect
* representation. This way, it's only done under memory pressure.
- *
- * The pte_chain_lock() is sufficient to pin down mm_structs while we examine
- * them.
*/
-int page_referenced(struct page *page, int *rsslimit)
+int page_referenced(struct page * page)
{
- struct mm_struct * mm;
struct pte_chain *pc;
int referenced = 0;
@@ -132,17 +127,10 @@ int page_referenced(struct page *page, i
pte_t *pte = rmap_ptep_map(page->pte.direct);
if (ptep_test_and_clear_young(pte))
referenced++;
-
- mm = ptep_to_mm(pte);
- if (mm->rss > mm->rlimit_rss)
- *rsslimit = 1;
rmap_ptep_unmap(pte);
} else {
int nr_chains = 0;
- /* We clear it if any task using the page is under its limit. */
- *rsslimit = 1;
-
/* Check all the page tables mapping this page. */
for (pc = page->pte.chain; pc; pc = pte_chain_next(pc)) {
int i;
@@ -154,10 +142,6 @@ int page_referenced(struct page *page, i
p = rmap_ptep_map(pte_paddr);
if (ptep_test_and_clear_young(p))
referenced++;
-
- mm = ptep_to_mm(p);
- if (mm->rss < mm->rlimit_rss)
- *rsslimit = 0;
rmap_ptep_unmap(p);
nr_chains++;
}
diff -puN mm/vmscan.c~vm-no-rss-limit mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-no-rss-limit 2004-02-04 14:09:43.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-04 14:09:43.000000000 +1100
@@ -252,7 +252,6 @@ shrink_list(struct list_head *page_list,
LIST_HEAD(ret_pages);
struct pagevec freed_pvec;
int pgactivate = 0;
- int over_rsslimit = 0;
int ret = 0;
cond_resched();
@@ -279,8 +278,8 @@ shrink_list(struct list_head *page_list,
goto keep_locked;
pte_chain_lock(page);
- referenced = page_referenced(page, &over_rsslimit);
- if (referenced && page_mapping_inuse(page) && !over_rsslimit) {
+ referenced = page_referenced(page);
+ if (referenced && page_mapping_inuse(page)) {
/* In active use or really unfreeable. Activate it. */
pte_chain_unlock(page);
goto activate_locked;
@@ -601,7 +600,6 @@ refill_inactive_zone(struct zone *zone,
long mapped_ratio;
long distress;
long swap_tendency;
- int over_rsslimit = 0;
lru_add_drain();
pgmoved = 0;
@@ -662,15 +660,13 @@ refill_inactive_zone(struct zone *zone,
list_del(&page->lru);
if (page_mapped(page)) {
pte_chain_lock(page);
- if (page_mapped(page) &&
- page_referenced(page, &over_rsslimit) &&
- !over_rsslimit) {
+ if (page_mapped(page) && page_referenced(page)) {
pte_chain_unlock(page);
list_add(&page->lru, &l_active);
continue;
}
pte_chain_unlock(page);
- if (!reclaim_mapped && !over_rsslimit) {
+ if (!reclaim_mapped) {
list_add(&page->lru, &l_active);
continue;
}
_
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 2/5] mm improvements
2004-02-04 9:39 [PATCH 0/5] mm improvements Nick Piggin
2004-02-04 9:40 ` [PATCH 1/5] " Nick Piggin
@ 2004-02-04 9:40 ` Nick Piggin
2004-02-04 10:10 ` Andrew Morton
2004-02-04 9:41 ` [PATCH 3/5] " Nick Piggin
` (3 subsequent siblings)
5 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 9:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 246 bytes --]
Nick Piggin wrote:
> 2/5: vm-dont-rotate-active-list.patch
> Nikita's patch to keep more page ordering info in the active list.
> Also should improve system time due to less useless scanning
> Helps swapping loads significantly.
>
[-- Attachment #2: vm-dont-rotate-active-list.patch --]
[-- Type: text/plain, Size: 9436 bytes --]
linux-2.6-npiggin/include/linux/mmzone.h | 6 +
linux-2.6-npiggin/mm/page_alloc.c | 20 ++++
linux-2.6-npiggin/mm/vmscan.c | 144 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 51 deletions(-)
diff -puN include/linux/mmzone.h~vm-dont-rotate-active-list include/linux/mmzone.h
--- linux-2.6/include/linux/mmzone.h~vm-dont-rotate-active-list 2004-02-04 14:09:44.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mmzone.h 2004-02-04 14:09:44.000000000 +1100
@@ -149,6 +149,12 @@ struct zone {
unsigned long zone_start_pfn;
/*
+ * dummy page used as place holder during scanning of
+ * active_list in refill_inactive_zone()
+ */
+ struct page *scan_page;
+
+ /*
* rarely used fields:
*/
char *name;
diff -puN mm/page_alloc.c~vm-dont-rotate-active-list mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-dont-rotate-active-list 2004-02-04 14:09:44.000000000 +1100
+++ linux-2.6-npiggin/mm/page_alloc.c 2004-02-04 14:09:44.000000000 +1100
@@ -1213,6 +1213,9 @@ void __init memmap_init_zone(struct page
memmap_init_zone((start), (size), (nid), (zone), (start_pfn))
#endif
+/* dummy pages used to scan active lists */
+static struct page scan_pages[MAX_NUMNODES][MAX_NR_ZONES];
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -1235,6 +1238,7 @@ static void __init free_area_init_core(s
struct zone *zone = pgdat->node_zones + j;
unsigned long size, realsize;
unsigned long batch;
+ struct page *scan_page;
zone_table[NODEZONE(nid, j)] = zone;
realsize = size = zones_size[j];
@@ -1289,6 +1293,22 @@ static void __init free_area_init_core(s
atomic_set(&zone->refill_counter, 0);
zone->nr_active = 0;
zone->nr_inactive = 0;
+
+ /* initialize dummy page used for scanning */
+ scan_page = &scan_pages[nid][j];
+ zone->scan_page = scan_page;
+ memset(scan_page, 0, sizeof *scan_page);
+ scan_page->flags =
+ (1 << PG_locked) |
+ (1 << PG_error) |
+ (1 << PG_lru) |
+ (1 << PG_active) |
+ (1 << PG_reserved);
+ set_page_zone(scan_page, j);
+ page_cache_get(scan_page);
+ INIT_LIST_HEAD(&scan_page->list);
+ list_add(&scan_page->lru, &zone->active_list);
+
if (!size)
continue;
diff -puN mm/vmscan.c~vm-dont-rotate-active-list mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-dont-rotate-active-list 2004-02-04 14:09:44.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-04 14:09:44.000000000 +1100
@@ -45,14 +45,15 @@
int vm_swappiness = 60;
static long total_memory;
+#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
+
#ifdef ARCH_HAS_PREFETCH
#define prefetch_prev_lru_page(_page, _base, _field) \
do { \
if ((_page)->lru.prev != _base) { \
struct page *prev; \
\
- prev = list_entry(_page->lru.prev, \
- struct page, lru); \
+ prev = lru_to_page(&(_page)->lru); \
prefetch(&prev->_field); \
} \
} while (0)
@@ -66,8 +67,7 @@ static long total_memory;
if ((_page)->lru.prev != _base) { \
struct page *prev; \
\
- prev = list_entry(_page->lru.prev, \
- struct page, lru); \
+ prev = lru_to_page(&(_page)->lru); \
prefetchw(&prev->_field); \
} \
} while (0)
@@ -262,7 +262,7 @@ shrink_list(struct list_head *page_list,
int may_enter_fs;
int referenced;
- page = list_entry(page_list->prev, struct page, lru);
+ page = lru_to_page(page_list);
list_del(&page->lru);
if (TestSetPageLocked(page))
@@ -507,8 +507,7 @@ shrink_cache(const int nr_pages, struct
while (nr_scan++ < nr_to_process &&
!list_empty(&zone->inactive_list)) {
- page = list_entry(zone->inactive_list.prev,
- struct page, lru);
+ page = lru_to_page(&zone->inactive_list);
prefetchw_prev_lru_page(page,
&zone->inactive_list, flags);
@@ -546,7 +545,7 @@ shrink_cache(const int nr_pages, struct
* Put back any unfreeable pages.
*/
while (!list_empty(&page_list)) {
- page = list_entry(page_list.prev, struct page, lru);
+ page = lru_to_page(&page_list);
if (TestSetPageLRU(page))
BUG();
list_del(&page->lru);
@@ -567,6 +566,39 @@ done:
return ret;
}
+
+/* move pages from @page_list to the @spot, that should be somewhere on the
+ * @zone->active_list */
+static int
+spill_on_spot(struct zone *zone,
+ struct list_head *page_list, struct list_head *spot,
+ struct pagevec *pvec)
+{
+ struct page *page;
+ int moved;
+
+ moved = 0;
+ while (!list_empty(page_list)) {
+ page = lru_to_page(page_list);
+ prefetchw_prev_lru_page(page, page_list, flags);
+ if (TestSetPageLRU(page))
+ BUG();
+ BUG_ON(!PageActive(page));
+ list_move(&page->lru, spot);
+ moved++;
+ if (!pagevec_add(pvec, page)) {
+ zone->nr_active += moved;
+ moved = 0;
+ spin_unlock_irq(&zone->lru_lock);
+ __pagevec_release(pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ return moved;
+}
+
+
+
/*
* This moves pages from the active list to the inactive list.
*
@@ -593,37 +625,18 @@ refill_inactive_zone(struct zone *zone,
int nr_pages = nr_pages_in;
LIST_HEAD(l_hold); /* The pages which were snipped off */
LIST_HEAD(l_inactive); /* Pages to go onto the inactive_list */
- LIST_HEAD(l_active); /* Pages to go onto the active_list */
+ LIST_HEAD(l_ignore); /* Pages to be returned to the active_list */
+ LIST_HEAD(l_active); /* Pages to go onto the head of the
+ * active_list */
+
struct page *page;
+ struct page *scan;
struct pagevec pvec;
int reclaim_mapped = 0;
long mapped_ratio;
long distress;
long swap_tendency;
- lru_add_drain();
- pgmoved = 0;
- spin_lock_irq(&zone->lru_lock);
- while (nr_pages && !list_empty(&zone->active_list)) {
- page = list_entry(zone->active_list.prev, struct page, lru);
- prefetchw_prev_lru_page(page, &zone->active_list, flags);
- if (!TestClearPageLRU(page))
- BUG();
- list_del(&page->lru);
- if (page_count(page) == 0) {
- /* It is currently in pagevec_release() */
- SetPageLRU(page);
- list_add(&page->lru, &zone->active_list);
- } else {
- page_cache_get(page);
- list_add(&page->lru, &l_hold);
- pgmoved++;
- }
- nr_pages--;
- }
- zone->nr_active -= pgmoved;
- spin_unlock_irq(&zone->lru_lock);
-
/*
* `distress' is a measure of how much trouble we're having reclaiming
* pages. 0 -> no problems. 100 -> great trouble.
@@ -655,10 +668,53 @@ refill_inactive_zone(struct zone *zone,
if (swap_tendency >= 100)
reclaim_mapped = 1;
+ scan = zone->scan_page;
+ lru_add_drain();
+ pgmoved = 0;
+ spin_lock_irq(&zone->lru_lock);
+ if (reclaim_mapped) {
+ /*
+ * When scanning active_list with !reclaim_mapped mapped
+ * inactive pages are left behind zone->scan_page. If zone is
+ * switched to reclaim_mapped mode reset zone->scan_page to
+ * the end of inactive list so that inactive mapped pages are
+ * re-scanned.
+ */
+ list_move_tail(&scan->lru, &zone->active_list);
+ }
+ while (nr_pages && zone->active_list.prev != zone->active_list.next) {
+ /*
+ * if head of active list reached---wrap to the tail
+ */
+ if (scan->lru.prev == &zone->active_list)
+ list_move_tail(&scan->lru, &zone->active_list);
+ page = lru_to_page(&scan->lru);
+ prefetchw_prev_lru_page(page, &zone->active_list, flags);
+ if (!TestClearPageLRU(page))
+ BUG();
+ list_del(&page->lru);
+ if (page_count(page) == 0) {
+ /* It is currently in pagevec_release() */
+ SetPageLRU(page);
+ list_add(&page->lru, &zone->active_list);
+ } else {
+ page_cache_get(page);
+ list_add(&page->lru, &l_hold);
+ pgmoved++;
+ }
+ nr_pages--;
+ }
+ zone->nr_active -= pgmoved;
+ spin_unlock_irq(&zone->lru_lock);
+
while (!list_empty(&l_hold)) {
- page = list_entry(l_hold.prev, struct page, lru);
+ page = lru_to_page(&l_hold);
list_del(&page->lru);
if (page_mapped(page)) {
+ /*
+ * probably it would be useful to transfer dirty bit
+ * from pte to the @page here.
+ */
pte_chain_lock(page);
if (page_mapped(page) && page_referenced(page)) {
pte_chain_unlock(page);
@@ -667,7 +723,7 @@ refill_inactive_zone(struct zone *zone,
}
pte_chain_unlock(page);
if (!reclaim_mapped) {
- list_add(&page->lru, &l_active);
+ list_add(&page->lru, &l_ignore);
continue;
}
}
@@ -687,7 +743,7 @@ refill_inactive_zone(struct zone *zone,
pgmoved = 0;
spin_lock_irq(&zone->lru_lock);
while (!list_empty(&l_inactive)) {
- page = list_entry(l_inactive.prev, struct page, lru);
+ page = lru_to_page(&l_inactive);
prefetchw_prev_lru_page(page, &l_inactive, flags);
if (TestSetPageLRU(page))
BUG();
@@ -714,23 +770,9 @@ refill_inactive_zone(struct zone *zone,
spin_lock_irq(&zone->lru_lock);
}
- pgmoved = 0;
- while (!list_empty(&l_active)) {
- page = list_entry(l_active.prev, struct page, lru);
- prefetchw_prev_lru_page(page, &l_active, flags);
- if (TestSetPageLRU(page))
- BUG();
- BUG_ON(!PageActive(page));
- list_move(&page->lru, &zone->active_list);
- pgmoved++;
- if (!pagevec_add(&pvec, page)) {
+ pgmoved = spill_on_spot(zone, &l_active, &zone->active_list, &pvec);
zone->nr_active += pgmoved;
- pgmoved = 0;
- spin_unlock_irq(&zone->lru_lock);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
+ pgmoved = spill_on_spot(zone, &l_ignore, &scan->lru, &pvec);
zone->nr_active += pgmoved;
spin_unlock_irq(&zone->lru_lock);
pagevec_release(&pvec);
_
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 3/5] mm improvements
2004-02-04 9:39 [PATCH 0/5] mm improvements Nick Piggin
2004-02-04 9:40 ` [PATCH 1/5] " Nick Piggin
2004-02-04 9:40 ` [PATCH 2/5] " Nick Piggin
@ 2004-02-04 9:41 ` Nick Piggin
2004-02-04 15:28 ` Rik van Riel
2004-02-04 9:42 ` [PATCH 4/5] " Nick Piggin
` (2 subsequent siblings)
5 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 9:41 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-mm
[-- Attachment #1: Type: text/plain, Size: 190 bytes --]
Nick Piggin wrote:
> 3/5: vm-lru-info.patch
> Keep more referenced info in the active list. Should also improve
> system time in some cases. Helps swapping loads significantly.
>
[-- Attachment #2: vm-lru-info.patch --]
[-- Type: text/plain, Size: 1385 bytes --]
When refill_inactive_list is running !reclaim_mapped, it clears a mapped
pages referenced bits then puts them back to the head of the active list.
Referenced and non referenced mapped pages are treated the same, so you
lose the "referenced" information.
This patch causes the referenced bits to not be cleared during !reclaim_mapped.
It improves heavy swapping performance significantly.
linux-2.6-npiggin/mm/vmscan.c | 14 ++++++++++----
1 files changed, 10 insertions(+), 4 deletions(-)
diff -puN mm/vmscan.c~vm-lru-info mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-lru-info 2004-02-04 14:09:45.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-04 14:09:45.000000000 +1100
@@ -711,6 +711,16 @@ refill_inactive_zone(struct zone *zone,
page = lru_to_page(&l_hold);
list_del(&page->lru);
if (page_mapped(page)) {
+
+ /*
+ * Don't clear page referenced if we're not going
+ * to use it.
+ */
+ if (!reclaim_mapped) {
+ list_add(&page->lru, &l_ignore);
+ continue;
+ }
+
/*
* probably it would be useful to transfer dirty bit
* from pte to the @page here.
@@ -722,10 +732,6 @@ refill_inactive_zone(struct zone *zone,
continue;
}
pte_chain_unlock(page);
- if (!reclaim_mapped) {
- list_add(&page->lru, &l_ignore);
- continue;
- }
}
/*
* FIXME: need to consider page_count(page) here if/when we
_
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 4/5] mm improvements
2004-02-04 9:39 [PATCH 0/5] mm improvements Nick Piggin
` (2 preceding siblings ...)
2004-02-04 9:41 ` [PATCH 3/5] " Nick Piggin
@ 2004-02-04 9:42 ` Nick Piggin
2004-02-04 10:11 ` Andrew Morton
2004-02-04 9:42 ` [PATCH 5/5] " Nick Piggin
2004-02-04 13:25 ` [PATCH 0/5] " Nikita Danilov
5 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 9:42 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 331 bytes --]
Nick Piggin wrote:
> 4/5: vm-fix-shrink-zone.patch
> Most significant part of this patch changes active / inactive
> balancing. This improves non swapping kbuild by a few %. Helps
> swapping significantly.
>
> It also contains a number of other small fixes which have little
> measurable impact on kbuild.
>
[-- Attachment #2: vm-fix-shrink-zone.patch --]
[-- Type: text/plain, Size: 9951 bytes --]
This patch helps high kbuild loads (swapping) significantly.
It actually also takes 2-3 seconds off a single threaded and non swapping
make bzImage on a 64MB system, and improves light and medium swapping
performance as well.
* Improve precision in shrink_slab by doing a multiply first.
* Calculate nr_scanned correctly instead of using max_scanned.
* In shrink_cache, loop again if (nr_taken == 0) or
(nr_freed <= 0 && list_empty(&page_list)) instead of terminating.
Use max_scan to determine termination.
* In shrink_zone, scan the active list more aggressively at low and
medium imbalances. This gives improvements to kbuild at no and low
swapping loads.
* Scan the active list more aggressively at high loads, but cap the
amount of scanning that can be done. This helps high swapping loads.
* The more aggressive scanning helps by making better use of the inactive
list to provide reclaim information.
* Calculate max_scan after we have refilled the inactive list.
* In try_to_free_pages, put even pressure on the slab even if we have
reclaimed enough pages from the LRU.
linux-2.6-npiggin/mm/vmscan.c | 137 ++++++++++++++++++++----------------------
1 files changed, 66 insertions(+), 71 deletions(-)
diff -puN mm/vmscan.c~vm-fix-shrink-zone mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-fix-shrink-zone 2004-02-04 14:09:45.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-04 14:09:45.000000000 +1100
@@ -137,7 +137,7 @@ EXPORT_SYMBOL(remove_shrinker);
*
* We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits.
*/
-static int shrink_slab(long scanned, unsigned int gfp_mask)
+static int shrink_slab(unsigned long scanned, unsigned int gfp_mask)
{
struct shrinker *shrinker;
long pages;
@@ -149,7 +149,7 @@ static int shrink_slab(long scanned, uns
list_for_each_entry(shrinker, &shrinker_list, list) {
unsigned long long delta;
- delta = 4 * (scanned / shrinker->seeks);
+ delta = 4 * scanned / shrinker->seeks;
delta *= (*shrinker->shrinker)(0, gfp_mask);
do_div(delta, pages + 1);
shrinker->nr += delta;
@@ -245,8 +245,7 @@ static void handle_write_error(struct ad
* shrink_list returns the number of reclaimed pages
*/
static int
-shrink_list(struct list_head *page_list, unsigned int gfp_mask,
- int *max_scan, int *nr_mapped)
+shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_mapped)
{
struct address_space *mapping;
LIST_HEAD(ret_pages);
@@ -481,13 +480,15 @@ keep:
*/
static int
shrink_cache(const int nr_pages, struct zone *zone,
- unsigned int gfp_mask, int max_scan, int *nr_mapped)
+ unsigned int gfp_mask, int max_scan, int *nr_scanned)
{
LIST_HEAD(page_list);
struct pagevec pvec;
int nr_to_process;
int ret = 0;
+ *nr_scanned = 0;
+
/*
* Try to ensure that we free `nr_pages' pages in one pass of the loop.
*/
@@ -498,8 +499,9 @@ shrink_cache(const int nr_pages, struct
pagevec_init(&pvec, 1);
lru_add_drain();
+again:
spin_lock_irq(&zone->lru_lock);
- while (max_scan > 0 && ret < nr_pages) {
+ while (*nr_scanned < max_scan && ret < nr_pages) {
struct page *page;
int nr_taken = 0;
int nr_scan = 0;
@@ -529,18 +531,19 @@ shrink_cache(const int nr_pages, struct
zone->pages_scanned += nr_taken;
spin_unlock_irq(&zone->lru_lock);
+ *nr_scanned += nr_scan;
if (nr_taken == 0)
- goto done;
+ goto again;
- max_scan -= nr_scan;
mod_page_state(pgscan, nr_scan);
- nr_freed = shrink_list(&page_list, gfp_mask,
- &max_scan, nr_mapped);
+ nr_freed = shrink_list(&page_list, gfp_mask, nr_scanned);
ret += nr_freed;
+
if (nr_freed <= 0 && list_empty(&page_list))
- goto done;
+ goto again;
spin_lock_irq(&zone->lru_lock);
+
/*
* Put back any unfreeable pages.
*/
@@ -561,7 +564,6 @@ shrink_cache(const int nr_pages, struct
}
}
spin_unlock_irq(&zone->lru_lock);
-done:
pagevec_release(&pvec);
return ret;
}
@@ -570,9 +572,8 @@ done:
/* move pages from @page_list to the @spot, that should be somewhere on the
* @zone->active_list */
static int
-spill_on_spot(struct zone *zone,
- struct list_head *page_list, struct list_head *spot,
- struct pagevec *pvec)
+spill_on_spot(struct zone *zone, struct list_head *page_list,
+ struct list_head *spot, struct pagevec *pvec)
{
struct page *page;
int moved;
@@ -793,41 +794,47 @@ refill_inactive_zone(struct zone *zone,
* direct reclaim.
*/
static int
-shrink_zone(struct zone *zone, int max_scan, unsigned int gfp_mask,
- const int nr_pages, int *nr_mapped, struct page_state *ps)
+shrink_zone(struct zone *zone, unsigned int gfp_mask,
+ int nr_pages, int *nr_scanned, struct page_state *ps, int priority)
{
- unsigned long ratio;
+ unsigned long imbalance;
+ unsigned long nr_refill_inact;
+ unsigned long max_scan;
/*
* Try to keep the active list 2/3 of the size of the cache. And
* make sure that refill_inactive is given a decent number of pages.
*
- * The "ratio+1" here is important. With pagecache-intensive workloads
- * the inactive list is huge, and `ratio' evaluates to zero all the
- * time. Which pins the active list memory. So we add one to `ratio'
- * just to make sure that the kernel will slowly sift through the
- * active list.
+ * Keeping imbalance > 0 is important. With pagecache-intensive loads
+ * the inactive list is huge, and imbalance evaluates to zero all the
+ * time which would pin the active list memory.
*/
- ratio = (unsigned long)nr_pages * zone->nr_active /
- ((zone->nr_inactive | 1) * 2);
- atomic_add(ratio+1, &zone->refill_counter);
- if (atomic_read(&zone->refill_counter) > SWAP_CLUSTER_MAX) {
- int count;
-
- /*
- * Don't try to bring down too many pages in one attempt.
- * If this fails, the caller will increase `priority' and
- * we'll try again, with an increased chance of reclaiming
- * mapped memory.
- */
- count = atomic_read(&zone->refill_counter);
- if (count > SWAP_CLUSTER_MAX * 4)
- count = SWAP_CLUSTER_MAX * 4;
- atomic_set(&zone->refill_counter, 0);
- refill_inactive_zone(zone, count, ps);
+ if (zone->nr_active >= zone->nr_inactive*4)
+ /* ratio will be >= 2 */
+ imbalance = 8*nr_pages;
+ else if (zone->nr_active >= zone->nr_inactive*2)
+ /* 1 < ratio < 2 */
+ imbalance = 4*nr_pages*zone->nr_active / (zone->nr_inactive*2);
+ else
+ imbalance = nr_pages / 2;
+
+ imbalance++;
+
+ nr_refill_inact = atomic_read(&zone->refill_counter) + imbalance;
+ if (nr_refill_inact > SWAP_CLUSTER_MAX) {
+ refill_inactive_zone(zone, nr_refill_inact, ps);
+ nr_refill_inact = 0;
}
- return shrink_cache(nr_pages, zone, gfp_mask,
- max_scan, nr_mapped);
+ atomic_set(&zone->refill_counter, nr_refill_inact);
+
+ /*
+ * Now pull pages from the inactive list
+ */
+ max_scan = zone->nr_inactive >> priority;
+ if (max_scan < nr_pages * 2)
+ max_scan = nr_pages * 2;
+
+ return shrink_cache(nr_pages, zone, gfp_mask, max_scan, nr_scanned);
}
/*
@@ -856,8 +863,7 @@ shrink_caches(struct zone **zones, int p
for (i = 0; zones[i] != NULL; i++) {
int to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX);
struct zone *zone = zones[i];
- int nr_mapped = 0;
- int max_scan;
+ int nr_scanned;
if (zone->free_pages < zone->pages_high)
zone->temp_priority = priority;
@@ -865,16 +871,9 @@ shrink_caches(struct zone **zones, int p
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue; /* Let kswapd poll it */
- /*
- * If we cannot reclaim `nr_pages' pages by scanning twice
- * that many pages then fall back to the next zone.
- */
- max_scan = zone->nr_inactive >> priority;
- if (max_scan < to_reclaim * 2)
- max_scan = to_reclaim * 2;
- ret += shrink_zone(zone, max_scan, gfp_mask,
- to_reclaim, &nr_mapped, ps);
- *total_scanned += max_scan + nr_mapped;
+ ret += shrink_zone(zone, gfp_mask,
+ to_reclaim, &nr_scanned, ps, priority);
+ *total_scanned += nr_scanned;
if (ret >= nr_pages)
break;
}
@@ -920,6 +919,15 @@ int try_to_free_pages(struct zone **zone
get_page_state(&ps);
nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
gfp_mask, nr_pages, &ps);
+
+ if (zones[0] - zones[0]->zone_pgdat->node_zones < ZONE_HIGHMEM) {
+ shrink_slab(total_scanned, gfp_mask);
+ if (reclaim_state) {
+ nr_reclaimed += reclaim_state->reclaimed_slab;
+ reclaim_state->reclaimed_slab = 0;
+ }
+ }
+
if (nr_reclaimed >= nr_pages) {
ret = 1;
goto out;
@@ -935,13 +943,6 @@ int try_to_free_pages(struct zone **zone
/* Take a nap, wait for some writeback to complete */
blk_congestion_wait(WRITE, HZ/10);
- if (zones[0] - zones[0]->zone_pgdat->node_zones < ZONE_HIGHMEM) {
- shrink_slab(total_scanned, gfp_mask);
- if (reclaim_state) {
- nr_reclaimed += reclaim_state->reclaimed_slab;
- reclaim_state->reclaimed_slab = 0;
- }
- }
}
if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
out_of_memory();
@@ -989,8 +990,7 @@ static int balance_pgdat(pg_data_t *pgda
for (i = 0; i < pgdat->nr_zones; i++) {
struct zone *zone = pgdat->node_zones + i;
- int nr_mapped = 0;
- int max_scan;
+ int nr_scanned;
int to_reclaim;
int reclaimed;
@@ -1005,16 +1005,11 @@ static int balance_pgdat(pg_data_t *pgda
continue;
}
zone->temp_priority = priority;
- max_scan = zone->nr_inactive >> priority;
- if (max_scan < to_reclaim * 2)
- max_scan = to_reclaim * 2;
- if (max_scan < SWAP_CLUSTER_MAX)
- max_scan = SWAP_CLUSTER_MAX;
- reclaimed = shrink_zone(zone, max_scan, GFP_KERNEL,
- to_reclaim, &nr_mapped, ps);
+ reclaimed = shrink_zone(zone, GFP_KERNEL,
+ to_reclaim, &nr_scanned, ps, priority);
if (i < ZONE_HIGHMEM) {
reclaim_state->reclaimed_slab = 0;
- shrink_slab(max_scan + nr_mapped, GFP_KERNEL);
+ shrink_slab(nr_scanned, GFP_KERNEL);
reclaimed += reclaim_state->reclaimed_slab;
}
to_free -= reclaimed;
_
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 5/5] mm improvements
2004-02-04 9:39 [PATCH 0/5] mm improvements Nick Piggin
` (3 preceding siblings ...)
2004-02-04 9:42 ` [PATCH 4/5] " Nick Piggin
@ 2004-02-04 9:42 ` Nick Piggin
2004-02-04 10:03 ` Nick Piggin
2004-02-04 10:18 ` Andrew Morton
2004-02-04 13:25 ` [PATCH 0/5] " Nikita Danilov
5 siblings, 2 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 9:42 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 138 bytes --]
Nick Piggin wrote:
> 5/5: vm-tune-throttle.patch
> Try to allocate a bit harder before giving up / throttling on
> writeout.
>
[-- Attachment #2: vm-tune-throttle.patch --]
[-- Type: text/plain, Size: 3115 bytes --]
This patch causes try_to_free_pages to wakeup_bdflush even if it has
reclaimed the required # of pages on the first scan.
It allows two scans at the two lowest priorities before breaking out or
doing a blk_congestion_wait, for both try_to_free_pages and balance_pgdat.
linux-2.6-npiggin/mm/vmscan.c | 38 +++++++++++++++++++++-----------------
1 files changed, 21 insertions(+), 17 deletions(-)
diff -puN mm/vmscan.c~vm-tune-throttle mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-tune-throttle 2004-02-04 14:09:46.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-04 14:09:46.000000000 +1100
@@ -930,22 +930,33 @@ int try_to_free_pages(struct zone **zone
if (nr_reclaimed >= nr_pages) {
ret = 1;
+ if (gfp_mask & __GFP_FS)
+ wakeup_bdflush(total_scanned);
goto out;
}
+
+ /* Don't stall on the first run - it might be bad luck */
+ if (likely(priority == DEF_PRIORITY))
+ continue;
+
+ /* Let the caller handle it */
if (!(gfp_mask & __GFP_FS))
- break; /* Let the caller handle it */
+ goto out;
+
/*
- * Try to write back as many pages as we just scanned. Not
- * sure if that makes sense, but it's an attempt to avoid
- * creating IO storms unnecessarily
+ * Try to write back as many pages as we just scanned.
+ * Not sure if that makes sense, but it's an attempt
+ * to avoid creating IO storms unnecessarily
*/
wakeup_bdflush(total_scanned);
/* Take a nap, wait for some writeback to complete */
blk_congestion_wait(WRITE, HZ/10);
}
- if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
+
+ if (!(gfp_mask & __GFP_NORETRY))
out_of_memory();
+
out:
for (i = 0; zones[i] != 0; i++)
zones[i]->prev_priority = zones[i]->temp_priority;
@@ -1004,6 +1015,7 @@ static int balance_pgdat(pg_data_t *pgda
if (to_reclaim <= 0)
continue;
}
+ all_zones_ok = 0;
zone->temp_priority = priority;
reclaimed = shrink_zone(zone, GFP_KERNEL,
to_reclaim, &nr_scanned, ps, priority);
@@ -1017,16 +1029,6 @@ static int balance_pgdat(pg_data_t *pgda
continue;
if (zone->pages_scanned > zone->present_pages * 2)
zone->all_unreclaimable = 1;
- /*
- * If this scan failed to reclaim `to_reclaim' or more
- * pages, we're getting into trouble. Need to scan
- * some more, and throttle kswapd. Note that this zone
- * may now have sufficient free pages due to freeing
- * activity by some other process. That's OK - we'll
- * pick that info up on the next pass through the loop.
- */
- if (reclaimed < to_reclaim)
- all_zones_ok = 0;
}
if (nr_pages && to_free > 0)
continue; /* swsusp: need to do more work */
@@ -1034,9 +1036,11 @@ static int balance_pgdat(pg_data_t *pgda
break; /* kswapd: all done */
/*
* OK, kswapd is getting into trouble. Take a nap, then take
- * another pass across the zones.
+ * another pass across the zones. Don't stall on the first
+ * pass.
*/
- blk_congestion_wait(WRITE, HZ/10);
+ if (priority < DEF_PRIORITY)
+ blk_congestion_wait(WRITE, HZ/10);
}
for (i = 0; i < pgdat->nr_zones; i++) {
_
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 5/5] mm improvements
2004-02-04 9:42 ` [PATCH 5/5] " Nick Piggin
@ 2004-02-04 10:03 ` Nick Piggin
2004-02-04 10:18 ` Andrew Morton
1 sibling, 0 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 10:03 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Nick Piggin wrote:
> Nick Piggin wrote:
>
>> 5/5: vm-tune-throttle.patch
>> Try to allocate a bit harder before giving up / throttling on
>> writeout.
>>
>
That would be "try to free a bit harder"
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 2/5] mm improvements
2004-02-04 9:40 ` [PATCH 2/5] " Nick Piggin
@ 2004-02-04 10:10 ` Andrew Morton
2004-02-04 10:15 ` Nick Piggin
2004-02-04 15:27 ` Rik van Riel
0 siblings, 2 replies; 40+ messages in thread
From: Andrew Morton @ 2004-02-04 10:10 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-mm
Nick Piggin <piggin@cyberone.com.au> wrote:
>
> > 2/5: vm-dont-rotate-active-list.patch
> > Nikita's patch to keep more page ordering info in the active list.
> > Also should improve system time due to less useless scanning
> > Helps swapping loads significantly.
It bugs me that this improvement is also applicable to 2.4. if it makes
the same improvement there, we're still behind.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 4/5] mm improvements
2004-02-04 9:42 ` [PATCH 4/5] " Nick Piggin
@ 2004-02-04 10:11 ` Andrew Morton
2004-02-04 10:19 ` Nick Piggin
0 siblings, 1 reply; 40+ messages in thread
From: Andrew Morton @ 2004-02-04 10:11 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-mm
Nick Piggin <piggin@cyberone.com.au> wrote:
>
> + if (zone->nr_active >= zone->nr_inactive*4)
> + /* ratio will be >= 2 */
> + imbalance = 8*nr_pages;
> + else if (zone->nr_active >= zone->nr_inactive*2)
> + /* 1 < ratio < 2 */
> + imbalance = 4*nr_pages*zone->nr_active / (zone->nr_inactive*2);
This can cause a divide-by-zero, yes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 2/5] mm improvements
2004-02-04 10:10 ` Andrew Morton
@ 2004-02-04 10:15 ` Nick Piggin
2004-02-04 15:27 ` Rik van Riel
1 sibling, 0 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 10:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Andrew Morton wrote:
>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>> > 2/5: vm-dont-rotate-active-list.patch
>> > Nikita's patch to keep more page ordering info in the active list.
>> > Also should improve system time due to less useless scanning
>> > Helps swapping loads significantly.
>>
>
>It bugs me that this improvement is also applicable to 2.4. if it makes
>the same improvement there, we're still behind.
>
>
>
Yeah that bugs me too. If someone wants to backport it to 2.4 it
would be interesting to measure. I'm not sure if it will get
included, but if it did and if it helps as much as it helped 2.6
then it makes my job a lot harder.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 5/5] mm improvements
2004-02-04 9:42 ` [PATCH 5/5] " Nick Piggin
2004-02-04 10:03 ` Nick Piggin
@ 2004-02-04 10:18 ` Andrew Morton
2004-02-04 10:22 ` Nick Piggin
1 sibling, 1 reply; 40+ messages in thread
From: Andrew Morton @ 2004-02-04 10:18 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-mm
Nick Piggin <piggin@cyberone.com.au> wrote:
>
> It allows two scans at the two lowest priorities before breaking out or
> doing a blk_congestion_wait, for both try_to_free_pages and balance_pgdat.
This seems to be fairly equivalent to simply subtracting one from
DEF_PRIORITY.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 4/5] mm improvements
2004-02-04 10:11 ` Andrew Morton
@ 2004-02-04 10:19 ` Nick Piggin
0 siblings, 0 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 10:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Andrew Morton wrote:
>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>+ if (zone->nr_active >= zone->nr_inactive*4)
>> + /* ratio will be >= 2 */
>> + imbalance = 8*nr_pages;
>> + else if (zone->nr_active >= zone->nr_inactive*2)
>> + /* 1 < ratio < 2 */
>> + imbalance = 4*nr_pages*zone->nr_active / (zone->nr_inactive*2);
>>
>
>This can cause a divide-by-zero, yes?
>
>
Yes. Sorry. I guess just adding 1 to the divisor should be fine.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 5/5] mm improvements
2004-02-04 10:18 ` Andrew Morton
@ 2004-02-04 10:22 ` Nick Piggin
0 siblings, 0 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-04 10:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Andrew Morton wrote:
>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>It allows two scans at the two lowest priorities before breaking out or
>> doing a blk_congestion_wait, for both try_to_free_pages and balance_pgdat.
>>
>
>This seems to be fairly equivalent to simply subtracting one from
>DEF_PRIORITY.
>
>
Sort of - except in the case where nr_inactive >> priority
is less than nr_pages*2. Oh and this also allows another
shot at refilling the inactive list.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 9:39 [PATCH 0/5] mm improvements Nick Piggin
` (4 preceding siblings ...)
2004-02-04 9:42 ` [PATCH 5/5] " Nick Piggin
@ 2004-02-04 13:25 ` Nikita Danilov
2004-02-04 13:53 ` Hugh Dickins
5 siblings, 1 reply; 40+ messages in thread
From: Nikita Danilov @ 2004-02-04 13:25 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-mm
Nick Piggin writes:
> Patches against 2.6.2-rc3-mm1.
> Please test / review / comment.
Hello, Nick,
I composed a new patch that may be worth trying:
ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.02.04/p12-dont-unmap-on-pageout.patch
It avoids (if possible) unmapping dirty page before calling
->writepage(). Intention is to avoid minor page faults for the pages
under write-back.
To this end new function mm/rmap.c:page_is_dirty() is added that scans
page's ptes and transfers their dirtiness to the struct page
itself. page_is_dirty() is called by shrink_list() and page is unmapped
only if page_is_dirty() found all ptes clean.
Few points:
1. I only gave it light testing (compared with other patches in the
"extra" series).
2. dont-unmap-on-pageout logically depends on check-pte-dirty, and
textually on skip-writepage patches.
3. for some unimportant reasons patches were produces with "diff -b",
and may, hence, require "patch -l" to apply.
4. I found that shmem_writepage() has BUG_ON(page_mapped(page))
check. Its removal had no effect, and I am not sure why the check was
there at all.
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 13:25 ` [PATCH 0/5] " Nikita Danilov
@ 2004-02-04 13:53 ` Hugh Dickins
2004-02-04 14:03 ` Nikita Danilov
2004-02-04 18:33 ` Andrew Morton
0 siblings, 2 replies; 40+ messages in thread
From: Hugh Dickins @ 2004-02-04 13:53 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Nick Piggin, Andrew Morton, linux-mm
On Wed, 4 Feb 2004, Nikita Danilov wrote:
>
> 4. I found that shmem_writepage() has BUG_ON(page_mapped(page))
> check. Its removal had no effect, and I am not sure why the check was
> there at all.
Sorry, that BUG_ON is there for very good reason. It's no disgrace
that your testing didn't notice the effect of passing a mapped page
down to shmem_writepage, but it is a serious breakage of tmpfs.
I'd have to sit here thinking awhile to remember if there are further
reasons why it's a no-no. But the reason that springs to mind is it
breaks the semantics of a tmpfs file mapped shared into different mms.
shmem_writepage changes the tmpfs-file identity of the page to swap
identity: so if it's unmapped later, the instances would then become
private (to be COWed) instead of shared.
If you go the writepage-while-mapped route (more general gotchas?
I forget), you'll have to make an exception for shmem_writepage.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 13:53 ` Hugh Dickins
@ 2004-02-04 14:03 ` Nikita Danilov
2004-02-04 15:03 ` Hugh Dickins
2004-02-04 18:33 ` Andrew Morton
1 sibling, 1 reply; 40+ messages in thread
From: Nikita Danilov @ 2004-02-04 14:03 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Nick Piggin, Andrew Morton, linux-mm
Hugh Dickins writes:
> On Wed, 4 Feb 2004, Nikita Danilov wrote:
> >
> > 4. I found that shmem_writepage() has BUG_ON(page_mapped(page))
> > check. Its removal had no effect, and I am not sure why the check was
> > there at all.
>
> Sorry, that BUG_ON is there for very good reason. It's no disgrace
> that your testing didn't notice the effect of passing a mapped page
> down to shmem_writepage, but it is a serious breakage of tmpfs.
>
> I'd have to sit here thinking awhile to remember if there are further
> reasons why it's a no-no. But the reason that springs to mind is it
> breaks the semantics of a tmpfs file mapped shared into different mms.
> shmem_writepage changes the tmpfs-file identity of the page to swap
> identity: so if it's unmapped later, the instances would then become
> private (to be COWed) instead of shared.
Ah, I see.
>
> If you go the writepage-while-mapped route (more general gotchas?
> I forget), you'll have to make an exception for shmem_writepage.
May be one can just call try_to_unmap() from shmem_writepage()?
>
> Hugh
>
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 14:03 ` Nikita Danilov
@ 2004-02-04 15:03 ` Hugh Dickins
2004-02-04 15:19 ` Nikita Danilov
0 siblings, 1 reply; 40+ messages in thread
From: Hugh Dickins @ 2004-02-04 15:03 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Nick Piggin, Andrew Morton, linux-mm
On Wed, 4 Feb 2004, Nikita Danilov wrote:
> Hugh Dickins writes:
> > If you go the writepage-while-mapped route (more general gotchas?
> > I forget), you'll have to make an exception for shmem_writepage.
>
> May be one can just call try_to_unmap() from shmem_writepage()?
That sounds much cleaner. But I've not yet found what tree your
p12-dont-unmap-on-pageout.patch applies to, so cannot judge it.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 15:03 ` Hugh Dickins
@ 2004-02-04 15:19 ` Nikita Danilov
2004-02-05 2:13 ` Nick Piggin
0 siblings, 1 reply; 40+ messages in thread
From: Nikita Danilov @ 2004-02-04 15:19 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Nick Piggin, Andrew Morton, linux-mm
Hugh Dickins writes:
> On Wed, 4 Feb 2004, Nikita Danilov wrote:
> > Hugh Dickins writes:
> > > If you go the writepage-while-mapped route (more general gotchas?
> > > I forget), you'll have to make an exception for shmem_writepage.
> >
> > May be one can just call try_to_unmap() from shmem_writepage()?
>
> That sounds much cleaner. But I've not yet found what tree your
> p12-dont-unmap-on-pageout.patch applies to, so cannot judge it.
Whole
ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.02.04/
applies to the 2.6.2-rc2.
I just updated p12-dont-unmap-on-pageout.patch in-place.
To apply it one has to apply skip-writepage and check-pte-dirty first.
>
> Hugh
>
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 2/5] mm improvements
2004-02-04 10:10 ` Andrew Morton
2004-02-04 10:15 ` Nick Piggin
@ 2004-02-04 15:27 ` Rik van Riel
2004-02-05 2:18 ` Nick Piggin
1 sibling, 1 reply; 40+ messages in thread
From: Rik van Riel @ 2004-02-04 15:27 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nick Piggin, linux-mm
On Wed, 4 Feb 2004, Andrew Morton wrote:
> Nick Piggin <piggin@cyberone.com.au> wrote:
> > > 2/5: vm-dont-rotate-active-list.patch
> > > Nikita's patch to keep more page ordering info in the active list.
> > > Also should improve system time due to less useless scanning
> > > Helps swapping loads significantly.
>
> It bugs me that this improvement is also applicable to 2.4. if it makes
> the same improvement there, we're still behind.
I suspect 2.4 won't see the gains from this, since active/inactive
list location is hardly relevant for mapped pages there, due to the
page table scanning algorithm.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/5] mm improvements
2004-02-04 9:41 ` [PATCH 3/5] " Nick Piggin
@ 2004-02-04 15:28 ` Rik van Riel
2004-02-04 16:45 ` Nikita Danilov
0 siblings, 1 reply; 40+ messages in thread
From: Rik van Riel @ 2004-02-04 15:28 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-mm
On Wed, 4 Feb 2004, Nick Piggin wrote:
> Nick Piggin wrote:
>
> > 3/5: vm-lru-info.patch
> > Keep more referenced info in the active list. Should also improve
> > system time in some cases. Helps swapping loads significantly.
I suspect this is one of the more important ones in this
batch of patches...
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/5] mm improvements
2004-02-04 15:28 ` Rik van Riel
@ 2004-02-04 16:45 ` Nikita Danilov
2004-02-04 18:53 ` Andrew Morton
2004-02-05 2:10 ` Nick Piggin
0 siblings, 2 replies; 40+ messages in thread
From: Nikita Danilov @ 2004-02-04 16:45 UTC (permalink / raw)
To: Rik van Riel; +Cc: Nick Piggin, Andrew Morton, linux-mm
Rik van Riel writes:
> On Wed, 4 Feb 2004, Nick Piggin wrote:
> > Nick Piggin wrote:
> >
> > > 3/5: vm-lru-info.patch
> > > Keep more referenced info in the active list. Should also improve
> > > system time in some cases. Helps swapping loads significantly.
>
> I suspect this is one of the more important ones in this
> batch of patches...
I don't understand how this works. This patch just parks mapped pages on
the "ignored" segment of the active list, where they rest until
reclaim_mapped mode is entered.
This only makes a difference for the pages that were page_referenced():
1. they are moved to the ignored segment rather than to the head of the
active list.
2. their referenced bit is not cleared
Now, as "ignored" segment is not scanned in !reclaim_mode, (2) would
only make a difference when VM rapidly oscillates between reclaim_mapped
and !reclaim_mapped, because after a long period of !reclaim_mapped
operation preserved referenced bit on a page only means "this page has
been referenced in the past, but not necessary recently".
And if (1) affects performance significantly, that something rotten in
the idea of treating mapped pages preferentially by the replacement, and
the same effect can be achieved by simply increasing vm_swappiness.
Nick, can you test what will be an effect of doing something like
while (!list_empty(&l_hold)) {
page = lru_to_page(&l_hold);
list_del(&page->lru);
if (page_mapped(page)) {
int referenced;
referenced = page_referenced(page);
if (!reclaim_mapped) {
list_add(&page->lru, &l_ignore);
continue;
}
pte_chain_lock(page);
if (page_mapped(page) && referenced) {
pte_chain_unlock(page);
list_add(&page->lru, &l_active);
continue;
}
pte_chain_unlock(page);
}
...
i.e., by cleaning the referenced bit before moving page to the l_ignore?
>
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 13:53 ` Hugh Dickins
2004-02-04 14:03 ` Nikita Danilov
@ 2004-02-04 18:33 ` Andrew Morton
2004-02-04 20:54 ` Hugh Dickins
1 sibling, 1 reply; 40+ messages in thread
From: Andrew Morton @ 2004-02-04 18:33 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Nikita, piggin, linux-mm
Hugh Dickins <hugh@veritas.com> wrote:
>
> > 4. I found that shmem_writepage() has BUG_ON(page_mapped(page))
> > check. Its removal had no effect, and I am not sure why the check was
> > there at all.
>
> Sorry, that BUG_ON is there for very good reason. It's no disgrace
> that your testing didn't notice the effect of passing a mapped page
> down to shmem_writepage, but it is a serious breakage of tmpfs.
hm. Can't I force writepage-of-a-mapped-page with msync()?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/5] mm improvements
2004-02-04 16:45 ` Nikita Danilov
@ 2004-02-04 18:53 ` Andrew Morton
2004-02-05 2:10 ` Nick Piggin
1 sibling, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2004-02-04 18:53 UTC (permalink / raw)
To: Nikita Danilov; +Cc: riel, piggin, linux-mm
Nikita Danilov <Nikita@Namesys.COM> wrote:
>
> Rik van Riel writes:
> > On Wed, 4 Feb 2004, Nick Piggin wrote:
> > > Nick Piggin wrote:
> > >
> > > > 3/5: vm-lru-info.patch
> > > > Keep more referenced info in the active list. Should also improve
> > > > system time in some cases. Helps swapping loads significantly.
> >
> > I suspect this is one of the more important ones in this
> > batch of patches...
>
> I don't understand how this works. This patch just parks mapped pages on
> the "ignored" segment of the active list, where they rest until
> reclaim_mapped mode is entered.
>
> This only makes a difference for the pages that were page_referenced():
>
> 1. they are moved to the ignored segment rather than to the head of the
> active list.
>
> 2. their referenced bit is not cleared
>
> Now, as "ignored" segment is not scanned in !reclaim_mode, (2) would
> only make a difference when VM rapidly oscillates between reclaim_mapped
> and !reclaim_mapped, because after a long period of !reclaim_mapped
> operation preserved referenced bit on a page only means "this page has
> been referenced in the past, but not necessary recently".
Yes, reclaim_mapped shouldn't change at all frequently, unless the
zone->prev_priority thing is broken. prev_priority is supposed to remember
the reclaim_mapped state between successive scan attempts so we go straight
into doing the right thing.
It _was_ working - I instrumented and tested that when it went in. It was
a bit tricky to get right.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/5] mm improvements
2004-02-04 9:40 ` [PATCH 1/5] " Nick Piggin
@ 2004-02-04 19:45 ` Rik van Riel
2004-02-09 7:00 ` Nick Piggin
0 siblings, 1 reply; 40+ messages in thread
From: Rik van Riel @ 2004-02-04 19:45 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-mm
On Wed, 4 Feb 2004, Nick Piggin wrote:
> > 1/5: vm-no-rss-limit.patch
> > Remove broken RSS limiting. Simple problem, Rik is onto it.
> >
Does the patch below fix the performance problem with the
rss limit patch ?
===== fs/exec.c 1.103 vs edited =====
--- 1.103/fs/exec.c Mon Jan 19 01:35:50 2004
+++ edited/fs/exec.c Wed Feb 4 14:38:10 2004
@@ -1117,6 +1117,11 @@
retval = init_new_context(current, bprm.mm);
if (retval < 0)
goto out_mm;
+ if (likely(current->mm)) {
+ bprm.mm->rlimit_rss = current->mm->rlimit_rss;
+ } else {
+ bprm.mm->rlimit_rss = init_mm.rlimit_rss;
+ }
bprm.argc = count(argv, bprm.p / sizeof(void *));
if ((retval = bprm.argc) < 0)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 18:33 ` Andrew Morton
@ 2004-02-04 20:54 ` Hugh Dickins
2004-02-04 21:04 ` Andrew Morton
0 siblings, 1 reply; 40+ messages in thread
From: Hugh Dickins @ 2004-02-04 20:54 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nikita, piggin, linux-mm
On Wed, 4 Feb 2004, Andrew Morton wrote:
> Hugh Dickins <hugh@veritas.com> wrote:
> >
> > Sorry, that BUG_ON is there for very good reason. It's no disgrace
> > that your testing didn't notice the effect of passing a mapped page
> > down to shmem_writepage, but it is a serious breakage of tmpfs.
>
> hm. Can't I force writepage-of-a-mapped-page with msync()?
I hope not, __filemap_fdatawrite still starts off with:
if (mapping->backing_dev_info->memory_backed)
return 0;
Once upon a time you did have vmscan.c calling ->writepages, rather
the effect that Nikita is trying for. It was that writepages which
led me to insert the BUG_ON and give tmpfs a dummy writepages.
Later on you dropped the ->writepages from vmscan.c:
do you remember why? would be useful info for Nikita.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 20:54 ` Hugh Dickins
@ 2004-02-04 21:04 ` Andrew Morton
0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2004-02-04 21:04 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Nikita, piggin, linux-mm
Hugh Dickins <hugh@veritas.com> wrote:
>
> On Wed, 4 Feb 2004, Andrew Morton wrote:
> > Hugh Dickins <hugh@veritas.com> wrote:
> > >
> > > Sorry, that BUG_ON is there for very good reason. It's no disgrace
> > > that your testing didn't notice the effect of passing a mapped page
> > > down to shmem_writepage, but it is a serious breakage of tmpfs.
> >
> > hm. Can't I force writepage-of-a-mapped-page with msync()?
>
> I hope not, __filemap_fdatawrite still starts off with:
>
> if (mapping->backing_dev_info->memory_backed)
> return 0;
Sigh. ->memory_backed is a crock. It is excessively overloaded and needs
to be split up into several things which really mean something.
> Once upon a time you did have vmscan.c calling ->writepages, rather
> the effect that Nikita is trying for. It was that writepages which
> led me to insert the BUG_ON and give tmpfs a dummy writepages.
> Later on you dropped the ->writepages from vmscan.c:
> do you remember why? would be useful info for Nikita.
I'd need to troll the changelogs to remember the exact reason. I had the
standalone a_ops->vm_writeback thing in there, which was able to do
writearound against the targetted page. iirc it was causing some difficulties and
as a big effort was underway to minimise the amount of writeout via vmscan
_anyway_, I decided to toss it all out, stick with page-at-a-time writepage.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/5] mm improvements
2004-02-04 16:45 ` Nikita Danilov
2004-02-04 18:53 ` Andrew Morton
@ 2004-02-05 2:10 ` Nick Piggin
1 sibling, 0 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-05 2:10 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Rik van Riel, Andrew Morton, linux-mm
Nikita Danilov wrote:
>Rik van Riel writes:
> > On Wed, 4 Feb 2004, Nick Piggin wrote:
> > > Nick Piggin wrote:
> > >
> > > > 3/5: vm-lru-info.patch
> > > > Keep more referenced info in the active list. Should also improve
> > > > system time in some cases. Helps swapping loads significantly.
> >
> > I suspect this is one of the more important ones in this
> > batch of patches...
>
>I don't understand how this works. This patch just parks mapped pages on
>the "ignored" segment of the active list, where they rest until
>reclaim_mapped mode is entered.
>
>This only makes a difference for the pages that were page_referenced():
>
>1. they are moved to the ignored segment rather than to the head of the
>active list.
>
>2. their referenced bit is not cleared
>
>
It treats all mapped pages in the same manner. Without this
patch, referenced mapped pages are distinctly disadvantaged
vs unreferenced mapped pages.
Even if reclaim_mapped is only flipped once every few
seconds it can make a big impact. On a 64MB heavily
swapping, you probably take 10 seconds to reclaim 64MB. It
is of critical importance that we keep as much hotness
information as possible.
It shows on the benchmarks too. It provides nearly as
much improvement as your patch alone for a make -j16.
ie. over 20%
http://www.kerneltrap.org/~npiggin/vm/2/
Also, when you're heavily swapping, everything slows down
to such an extent that "hot" pages are no longer touched
thousands of times per second, but maybe a few times every
few seconds. If you're continually clearing this information,
as soon as reclaim_mapped is triggered, all your hot pages
get evicted.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-04 15:19 ` Nikita Danilov
@ 2004-02-05 2:13 ` Nick Piggin
2004-02-05 14:03 ` Nikita Danilov
0 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-05 2:13 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Hugh Dickins, Andrew Morton, linux-mm
Nikita Danilov wrote:
>Hugh Dickins writes:
> > On Wed, 4 Feb 2004, Nikita Danilov wrote:
> > > Hugh Dickins writes:
> > > > If you go the writepage-while-mapped route (more general gotchas?
> > > > I forget), you'll have to make an exception for shmem_writepage.
> > >
> > > May be one can just call try_to_unmap() from shmem_writepage()?
> >
> > That sounds much cleaner. But I've not yet found what tree your
> > p12-dont-unmap-on-pageout.patch applies to, so cannot judge it.
>
>Whole
>ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.02.04/
>applies to the 2.6.2-rc2.
>
>I just updated p12-dont-unmap-on-pageout.patch in-place.
>
>
>
Sure, I can give this a try. It makes sense.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 2/5] mm improvements
2004-02-04 15:27 ` Rik van Riel
@ 2004-02-05 2:18 ` Nick Piggin
0 siblings, 0 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-05 2:18 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-mm
Rik van Riel wrote:
>On Wed, 4 Feb 2004, Andrew Morton wrote:
>
>
>>Nick Piggin <piggin@cyberone.com.au> wrote:
>>
>>
>>> > 2/5: vm-dont-rotate-active-list.patch
>>> > Nikita's patch to keep more page ordering info in the active list.
>>> > Also should improve system time due to less useless scanning
>>> > Helps swapping loads significantly.
>>>
>>>
>>It bugs me that this improvement is also applicable to 2.4. if it makes
>>the same improvement there, we're still behind.
>>
>>
>
>I suspect 2.4 won't see the gains from this, since active/inactive
>list location is hardly relevant for mapped pages there, due to the
>page table scanning algorithm.
>
>
>
Yeah you're right.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 2:13 ` Nick Piggin
@ 2004-02-05 14:03 ` Nikita Danilov
2004-02-05 15:11 ` Nick Piggin
0 siblings, 1 reply; 40+ messages in thread
From: Nikita Danilov @ 2004-02-05 14:03 UTC (permalink / raw)
To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm
[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 1692 bytes --]
Nick Piggin writes:
>
>
> Nikita Danilov wrote:
>
> >Hugh Dickins writes:
> > > On Wed, 4 Feb 2004, Nikita Danilov wrote:
> > > > Hugh Dickins writes:
> > > > > If you go the writepage-while-mapped route (more general gotchas?
> > > > > I forget), you'll have to make an exception for shmem_writepage.
> > > >
> > > > May be one can just call try_to_unmap() from shmem_writepage()?
> > >
> > > That sounds much cleaner. But I've not yet found what tree your
> > > p12-dont-unmap-on-pageout.patch applies to, so cannot judge it.
> >
> >Whole
> >ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.02.04/
> >applies to the 2.6.2-rc2.
> >
> >I just updated p12-dont-unmap-on-pageout.patch in-place.
> >
> >
> >
>
> Sure, I can give this a try. It makes sense.
>
To my surprise I have just found that
ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.02.04/p10-trasnfer-dirty-on-refill.patch
[yes, I know there is a typo in the name.]
patch improves performance quite measurably. It implements a suggestion
made in the comment in refill_inactive_zone():
/*
* probably it would be useful to transfer dirty bit
* from pte to the @page here.
*/
To do this page_is_dirty() function is used (the same one as used by
dont-unmap-on-pageout.patch), which is implemented in
check-pte-dirty.patch.
I ran
$ time build.sh 10 11
(attached) and get following elapsed time:
without patch: 3818.320, with patch: 3368.690 (11% improvement).
As I see it, early transfer of dirtiness to the struct page allows to do
more write-back through ->writepages() which is much more efficient way
than single-page ->writepage.
Nikita.
[-- Attachment #2: build.sh --]
[-- Type: text/plain, Size: 726 bytes --]
#! /bin/sh
nr=$1
pl=$2
path=/usr/src/linux-2.5.59-mm6/
s=$(seq 1 $nr)
function emit()
{
echo $*
xtermset -T "$*"
}
emit Removing
rm -fr [0-9]* linux* 2> /dev/null
emit Copying
cp -r $path . 2>/dev/null
emit Cloning
for i in $s ;do
bk clone linux-2.5.59-mm6 $i >/dev/null 2>/dev/null &
done
wait
emit Unpacking
for i in $s ;do
cd $i
bk -r get -q &
cd ..
done
wait
emit Cleaning
for i in $s ;do
cd $i
make mrproper >/dev/null 2>/dev/null &
cd ..
done
wait
emit Building
for i in $s ;do
cd $i
cp ../.config .
yes | make oldconfig >/dev/null 2>/dev/null
make -j$pl bzImage >/dev/null 2>/dev/null &
cd ..
done
wait
emit done.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 14:03 ` Nikita Danilov
@ 2004-02-05 15:11 ` Nick Piggin
2004-02-05 15:15 ` Nick Piggin
0 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-05 15:11 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Hugh Dickins, Andrew Morton, linux-mm
Nikita Danilov wrote:
>To my surprise I have just found that
>
>ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.02.04/p10-trasnfer-dirty-on-refill.patch
>
>[yes, I know there is a typo in the name.]
>
>patch improves performance quite measurably. It implements a suggestion
>made in the comment in refill_inactive_zone():
>
> /*
> * probably it would be useful to transfer dirty bit
> * from pte to the @page here.
> */
>
>To do this page_is_dirty() function is used (the same one as used by
>dont-unmap-on-pageout.patch), which is implemented in
>check-pte-dirty.patch.
>
>I ran
>
>$ time build.sh 10 11
>
>(attached) and get following elapsed time:
>
>without patch: 3818.320, with patch: 3368.690 (11% improvement).
>
>
That looks nice. I promise I will test your new patches, but
can you tell me if I've misread this patch?
2004.02.04/p0f-check-pte-dirty.patch:
function page_is_dirty:
if not PageDirect, then for each pte:
+ pte_dirty = page_pte_is_dirty(page, pte_paddr);
+ if (pte_dirty != 0)
+ ret = pte_dirty;
Won't this leave ret in a random state? Should it be ret++?
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 15:11 ` Nick Piggin
@ 2004-02-05 15:15 ` Nick Piggin
2004-02-05 15:20 ` Nikita Danilov
0 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-05 15:15 UTC (permalink / raw)
To: Nick Piggin; +Cc: Nikita Danilov, Hugh Dickins, Andrew Morton, linux-mm
Nick Piggin wrote:
>
> + pte_dirty = page_pte_is_dirty(page, pte_paddr);
> + if (pte_dirty != 0)
> + ret = pte_dirty;
>
>
> Won't this leave ret in a random state? Should it be ret++?
>
... must be past my bedtime.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 15:15 ` Nick Piggin
@ 2004-02-05 15:20 ` Nikita Danilov
2004-02-05 15:33 ` Nick Piggin
0 siblings, 1 reply; 40+ messages in thread
From: Nikita Danilov @ 2004-02-05 15:20 UTC (permalink / raw)
To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm
Nick Piggin writes:
>
>
> Nick Piggin wrote:
>
> >
> > + pte_dirty = page_pte_is_dirty(page, pte_paddr);
> > + if (pte_dirty != 0)
> > + ret = pte_dirty;
> >
> >
> > Won't this leave ret in a random state? Should it be ret++?
> >
>
> ... must be past my bedtime.
>
Well, @ret handling is kind of obscure in this function, right.
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 15:20 ` Nikita Danilov
@ 2004-02-05 15:33 ` Nick Piggin
2004-02-05 15:46 ` Nikita Danilov
0 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-05 15:33 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Hugh Dickins, Andrew Morton, linux-mm
Nikita Danilov wrote:
>Nick Piggin writes:
> >
> >
> > Nick Piggin wrote:
> >
> > >
> > > + pte_dirty = page_pte_is_dirty(page, pte_paddr);
> > > + if (pte_dirty != 0)
> > > + ret = pte_dirty;
> > >
> > >
> > > Won't this leave ret in a random state? Should it be ret++?
> > >
> >
> > ... must be past my bedtime.
> >
>
>Well, @ret handling is kind of obscure in this function, right.
>
>
Yeah... I think it actually could be:
pte_dirty = page_pte_is_dirty(page, pte_paddr);
if (pte_dirty < 0) {
ret = pte_dirty;
goto out;
}
ret += pte_dirty; /* if (pte_dirty) ret++; */
...
out:
return ret;
Right?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 15:33 ` Nick Piggin
@ 2004-02-05 15:46 ` Nikita Danilov
2004-02-05 15:56 ` Nick Piggin
0 siblings, 1 reply; 40+ messages in thread
From: Nikita Danilov @ 2004-02-05 15:46 UTC (permalink / raw)
To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm
Nick Piggin writes:
>
>
> Nikita Danilov wrote:
>
> >Nick Piggin writes:
> > >
> > >
> > > Nick Piggin wrote:
> > >
> > > >
> > > > + pte_dirty = page_pte_is_dirty(page, pte_paddr);
> > > > + if (pte_dirty != 0)
> > > > + ret = pte_dirty;
> > > >
> > > >
> > > > Won't this leave ret in a random state? Should it be ret++?
> > > >
> > >
> > > ... must be past my bedtime.
> > >
> >
> >Well, @ret handling is kind of obscure in this function, right.
> >
> >
>
> Yeah... I think it actually could be:
>
> pte_dirty = page_pte_is_dirty(page, pte_paddr);
> if (pte_dirty < 0) {
> ret = pte_dirty;
> goto out;
> }
> ret += pte_dirty; /* if (pte_dirty) ret++; */
>
You mean so as to return number of dirty pte's, rather than just +1?
This may be useful.
> ...
>
> out:
> return ret;
>
> Right?
>
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 15:46 ` Nikita Danilov
@ 2004-02-05 15:56 ` Nick Piggin
2004-02-05 16:03 ` Nikita Danilov
0 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-05 15:56 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Hugh Dickins, Andrew Morton, linux-mm
Nikita Danilov wrote:
>Nick Piggin writes:
> >
> > Yeah... I think it actually could be:
> >
> > pte_dirty = page_pte_is_dirty(page, pte_paddr);
> > if (pte_dirty < 0) {
> > ret = pte_dirty;
> > goto out;
> > }
> > ret += pte_dirty; /* if (pte_dirty) ret++; */
> >
>
>You mean so as to return number of dirty pte's, rather than just +1?
>This may be useful.
>
That wasn't my immediate problem, but rather than an 'if'.
The main thing I'm worried about is you seem to be not
handling the error case correctly.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 15:56 ` Nick Piggin
@ 2004-02-05 16:03 ` Nikita Danilov
2004-02-05 16:09 ` Nick Piggin
0 siblings, 1 reply; 40+ messages in thread
From: Nikita Danilov @ 2004-02-05 16:03 UTC (permalink / raw)
To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm
Nick Piggin writes:
>
[...]
>
> That wasn't my immediate problem, but rather than an 'if'.
>
> The main thing I'm worried about is you seem to be not
> handling the error case correctly.
Take a look at the for-loops conditions.
>
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/5] mm improvements
2004-02-05 16:03 ` Nikita Danilov
@ 2004-02-05 16:09 ` Nick Piggin
0 siblings, 0 replies; 40+ messages in thread
From: Nick Piggin @ 2004-02-05 16:09 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Hugh Dickins, Andrew Morton, linux-mm
Nikita Danilov wrote:
>Nick Piggin writes:
> >
>
>[...]
>
> >
> > That wasn't my immediate problem, but rather than an 'if'.
> >
> > The main thing I'm worried about is you seem to be not
> > handling the error case correctly.
>
>Take a look at the for-loops conditions.
>
>
Bah, sorry for the noise. I prefer my version and removing
the conditions from the for loops, but doesn't matter.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/5] mm improvements
2004-02-04 19:45 ` Rik van Riel
@ 2004-02-09 7:00 ` Nick Piggin
2004-02-09 21:56 ` Rik van Riel
0 siblings, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2004-02-09 7:00 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-mm
Rik van Riel wrote:
> On Wed, 4 Feb 2004, Nick Piggin wrote:
>
>> > 1/5: vm-no-rss-limit.patch
>> > Remove broken RSS limiting. Simple problem, Rik is onto it.
>> >
>
>
> Does the patch below fix the performance problem with the
> rss limit patch ?
>
>
Sorry I missed this Rik. The rsslimit patch is now too old
to apply to the mm tree because of one of my patches.
To fix this you need to be able to check rsslimit before
clearing referenced bits, and possibly not clear referenced
bit at all.
Its obviously inefficient to do to check ptes twice, so probably
just doing it once would be OK, you'd just need to do something
like:
if (referenced && dont_clear_referenced)
SetPageReferenced(page);
at the end.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/5] mm improvements
2004-02-09 7:00 ` Nick Piggin
@ 2004-02-09 21:56 ` Rik van Riel
0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2004-02-09 21:56 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-mm
On Mon, 9 Feb 2004, Nick Piggin wrote:
> Sorry I missed this Rik. The rsslimit patch is now too old
> to apply to the mm tree because of one of my patches.
No problem, I'll cook a new one.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2004-02-09 21:56 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-04 9:39 [PATCH 0/5] mm improvements Nick Piggin
2004-02-04 9:40 ` [PATCH 1/5] " Nick Piggin
2004-02-04 19:45 ` Rik van Riel
2004-02-09 7:00 ` Nick Piggin
2004-02-09 21:56 ` Rik van Riel
2004-02-04 9:40 ` [PATCH 2/5] " Nick Piggin
2004-02-04 10:10 ` Andrew Morton
2004-02-04 10:15 ` Nick Piggin
2004-02-04 15:27 ` Rik van Riel
2004-02-05 2:18 ` Nick Piggin
2004-02-04 9:41 ` [PATCH 3/5] " Nick Piggin
2004-02-04 15:28 ` Rik van Riel
2004-02-04 16:45 ` Nikita Danilov
2004-02-04 18:53 ` Andrew Morton
2004-02-05 2:10 ` Nick Piggin
2004-02-04 9:42 ` [PATCH 4/5] " Nick Piggin
2004-02-04 10:11 ` Andrew Morton
2004-02-04 10:19 ` Nick Piggin
2004-02-04 9:42 ` [PATCH 5/5] " Nick Piggin
2004-02-04 10:03 ` Nick Piggin
2004-02-04 10:18 ` Andrew Morton
2004-02-04 10:22 ` Nick Piggin
2004-02-04 13:25 ` [PATCH 0/5] " Nikita Danilov
2004-02-04 13:53 ` Hugh Dickins
2004-02-04 14:03 ` Nikita Danilov
2004-02-04 15:03 ` Hugh Dickins
2004-02-04 15:19 ` Nikita Danilov
2004-02-05 2:13 ` Nick Piggin
2004-02-05 14:03 ` Nikita Danilov
2004-02-05 15:11 ` Nick Piggin
2004-02-05 15:15 ` Nick Piggin
2004-02-05 15:20 ` Nikita Danilov
2004-02-05 15:33 ` Nick Piggin
2004-02-05 15:46 ` Nikita Danilov
2004-02-05 15:56 ` Nick Piggin
2004-02-05 16:03 ` Nikita Danilov
2004-02-05 16:09 ` Nick Piggin
2004-02-04 18:33 ` Andrew Morton
2004-02-04 20:54 ` Hugh Dickins
2004-02-04 21:04 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox