From: Andrew Morton <akpm@osdl.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: miquels@cistron.nl, linux-mm@kvack.org
Subject: Re: Keeping mmap'ed files in core regression in 2.6.7-rc
Date: Tue, 15 Jun 2004 21:23:36 -0700 [thread overview]
Message-ID: <20040615212336.17d0a396.akpm@osdl.org> (raw)
In-Reply-To: <40CFC67D.6020205@yahoo.com.au>
Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> >
> > shrink_zone() will free arbitrarily large amounts of memory as the scanning
> > priority increases. Probably it shouldn't.
> >
> >
>
> Especially for kswapd, I think, because it can end up fighting with
> memory allocators and think it is getting into trouble. It should
> probably rather just keep putting along quietly.
>
> I have a few experimental patches that magnify this problem, so I'll
> be looking at fixing it soon. The tricky part will be trying to
> maintain a similar prev_priority / temp_priority balance.
hm, I don't see why. Why not simply bale from shrink_listing as soon as
we've reclaimed SWAP_CLUSTER_MAX pages?
I got bored of shrink_zone() bugs and rewrote it again yesterday. Haven't
tested it much. I really hate struct scan_control btw ;)
We've been futzing with the scan rates of the inactive and active lists far
too much, and it's still not right (Anton reports interrupt-off times of over
a second).
- We have this logic in there from 2.4.early (at least) which tries to keep
the inactive list 1/3rd the size of the active list. Or something.
I really cannot see any logic behind this, so toss it out and change the
arithmetic in there so that all pages on both lists have equal scan rates.
- Chunk the work up so we never hold interrupts off for more that 32 pages
worth of scanning.
- Make the per-zone scan-count accumulators unsigned long rather than
atomic_t.
Mainly because atomic_t's could conceivably overflow, but also because
access to these counters is racy-by-design anyway.
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
25-akpm/include/linux/mmzone.h | 4 +-
25-akpm/mm/page_alloc.c | 4 +-
25-akpm/mm/vmscan.c | 70 ++++++++++++++++++-----------------------
3 files changed, 35 insertions(+), 43 deletions(-)
diff -puN mm/vmscan.c~vmscan-scan-sanity mm/vmscan.c
--- 25/mm/vmscan.c~vmscan-scan-sanity 2004-06-15 02:19:01.485627112 -0700
+++ 25-akpm/mm/vmscan.c 2004-06-15 02:49:29.317754392 -0700
@@ -789,54 +789,46 @@ refill_inactive_zone(struct zone *zone,
}
/*
- * Scan `nr_pages' from this zone. Returns the number of reclaimed pages.
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
*/
static void
shrink_zone(struct zone *zone, struct scan_control *sc)
{
- unsigned long scan_active, scan_inactive;
- int count;
-
- scan_inactive = (zone->nr_active + zone->nr_inactive) >> sc->priority;
+ unsigned long nr_active;
+ unsigned long nr_inactive;
/*
- * Try to keep the active list 2/3 of the size of the cache. And
- * make sure that refill_inactive is given a decent number of pages.
- *
- * The "scan_active + 1" here is important. With pagecache-intensive
- * workloads the inactive list is huge, and `ratio' evaluates to zero
- * all the time. Which pins the active list memory. So we add one to
- * `scan_active' just to make sure that the kernel will slowly sift
- * through the active list.
+ * Add one to `nr_to_scan' just to make sure that the kernel will
+ * slowly sift through the active list.
*/
- if (zone->nr_active >= 4*(zone->nr_inactive*2 + 1)) {
- /* Don't scan more than 4 times the inactive list scan size */
- scan_active = 4*scan_inactive;
- } else {
- unsigned long long tmp;
-
- /* Cast to long long so the multiply doesn't overflow */
-
- tmp = (unsigned long long)scan_inactive * zone->nr_active;
- do_div(tmp, zone->nr_inactive*2 + 1);
- scan_active = (unsigned long)tmp;
- }
-
- atomic_add(scan_active + 1, &zone->nr_scan_active);
- count = atomic_read(&zone->nr_scan_active);
- if (count >= SWAP_CLUSTER_MAX) {
- atomic_set(&zone->nr_scan_active, 0);
- sc->nr_to_scan = count;
- refill_inactive_zone(zone, sc);
- }
+ zone->nr_scan_active += (zone->nr_active >> sc->priority) + 1;
+ nr_active = zone->nr_scan_active;
+ if (nr_active >= SWAP_CLUSTER_MAX)
+ zone->nr_scan_active = 0;
+ else
+ nr_active = 0;
+
+ zone->nr_scan_inactive += (zone->nr_inactive >> sc->priority) + 1;
+ nr_inactive = zone->nr_scan_inactive;
+ if (nr_inactive >= SWAP_CLUSTER_MAX)
+ zone->nr_scan_inactive = 0;
+ else
+ nr_inactive = 0;
+
+ while (nr_active || nr_inactive) {
+ if (nr_active) {
+ sc->nr_to_scan = min(nr_active,
+ (unsigned long)SWAP_CLUSTER_MAX);
+ nr_active -= sc->nr_to_scan;
+ refill_inactive_zone(zone, sc);
+ }
- atomic_add(scan_inactive, &zone->nr_scan_inactive);
- count = atomic_read(&zone->nr_scan_inactive);
- if (count >= SWAP_CLUSTER_MAX) {
- atomic_set(&zone->nr_scan_inactive, 0);
- sc->nr_to_scan = count;
- shrink_cache(zone, sc);
+ if (nr_inactive) {
+ sc->nr_to_scan = min(nr_inactive,
+ (unsigned long)SWAP_CLUSTER_MAX);
+ nr_inactive -= sc->nr_to_scan;
+ shrink_cache(zone, sc);
+ }
}
}
diff -puN include/linux/mmzone.h~vmscan-scan-sanity include/linux/mmzone.h
--- 25/include/linux/mmzone.h~vmscan-scan-sanity 2004-06-15 02:49:35.705783264 -0700
+++ 25-akpm/include/linux/mmzone.h 2004-06-15 02:49:48.283871104 -0700
@@ -118,8 +118,8 @@ struct zone {
spinlock_t lru_lock;
struct list_head active_list;
struct list_head inactive_list;
- atomic_t nr_scan_active;
- atomic_t nr_scan_inactive;
+ unsigned long nr_scan_active;
+ unsigned long nr_scan_inactive;
unsigned long nr_active;
unsigned long nr_inactive;
int all_unreclaimable; /* All pages pinned */
diff -puN mm/page_alloc.c~vmscan-scan-sanity mm/page_alloc.c
--- 25/mm/page_alloc.c~vmscan-scan-sanity 2004-06-15 02:50:04.404420408 -0700
+++ 25-akpm/mm/page_alloc.c 2004-06-15 02:50:53.752918296 -0700
@@ -1482,8 +1482,8 @@ static void __init free_area_init_core(s
zone_names[j], realsize, batch);
INIT_LIST_HEAD(&zone->active_list);
INIT_LIST_HEAD(&zone->inactive_list);
- atomic_set(&zone->nr_scan_active, 0);
- atomic_set(&zone->nr_scan_inactive, 0);
+ zone->nr_scan_active = 0;
+ zone->nr_scan_inactive = 0;
zone->nr_active = 0;
zone->nr_inactive = 0;
if (!size)
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-06-16 4:23 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-06-08 14:29 Miquel van Smoorenburg
2004-06-12 6:56 ` Nick Piggin
2004-06-14 14:06 ` Miquel van Smoorenburg
2004-06-15 3:03 ` Nick Piggin
2004-06-15 14:31 ` Miquel van Smoorenburg
2004-06-16 3:16 ` Nick Piggin
2004-06-16 3:50 ` Andrew Morton
2004-06-16 4:03 ` Nick Piggin
2004-06-16 4:23 ` Andrew Morton [this message]
2004-06-16 4:41 ` Nick Piggin
2004-06-17 10:50 ` Miquel van Smoorenburg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040615212336.17d0a396.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=linux-mm@kvack.org \
--cc=miquels@cistron.nl \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox