* [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list
2010-08-31 17:37 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V3 Mel Gorman
@ 2010-08-31 17:37 ` Mel Gorman
2010-08-31 18:17 ` Christoph Lameter
2010-08-31 23:27 ` KOSAKI Motohiro
2010-08-31 17:37 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-08-31 17:37 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2 siblings, 2 replies; 21+ messages in thread
From: Mel Gorman @ 2010-08-31 17:37 UTC (permalink / raw)
To: Andrew Morton
Cc: Linux Kernel List, linux-mm, Rik van Riel, Johannes Weiner,
Minchan Kim, Christoph Lameter, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, Mel Gorman
When allocating a page, the system uses NR_FREE_PAGES counters to determine
if watermarks would remain intact after the allocation was made. This
check is made without interrupts disabled or the zone lock held and so is
race-prone by nature. Unfortunately, when pages are being freed in batch,
the counters are updated before the pages are added on the list. During this
window, the counters are misleading as the pages do not exist yet. When
under significant pressure on systems with large numbers of CPUs, it's
possible for processes to make progress even though they should have been
stalled. This is particularly problematic if a number of the processes are
using GFP_ATOMIC as the min watermark can be accidentally breached and in
extreme cases, the system can livelock.
This patch updates the counters after the pages have been added to the
list. This makes the allocator more cautious with respect to preserving
the watermarks and mitigates livelock possibilities.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/page_alloc.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a9649f4..97d74a0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -588,12 +588,12 @@ static void free_pcppages_bulk(struct zone *zone, int count,
{
int migratetype = 0;
int batch_free = 0;
+ int freed = count;
spin_lock(&zone->lock);
zone->all_unreclaimable = 0;
zone->pages_scanned = 0;
- __mod_zone_page_state(zone, NR_FREE_PAGES, count);
while (count) {
struct page *page;
struct list_head *list;
@@ -621,6 +621,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
trace_mm_page_pcpu_drain(page, 0, page_private(page));
} while (--count && --batch_free && !list_empty(list));
}
+ __mod_zone_page_state(zone, NR_FREE_PAGES, freed);
spin_unlock(&zone->lock);
}
@@ -631,8 +632,8 @@ static void free_one_page(struct zone *zone, struct page *page, int order,
zone->all_unreclaimable = 0;
zone->pages_scanned = 0;
- __mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
__free_one_page(page, zone, order, migratetype);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
spin_unlock(&zone->lock);
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list
2010-08-31 17:37 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
@ 2010-08-31 18:17 ` Christoph Lameter
2010-09-01 7:10 ` Mel Gorman
2010-08-31 23:27 ` KOSAKI Motohiro
1 sibling, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-08-31 18:17 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Linux Kernel List, linux-mm, Rik van Riel,
Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro
I already did a
Reviewed-by: Christoph Lameter <cl@linux.com>
I believe?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list
2010-08-31 18:17 ` Christoph Lameter
@ 2010-09-01 7:10 ` Mel Gorman
0 siblings, 0 replies; 21+ messages in thread
From: Mel Gorman @ 2010-09-01 7:10 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andrew Morton, Linux Kernel List, linux-mm, Rik van Riel,
Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro
On Tue, Aug 31, 2010 at 01:17:44PM -0500, Christoph Lameter wrote:
>
> I already did a
>
> Reviewed-by: Christoph Lameter <cl@linux.com>
>
> I believe?
>
You did and I omitted it. It's included now. Thanks
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list
2010-08-31 17:37 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
2010-08-31 18:17 ` Christoph Lameter
@ 2010-08-31 23:27 ` KOSAKI Motohiro
1 sibling, 0 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2010-08-31 23:27 UTC (permalink / raw)
To: Mel Gorman
Cc: kosaki.motohiro, Andrew Morton, Linux Kernel List, linux-mm,
Rik van Riel, Johannes Weiner, Minchan Kim, Christoph Lameter,
KAMEZAWA Hiroyuki
> When allocating a page, the system uses NR_FREE_PAGES counters to determine
> if watermarks would remain intact after the allocation was made. This
> check is made without interrupts disabled or the zone lock held and so is
> race-prone by nature. Unfortunately, when pages are being freed in batch,
> the counters are updated before the pages are added on the list. During this
> window, the counters are misleading as the pages do not exist yet. When
> under significant pressure on systems with large numbers of CPUs, it's
> possible for processes to make progress even though they should have been
> stalled. This is particularly problematic if a number of the processes are
> using GFP_ATOMIC as the min watermark can be accidentally breached and in
> extreme cases, the system can livelock.
>
> This patch updates the counters after the pages have been added to the
> list. This makes the allocator more cautious with respect to preserving
> the watermarks and mitigates livelock possibilities.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-08-31 17:37 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V3 Mel Gorman
2010-08-31 17:37 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
@ 2010-08-31 17:37 ` Mel Gorman
2010-08-31 18:20 ` Christoph Lameter
` (2 more replies)
2010-08-31 17:37 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2 siblings, 3 replies; 21+ messages in thread
From: Mel Gorman @ 2010-08-31 17:37 UTC (permalink / raw)
To: Andrew Morton
Cc: Linux Kernel List, linux-mm, Rik van Riel, Johannes Weiner,
Minchan Kim, Christoph Lameter, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, Mel Gorman
Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as
it is cheaper than scanning a number of lists. To avoid synchronization
overhead, counter deltas are maintained on a per-cpu basis and drained both
periodically and when the delta is above a threshold. On large CPU systems,
the difference between the estimated and real value of NR_FREE_PAGES can be
very high. If NR_FREE_PAGES is much higher than number of real free page
in buddy, the VM can allocate pages below min watermark, at worst reducing
the real number of pages to zero. Even if the OOM killer kills some victim
for freeing memory, it may not free memory if the exit path requires a new
page resulting in livelock.
This patch introduces zone_nr_free_pages() to take a slightly more accurate
estimate of NR_FREE_PAGES while kswapd is awake. The estimate is not perfect
and may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/mmzone.h | 13 +++++++++++++
mm/mmzone.c | 29 +++++++++++++++++++++++++++++
mm/page_alloc.c | 4 ++--
mm/vmstat.c | 15 ++++++++++++++-
4 files changed, 58 insertions(+), 3 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6e6e626..3984c4e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -284,6 +284,13 @@ struct zone {
unsigned long watermark[NR_WMARK];
/*
+ * When free pages are below this point, additional steps are taken
+ * when reading the number of free pages to avoid per-cpu counter
+ * drift allowing watermarks to be breached
+ */
+ unsigned long percpu_drift_mark;
+
+ /*
* We don't know if the memory that we're going to allocate will be freeable
* or/and it will be released eventually, so to avoid totally wasting several
* GB of ram we must reserve some of the lower zone memory (otherwise we risk
@@ -441,6 +448,12 @@ static inline int zone_is_oom_locked(const struct zone *zone)
return test_bit(ZONE_OOM_LOCKED, &zone->flags);
}
+#ifdef CONFIG_SMP
+unsigned long zone_nr_free_pages(struct zone *zone);
+#else
+#define zone_nr_free_pages(zone) zone_page_state(zone, NR_FREE_PAGES)
+#endif /* CONFIG_SMP */
+
/*
* The "priority" of VM scanning is how much of the queues we will scan in one
* go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
diff --git a/mm/mmzone.c b/mm/mmzone.c
index f5b7d17..69ecbe9 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -87,3 +87,32 @@ int memmap_valid_within(unsigned long pfn,
return 1;
}
#endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
+
+#ifdef CONFIG_SMP
+/* Called when a more accurate view of NR_FREE_PAGES is needed */
+unsigned long zone_nr_free_pages(struct zone *zone)
+{
+ unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
+
+ /*
+ * While kswapd is awake, it is considered the zone is under some
+ * memory pressure. Under pressure, there is a risk that
+ * per-cpu-counter-drift will allow the min watermark to be breached
+ * potentially causing a live-lock. While kswapd is awake and
+ * free pages are low, get a better estimate for free pages
+ */
+ if (nr_free_pages < zone->percpu_drift_mark &&
+ !waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ struct per_cpu_pageset *pset;
+
+ pset = per_cpu_ptr(zone->pageset, cpu);
+ nr_free_pages += pset->vm_stat_diff[NR_FREE_PAGES];
+ }
+ }
+
+ return nr_free_pages;
+}
+#endif /* CONFIG_SMP */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 97d74a0..bbaa959 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1462,7 +1462,7 @@ int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
{
/* free_pages my go negative - that's OK */
long min = mark;
- long free_pages = zone_page_state(z, NR_FREE_PAGES) - (1 << order) + 1;
+ long free_pages = zone_nr_free_pages(z) - (1 << order) + 1;
int o;
if (alloc_flags & ALLOC_HIGH)
@@ -2424,7 +2424,7 @@ void show_free_areas(void)
" all_unreclaimable? %s"
"\n",
zone->name,
- K(zone_page_state(zone, NR_FREE_PAGES)),
+ K(zone_nr_free_pages(zone)),
K(min_wmark_pages(zone)),
K(low_wmark_pages(zone)),
K(high_wmark_pages(zone)),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f389168..696cab2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -138,11 +138,24 @@ static void refresh_zone_stat_thresholds(void)
int threshold;
for_each_populated_zone(zone) {
+ unsigned long max_drift, tolerate_drift;
+
threshold = calculate_threshold(zone);
for_each_online_cpu(cpu)
per_cpu_ptr(zone->pageset, cpu)->stat_threshold
= threshold;
+
+ /*
+ * Only set percpu_drift_mark if there is a danger that
+ * NR_FREE_PAGES reports the low watermark is ok when in fact
+ * the min watermark could be breached by an allocation
+ */
+ tolerate_drift = low_wmark_pages(zone) - min_wmark_pages(zone);
+ max_drift = num_online_cpus() * threshold;
+ if (max_drift > tolerate_drift)
+ zone->percpu_drift_mark = high_wmark_pages(zone) +
+ max_drift;
}
}
@@ -813,7 +826,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"\n scanned %lu"
"\n spanned %lu"
"\n present %lu",
- zone_page_state(zone, NR_FREE_PAGES),
+ zone_nr_free_pages(zone),
min_wmark_pages(zone),
low_wmark_pages(zone),
high_wmark_pages(zone),
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-08-31 17:37 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
@ 2010-08-31 18:20 ` Christoph Lameter
2010-08-31 23:37 ` KOSAKI Motohiro
2010-09-02 0:43 ` Christoph Lameter
2 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-08-31 18:20 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Linux Kernel List, linux-mm, Rik van Riel,
Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro
Reviewed-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-08-31 17:37 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-08-31 18:20 ` Christoph Lameter
@ 2010-08-31 23:37 ` KOSAKI Motohiro
2010-09-01 7:24 ` Mel Gorman
2010-09-02 0:43 ` Christoph Lameter
2 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2010-08-31 23:37 UTC (permalink / raw)
To: Mel Gorman
Cc: kosaki.motohiro, Andrew Morton, Linux Kernel List, linux-mm,
Rik van Riel, Johannes Weiner, Minchan Kim, Christoph Lameter,
KAMEZAWA Hiroyuki
> +#ifdef CONFIG_SMP
> +/* Called when a more accurate view of NR_FREE_PAGES is needed */
> +unsigned long zone_nr_free_pages(struct zone *zone)
> +{
> + unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
> +
> + /*
> + * While kswapd is awake, it is considered the zone is under some
> + * memory pressure. Under pressure, there is a risk that
> + * per-cpu-counter-drift will allow the min watermark to be breached
> + * potentially causing a live-lock. While kswapd is awake and
> + * free pages are low, get a better estimate for free pages
> + */
> + if (nr_free_pages < zone->percpu_drift_mark &&
> + !waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + struct per_cpu_pageset *pset;
> +
> + pset = per_cpu_ptr(zone->pageset, cpu);
> + nr_free_pages += pset->vm_stat_diff[NR_FREE_PAGES];
If my understanding is correct, we have no lock when reading pset->vm_stat_diff.
It mean nr_free_pages can reach negative value at very rarely race. boundary
check is necessary?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-08-31 23:37 ` KOSAKI Motohiro
@ 2010-09-01 7:24 ` Mel Gorman
2010-09-01 7:33 ` KOSAKI Motohiro
0 siblings, 1 reply; 21+ messages in thread
From: Mel Gorman @ 2010-09-01 7:24 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrew Morton, Linux Kernel List, linux-mm, Rik van Riel,
Johannes Weiner, Minchan Kim, Christoph Lameter,
KAMEZAWA Hiroyuki
On Wed, Sep 01, 2010 at 08:37:41AM +0900, KOSAKI Motohiro wrote:
> > +#ifdef CONFIG_SMP
> > +/* Called when a more accurate view of NR_FREE_PAGES is needed */
> > +unsigned long zone_nr_free_pages(struct zone *zone)
> > +{
> > + unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
> > +
> > + /*
> > + * While kswapd is awake, it is considered the zone is under some
> > + * memory pressure. Under pressure, there is a risk that
> > + * per-cpu-counter-drift will allow the min watermark to be breached
> > + * potentially causing a live-lock. While kswapd is awake and
> > + * free pages are low, get a better estimate for free pages
> > + */
> > + if (nr_free_pages < zone->percpu_drift_mark &&
> > + !waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
> > + int cpu;
> > +
> > + for_each_online_cpu(cpu) {
> > + struct per_cpu_pageset *pset;
> > +
> > + pset = per_cpu_ptr(zone->pageset, cpu);
> > + nr_free_pages += pset->vm_stat_diff[NR_FREE_PAGES];
>
> If my understanding is correct, we have no lock when reading pset->vm_stat_diff.
> It mean nr_free_pages can reach negative value at very rarely race. boundary
> check is necessary?
>
True, well spotted.
How about the following? It records a delta and checks if delta is negative
and would cause underflow.
unsigned long zone_nr_free_pages(struct zone *zone)
{
unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
long delta = 0;
/*
* While kswapd is awake, it is considered the zone is under some
* memory pressure. Under pressure, there is a risk that
* per-cpu-counter-drift will allow the min watermark to be breached
* potentially causing a live-lock. While kswapd is awake and
* free pages are low, get a better estimate for free pages
*/
if (nr_free_pages < zone->percpu_drift_mark &&
!waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
int cpu;
for_each_online_cpu(cpu) {
struct per_cpu_pageset *pset;
pset = per_cpu_ptr(zone->pageset, cpu);
delta += pset->vm_stat_diff[NR_FREE_PAGES];
}
}
/* Watch for underflow */
if (delta < 0 && abs(delta) > nr_free_pages)
delta = -nr_free_pages;
return nr_free_pages + delta;
}
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-01 7:24 ` Mel Gorman
@ 2010-09-01 7:33 ` KOSAKI Motohiro
2010-09-01 20:16 ` Christoph Lameter
0 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2010-09-01 7:33 UTC (permalink / raw)
To: Mel Gorman
Cc: kosaki.motohiro, Andrew Morton, Linux Kernel List, linux-mm,
Rik van Riel, Johannes Weiner, Minchan Kim, Christoph Lameter,
KAMEZAWA Hiroyuki
> On Wed, Sep 01, 2010 at 08:37:41AM +0900, KOSAKI Motohiro wrote:
> > > +#ifdef CONFIG_SMP
> > > +/* Called when a more accurate view of NR_FREE_PAGES is needed */
> > > +unsigned long zone_nr_free_pages(struct zone *zone)
> > > +{
> > > + unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
> > > +
> > > + /*
> > > + * While kswapd is awake, it is considered the zone is under some
> > > + * memory pressure. Under pressure, there is a risk that
> > > + * per-cpu-counter-drift will allow the min watermark to be breached
> > > + * potentially causing a live-lock. While kswapd is awake and
> > > + * free pages are low, get a better estimate for free pages
> > > + */
> > > + if (nr_free_pages < zone->percpu_drift_mark &&
> > > + !waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
> > > + int cpu;
> > > +
> > > + for_each_online_cpu(cpu) {
> > > + struct per_cpu_pageset *pset;
> > > +
> > > + pset = per_cpu_ptr(zone->pageset, cpu);
> > > + nr_free_pages += pset->vm_stat_diff[NR_FREE_PAGES];
> >
> > If my understanding is correct, we have no lock when reading pset->vm_stat_diff.
> > It mean nr_free_pages can reach negative value at very rarely race. boundary
> > check is necessary?
> >
>
> True, well spotted.
>
> How about the following? It records a delta and checks if delta is negative
> and would cause underflow.
>
> unsigned long zone_nr_free_pages(struct zone *zone)
> {
> unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
> long delta = 0;
>
> /*
> * While kswapd is awake, it is considered the zone is under some
> * memory pressure. Under pressure, there is a risk that
> * per-cpu-counter-drift will allow the min watermark to be breached
> * potentially causing a live-lock. While kswapd is awake and
> * free pages are low, get a better estimate for free pages
> */
> if (nr_free_pages < zone->percpu_drift_mark &&
> !waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
> int cpu;
>
> for_each_online_cpu(cpu) {
> struct per_cpu_pageset *pset;
>
> pset = per_cpu_ptr(zone->pageset, cpu);
> delta += pset->vm_stat_diff[NR_FREE_PAGES];
> }
> }
>
> /* Watch for underflow */
> if (delta < 0 && abs(delta) > nr_free_pages)
> delta = -nr_free_pages;
>
> return nr_free_pages + delta;
> }
Looks good to me :)
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-01 7:33 ` KOSAKI Motohiro
@ 2010-09-01 20:16 ` Christoph Lameter
2010-09-01 20:34 ` Mel Gorman
0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-09-01 20:16 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Mel Gorman, Andrew Morton, Linux Kernel List, linux-mm,
Rik van Riel, Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki
On Wed, 1 Sep 2010, KOSAKI Motohiro wrote:
> > How about the following? It records a delta and checks if delta is negative
> > and would cause underflow.
> >
> > unsigned long zone_nr_free_pages(struct zone *zone)
> > {
> > unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
> > long delta = 0;
> >
> > /*
> > * While kswapd is awake, it is considered the zone is under some
> > * memory pressure. Under pressure, there is a risk that
> > * per-cpu-counter-drift will allow the min watermark to be breached
> > * potentially causing a live-lock. While kswapd is awake and
> > * free pages are low, get a better estimate for free pages
> > */
> > if (nr_free_pages < zone->percpu_drift_mark &&
> > !waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
> > int cpu;
> >
> > for_each_online_cpu(cpu) {
> > struct per_cpu_pageset *pset;
> >
> > pset = per_cpu_ptr(zone->pageset, cpu);
> > delta += pset->vm_stat_diff[NR_FREE_PAGES];
> > }
> > }
> >
> > /* Watch for underflow */
> > if (delta < 0 && abs(delta) > nr_free_pages)
> > delta = -nr_free_pages;
Not sure what the point here is. If the delta is going below zero then
there was a concurrent operation updating the counters negatively while
we summed up the counters. It is then safe to assume a value of zero. We
cannot really be more accurate than that.
so
if (delta < 0)
delta = 0;
would be correct. See also handling of counter underflow in
vmstat.h:zone_page_state(). As I have said before: I would rather have the
counter handling in one place to avoid creating differences in counter
handling.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-01 20:16 ` Christoph Lameter
@ 2010-09-01 20:34 ` Mel Gorman
2010-09-02 0:24 ` Christoph Lameter
0 siblings, 1 reply; 21+ messages in thread
From: Mel Gorman @ 2010-09-01 20:34 UTC (permalink / raw)
To: Christoph Lameter
Cc: KOSAKI Motohiro, Andrew Morton, Linux Kernel List, linux-mm,
Rik van Riel, Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki
On Wed, Sep 01, 2010 at 03:16:59PM -0500, Christoph Lameter wrote:
> On Wed, 1 Sep 2010, KOSAKI Motohiro wrote:
>
> > > How about the following? It records a delta and checks if delta is negative
> > > and would cause underflow.
> > >
> > > unsigned long zone_nr_free_pages(struct zone *zone)
> > > {
> > > unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
> > > long delta = 0;
> > >
> > > /*
> > > * While kswapd is awake, it is considered the zone is under some
> > > * memory pressure. Under pressure, there is a risk that
> > > * per-cpu-counter-drift will allow the min watermark to be breached
> > > * potentially causing a live-lock. While kswapd is awake and
> > > * free pages are low, get a better estimate for free pages
> > > */
> > > if (nr_free_pages < zone->percpu_drift_mark &&
> > > !waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
> > > int cpu;
> > >
> > > for_each_online_cpu(cpu) {
> > > struct per_cpu_pageset *pset;
> > >
> > > pset = per_cpu_ptr(zone->pageset, cpu);
> > > delta += pset->vm_stat_diff[NR_FREE_PAGES];
> > > }
> > > }
> > >
> > > /* Watch for underflow */
> > > if (delta < 0 && abs(delta) > nr_free_pages)
> > > delta = -nr_free_pages;
>
> Not sure what the point here is. If the delta is going below zero then
> there was a concurrent operation updating the counters negatively while
> we summed up the counters.
The point is if the negative delta is greater than the current value of
nr_free_pages then nr_free_pages would underflow when delta is applied to it.
> It is then safe to assume a value of zero. We
> cannot really be more accurate than that.
>
> so
>
> if (delta < 0)
> delta = 0;
>
> would be correct.
Lets say the reading at the start for nr_free_pages is 120 and the delta is
-20, then the estimated true value of nr_free_pages is 100. If we used your
logic, the estimate would be 120. Maybe I'm missing what you're saying.
> See also handling of counter underflow in
> vmstat.h:zone_page_state().
I'm not seeing the relation. zone_nr_free_pages() is trying to
reconcile the reading from zone_page_state() with the contents of
vm_stat_diff[].
> As I have said before: I would rather have the
> counter handling in one place to avoid creating differences in counter
> handling.
>
And I'd rather not hurt the paths for every counter unnecessarily
without good cause. I can move zone_nr_free_pages() to mm/vmstat.c if
you'd prefer?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-01 20:34 ` Mel Gorman
@ 2010-09-02 0:24 ` Christoph Lameter
2010-09-02 0:26 ` KOSAKI Motohiro
0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-09-02 0:24 UTC (permalink / raw)
To: Mel Gorman
Cc: KOSAKI Motohiro, Andrew Morton, Linux Kernel List, linux-mm,
Rik van Riel, Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki
On Wed, 1 Sep 2010, Mel Gorman wrote:
> > > > if (delta < 0 && abs(delta) > nr_free_pages)
> > > > delta = -nr_free_pages;
> >
> > Not sure what the point here is. If the delta is going below zero then
> > there was a concurrent operation updating the counters negatively while
> > we summed up the counters.
>
> The point is if the negative delta is greater than the current value of
> nr_free_pages then nr_free_pages would underflow when delta is applied to it.
Ok. then
nr_free_pages += delta;
if (nr_free_pages < 0)
nr_free_pages = 0;
> > would be correct.
>
> Lets say the reading at the start for nr_free_pages is 120 and the delta is
> -20, then the estimated true value of nr_free_pages is 100. If we used your
> logic, the estimate would be 120. Maybe I'm missing what you're saying.
Well yes the sum of the counter needs to be checked not just the sum of
the deltas. This is the same as the counter determination in vmstat.h
> > See also handling of counter underflow in
> > vmstat.h:zone_page_state().
>
> I'm not seeing the relation. zone_nr_free_pages() is trying to
> reconcile the reading from zone_page_state() with the contents of
> vm_stat_diff[].
Both are determinations of a counter value. The global or zone counters
can also temporarily go below zero due to deferred updates. If
this happens then 0 will be returned(!). zonr_nr_free_pages need to work
in the same way.
> > As I have said before: I would rather have the
> > counter handling in one place to avoid creating differences in counter
> > handling.
> >
>
> And I'd rather not hurt the paths for every counter unnecessarily
> without good cause. I can move zone_nr_free_pages() to mm/vmstat.c if
> you'd prefer?
Generalize it on the way please to work with any counter?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-02 0:24 ` Christoph Lameter
@ 2010-09-02 0:26 ` KOSAKI Motohiro
2010-09-02 0:39 ` Christoph Lameter
0 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2010-09-02 0:26 UTC (permalink / raw)
To: Christoph Lameter
Cc: kosaki.motohiro, Mel Gorman, Andrew Morton, Linux Kernel List,
linux-mm, Rik van Riel, Johannes Weiner, Minchan Kim,
KAMEZAWA Hiroyuki
> On Wed, 1 Sep 2010, Mel Gorman wrote:
>
> > > > > if (delta < 0 && abs(delta) > nr_free_pages)
> > > > > delta = -nr_free_pages;
> > >
> > > Not sure what the point here is. If the delta is going below zero then
> > > there was a concurrent operation updating the counters negatively while
> > > we summed up the counters.
> >
> > The point is if the negative delta is greater than the current value of
> > nr_free_pages then nr_free_pages would underflow when delta is applied to it.
>
> Ok. then
>
> nr_free_pages += delta;
> if (nr_free_pages < 0)
> nr_free_pages = 0;
nr_free_pages is unsined. this wouldn't works ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-02 0:26 ` KOSAKI Motohiro
@ 2010-09-02 0:39 ` Christoph Lameter
2010-09-02 0:54 ` Christoph Lameter
0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-09-02 0:39 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Mel Gorman, Andrew Morton, Linux Kernel List, linux-mm,
Rik van Riel, Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki
On Thu, 2 Sep 2010, KOSAKI Motohiro wrote:
> > nr_free_pages += delta;
> > if (nr_free_pages < 0)
> > nr_free_pages = 0;
>
> nr_free_pages is unsined. this wouldn't works ;)
The VM counters are signed and must be signed otherwise the deferred
update scheme would cause desasters. For treatment in the page allocator
these may be converted to unsigned.
The effect needs to be the same as retrieving a global or
zone ZVC counter. Which is currently implemented in the following way:
static inline unsigned long zone_page_state(struct zone *zone,
enum zone_stat_item item)
{
long x = atomic_long_read(&zone->vm_stat[item]);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
#endif
return x;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-02 0:39 ` Christoph Lameter
@ 2010-09-02 0:54 ` Christoph Lameter
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-09-02 0:54 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Mel Gorman, Andrew Morton, Linux Kernel List, linux-mm,
Rik van Riel, Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki
On Wed, 1 Sep 2010, Christoph Lameter wrote:
> The effect needs to be the same as retrieving a global or
> zone ZVC counter. Which is currently implemented in the following way:
>
> static inline unsigned long zone_page_state(struct zone *zone,
> enum zone_stat_item item)
> {
> long x = atomic_long_read(&zone->vm_stat[item]);
> #ifdef CONFIG_SMP
> if (x < 0)
> x = 0;
> #endif
> return x;
> }
>
Here is a patch that defined a snapshot function that works in the same
way:
Subject: Add a snapshot function for vm statistics
Add a snapshot function that can more accurately determine
the current value of a zone counter.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux-2.6/include/linux/vmstat.h
===================================================================
--- linux-2.6.orig/include/linux/vmstat.h 2010-09-01 19:45:23.506071189 -0500
+++ linux-2.6/include/linux/vmstat.h 2010-09-01 19:53:02.978979081 -0500
@@ -170,6 +170,28 @@
return x;
}
+/*
+ * More accurate version that also considers the currently pending
+ * deltas. For that we need to loop over all cpus to find the current
+ * deltas. There is no synchronization so the result cannot be
+ * exactly accurate either.
+ */
+static inline unsigned long zone_page_state_snapshot(struct zone *zone,
+ enum zone_stat_item item)
+{
+ int cpu;
+ long x = atomic_long_read(&zone->vm_stat[item]);
+
+#ifdef CONFIG_SMP
+ for_each_online_cpu(cpu)
+ x += per_cpu_ptr(zone->pageset, cpu)->vm_stat_diff[item];
+
+ if (x < 0)
+ x = 0;
+#endif
+ return x;
+}
+
extern unsigned long global_reclaimable_pages(void);
extern unsigned long zone_reclaimable_pages(struct zone *zone);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-08-31 17:37 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-08-31 18:20 ` Christoph Lameter
2010-08-31 23:37 ` KOSAKI Motohiro
@ 2010-09-02 0:43 ` Christoph Lameter
2010-09-02 0:49 ` KOSAKI Motohiro
2010-09-02 8:51 ` Mel Gorman
2 siblings, 2 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-09-02 0:43 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Linux Kernel List, linux-mm, Rik van Riel,
Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro
On Tue, 31 Aug 2010, Mel Gorman wrote:
> +#ifdef CONFIG_SMP
> +/* Called when a more accurate view of NR_FREE_PAGES is needed */
> +unsigned long zone_nr_free_pages(struct zone *zone)
> +{
> + unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
You cannot call zone_page_state here because zone_page_state clips the
counter at zero. The nr_free_pages needs to reflect the unclipped state
and then the deltas need to be added. Then the clipping at zero can be
done.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-02 0:43 ` Christoph Lameter
@ 2010-09-02 0:49 ` KOSAKI Motohiro
2010-09-02 8:51 ` Mel Gorman
1 sibling, 0 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2010-09-02 0:49 UTC (permalink / raw)
To: Christoph Lameter
Cc: kosaki.motohiro, Mel Gorman, Andrew Morton, Linux Kernel List,
linux-mm, Rik van Riel, Johannes Weiner, Minchan Kim,
KAMEZAWA Hiroyuki
> On Tue, 31 Aug 2010, Mel Gorman wrote:
>
> > +#ifdef CONFIG_SMP
> > +/* Called when a more accurate view of NR_FREE_PAGES is needed */
> > +unsigned long zone_nr_free_pages(struct zone *zone)
> > +{
> > + unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
>
> You cannot call zone_page_state here because zone_page_state clips the
> counter at zero. The nr_free_pages needs to reflect the unclipped state
> and then the deltas need to be added. Then the clipping at zero can be
> done.
Good spotting. you are right.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
2010-09-02 0:43 ` Christoph Lameter
2010-09-02 0:49 ` KOSAKI Motohiro
@ 2010-09-02 8:51 ` Mel Gorman
1 sibling, 0 replies; 21+ messages in thread
From: Mel Gorman @ 2010-09-02 8:51 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andrew Morton, Linux Kernel List, linux-mm, Rik van Riel,
Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro
On Wed, Sep 01, 2010 at 07:43:41PM -0500, Christoph Lameter wrote:
> On Tue, 31 Aug 2010, Mel Gorman wrote:
>
> > +#ifdef CONFIG_SMP
> > +/* Called when a more accurate view of NR_FREE_PAGES is needed */
> > +unsigned long zone_nr_free_pages(struct zone *zone)
> > +{
> > + unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
>
> You cannot call zone_page_state here because zone_page_state clips the
> counter at zero. The nr_free_pages needs to reflect the unclipped state
> and then the deltas need to be added. Then the clipping at zero can be
> done.
>
Good point. This justifies the use of a generic helper that is co-located
with vmstat.h. I've taken your zone_page_state_snapshot() patch, am using
the helper to take a more accurate reading of NR_FREE_PAGES and preparing
for a test. Thanks Christoph.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails
2010-08-31 17:37 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V3 Mel Gorman
2010-08-31 17:37 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
2010-08-31 17:37 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
@ 2010-08-31 17:37 ` Mel Gorman
2010-08-31 18:26 ` Christoph Lameter
2 siblings, 1 reply; 21+ messages in thread
From: Mel Gorman @ 2010-08-31 17:37 UTC (permalink / raw)
To: Andrew Morton
Cc: Linux Kernel List, linux-mm, Rik van Riel, Johannes Weiner,
Minchan Kim, Christoph Lameter, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, Mel Gorman
When under significant memory pressure, a process enters direct reclaim
and immediately afterwards tries to allocate a page. If it fails and no
further progress is made, it's possible the system will go OOM. However,
on systems with large amounts of memory, it's possible that a significant
number of pages are on per-cpu lists and inaccessible to the calling
process. This leads to a process entering direct reclaim more often than
it should increasing the pressure on the system and compounding the problem.
This patch notes that if direct reclaim is making progress but
allocations are still failing that the system is already under heavy
pressure. In this case, it drains the per-cpu lists and tries the
allocation a second time before continuing.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
mm/page_alloc.c | 20 ++++++++++++++++----
1 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bbaa959..750e1dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1847,6 +1847,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
struct page *page = NULL;
struct reclaim_state reclaim_state;
struct task_struct *p = current;
+ bool drained = false;
cond_resched();
@@ -1865,14 +1866,25 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
cond_resched();
- if (order != 0)
- drain_all_pages();
+ if (unlikely(!(*did_some_progress)))
+ return NULL;
- if (likely(*did_some_progress))
- page = get_page_from_freelist(gfp_mask, nodemask, order,
+retry:
+ page = get_page_from_freelist(gfp_mask, nodemask, order,
zonelist, high_zoneidx,
alloc_flags, preferred_zone,
migratetype);
+
+ /*
+ * If an allocation failed after direct reclaim, it could be because
+ * pages are pinned on the per-cpu lists. Drain them and try again
+ */
+ if (!page && !drained) {
+ drain_all_pages();
+ drained = true;
+ goto retry;
+ }
+
return page;
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails
2010-08-31 17:37 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
@ 2010-08-31 18:26 ` Christoph Lameter
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-08-31 18:26 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Linux Kernel List, linux-mm, Rik van Riel,
Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro
Reviewed-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread