* [PATCH] mm: fix negative nr_isolated counts
@ 2015-02-11 7:06 Hugh Dickins
2015-02-11 9:58 ` Vlastimil Babka
2015-02-11 21:09 ` Andrew Morton
0 siblings, 2 replies; 6+ messages in thread
From: Hugh Dickins @ 2015-02-11 7:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: Vlastimil Babka, linux-mm
The vmstat interfaces are good at hiding negative counts (at least
when CONFIG_SMP); but if you peer behind the curtain, you find that
nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
more negative: so they can absorb larger and larger numbers of isolated
pages, yet still appear to be zero.
I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
but I guess it's there for a good reason, in which case we ought to get
too_many_isolated() working again.
The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
to call acct_isolated() to increment them.
Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.18+
---
mm/compaction.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- v3.19/mm/compaction.c 2015-02-08 18:54:22.000000000 -0800
+++ linux/mm/compaction.c 2015-02-10 22:25:04.613907871 -0800
@@ -1015,8 +1015,10 @@ static isolate_migrate_t isolate_migrate
low_pfn = isolate_migratepages_block(cc, low_pfn, end_pfn,
isolate_mode);
- if (!low_pfn || cc->contended)
+ if (!low_pfn || cc->contended) {
+ acct_isolated(zone, cc);
return ISOLATE_ABORT;
+ }
/*
* Either we isolated something and proceed with migration. Or
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] mm: fix negative nr_isolated counts
2015-02-11 7:06 [PATCH] mm: fix negative nr_isolated counts Hugh Dickins
@ 2015-02-11 9:58 ` Vlastimil Babka
2015-02-12 7:10 ` Joonsoo Kim
2015-02-11 21:09 ` Andrew Morton
1 sibling, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2015-02-11 9:58 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton; +Cc: linux-mm, Joonsoo Kim
On 02/11/2015 08:06 AM, Hugh Dickins wrote:
> The vmstat interfaces are good at hiding negative counts (at least
> when CONFIG_SMP); but if you peer behind the curtain, you find that
> nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
> more negative: so they can absorb larger and larger numbers of isolated
> pages, yet still appear to be zero.
>
> I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
> but I guess it's there for a good reason, in which case we ought to get
> too_many_isolated() working again.
>
> The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
> putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
> to call acct_isolated() to increment them.
>
> Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
Ccing Joonsoo for completeness, as it seems he contributed to this part
[1] (to fix another bug of mine, not trying to dismiss responsibility)
But yeah it looks correct. Thanks for finding and fixing!
Acked-by: Vlastimil Babka <vbabka@suse.cz>
[1] https://lkml.org/lkml/2014/9/29/60
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v3.18+
> ---
>
> mm/compaction.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- v3.19/mm/compaction.c 2015-02-08 18:54:22.000000000 -0800
> +++ linux/mm/compaction.c 2015-02-10 22:25:04.613907871 -0800
> @@ -1015,8 +1015,10 @@ static isolate_migrate_t isolate_migrate
> low_pfn = isolate_migratepages_block(cc, low_pfn, end_pfn,
> isolate_mode);
>
> - if (!low_pfn || cc->contended)
> + if (!low_pfn || cc->contended) {
> + acct_isolated(zone, cc);
> return ISOLATE_ABORT;
> + }
>
> /*
> * Either we isolated something and proceed with migration. Or
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] mm: fix negative nr_isolated counts
2015-02-11 9:58 ` Vlastimil Babka
@ 2015-02-12 7:10 ` Joonsoo Kim
0 siblings, 0 replies; 6+ messages in thread
From: Joonsoo Kim @ 2015-02-12 7:10 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Hugh Dickins, Andrew Morton, linux-mm
On Wed, Feb 11, 2015 at 10:58:13AM +0100, Vlastimil Babka wrote:
> On 02/11/2015 08:06 AM, Hugh Dickins wrote:
> >The vmstat interfaces are good at hiding negative counts (at least
> >when CONFIG_SMP); but if you peer behind the curtain, you find that
> >nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
> >more negative: so they can absorb larger and larger numbers of isolated
> >pages, yet still appear to be zero.
> >
> >I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
> >but I guess it's there for a good reason, in which case we ought to get
> >too_many_isolated() working again.
> >
> >The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
> >putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
> >to call acct_isolated() to increment them.
> >
> >Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
>
> Ccing Joonsoo for completeness, as it seems he contributed to this
> part [1] (to fix another bug of mine, not trying to dismiss
> responsibility)
>
> But yeah it looks correct. Thanks for finding and fixing!
Yes, it looks correct to me.
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>
> [1] https://lkml.org/lkml/2014/9/29/60
>
> >Signed-off-by: Hugh Dickins <hughd@google.com>
> >Cc: stable@vger.kernel.org # v3.18+
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: fix negative nr_isolated counts
2015-02-11 7:06 [PATCH] mm: fix negative nr_isolated counts Hugh Dickins
2015-02-11 9:58 ` Vlastimil Babka
@ 2015-02-11 21:09 ` Andrew Morton
2015-02-12 8:18 ` Vlastimil Babka
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2015-02-11 21:09 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Vlastimil Babka, linux-mm, Joonsoo Kim
On Tue, 10 Feb 2015 23:06:09 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:
> The vmstat interfaces are good at hiding negative counts (at least
> when CONFIG_SMP); but if you peer behind the curtain, you find that
> nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
> more negative: so they can absorb larger and larger numbers of isolated
> pages, yet still appear to be zero.
>
> I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
> but I guess it's there for a good reason, in which case we ought to get
> too_many_isolated() working again.
>
> The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
> putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
> to call acct_isolated() to increment them.
So if I'm understanding this correctly, shrink_inactive_list()'s call
to congestion_wait() basically never happens?
If so I'm pretty reluctant to merge this up until it has had plenty of
careful testing - there's a decent chance that it will make the kernel
behave worse.
> Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v3.18+
And why -stable? What user-visible problem is the bug causing?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] mm: fix negative nr_isolated counts
2015-02-11 21:09 ` Andrew Morton
@ 2015-02-12 8:18 ` Vlastimil Babka
2015-02-12 15:12 ` Rik van Riel
0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2015-02-12 8:18 UTC (permalink / raw)
To: Andrew Morton, Hugh Dickins; +Cc: linux-mm, Joonsoo Kim, Rik van Riel
On 02/11/2015 10:09 PM, Andrew Morton wrote:
> On Tue, 10 Feb 2015 23:06:09 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:
>
>> The vmstat interfaces are good at hiding negative counts (at least
>> when CONFIG_SMP); but if you peer behind the curtain, you find that
>> nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
>> more negative: so they can absorb larger and larger numbers of isolated
>> pages, yet still appear to be zero.
>>
>> I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
>> but I guess it's there for a good reason, in which case we ought to get
>> too_many_isolated() working again.
>>
>> The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
>> putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
>> to call acct_isolated() to increment them.
>
> So if I'm understanding this correctly, shrink_inactive_list()'s call
> to congestion_wait() basically never happens?
I think so, the more the counters go negative, the less chance of
congestion_wait() to happen from there.
> If so I'm pretty reluctant to merge this up until it has had plenty of
> careful testing - there's a decent chance that it will make the kernel
> behave worse.
You mean "worse" by letting shrink_inactive_list() call
congestion_wait() again, as it used to before 3.18, since 2009 it seems?
Maybe it's not needed anymore, but it IMHO shouldn't get disabled by
accident, but properly evaluated and removed. Hugh's patch just fixes
the accidental disable.
>> Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
>> Signed-off-by: Hugh Dickins <hughd@google.com>
>> Cc: stable@vger.kernel.org # v3.18+
>
> And why -stable? What user-visible problem is the bug causing?
>
Commit 35cd78156c "vmscan: throttle direct reclaim when too many pages
are isolated already" by Rik seems to have introduced this
congestion_wait() based on too_many_isolated(). The bug it was fixing:
"When way too many processes go into direct reclaim, it is possible
for all of the pages to be taken off the LRU. One result of this is that
the next process in the page reclaim code thinks there are no
reclaimable pages left and triggers an out of memory kill."
So either this is now prevented by something else and
too_many_isolated() could go away, or we should restore its
functionality. Any idea, Rik?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] mm: fix negative nr_isolated counts
2015-02-12 8:18 ` Vlastimil Babka
@ 2015-02-12 15:12 ` Rik van Riel
0 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2015-02-12 15:12 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton, Hugh Dickins
Cc: linux-mm, Joonsoo Kim, Rafael Aquini
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/12/2015 03:18 AM, Vlastimil Babka wrote:
> On 02/11/2015 10:09 PM, Andrew Morton wrote:
>>> Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up
>>> from isolate_migratepages_range()") Signed-off-by: Hugh Dickins
>>> <hughd@google.com> Cc: stable@vger.kernel.org # v3.18+
>>
>> And why -stable? What user-visible problem is the bug causing?
>>
>
> Commit 35cd78156c "vmscan: throttle direct reclaim when too many
> pages are isolated already" by Rik seems to have introduced this
> congestion_wait() based on too_many_isolated(). The bug it was
> fixing:
>
> "When way too many processes go into direct reclaim, it is possible
> for all of the pages to be taken off the LRU. One result of this is
> that the next process in the page reclaim code thinks there are no
> reclaimable pages left and triggers an out of memory kill."
>
> So either this is now prevented by something else and
> too_many_isolated() could go away, or we should restore its
> functionality. Any idea, Rik?
I don't think that bug is prevented.
I have seen reports of OOM kills happening while the system
still has a lot of reclaimable page cache pages.
This might actually help explain that bug...
- --
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJU3MLdAAoJEM553pKExN6DheAH/RgOqPr/HwzgaalKd2JcQcSx
xuIL/AhjIf8SYIHO5TTr00lF6mMpXfLs6+7UzYlICYmJ+wA4jZ6MapfpqYH/nkYG
tCS/8kMvH+rfkrUMp8NDz1od4Akp9w153xpA/6rmNrGTrcwXY9L4R2ANj30sJ9bw
5aRvwsYKAbGjXwJqDFbkR6UySthEZ8wPlOZpjJyhBoA9kMx+hP/Aka+qjYkiS7Ny
DfMuEjaNl8dsFZuulc7olhKNSXLyQPNmZt+oQCfb82KH78r6qpH2mhIrRtTunY6z
9iLHrxRgN2j8ZtDPFVaxMWQ3CQlaBZgTigSx1p+MTYVq8nfUe2HhkBgs2EKuV18=
=hWac
-----END PGP SIGNATURE-----
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-02-12 15:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-11 7:06 [PATCH] mm: fix negative nr_isolated counts Hugh Dickins
2015-02-11 9:58 ` Vlastimil Babka
2015-02-12 7:10 ` Joonsoo Kim
2015-02-11 21:09 ` Andrew Morton
2015-02-12 8:18 ` Vlastimil Babka
2015-02-12 15:12 ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox