* freepage accounting bug with CMA/migrate isolation
@ 2018-07-24 4:24 Mike Kravetz
2018-07-24 22:38 ` Laura Abbott
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Mike Kravetz @ 2018-07-24 4:24 UTC (permalink / raw)
To: linux-mm; +Cc: Vlastimil Babka, 'Joonsoo Kim', Laura Abbott
With v4.17, I can see an issue like those addressed in commits 3c605096d315
("mm/page_alloc: restrict max order of merging on isolated pageblock")
and d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and
other pageblocks"). After running a CMA stress test for a while, I see:
MemTotal: 8168384 kB
MemFree: 8457232 kB
MemAvailable: 9204844 kB
If I let the test run, MemFree and MemAvailable will continue to grow.
I am certain the issue is with pageblocks of migratetype ISOLATED. If
I disable all special 'is_migrate_isolate' checks in freepage accounting,
the issue goes away. Further, I am pretty sure the issue has to do with
pageblock merging and or page orders spanning pageblocks. If I make
pageblock_order equal MAX_ORDER-1, the issue also goes away.
Just looking for suggesting in where/how to debug. I've been hacking on
this without much success.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: freepage accounting bug with CMA/migrate isolation
2018-07-24 4:24 freepage accounting bug with CMA/migrate isolation Mike Kravetz
@ 2018-07-24 22:38 ` Laura Abbott
2018-07-25 0:46 ` Mike Kravetz
2018-07-26 12:28 ` Vlastimil Babka
2018-07-26 13:30 ` Vlastimil Babka
2 siblings, 1 reply; 7+ messages in thread
From: Laura Abbott @ 2018-07-24 22:38 UTC (permalink / raw)
To: Mike Kravetz, linux-mm; +Cc: Vlastimil Babka, 'Joonsoo Kim'
On 07/23/2018 09:24 PM, Mike Kravetz wrote:
> With v4.17, I can see an issue like those addressed in commits 3c605096d315
> ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> and d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and
> other pageblocks"). After running a CMA stress test for a while, I see:
> MemTotal: 8168384 kB
> MemFree: 8457232 kB
> MemAvailable: 9204844 kB
> If I let the test run, MemFree and MemAvailable will continue to grow.
>
> I am certain the issue is with pageblocks of migratetype ISOLATED. If
> I disable all special 'is_migrate_isolate' checks in freepage accounting,
> the issue goes away. Further, I am pretty sure the issue has to do with
> pageblock merging and or page orders spanning pageblocks. If I make
> pageblock_order equal MAX_ORDER-1, the issue also goes away.
>
> Just looking for suggesting in where/how to debug. I've been hacking on
> this without much success.
> --
> Mike Kravetz
>
If you revert d883c6cf3b39 ("Revert "mm/cma: manage the memory of the CMA
area by using the ZONE_MOVABLE"") do you still see the issue? I thought
there was another isolation edge case which was fixed by that series.
Thanks,
Laura
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: freepage accounting bug with CMA/migrate isolation
2018-07-24 22:38 ` Laura Abbott
@ 2018-07-25 0:46 ` Mike Kravetz
0 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2018-07-25 0:46 UTC (permalink / raw)
To: Laura Abbott, linux-mm; +Cc: Vlastimil Babka, 'Joonsoo Kim'
On 07/24/2018 03:38 PM, Laura Abbott wrote:
> On 07/23/2018 09:24 PM, Mike Kravetz wrote:
>> With v4.17, I can see an issue like those addressed in commits 3c605096d315
>> ("mm/page_alloc: restrict max order of merging on isolated pageblock")
>> and d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and
>> other pageblocks"). After running a CMA stress test for a while, I see:
>> MemTotal: 8168384 kB
>> MemFree: 8457232 kB
>> MemAvailable: 9204844 kB
>> If I let the test run, MemFree and MemAvailable will continue to grow.
>>
>> I am certain the issue is with pageblocks of migratetype ISOLATED. If
>> I disable all special 'is_migrate_isolate' checks in freepage accounting,
>> the issue goes away. Further, I am pretty sure the issue has to do with
>> pageblock merging and or page orders spanning pageblocks. If I make
>> pageblock_order equal MAX_ORDER-1, the issue also goes away.
>>
>> Just looking for suggesting in where/how to debug. I've been hacking on
>> this without much success.
>> --
>> Mike Kravetz
>>
>
> If you revert d883c6cf3b39 ("Revert "mm/cma: manage the memory of the CMA
> area by using the ZONE_MOVABLE"") do you still see the issue? I thought
> there was another isolation edge case which was fixed by that series.
>
Thanks Laura,
Reverting that patch certainly seems to help. Although, I'm guessing there
is still some accounting issue even with the patch reverted.
Right after boot,
MemTotal: 8168380 kB
MemFree: 7233360 kB
MemAvailable: 7317704 kB
After stress testing for a couple hours,
MemTotal: 8168380 kB
MemFree: 7848468 kB
MemAvailable: 7634856 kB
While looking at the code, I did not like the way set_migratetype_isolate
may 'isolate' more than pageblock_nr_pages if there is a > pageblock_order
sized free page. This seems to work because alloc_contig_range always
aligns to MAX_ORDER-1. But, I'd like to change this and see if it helps.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: freepage accounting bug with CMA/migrate isolation
2018-07-24 4:24 freepage accounting bug with CMA/migrate isolation Mike Kravetz
2018-07-24 22:38 ` Laura Abbott
@ 2018-07-26 12:28 ` Vlastimil Babka
2018-07-26 16:50 ` Mike Kravetz
2018-07-26 13:30 ` Vlastimil Babka
2 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2018-07-26 12:28 UTC (permalink / raw)
To: Mike Kravetz, linux-mm; +Cc: 'Joonsoo Kim', Laura Abbott
On 07/24/2018 06:24 AM, Mike Kravetz wrote:
> With v4.17, I can see an issue like those addressed in commits 3c605096d315
> ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> and d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and
> other pageblocks"). After running a CMA stress test for a while, I see:
> MemTotal: 8168384 kB
> MemFree: 8457232 kB
> MemAvailable: 9204844 kB
> If I let the test run, MemFree and MemAvailable will continue to grow.
>
> I am certain the issue is with pageblocks of migratetype ISOLATED. If
> I disable all special 'is_migrate_isolate' checks in freepage accounting,
> the issue goes away.
That means you count isolated pages as freepages, right?
> Further, I am pretty sure the issue has to do with
> pageblock merging and or page orders spanning pageblocks. If I make
> pageblock_order equal MAX_ORDER-1, the issue also goes away.
Interesting, that should only matter in __free_one_page(). Do you have
page guards enabled?
> Just looking for suggesting in where/how to debug. I've been hacking on
> this without much success.
> --
> Mike Kravetz
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: freepage accounting bug with CMA/migrate isolation
2018-07-24 4:24 freepage accounting bug with CMA/migrate isolation Mike Kravetz
2018-07-24 22:38 ` Laura Abbott
2018-07-26 12:28 ` Vlastimil Babka
@ 2018-07-26 13:30 ` Vlastimil Babka
2 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2018-07-26 13:30 UTC (permalink / raw)
To: Mike Kravetz, linux-mm; +Cc: 'Joonsoo Kim', Laura Abbott
On 07/24/2018 06:24 AM, Mike Kravetz wrote:
> With v4.17, I can see an issue like those addressed in commits 3c605096d315
> ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> and d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and
> other pageblocks"). After running a CMA stress test for a while, I see:
> MemTotal: 8168384 kB
> MemFree: 8457232 kB
> MemAvailable: 9204844 kB
> If I let the test run, MemFree and MemAvailable will continue to grow.
>
> I am certain the issue is with pageblocks of migratetype ISOLATED. If
> I disable all special 'is_migrate_isolate' checks in freepage accounting,
> the issue goes away. Further, I am pretty sure the issue has to do with
> pageblock merging and or page orders spanning pageblocks. If I make
> pageblock_order equal MAX_ORDER-1, the issue also goes away.
>
> Just looking for suggesting in where/how to debug. I've been hacking on
> this without much success.
Maybe I'm wrong or it's something else, but I think that
unset_migratetype_isolate() is wrong and can lead to overcounting freepages.
Scenario is (with pageblock_order = MAX_ORDER - 2):
- MAX_ORDER-1 block is already free when isolated, thus there is no
merging that could be limited to pageblock_order, and the free page
remains > pageblock_order
- unset_migratetype_isolate() is called on first pageblock of the pair,
goes via the "if (PageBuddy(page))" path, isolates the page to free it,
thus adding MAX_ORDER - 1 pages to freepage counter.
- zone lock is dropped, somebody else allocates and splits the
MAX_ORDER-1 free page. Since the first pageblock is already marked
!ISOLATE, the free pages left after splitting are put on !ISOLATE
freelists, that includes pages from the second pageblock (perhaps a
whole pageblock_order page).
- unset_migratetype_isolate() is called on second pageblock of the pair
and increments the freepage counter again for all free pages in the
second block (move_freepages_block() doesn't check if it's really moving
them from ISOLATED freelists) so they get accounted twice.
Not sure if your stress test can trigger this so frequently, but it's
possible? A fix would have to 1) force split to <= pageblock_order in
start_isolate_page_range() or 2) unset_migratetype_isolate() make sure
that it converts all pageblocks at once if it finds a > pageblock_order
free page. Maybe there's more pieces subtly broken with free page >
pageblock_order, so it would have to be 1) though.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: freepage accounting bug with CMA/migrate isolation
2018-07-26 12:28 ` Vlastimil Babka
@ 2018-07-26 16:50 ` Mike Kravetz
2018-07-26 21:26 ` Vlastimil Babka
0 siblings, 1 reply; 7+ messages in thread
From: Mike Kravetz @ 2018-07-26 16:50 UTC (permalink / raw)
To: Vlastimil Babka, linux-mm; +Cc: 'Joonsoo Kim', Laura Abbott
On 07/26/2018 05:28 AM, Vlastimil Babka wrote:
> On 07/24/2018 06:24 AM, Mike Kravetz wrote:
>> With v4.17, I can see an issue like those addressed in commits 3c605096d315
>> ("mm/page_alloc: restrict max order of merging on isolated pageblock")
>> and d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and
>> other pageblocks"). After running a CMA stress test for a while, I see:
>> MemTotal: 8168384 kB
>> MemFree: 8457232 kB
>> MemAvailable: 9204844 kB
>> If I let the test run, MemFree and MemAvailable will continue to grow.
>>
>> I am certain the issue is with pageblocks of migratetype ISOLATED. If
>> I disable all special 'is_migrate_isolate' checks in freepage accounting,
>> the issue goes away.
>
> That means you count isolated pages as freepages, right?
Yes, I know it is not correct. But, just wanted to eliminate the
isolated pageblock special case for experimentation.
>> Further, I am pretty sure the issue has to do with
>> pageblock merging and or page orders spanning pageblocks. If I make
>> pageblock_order equal MAX_ORDER-1, the issue also goes away.
>
> Interesting, that should only matter in __free_one_page(). Do you have
> page guards enabled?
Nope, no page guards.
Do note that in this case, I added back all the special 'is_migrate_isolate'
checks. So, just stock 4.17 with the change to make pageblock_order equal
MAX_ORDER-1.
>> Just looking for suggesting in where/how to debug. I've been hacking on
>> this without much success.
As mentioned in my reply to Laura, I noticed that move_freepages_block()
can move more than a pageblock of pages. This is the case where page_order
of the (first) free page is > pageblock_order. Should only happen in the
set_migratetype_isolate case as unset has that check you added. This
generally 'works' as alloc_contig_range rounds up to MAX_ORDER(-1). So,
set and unset migrate isolate tend to balance out. But, I am wondering
if there might be some kind of race where someone could mess with those
pageblocks (and freepage counts) while we drop the zone lock. Trying to
put together a quick hack to test this theory, but it is more complicated
that first thought. :)
--
Mike Kravetz
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: freepage accounting bug with CMA/migrate isolation
2018-07-26 16:50 ` Mike Kravetz
@ 2018-07-26 21:26 ` Vlastimil Babka
0 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2018-07-26 21:26 UTC (permalink / raw)
To: Mike Kravetz, linux-mm; +Cc: 'Joonsoo Kim', Laura Abbott
On 07/26/2018 06:50 PM, Mike Kravetz wrote:
> On 07/26/2018 05:28 AM, Vlastimil Babka wrote:
>>> Just looking for suggesting in where/how to debug. I've been hacking on
>>> this without much success.
>
> As mentioned in my reply to Laura, I noticed that move_freepages_block()
> can move more than a pageblock of pages. This is the case where page_order
> of the (first) free page is > pageblock_order. Should only happen in the
> set_migratetype_isolate case as unset has that check you added. Thi
Hmm not sure which "check I added" you mean, in
unset_migratetype_isolate() ?
> generally 'works' as alloc_contig_range rounds up to MAX_ORDER(-1). So,
> set and unset migrate isolate tend to balance out. But, I am wondering
> if there might be some kind of race where someone could mess with those
> pageblocks (and freepage counts) while we drop the zone lock. Trying to
Yeah see my other mail for such race when we drop the zone lock in
unset_migratetype_isolate(). set_migratetype_isolate() would also have
this problem (which would result in *less* freepages counted), but if we
move MAX_ORDER-1 pages to MIGRATE_ISOLATE freelist, then nobody can mess
with them while the zone is locked, as they are isolated.
unset_migratetype_isolate() has no such luck.
> put together a quick hack to test this theory, but it is more complicated
> that first thought. :)
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-07-26 21:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-24 4:24 freepage accounting bug with CMA/migrate isolation Mike Kravetz
2018-07-24 22:38 ` Laura Abbott
2018-07-25 0:46 ` Mike Kravetz
2018-07-26 12:28 ` Vlastimil Babka
2018-07-26 16:50 ` Mike Kravetz
2018-07-26 21:26 ` Vlastimil Babka
2018-07-26 13:30 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox