* Re: v5.1-rc5 s390x WARNING
2019-04-18 13:54 ` Mel Gorman
@ 2019-04-18 14:37 ` Matthew Wilcox
2019-04-18 15:25 ` Mel Gorman
2019-04-19 8:41 ` Li Wang
2019-04-19 12:51 ` Vlastimil Babka
2 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2019-04-18 14:37 UTC (permalink / raw)
To: Mel Gorman; +Cc: Vlastimil Babka, Li Wang, Minchan Kim, linux-mm
On Thu, Apr 18, 2019 at 02:54:52PM +0100, Mel Gorman wrote:
> > > [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
We lost a character here? "_irect_" should surely be "_direct_"
> ---8<---
> mm, page_alloc: Always use a captured page regardless of compaction result
>
> During the development of commit 5e1f0f098b46 ("mm, compaction: capture
> a page under direct compaction"), a paranoid check was added to ensure
> that if a captured page was available after compaction that it was
> consistent with the final state of compaction. The intent was to catch
> serious programming bugs such as using a stale page pointer and causing
> corruption problems.
>
> However, it is possible to get a captured page even if compaction was
> unsuccessful if an interrupt triggered and happened to free pages in
> interrupt context that got merged into a suitable high-order page. It's
> highly unlikely but Li Wang did report the following warning on s390
>
> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
... so it probably needs to be corrected here.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: v5.1-rc5 s390x WARNING
2019-04-18 14:37 ` Matthew Wilcox
@ 2019-04-18 15:25 ` Mel Gorman
2019-04-19 12:53 ` Vlastimil Babka
0 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2019-04-18 15:25 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Vlastimil Babka, Li Wang, Minchan Kim, linux-mm
On Thu, Apr 18, 2019 at 07:37:12AM -0700, Matthew Wilcox wrote:
> On Thu, Apr 18, 2019 at 02:54:52PM +0100, Mel Gorman wrote:
> > > > [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
>
> We lost a character here? "_irect_" should surely be "_direct_"
>
It confused me too but that was the bug report so I preserved the message
I was given.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: v5.1-rc5 s390x WARNING
2019-04-18 15:25 ` Mel Gorman
@ 2019-04-19 12:53 ` Vlastimil Babka
0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-04-19 12:53 UTC (permalink / raw)
To: Mel Gorman, Matthew Wilcox; +Cc: Li Wang, Minchan Kim, linux-mm
On 4/18/19 5:25 PM, Mel Gorman wrote:
> On Thu, Apr 18, 2019 at 07:37:12AM -0700, Matthew Wilcox wrote:
>> On Thu, Apr 18, 2019 at 02:54:52PM +0100, Mel Gorman wrote:
>>>>> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
>>
>> We lost a character here? "_irect_" should surely be "_direct_"
>>
>
> It confused me too but that was the bug report so I preserved the message
> I was given.
I think it was a result of manual wrapping fixup. See how module list is
wrapped around the same column, and the spaces at the same column in the
register dump. Btw this line has the function name fine:
[ 1422.124092] Krnl PSW : 0704e00180000000 00000000002779ba
(__alloc_pages_direct_compact+0x182/0x190)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: v5.1-rc5 s390x WARNING
2019-04-18 13:54 ` Mel Gorman
2019-04-18 14:37 ` Matthew Wilcox
@ 2019-04-19 8:41 ` Li Wang
2019-04-19 8:52 ` Mel Gorman
2019-04-19 12:51 ` Vlastimil Babka
2 siblings, 1 reply; 9+ messages in thread
From: Li Wang @ 2019-04-19 8:41 UTC (permalink / raw)
To: Mel Gorman; +Cc: Vlastimil Babka, Minchan Kim, linux-mm
[-- Attachment #1: Type: text/plain, Size: 4632 bytes --]
On Thu, Apr 18, 2019 at 9:55 PM Mel Gorman <mgorman@techsingularity.net>
wrote:
> On Wed, Apr 17, 2019 at 10:54:38AM +0200, Vlastimil Babka wrote:
> > On 4/17/19 10:35 AM, Li Wang wrote:
> > > Hi there,
> > >
> > > I catched this warning on v5.1-rc5(s390x). It was trggiered in fork &
> malloc & memset stress test, but the reproduced rate is very low. I'm
> working on find a stable reproducer for it.
> > >
> > > Anyone can have a look first?
> > >
> > > [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777
> __alloc_pages_irect_compact+0x182/0x190
> >
> > This means compaction was either skipped or deferred, yet it captured a
> > page. We have some registers with value 1 and 2, which is
> > COMPACT_SKIPPED and COMPACT_DEFERRED, so it could be one of those.
> > Probably COMPACT_SKIPPED. I think a race is possible:
> >
> > - compact_zone_order() sets up current->capture_control
> > - compact_zone() calls compaction_suitable() which returns
> > COMPACT_SKIPPED, so it also returns
> > - interrupt comes and its processing happens to free a page that forms
> > high-order page, since 'current' isn't changed during interrupt (IIRC?)
> > the capture_control is still active and the page is captured
> > - compact_zone_order() does *capture = capc.page
> >
> > What do you think, Mel, does it look plausible?
>
> It's plausible, just extremely unlikely. I think the most likely result
> was that a page filled the per-cpu lists and a bunch of pages got freed
> in a batch from interrupt context.
>
> > Not sure whether we want
> > to try avoiding this scenario, or just remove the warning and be
> > grateful for the successful capture :)
> >
>
> Avoiding the scenario is pointless because it's not wrong. The check was
> initially meant to catch serious programming errors such as using a
> stale page pointer so I think the right patch is below. Li Wang, how
> reproducible is this and would you be willing to test it?
>
It's not easy to reproduce that again. I just saw only once during the OOM
phase that occurred on my s390x platform.
Sure, I run the stress test against a new kernel(build with this patch
applied) for many rounds, so far so good.
>
> ---8<---
> mm, page_alloc: Always use a captured page regardless of compaction result
>
> During the development of commit 5e1f0f098b46 ("mm, compaction: capture
> a page under direct compaction"), a paranoid check was added to ensure
> that if a captured page was available after compaction that it was
> consistent with the final state of compaction. The intent was to catch
> serious programming bugs such as using a stale page pointer and causing
> corruption problems.
>
> However, it is possible to get a captured page even if compaction was
> unsuccessful if an interrupt triggered and happened to free pages in
> interrupt context that got merged into a suitable high-order page. It's
> highly unlikely but Li Wang did report the following warning on s390
>
> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777
> __alloc_pages_irect_compact+0x182/0x190
> [ 1422.124065] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver
> nfs lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390 des_s390
> des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc ip_tables xfs
> libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod qeth qdio lcs ctcm
> ccwgroup fsm dm_mirror dm_region_hash dm_log dm_mod
> [ 1422.124086] CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted
> 5.1.0-rc 5 #1
>
> This patch simply removes the check entirely instead of trying to be
> clever about pages freed from interrupt context. If a serious programming
> error was introduced, it is highly likely to be caught by prep_new_page()
> instead.
>
> Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct
> compaction")
> Reported-by: Li Wang <liwang@redhat.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
> mm/page_alloc.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d96ca5bc555b..cfaba3889fa2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3773,11 +3773,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask,
> unsigned int order,
> memalloc_noreclaim_restore(noreclaim_flag);
> psi_memstall_leave(&pflags);
>
> - if (*compact_result <= COMPACT_INACTIVE) {
> - WARN_ON_ONCE(page);
> - return NULL;
> - }
> -
> /*
> * At least in one zone compaction wasn't deferred or skipped, so
> let's
> * count a compaction stall
>
--
Regards,
Li Wang
[-- Attachment #2: Type: text/html, Size: 6184 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: v5.1-rc5 s390x WARNING
2019-04-19 8:41 ` Li Wang
@ 2019-04-19 8:52 ` Mel Gorman
0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2019-04-19 8:52 UTC (permalink / raw)
To: Li Wang; +Cc: Vlastimil Babka, Minchan Kim, linux-mm
On Fri, Apr 19, 2019 at 04:41:24PM +0800, Li Wang wrote:
> > Avoiding the scenario is pointless because it's not wrong. The check was
> > initially meant to catch serious programming errors such as using a
> > stale page pointer so I think the right patch is below. Li Wang, how
> > reproducible is this and would you be willing to test it?
> >
>
> It's not easy to reproduce that again. I just saw only once during the OOM
> phase that occurred on my s390x platform.
>
> Sure, I run the stress test against a new kernel(build with this patch
> applied) for many rounds, so far so good.
>
I think the patch is safe enough and have sent it on to Andrew. Thanks
for reporting and testing this, it's appreciated.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: v5.1-rc5 s390x WARNING
2019-04-18 13:54 ` Mel Gorman
2019-04-18 14:37 ` Matthew Wilcox
2019-04-19 8:41 ` Li Wang
@ 2019-04-19 12:51 ` Vlastimil Babka
2 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-04-19 12:51 UTC (permalink / raw)
To: Mel Gorman; +Cc: Li Wang, Minchan Kim, linux-mm
On 4/18/19 3:54 PM, Mel Gorman wrote:
> On Wed, Apr 17, 2019 at 10:54:38AM +0200, Vlastimil Babka wrote:
>> On 4/17/19 10:35 AM, Li Wang wrote:
>>> Hi there,
>>>
>>> I catched this warning on v5.1-rc5(s390x). It was trggiered in fork & malloc & memset stress test, but the reproduced rate is very low. I'm working on find a stable reproducer for it.
>>>
>>> Anyone can have a look first?
>>>
>>> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
>>
>> This means compaction was either skipped or deferred, yet it captured a
>> page. We have some registers with value 1 and 2, which is
>> COMPACT_SKIPPED and COMPACT_DEFERRED, so it could be one of those.
>> Probably COMPACT_SKIPPED. I think a race is possible:
>>
>> - compact_zone_order() sets up current->capture_control
>> - compact_zone() calls compaction_suitable() which returns
>> COMPACT_SKIPPED, so it also returns
>> - interrupt comes and its processing happens to free a page that forms
>> high-order page, since 'current' isn't changed during interrupt (IIRC?)
>> the capture_control is still active and the page is captured
>> - compact_zone_order() does *capture = capc.page
>>
>> What do you think, Mel, does it look plausible?
>
> It's plausible, just extremely unlikely. I think the most likely result
> was that a page filled the per-cpu lists and a bunch of pages got freed
> in a batch from interrupt context.
Sure, good point. Per-cpu lists make the scenario even more rare, but
once it's full, there's a higher change the batch free from the
interrupt will result in high-order page being formed.
>> Not sure whether we want
>> to try avoiding this scenario, or just remove the warning and be
>> grateful for the successful capture :)
>>
>
> Avoiding the scenario is pointless because it's not wrong. The check was
> initially meant to catch serious programming errors such as using a
> stale page pointer so I think the right patch is below. Li Wang, how
> reproducible is this and would you be willing to test it?
>
> ---8<---
> mm, page_alloc: Always use a captured page regardless of compaction result
>
> During the development of commit 5e1f0f098b46 ("mm, compaction: capture
> a page under direct compaction"), a paranoid check was added to ensure
> that if a captured page was available after compaction that it was
> consistent with the final state of compaction. The intent was to catch
> serious programming bugs such as using a stale page pointer and causing
> corruption problems.
>
> However, it is possible to get a captured page even if compaction was
> unsuccessful if an interrupt triggered and happened to free pages in
> interrupt context that got merged into a suitable high-order page. It's
> highly unlikely but Li Wang did report the following warning on s390
>
> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
> [ 1422.124065] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
> nfs lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390 des_s390
> des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc ip_tables xfs
> libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod qeth qdio lcs ctcm
> ccwgroup fsm dm_mirror dm_region_hash dm_log dm_mod
> [ 1422.124086] CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1
>
> This patch simply removes the check entirely instead of trying to be
> clever about pages freed from interrupt context. If a serious programming
> error was introduced, it is highly likely to be caught by prep_new_page()
> instead.
>
> Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
> Reported-by: Li Wang <liwang@redhat.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Yup, no need for a Cc: stable on a very rare WARN_ON_ONCE. So the AI
will pick it anyway...
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/page_alloc.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d96ca5bc555b..cfaba3889fa2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3773,11 +3773,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> memalloc_noreclaim_restore(noreclaim_flag);
> psi_memstall_leave(&pflags);
>
> - if (*compact_result <= COMPACT_INACTIVE) {
> - WARN_ON_ONCE(page);
> - return NULL;
> - }
> -
> /*
> * At least in one zone compaction wasn't deferred or skipped, so let's
> * count a compaction stall
>
^ permalink raw reply [flat|nested] 9+ messages in thread