linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* v5.1-rc5 s390x WARNING
@ 2019-04-17  8:35 Li Wang
  2019-04-17  8:54 ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Li Wang @ 2019-04-17  8:35 UTC (permalink / raw)
  To: Mel Gorman, Vlastimil Babka, Minchan Kim; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 2781 bytes --]

Hi there,

I catched this warning on v5.1-rc5(s390x). It was trggiered in fork &
malloc & memset stress test, but the reproduced rate is very low. I'm
working on find a stable reproducer for it.

Anyone can have a look first?

[ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777
__alloc_pages_irect_compact+0x182/0x190
[ 1422.124065] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver
 nfs lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390
des_s390 des_g
eneric sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc ip_tables xfs
libcrc32c d
asd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod qeth qdio lcs ctcm ccwgroup
fsm dm_mi
rror dm_region_hash dm_log dm_mod
[ 1422.124086] CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not
tainted 5.1.0-rc 5 #1
[ 1422.124089] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[ 1422.124092] Krnl PSW : 0704e00180000000 00000000002779ba
(__alloc_pages_direct_compact+0x182/0x190)
[ 1422.124096]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3
CC:2 PM:0 RI: 0 EA:3
[ 1422.124100] Krnl GPRS: 0000000000000000 000003e00226fc24
000003d081bdf200 000 0000000000001
[ 1422.124103]            000000000027789a 0000000000000000
0000000000000001 000 000000006ee03
[ 1422.124107]            000003e00226fc28 0000000000000cc0
0000000000000240 000 0000000000002
[ 1422.124156]            0000000000400000 0000000000753cb0
000000000027789a 000 003e00226fa28
[ 1422.124163] Krnl Code: 00000000002779ac: e320f0a80002        ltg
 %r2,168( %r15)
[ 1422.124163]            00000000002779b2: a784fff4            brc
 8,27799a
[ 1422.124163]           #00000000002779b6: a7f40001            brc
 15,2779b 8
[ 1422.124163]           >00000000002779ba: a7290000            lghi
 %r2,0
[ 1422.124163]            00000000002779be: a7f4fff0            brc
 15,27799 e
[ 1422.124163]            00000000002779c2: 0707                bcr
 0,%r7
[ 1422.124163]            00000000002779c4: 0707                bcr
 0,%r7
[ 1422.124163]            00000000002779c6: 0707                bcr
 0,%r7
[ 1422.124194] Call Trace:
[ 1422.124196] ([<000000000027789a>]
__alloc_pages_direct_compact+0x62/0x190)
[ 1422.124198]  [<0000000000278618>]
__alloc_pages_nodemask+0x728/0x1148
[ 1422.124201]  [<0000000000126bb2>] crst_table_alloc+0x32/0x68
[ 1422.124203]  [<0000000000135888>] mm_init+0x118/0x308
[ 1422.124204]  [<0000000000137e60>]
copy_process.part.49+0x1820/0x1d90
[ 1422.124205]  [<000000000013865c>] _do_fork+0x114/0x3b8
[ 1422.124206]  [<0000000000138aa4>] __s390x_sys_clone+0x44/0x58
[ 1422.124210]  [<0000000000739a90>] system_call+0x288/0x2a8
[ 1422.124210] Last Breaking-Event-Address:
[ 1422.124212]  [<00000000002779b6>]
__alloc_pages_direct_compact+0x17e/0x190
[ 1422.124213] ---[ end trace 36649eaa36968eaa ]---

-- 
Regards,
Li Wang

[-- Attachment #2: Type: text/html, Size: 4620 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: v5.1-rc5 s390x WARNING
  2019-04-17  8:35 v5.1-rc5 s390x WARNING Li Wang
@ 2019-04-17  8:54 ` Vlastimil Babka
  2019-04-18 13:54   ` Mel Gorman
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2019-04-17  8:54 UTC (permalink / raw)
  To: Li Wang, Mel Gorman, Minchan Kim; +Cc: linux-mm

On 4/17/19 10:35 AM, Li Wang wrote:
> Hi there,
> 
> I catched this warning on v5.1-rc5(s390x). It was trggiered in fork & malloc & memset stress test, but the reproduced rate is very low. I'm working on find a stable reproducer for it. 
> 
> Anyone can have a look first?
> 
> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190

This means compaction was either skipped or deferred, yet it captured a
page. We have some registers with value 1 and 2, which is
COMPACT_SKIPPED and COMPACT_DEFERRED, so it could be one of those.
Probably COMPACT_SKIPPED. I think a race is possible:

- compact_zone_order() sets up current->capture_control
- compact_zone() calls compaction_suitable() which returns
COMPACT_SKIPPED, so it also returns
- interrupt comes and its processing happens to free a page that forms
high-order page, since 'current' isn't changed during interrupt (IIRC?)
the capture_control is still active and the page is captured
- compact_zone_order() does *capture = capc.page

What do you think, Mel, does it look plausible? Not sure whether we want
to try avoiding this scenario, or just remove the warning and be
grateful for the successful capture :)

> [ 1422.124065] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver 
>  nfs lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390 des_s390 des_g 
> eneric sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc ip_tables xfs libcrc32c d 
> asd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod qeth qdio lcs ctcm ccwgroup fsm dm_mi 
> rror dm_region_hash dm_log dm_mod                                                
> [ 1422.124086] CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1                                                                             
> [ 1422.124089] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)                      
> [ 1422.124092] Krnl PSW : 0704e00180000000 00000000002779ba (__alloc_pages_direct_compact+0x182/0x190)                                                           
> [ 1422.124096]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI: 0 EA:3                                                                           
> [ 1422.124100] Krnl GPRS: 0000000000000000 000003e00226fc24 000003d081bdf200 000 0000000000001                                                                    
> [ 1422.124103]            000000000027789a 0000000000000000 0000000000000001 000 000000006ee03                                                                    
> [ 1422.124107]            000003e00226fc28 0000000000000cc0 0000000000000240 000 0000000000002                                                                    
> [ 1422.124156]            0000000000400000 0000000000753cb0 000000000027789a 000 003e00226fa28                                                                    
> [ 1422.124163] Krnl Code: 00000000002779ac: e320f0a80002        ltg     %r2,168( %r15)                                                                            
> [ 1422.124163]            00000000002779b2: a784fff4            brc     8,27799a                                                                                  
> [ 1422.124163]           #00000000002779b6: a7f40001            brc     15,2779b 8                                                                                
> [ 1422.124163]           >00000000002779ba: a7290000            lghi    %r2,0    
> [ 1422.124163]            00000000002779be: a7f4fff0            brc     15,27799 e                                                                                
> [ 1422.124163]            00000000002779c2: 0707                bcr     0,%r7    
> [ 1422.124163]            00000000002779c4: 0707                bcr     0,%r7    
> [ 1422.124163]            00000000002779c6: 0707                bcr     0,%r7    
> [ 1422.124194] Call Trace:                                                       
> [ 1422.124196] ([<000000000027789a>] __alloc_pages_direct_compact+0x62/0x190)    
> [ 1422.124198]  [<0000000000278618>] __alloc_pages_nodemask+0x728/0x1148         
> [ 1422.124201]  [<0000000000126bb2>] crst_table_alloc+0x32/0x68                  
> [ 1422.124203]  [<0000000000135888>] mm_init+0x118/0x308                         
> [ 1422.124204]  [<0000000000137e60>] copy_process.part.49+0x1820/0x1d90          
> [ 1422.124205]  [<000000000013865c>] _do_fork+0x114/0x3b8                        
> [ 1422.124206]  [<0000000000138aa4>] __s390x_sys_clone+0x44/0x58                 
> [ 1422.124210]  [<0000000000739a90>] system_call+0x288/0x2a8                     
> [ 1422.124210] Last Breaking-Event-Address:                                      
> [ 1422.124212]  [<00000000002779b6>] __alloc_pages_direct_compact+0x17e/0x190    
> [ 1422.124213] ---[ end trace 36649eaa36968eaa ]---                              
> 
> -- 
> Regards,
> Li Wang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: v5.1-rc5 s390x WARNING
  2019-04-17  8:54 ` Vlastimil Babka
@ 2019-04-18 13:54   ` Mel Gorman
  2019-04-18 14:37     ` Matthew Wilcox
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Mel Gorman @ 2019-04-18 13:54 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Li Wang, Minchan Kim, linux-mm

On Wed, Apr 17, 2019 at 10:54:38AM +0200, Vlastimil Babka wrote:
> On 4/17/19 10:35 AM, Li Wang wrote:
> > Hi there,
> > 
> > I catched this warning on v5.1-rc5(s390x). It was trggiered in fork & malloc & memset stress test, but the reproduced rate is very low. I'm working on find a stable reproducer for it. 
> > 
> > Anyone can have a look first?
> > 
> > [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
> 
> This means compaction was either skipped or deferred, yet it captured a
> page. We have some registers with value 1 and 2, which is
> COMPACT_SKIPPED and COMPACT_DEFERRED, so it could be one of those.
> Probably COMPACT_SKIPPED. I think a race is possible:
> 
> - compact_zone_order() sets up current->capture_control
> - compact_zone() calls compaction_suitable() which returns
> COMPACT_SKIPPED, so it also returns
> - interrupt comes and its processing happens to free a page that forms
> high-order page, since 'current' isn't changed during interrupt (IIRC?)
> the capture_control is still active and the page is captured
> - compact_zone_order() does *capture = capc.page
> 
> What do you think, Mel, does it look plausible?

It's plausible, just extremely unlikely. I think the most likely result
was that a page filled the per-cpu lists and a bunch of pages got freed
in a batch from interrupt context.

> Not sure whether we want
> to try avoiding this scenario, or just remove the warning and be
> grateful for the successful capture :)
> 

Avoiding the scenario is pointless because it's not wrong. The check was
initially meant to catch serious programming errors such as using a
stale page pointer so I think the right patch is below. Li Wang, how
reproducible is this and would you be willing to test it?

---8<---
mm, page_alloc: Always use a captured page regardless of compaction result

During the development of commit 5e1f0f098b46 ("mm, compaction: capture
a page under direct compaction"), a paranoid check was added to ensure
that if a captured page was available after compaction that it was
consistent with the final state of compaction. The intent was to catch
serious programming bugs such as using a stale page pointer and causing
corruption problems.

However, it is possible to get a captured page even if compaction was
unsuccessful if an interrupt triggered and happened to free pages in
interrupt context that got merged into a suitable high-order page. It's
highly unlikely but Li Wang did report the following warning on s390

[ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
[ 1422.124065] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
 nfs lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390 des_s390
 des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc ip_tables xfs
 libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod qeth qdio lcs ctcm
 ccwgroup fsm dm_mirror dm_region_hash dm_log dm_mod
[ 1422.124086] CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1

This patch simply removes the check entirely instead of trying to be
clever about pages freed from interrupt context. If a serious programming
error was introduced, it is highly likely to be caught by prep_new_page()
instead.

Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
Reported-by: Li Wang <liwang@redhat.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d96ca5bc555b..cfaba3889fa2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3773,11 +3773,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	memalloc_noreclaim_restore(noreclaim_flag);
 	psi_memstall_leave(&pflags);
 
-	if (*compact_result <= COMPACT_INACTIVE) {
-		WARN_ON_ONCE(page);
-		return NULL;
-	}
-
 	/*
 	 * At least in one zone compaction wasn't deferred or skipped, so let's
 	 * count a compaction stall


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: v5.1-rc5 s390x WARNING
  2019-04-18 13:54   ` Mel Gorman
@ 2019-04-18 14:37     ` Matthew Wilcox
  2019-04-18 15:25       ` Mel Gorman
  2019-04-19  8:41     ` Li Wang
  2019-04-19 12:51     ` Vlastimil Babka
  2 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2019-04-18 14:37 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Vlastimil Babka, Li Wang, Minchan Kim, linux-mm

On Thu, Apr 18, 2019 at 02:54:52PM +0100, Mel Gorman wrote:
> > > [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190

We lost a character here?  "_irect_" should surely be "_direct_"

> ---8<---
> mm, page_alloc: Always use a captured page regardless of compaction result
> 
> During the development of commit 5e1f0f098b46 ("mm, compaction: capture
> a page under direct compaction"), a paranoid check was added to ensure
> that if a captured page was available after compaction that it was
> consistent with the final state of compaction. The intent was to catch
> serious programming bugs such as using a stale page pointer and causing
> corruption problems.
> 
> However, it is possible to get a captured page even if compaction was
> unsuccessful if an interrupt triggered and happened to free pages in
> interrupt context that got merged into a suitable high-order page. It's
> highly unlikely but Li Wang did report the following warning on s390
> 
> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190

... so it probably needs to be corrected here.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: v5.1-rc5 s390x WARNING
  2019-04-18 14:37     ` Matthew Wilcox
@ 2019-04-18 15:25       ` Mel Gorman
  2019-04-19 12:53         ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2019-04-18 15:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Vlastimil Babka, Li Wang, Minchan Kim, linux-mm

On Thu, Apr 18, 2019 at 07:37:12AM -0700, Matthew Wilcox wrote:
> On Thu, Apr 18, 2019 at 02:54:52PM +0100, Mel Gorman wrote:
> > > > [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
> 
> We lost a character here?  "_irect_" should surely be "_direct_"
> 

It confused me too but that was the bug report so I preserved the message
I was given.

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: v5.1-rc5 s390x WARNING
  2019-04-18 13:54   ` Mel Gorman
  2019-04-18 14:37     ` Matthew Wilcox
@ 2019-04-19  8:41     ` Li Wang
  2019-04-19  8:52       ` Mel Gorman
  2019-04-19 12:51     ` Vlastimil Babka
  2 siblings, 1 reply; 9+ messages in thread
From: Li Wang @ 2019-04-19  8:41 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Vlastimil Babka, Minchan Kim, linux-mm

[-- Attachment #1: Type: text/plain, Size: 4632 bytes --]

On Thu, Apr 18, 2019 at 9:55 PM Mel Gorman <mgorman@techsingularity.net>
wrote:

> On Wed, Apr 17, 2019 at 10:54:38AM +0200, Vlastimil Babka wrote:
> > On 4/17/19 10:35 AM, Li Wang wrote:
> > > Hi there,
> > >
> > > I catched this warning on v5.1-rc5(s390x). It was trggiered in fork &
> malloc & memset stress test, but the reproduced rate is very low. I'm
> working on find a stable reproducer for it.
> > >
> > > Anyone can have a look first?
> > >
> > > [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777
> __alloc_pages_irect_compact+0x182/0x190
> >
> > This means compaction was either skipped or deferred, yet it captured a
> > page. We have some registers with value 1 and 2, which is
> > COMPACT_SKIPPED and COMPACT_DEFERRED, so it could be one of those.
> > Probably COMPACT_SKIPPED. I think a race is possible:
> >
> > - compact_zone_order() sets up current->capture_control
> > - compact_zone() calls compaction_suitable() which returns
> > COMPACT_SKIPPED, so it also returns
> > - interrupt comes and its processing happens to free a page that forms
> > high-order page, since 'current' isn't changed during interrupt (IIRC?)
> > the capture_control is still active and the page is captured
> > - compact_zone_order() does *capture = capc.page
> >
> > What do you think, Mel, does it look plausible?
>
> It's plausible, just extremely unlikely. I think the most likely result
> was that a page filled the per-cpu lists and a bunch of pages got freed
> in a batch from interrupt context.
>
> > Not sure whether we want
> > to try avoiding this scenario, or just remove the warning and be
> > grateful for the successful capture :)
> >
>
> Avoiding the scenario is pointless because it's not wrong. The check was
> initially meant to catch serious programming errors such as using a
> stale page pointer so I think the right patch is below. Li Wang, how
> reproducible is this and would you be willing to test it?
>

It's not easy to reproduce that again. I just saw only once during the OOM
phase that occurred on my s390x platform.

Sure, I run the stress test against a new kernel(build with this patch
applied) for many rounds, so far so good.


>
> ---8<---
> mm, page_alloc: Always use a captured page regardless of compaction result
>
> During the development of commit 5e1f0f098b46 ("mm, compaction: capture
> a page under direct compaction"), a paranoid check was added to ensure
> that if a captured page was available after compaction that it was
> consistent with the final state of compaction. The intent was to catch
> serious programming bugs such as using a stale page pointer and causing
> corruption problems.
>
> However, it is possible to get a captured page even if compaction was
> unsuccessful if an interrupt triggered and happened to free pages in
> interrupt context that got merged into a suitable high-order page. It's
> highly unlikely but Li Wang did report the following warning on s390
>
> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777
> __alloc_pages_irect_compact+0x182/0x190
> [ 1422.124065] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver
>  nfs lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390 des_s390
>  des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc ip_tables xfs
>  libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod qeth qdio lcs ctcm
>  ccwgroup fsm dm_mirror dm_region_hash dm_log dm_mod
> [ 1422.124086] CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted
> 5.1.0-rc 5 #1
>
> This patch simply removes the check entirely instead of trying to be
> clever about pages freed from interrupt context. If a serious programming
> error was introduced, it is highly likely to be caught by prep_new_page()
> instead.
>
> Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct
> compaction")
> Reported-by: Li Wang <liwang@redhat.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/page_alloc.c | 5 -----
>  1 file changed, 5 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d96ca5bc555b..cfaba3889fa2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3773,11 +3773,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask,
> unsigned int order,
>         memalloc_noreclaim_restore(noreclaim_flag);
>         psi_memstall_leave(&pflags);
>
> -       if (*compact_result <= COMPACT_INACTIVE) {
> -               WARN_ON_ONCE(page);
> -               return NULL;
> -       }
> -
>         /*
>          * At least in one zone compaction wasn't deferred or skipped, so
> let's
>          * count a compaction stall
>


-- 
Regards,
Li Wang

[-- Attachment #2: Type: text/html, Size: 6184 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: v5.1-rc5 s390x WARNING
  2019-04-19  8:41     ` Li Wang
@ 2019-04-19  8:52       ` Mel Gorman
  0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2019-04-19  8:52 UTC (permalink / raw)
  To: Li Wang; +Cc: Vlastimil Babka, Minchan Kim, linux-mm

On Fri, Apr 19, 2019 at 04:41:24PM +0800, Li Wang wrote:
> > Avoiding the scenario is pointless because it's not wrong. The check was
> > initially meant to catch serious programming errors such as using a
> > stale page pointer so I think the right patch is below. Li Wang, how
> > reproducible is this and would you be willing to test it?
> >
> 
> It's not easy to reproduce that again. I just saw only once during the OOM
> phase that occurred on my s390x platform.
> 
> Sure, I run the stress test against a new kernel(build with this patch
> applied) for many rounds, so far so good.
> 

I think the patch is safe enough and have sent it on to Andrew. Thanks
for reporting and testing this, it's appreciated.

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: v5.1-rc5 s390x WARNING
  2019-04-18 13:54   ` Mel Gorman
  2019-04-18 14:37     ` Matthew Wilcox
  2019-04-19  8:41     ` Li Wang
@ 2019-04-19 12:51     ` Vlastimil Babka
  2 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-04-19 12:51 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Li Wang, Minchan Kim, linux-mm

On 4/18/19 3:54 PM, Mel Gorman wrote:
> On Wed, Apr 17, 2019 at 10:54:38AM +0200, Vlastimil Babka wrote:
>> On 4/17/19 10:35 AM, Li Wang wrote:
>>> Hi there,
>>>
>>> I catched this warning on v5.1-rc5(s390x). It was trggiered in fork & malloc & memset stress test, but the reproduced rate is very low. I'm working on find a stable reproducer for it. 
>>>
>>> Anyone can have a look first?
>>>
>>> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
>>
>> This means compaction was either skipped or deferred, yet it captured a
>> page. We have some registers with value 1 and 2, which is
>> COMPACT_SKIPPED and COMPACT_DEFERRED, so it could be one of those.
>> Probably COMPACT_SKIPPED. I think a race is possible:
>>
>> - compact_zone_order() sets up current->capture_control
>> - compact_zone() calls compaction_suitable() which returns
>> COMPACT_SKIPPED, so it also returns
>> - interrupt comes and its processing happens to free a page that forms
>> high-order page, since 'current' isn't changed during interrupt (IIRC?)
>> the capture_control is still active and the page is captured
>> - compact_zone_order() does *capture = capc.page
>>
>> What do you think, Mel, does it look plausible?
> 
> It's plausible, just extremely unlikely. I think the most likely result
> was that a page filled the per-cpu lists and a bunch of pages got freed
> in a batch from interrupt context.

Sure, good point. Per-cpu lists make the scenario even more rare, but
once it's full, there's a higher change the batch free from the
interrupt will result in high-order page being formed.

>> Not sure whether we want
>> to try avoiding this scenario, or just remove the warning and be
>> grateful for the successful capture :)
>>
> 
> Avoiding the scenario is pointless because it's not wrong. The check was
> initially meant to catch serious programming errors such as using a
> stale page pointer so I think the right patch is below. Li Wang, how
> reproducible is this and would you be willing to test it?
> 
> ---8<---
> mm, page_alloc: Always use a captured page regardless of compaction result
> 
> During the development of commit 5e1f0f098b46 ("mm, compaction: capture
> a page under direct compaction"), a paranoid check was added to ensure
> that if a captured page was available after compaction that it was
> consistent with the final state of compaction. The intent was to catch
> serious programming bugs such as using a stale page pointer and causing
> corruption problems.
> 
> However, it is possible to get a captured page even if compaction was
> unsuccessful if an interrupt triggered and happened to free pages in
> interrupt context that got merged into a suitable high-order page. It's
> highly unlikely but Li Wang did report the following warning on s390
> 
> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
> [ 1422.124065] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
>  nfs lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390 des_s390
>  des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc ip_tables xfs
>  libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod qeth qdio lcs ctcm
>  ccwgroup fsm dm_mirror dm_region_hash dm_log dm_mod
> [ 1422.124086] CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1
> 
> This patch simply removes the check entirely instead of trying to be
> clever about pages freed from interrupt context. If a serious programming
> error was introduced, it is highly likely to be caught by prep_new_page()
> instead.
> 
> Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
> Reported-by: Li Wang <liwang@redhat.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Yup, no need for a Cc: stable on a very rare WARN_ON_ONCE. So the AI
will pick it anyway...

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d96ca5bc555b..cfaba3889fa2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3773,11 +3773,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  	memalloc_noreclaim_restore(noreclaim_flag);
>  	psi_memstall_leave(&pflags);
>  
> -	if (*compact_result <= COMPACT_INACTIVE) {
> -		WARN_ON_ONCE(page);
> -		return NULL;
> -	}
> -
>  	/*
>  	 * At least in one zone compaction wasn't deferred or skipped, so let's
>  	 * count a compaction stall
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: v5.1-rc5 s390x WARNING
  2019-04-18 15:25       ` Mel Gorman
@ 2019-04-19 12:53         ` Vlastimil Babka
  0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-04-19 12:53 UTC (permalink / raw)
  To: Mel Gorman, Matthew Wilcox; +Cc: Li Wang, Minchan Kim, linux-mm

On 4/18/19 5:25 PM, Mel Gorman wrote:
> On Thu, Apr 18, 2019 at 07:37:12AM -0700, Matthew Wilcox wrote:
>> On Thu, Apr 18, 2019 at 02:54:52PM +0100, Mel Gorman wrote:
>>>>> [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190
>>
>> We lost a character here?  "_irect_" should surely be "_direct_"
>>
> 
> It confused me too but that was the bug report so I preserved the message
> I was given.

I think it was a result of manual wrapping fixup. See how module list is
wrapped around the same column, and the spaces at the same column in the
register dump. Btw this line has the function name fine:

[ 1422.124092] Krnl PSW : 0704e00180000000 00000000002779ba
(__alloc_pages_direct_compact+0x182/0x190)


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-04-19 12:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17  8:35 v5.1-rc5 s390x WARNING Li Wang
2019-04-17  8:54 ` Vlastimil Babka
2019-04-18 13:54   ` Mel Gorman
2019-04-18 14:37     ` Matthew Wilcox
2019-04-18 15:25       ` Mel Gorman
2019-04-19 12:53         ` Vlastimil Babka
2019-04-19  8:41     ` Li Wang
2019-04-19  8:52       ` Mel Gorman
2019-04-19 12:51     ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox