linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: 김재원 <jaewon31.kim@samsung.com>
To: Vlastimil Babka <vbabka@suse.cz>, Baoquan He <bhe@redhat.com>,
	Jaewon Kim <jaewon31.kim@gmail.com>
Cc: 김재원 <jaewon31.kim@samsung.com>,
	"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
	"minchan@kernel.org" <minchan@kernel.org>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	이용택 <ytk.lee@samsung.com>, 김철민 <cmlaika.kim@samsung.com>
Subject: RE:(2) [PATCH v2] page_alloc: consider highatomic reserve in wmartermark fast
Date: Tue, 16 Jun 2020 16:30:02 +0900	[thread overview]
Message-ID: <20200616073002epcms1p12a14cc85b0aebf8ba88bedee7495b34f@epcms1p1> (raw)
In-Reply-To: <26a6a2c2-56e6-1a34-84ac-8ff9f74c34ce@suse.cz>

>On 6/14/20 1:17 AM, Baoquan He wrote:
>> On 06/13/20 at 10:08pm, Jaewon Kim wrote:
>> ...
>>> > > This is an example of ALLOC_HARDER allocation failure.
>>> > >
>>> > > <4>[ 6207.637280]  [3:  Binder:9343_3:22875] Binder:9343_3: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
>>> > > <4>[ 6207.637311]  [3:  Binder:9343_3:22875] Call trace:
>>> > > <4>[ 6207.637346]  [3:  Binder:9343_3:22875] [<ffffff8008f40f8c>] dump_stack+0xb8/0xf0
>>> > > <4>[ 6207.637356]  [3:  Binder:9343_3:22875] [<ffffff8008223320>] warn_alloc+0xd8/0x12c
>>> > > <4>[ 6207.637365]  [3:  Binder:9343_3:22875] [<ffffff80082245e4>] __alloc_pages_nodemask+0x120c/0x1250
>>> > > <4>[ 6207.637374]  [3:  Binder:9343_3:22875] [<ffffff800827f6e8>] new_slab+0x128/0x604
>>> > > <4>[ 6207.637381]  [3:  Binder:9343_3:22875] [<ffffff800827b0cc>] ___slab_alloc+0x508/0x670
>>> > > <4>[ 6207.637387]  [3:  Binder:9343_3:22875] [<ffffff800827ba00>] __kmalloc+0x2f8/0x310
>>> > > <4>[ 6207.637396]  [3:  Binder:9343_3:22875] [<ffffff80084ac3e0>] context_struct_to_string+0x104/0x1cc
>>> > > <4>[ 6207.637404]  [3:  Binder:9343_3:22875] [<ffffff80084ad8fc>] security_sid_to_context_core+0x74/0x144
>>> > > <4>[ 6207.637412]  [3:  Binder:9343_3:22875] [<ffffff80084ad880>] security_sid_to_context+0x10/0x18
>>> > > <4>[ 6207.637421]  [3:  Binder:9343_3:22875] [<ffffff800849bd80>] selinux_secid_to_secctx+0x20/0x28
>>> > > <4>[ 6207.637430]  [3:  Binder:9343_3:22875] [<ffffff800849109c>] security_secid_to_secctx+0x3c/0x70
>>> > > <4>[ 6207.637442]  [3:  Binder:9343_3:22875] [<ffffff8008bfe118>] binder_transaction+0xe68/0x454c
>>> > > <4>[ 6207.637569]  [3:  Binder:9343_3:22875] Mem-Info:
>>> > > <4>[ 6207.637595]  [3:  Binder:9343_3:22875] active_anon:102061 inactive_anon:81551 isolated_anon:0
>>> > > <4>[ 6207.637595]  [3:  Binder:9343_3:22875]  active_file:59102 inactive_file:68924 isolated_file:64
>>> > > <4>[ 6207.637595]  [3:  Binder:9343_3:22875]  unevictable:611 dirty:63 writeback:0 unstable:0
>>> > > <4>[ 6207.637595]  [3:  Binder:9343_3:22875]  slab_reclaimable:13324 slab_unreclaimable:44354
>>> > > <4>[ 6207.637595]  [3:  Binder:9343_3:22875]  mapped:83015 shmem:4858 pagetables:26316 bounce:0
>>> > > <4>[ 6207.637595]  [3:  Binder:9343_3:22875]  free:2727 free_pcp:1035 free_cma:178
>>> > > <4>[ 6207.637616]  [3:  Binder:9343_3:22875] Node 0 active_anon:408244kB inactive_anon:326204kB active_file:236408kB inactive_file:275696kB unevictable:2444kB isolated(anon):0kB isolated(file):256kB mapped:332060kB dirty:252kB writeback:0kB shmem:19432kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>>> > > <4>[ 6207.637627]  [3:  Binder:9343_3:22875] Normal free:10908kB min:6192kB low:44388kB high:47060kB active_anon:409160kB inactive_anon:325924kB active_file:235820kB inactive_file:276628kB unevictable:2444kB writepending:252kB present:3076096kB managed:2673676kB mlocked:2444kB kernel_stack:62512kB pagetables:105264kB bounce:0kB free_pcp:4140kB local_pcp:40kB free_cma:712kB
>> 
>> Checked this mem info, wondering why there's no 'reserved_highatomic'
>> printing in all these examples.
> 
>Yeah, it better be printed, especially after it's included in watermark
>calculation, so we're less confused by reports of allocation failure where
>watermarks are seemingly ok.
> 


Hello Vlastimil and Baoquan

The log in previous mail was captured from kernel based on v4.14.
After adding the reserved_highatomic log, I finally got a new log below
Let me change description in next patch.

There seems be reserved_highatomic:32768KB and actually 14232kB free.

[ 4738.329298] kswapd0: page allocation failure: order:0, mode:0x140000a(GFP_NOIO|__GFP_HIGHMEM|__GFP_MOVABLE), nodemask=(null)
[ 4738.329325] kswapd0 cpuset=/ mems_allowed=0
[ 4738.329339] CPU: 4 PID: 1221 Comm: kswapd0 Not tainted 4.14.113-18770262-userdebug #1
[ 4738.329350] Call trace: 
[ 4738.329366] [<0000000000000000>] dump_backtrace+0x0/0x248
[ 4738.329377] [<0000000000000000>] show_stack+0x18/0x20
[ 4738.329390] [<0000000000000000>] __dump_stack+0x20/0x28
[ 4738.329398] [<0000000000000000>] dump_stack+0x68/0x90
[ 4738.329409] [<0000000000000000>] warn_alloc+0x104/0x198
[ 4738.329417] [<0000000000000000>] __alloc_pages_nodemask+0xdc0/0xdf0
[ 4738.329427] [<0000000000000000>] zs_malloc+0x148/0x3d0
[ 4738.329438] [<0000000000000000>] zram_bvec_rw+0x410/0x798
[ 4738.329446] [<0000000000000000>] zram_rw_page+0x88/0xdc
[ 4738.329455] [<0000000000000000>] bdev_write_page+0x70/0xbc
[ 4738.329463] [<0000000000000000>] __swap_writepage+0x58/0x37c
[ 4738.329469] [<0000000000000000>] swap_writepage+0x40/0x4c
[ 4738.329478] [<0000000000000000>] shrink_page_list+0xc30/0xf48
[ 4738.329486] [<0000000000000000>] shrink_inactive_list+0x2b0/0x61c
[ 4738.329494] [<0000000000000000>] shrink_node_memcg+0x23c/0x618
[ 4738.329501] [<0000000000000000>] shrink_node+0x1c8/0x304
[ 4738.329509] [<0000000000000000>] kswapd+0x680/0x7c4
[ 4738.329518] [<0000000000000000>] kthread+0x110/0x120
[ 4738.329527] [<0000000000000000>] ret_from_fork+0x10/0x18
[ 4738.329538] Mem-Info:
[ 4738.329574] active_anon:111826 inactive_anon:65557 isolated_anon:0\x0a active_file:44260 inactive_file:83422 isolated_file:0\x0a unevictable:4158 dirty:117 writeback:0 unstable:0\x0a            slab_reclaimable:13943 slab_unreclaimable:43315\x0a mapped:102511 shmem:3299 pagetables:19566 bounce:0\x0a free:3510 free_pcp:553 free_cma:0
[ 4738.329593] Node 0 active_anon:447304kB inactive_anon:262228kB active_file:177040kB inactive_file:333688kB unevictable:16632kB isolated(anon):0kB isolated(file):0kB mapped:410044kB d           irty:468kB writeback:0kB shmem:13196kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 4738.329603] Normal free:14040kB min:7440kB low:94500kB high:98136kB reserved_highatomic:32768KB active_anon:447336kB inactive_anon:261668kB active_file:177572kB inactive_file:333768k           B unevictable:16632kB writepending:480kB present:4081664kB managed:3637088kB mlocked:16632kB kernel_stack:47072kB pagetables:78264kB bounce:0kB free_pcp:2280kB local_pcp:720kB free_cma:0kB        [ 4738.329607] lowmem_reserve[]: 0 0
[ 4738.329615] Normal: 860*4kB (H) 453*8kB (H) 180*16kB (H) 26*32kB (H) 34*64kB (H) 6*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14232kB


>...
> 
>>> > >       /*
>>> > >        * Fast check for order-0 only. If this fails then the reserves
>>> > > @@ -3598,9 +3604,12 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
>>> > >        * the caller is !atomic then it'll uselessly search the free
>>> > >        * list. That corner case is then slower but it is harmless.
>>> >
>>> > Do we need remove or adjust the code comment at this place? So Mel have
>>> > foreseen the corner case, just reclaiming to unreserve the highatomic
>>> > might be ignored.
>>> >
>>> 
>>> Hello thank you for your comment.
>>> 
>>> My previous mail to Vlastimil Babka seems to have html.
>>> Let me put  again here because I also think the comment should be changed.
>>> I'd like to hear opinions from the open source community.
>> 
>> Yeah, your replying mail to Vlastimil looks a little messy on format, I
>> didn't scroll down to check.
>> 
>>> 
>>> Additionally actually I think we need accurate counting of highatomic
>>> free after this patch.
>>> 
>>> ----------------------------------------------------------------------------------------
>>> 
>>> As Mel also agreed with me in v1 mail thread, this highatomic reserved should
>>> be considered even in watermark fast.
>>> 
>>> The comment, I think, may need to be changed. Prior to this patch, non
>>> highatomic
>>> allocation may do useless search, but it also can take ALL non highatomic free.
>>> 
>>> With this patch, non highatomic allocation will NOT do useless search. Rather,
>>> it may be required direct reclamation even when there are some non
>>> high atomic free.
>>> 
>>> i.e)
>>> In following situation, watermark check fails (9MB - 8MB < 4MB) though there are
>>> enough free (9MB - 4MB > 4MB). If this is really matter, we need to
>>> count highatomic
>>> free accurately.
>> 
>> Seems to make sense. We only use zone->nr_reserved_highatomic to account
>> the reserved highatomic, don't track how much have been used for
>> highatomic allocation. But not sure if this will happen often, or just a
>> very rare case, whether taking that into account will impact anything.
> 
>Unfortunately there's a problem when trying to account free pages of a migrate
>type exactly, as e.g. during reserve_highatomic_pageblock(), some pages might be
>in pcplist of other cpu with other migratetype, and once they are freed, the
>buddy merging will merge the different migratetypes and distort the accounting.
>Fixing this for all migratetypes would have performance overhead, so we only do
>that for MIGRATE_ISOLATE which is not that frequent (and it took a while to
>eliminate all corner cases), and CMA which doesn't change pageblocks dynamically.

AFAIK we do not account free cma in pcp either. But yes accurate check could be 
overhead. For example, __mod_zone_freepage_state should account highatomic free 
as cma free. And we may see some incorrect accounting issue.

> 
>So either we live with the possible overreclaim due to inaccurate counting per
>your example above, or we instead let order-0 atomic allocations use highatomic
>reserves.
>

Additionally regarding existing Mel's comment, let me remove some of it if you
don't mind.

        /*
         * Fast check for order-0 only. If this fails then the reserves
-        * need to be calculated. There is a corner case where the check
-        * passes but only the high-order atomic reserve are free. If
-        * the caller is !atomic then it'll uselessly search the free
-        * list. That corner case is then slower but it is harmless.
+        * need to be calculated.
         */

I will prepare v3 patch.
Thank you again.


  parent reply	other threads:[~2020-06-16  7:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200612085027epcas1p46383a7eda792eabbd1e74cd08fe988c9@epcas1p4.samsung.com>
2020-06-13  2:51 ` Jaewon Kim
2020-06-12 14:34   ` Vlastimil Babka
2020-06-13  4:16     ` Jaewon Kim
2020-06-15 11:25       ` Vlastimil Babka
2020-06-13  9:42   ` Baoquan He
2020-06-13 13:08     ` Jaewon Kim
2020-06-13 23:17       ` Baoquan He
2020-06-15 11:36         ` Vlastimil Babka
     [not found]         ` <CGME20200612085027epcas1p46383a7eda792eabbd1e74cd08fe988c9@epcms1p1>
2020-06-16  7:30           ` 김재원 [this message]
2020-06-16 14:17             ` (2) " Baoquan He
2020-06-16 15:46               ` Jaewon Kim
2020-06-17  4:03                 ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200616073002epcms1p12a14cc85b0aebf8ba88bedee7495b34f@epcms1p1 \
    --to=jaewon31.kim@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=cmlaika.kim@samsung.com \
    --cc=hannes@cmpxchg.org \
    --cc=jaewon31.kim@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=minchan@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=ytk.lee@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox