From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E49EC3F2C6 for ; Tue, 3 Mar 2020 20:26:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 017B72146E for ; Tue, 3 Mar 2020 20:26:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="vZ2VIpQT" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 017B72146E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A4A386B000C; Tue, 3 Mar 2020 15:26:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F9986B000D; Tue, 3 Mar 2020 15:26:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9108E6B000E; Tue, 3 Mar 2020 15:26:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com [216.40.44.39]) by kanga.kvack.org (Postfix) with ESMTP id 7A6DC6B000C for ; Tue, 3 Mar 2020 15:26:27 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id EB0C6180AD802 for ; Tue, 3 Mar 2020 20:26:26 +0000 (UTC) X-FDA: 76555183572.09.field18_1b1341548e261 X-HE-Tag: field18_1b1341548e261 X-Filterd-Recvd-Size: 13910 Received: from mail-qv1-f67.google.com (mail-qv1-f67.google.com [209.85.219.67]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Tue, 3 Mar 2020 20:26:25 +0000 (UTC) Received: by mail-qv1-f67.google.com with SMTP id ea1so2317716qvb.7 for ; Tue, 03 Mar 2020 12:26:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Orfv1yTb4oeAPDU/RnlKlAoHeHhkbxLTUGbT/yeAjNk=; b=vZ2VIpQT2Kjq1SSHP23ukmtMsv0d1thJYmyaG9NRIWg6Q6dT1ockkV35GVsV7rfzn1 K5pRfnrdy7nOQGfhtmcX1Lgn6mP0pJjwud/HWMXR21HzZQc3iJTEUKqL803H6glx8t3D NqluExe4I9V6DKeOCTU4caDfrW1jF8OiCrht30CuaFzrW+jLaZ9o1qeARAQKSX3XT2WF x8pEHbtCclqWpdDM6ZUcIzpEG6op2d9uTrSvuQ7+IBGdco23kSi+DADr4UIzFdHPq+iA dyCkuhCudwZVpmE629j3Y2iSb6a4xEC3hFXGOIs9WHSO27Y+sJKJvSKfmi0vcvkPzc/t 3+Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Orfv1yTb4oeAPDU/RnlKlAoHeHhkbxLTUGbT/yeAjNk=; b=g6LNG3XMeaEy0vzp4YZZkXogp6boov73PJt/CIKY7tQhB3H3tzjLoMDXSHoRGfk9MD 7chRFGvGSJCsN1eO29rxRhsZT+wPgtj3QpqiFn6RnADf3LboyTRFdOuLSKdxkhWN+o2m nfUrpFAk5yOF8jWAEJGZZfiFXtDA/UBKi/wq7V9v5kjMAJ/7nJrt1ODVdQobvOanI3+T kUbp8x6LZjzyIltiuclnKDyIh4WEVwhRlpL8dtybwljGGrT3xDeXA9X+Puh+BMD1PUGl tgquNY48ZF7XptGwAf3T654MbOixJiMQQdvBb9+/MpfXsdHivQ9ZhG/Y8bQjpvUpn/8O lzmA== X-Gm-Message-State: ANhLgQ2/SYzBYbRikT0qyDJzEa/YXC/jpsB6ADoWWiL5KyKbWKYix2/f W+tIWKGiV63p/2+OfhcmTYH0hWaqLck= X-Google-Smtp-Source: ADFU+vsjUp11YeT3cNTIHcmUBrOwzmEYZsgQ5GcGxIYcG0XEk3TzoCKOe/4U4liTyiET/M1wySd5GA== X-Received: by 2002:ad4:4e4a:: with SMTP id eb10mr5410290qvb.96.1583267185338; Tue, 03 Mar 2020 12:26:25 -0800 (PST) Received: from localhost ([2620:10d:c091:500::be1f]) by smtp.gmail.com with ESMTPSA id b25sm12638361qkh.6.2020.03.03.12.26.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Mar 2020 12:26:24 -0800 (PST) Date: Tue, 3 Mar 2020 15:26:23 -0500 From: Johannes Weiner To: Shakeel Butt Cc: Yang Shi , Tetsuo Handa , Naresh Kamboju , linux-mm , Andrew Morton , Mel Gorman , Michal Hocko , Dan Schatzberg Subject: Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs Message-ID: <20200303202623.GA68565@cmpxchg.org> References: <0a37bb7d-18a7-c43c-52a5-f13a34decf69@i-love.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote: > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi wrote: > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > wrote: > > > > > > Hello, Naresh. > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > This is > > > > > > /** > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > * @memcg: memcg to charge. > > > * > > > * This function marks the beginning of the remote memcg charging scope. All the > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > * given memcg. > > > * > > > * NOTE: This function is not nesting safe. > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > which is about memcg. Redirecting to linux-mm. > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > kworker") in linux-next, which converted loop driver to use worker per > > cgroup, so it may have multiple workers work at the mean time? > > > > So they may share the same "current", then it may cause kind of nested > > call to memalloc_use_memcg(). > > > > Could you please try the below debug patch? This is not the proper > > fix, but it may help us narrow down the problem. > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > index c49257a..1cc1cdc 100644 > > --- a/include/linux/sched/mm.h > > +++ b/include/linux/sched/mm.h > > @@ -320,6 +320,10 @@ static inline void > > memalloc_nocma_restore(unsigned int flags) > > */ > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > { > > + if ((current->flags & PF_KTHREAD) && > > + current->active_memcg) > > + return; > > + > > WARN_ON_ONCE(current->active_memcg); > > current->active_memcg = memcg; > > } > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. Yes, I think so. The stack trace: [ 98.137605] alloc_page_buffers+0x210/0x288 [ 98.141799] __getblk_gfp+0x1d4/0x400 [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 [ 98.154821] ext4_mb_init_group+0x270/0x390 [ 98.159014] ext4_mb_good_group+0x264/0x270 [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 [ 98.180592] ext4_writepages+0x830/0xf10 [ 98.184523] do_writepages+0xb4/0x198 [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 [ 98.197974] ext4_punch_hole+0x4a4/0x660 [ 98.201907] ext4_fallocate+0x294/0x1190 [ 98.205839] loop_process_work+0x690/0x1100 [ 98.210032] loop_workfn+0x2c/0x110 [ 98.213529] process_one_work+0x3e0/0x648 [ 98.217546] worker_thread+0x70/0x670 [ 98.221217] kthread+0x1b8/0x1c0 [ 98.224452] ret_from_fork+0x10/0x18 The loop kworker is instantiating cache pages on behalf of who queued the io request, but if the page already exists, the buffers should be allocated on behalf of who already owns the page. Nesting makes sense. Since the only difference between the use and unuse function is the warn when we nest, we can remove the unuse and do something like: old = memalloc_use_memcg(memcg); memalloc_use_memcg(old); What do you think? Patch below. It should go in before Dan's patches, and they in turn need a small update to save and restore active_memcg. (Since loop's use is from a kworker, it's unlikely that there is an outer scope. But it's probably best to keep this simple and robust.) --- >From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Tue, 3 Mar 2020 15:15:39 -0500 Subject: [PATCH] mm: support nesting memalloc_use_memcg() The memalloc_use_memcg() function to override the default memcg accounting context currently doesn't nest. But the patches to make the loop driver cgroup-aware will end up nesting: [ 98.137605] alloc_page_buffers+0x210/0x288 [ 98.141799] __getblk_gfp+0x1d4/0x400 [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 [ 98.154821] ext4_mb_init_group+0x270/0x390 [ 98.159014] ext4_mb_good_group+0x264/0x270 [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 [ 98.180592] ext4_writepages+0x830/0xf10 [ 98.184523] do_writepages+0xb4/0x198 [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 [ 98.197974] ext4_punch_hole+0x4a4/0x660 [ 98.201907] ext4_fallocate+0x294/0x1190 [ 98.205839] loop_process_work+0x690/0x1100 [ 98.210032] loop_workfn+0x2c/0x110 [ 98.213529] process_one_work+0x3e0/0x648 [ 98.217546] worker_thread+0x70/0x670 [ 98.221217] kthread+0x1b8/0x1c0 [ 98.224452] ret_from_fork+0x10/0x18 where loop_process_work() sets the memcg override to the memcg that submitted the IO request, and alloc_page_buffers() sets the override to the memcg that instantiated the cache page, which may differ. Make memalloc_use_memcg() return the old memcg and convert existing users to a stacking model. Delete the unused memalloc_unuse_memcg(). Signed-off-by: Johannes Weiner --- fs/buffer.c | 6 ++--- fs/notify/fanotify/fanotify.c | 5 +++-- fs/notify/inotify/inotify_fsnotify.c | 5 +++-- include/linux/sched/mm.h | 33 ++++++++++++---------------- 4 files changed, 23 insertions(+), 26 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index d8c7242426bb..54d5df14bd36 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, struct buffer_head *bh, *head; gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; long offset; - struct mem_cgroup *memcg; + struct mem_cgroup *memcg, *oldmemcg; if (retry) gfp |= __GFP_NOFAIL; memcg = get_mem_cgroup_from_page(page); - memalloc_use_memcg(memcg); + oldmemcg = memalloc_use_memcg(memcg); head = NULL; offset = PAGE_SIZE; @@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, set_bh_page(bh, page, offset); } out: - memalloc_unuse_memcg(); + memalloc_use_memcg(oldmemcg); mem_cgroup_put(memcg); return head; /* diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 5778d1347b35..cb596ad002d8 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; struct inode *id = fanotify_fid_inode(inode, mask, data, data_type); + struct mem_cgroup *oldmemcg; /* * For queues with unlimited length lost events are not expected and @@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, gfp |= __GFP_RETRY_MAYFAIL; /* Whoever is interested in the event, pays for the allocation. */ - memalloc_use_memcg(group->memcg); + oldmemcg = memalloc_use_memcg(group->memcg); if (fanotify_is_perm_event(mask)) { struct fanotify_perm_event *pevent; @@ -334,7 +335,7 @@ init: __maybe_unused event->path.dentry = NULL; } out: - memalloc_unuse_memcg(); + memalloc_use_memcg(oldmemcg); return event; } diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index d510223d302c..776ce66aaa47 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group, int ret; int len = 0; int alloc_len = sizeof(struct inotify_event_info); + struct mem_cgroup *oldmemcg; if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) return 0; @@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group, * trigger OOM killer in the target monitoring memcg as it may have * security repercussion. */ - memalloc_use_memcg(group->memcg); + oldmemcg = memalloc_use_memcg(group->memcg); event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); - memalloc_unuse_memcg(); + memalloc_use_memcg(oldmemcg); if (unlikely(!event)) { /* diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index c49257a3b510..ced06c12daf7 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags) * __GFP_ACCOUNT allocations till the end of the scope will be charged to the * given memcg. * - * NOTE: This function is not nesting safe. - */ -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) -{ - WARN_ON_ONCE(current->active_memcg); - current->active_memcg = memcg; -} - -/** - * memalloc_unuse_memcg - Ends the remote memcg charging scope. + * NOTE: This function can nest. Users must save the return value and + * reset the previous value after their own charging scope is over: * - * This function marks the end of the remote memcg charging scope started by - * memalloc_use_memcg(). + * old = memalloc_use_memcg(memcg); + * // ... allocations ... + * memalloc_use_memcg(old); */ -static inline void memalloc_unuse_memcg(void) +static inline struct mem_cgroup *__must_check +memalloc_use_memcg(struct mem_cgroup *memcg) { - current->active_memcg = NULL; + struct mem_cgroup *old = current->active_memcg; + + current->active_memcg = memcg; + return old; } #else -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) -{ -} - -static inline void memalloc_unuse_memcg(void) +static inline struct mem_cgroup *__must_check +memalloc_use_memcg(struct mem_cgroup *memcg) { + return NULL; } #endif -- 2.24.1