From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D43BAC433B4 for ; Mon, 19 Apr 2021 01:24:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 47D6F60FEA for ; Mon, 19 Apr 2021 01:24:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47D6F60FEA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BCE226B0036; Sun, 18 Apr 2021 21:24:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B7D016B006E; Sun, 18 Apr 2021 21:24:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1DB96B0070; Sun, 18 Apr 2021 21:24:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0165.hostedemail.com [216.40.44.165]) by kanga.kvack.org (Postfix) with ESMTP id 7CED76B0036 for ; Sun, 18 Apr 2021 21:24:03 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 26DD0180AD820 for ; Mon, 19 Apr 2021 01:24:03 +0000 (UTC) X-FDA: 78047370366.24.B346BFA Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by imf02.hostedemail.com (Postfix) with ESMTP id 6400940002C8 for ; Mon, 19 Apr 2021 01:23:41 +0000 (UTC) Received: from dread.disaster.area (pa49-181-239-12.pa.nsw.optusnet.com.au [49.181.239.12]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id CF2691AF1B0; Mon, 19 Apr 2021 11:23:58 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1lYIdo-00EcnL-KJ; Mon, 19 Apr 2021 11:23:56 +1000 Date: Mon, 19 Apr 2021 11:23:56 +1000 From: Dave Chinner To: Bharata B Rao Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, aneesh.kumar@linux.ibm.com Subject: Re: High kmalloc-32 slab cache consumption with 10k containers Message-ID: <20210419012356.GZ1990290@dread.disaster.area> References: <20210405054848.GA1077931@in.ibm.com> <20210406222807.GD1990290@dread.disaster.area> <20210416044439.GB1749436@in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210416044439.GB1749436@in.ibm.com> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Tu+Yewfh c=1 sm=1 tr=0 cx=a_idp_f a=gO82wUwQTSpaJfP49aMSow==:117 a=gO82wUwQTSpaJfP49aMSow==:17 a=kj9zAlcOel0A:10 a=3YhXtTcJ-WEA:10 a=7-415B0cAAAA:8 a=ScFYv971vT0PkSOGSnAA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6400940002C8 X-Stat-Signature: ek5jiyhfrmyg45uksas7sdn7mnpjuh45 Received-SPF: none (fromorbit.com>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from=""; helo=mail108.syd.optusnet.com.au; client-ip=211.29.132.59 X-HE-DKIM-Result: none/none X-HE-Tag: 1618795421-204183 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 16, 2021 at 10:14:39AM +0530, Bharata B Rao wrote: > On Wed, Apr 07, 2021 at 08:28:07AM +1000, Dave Chinner wrote: > > On Mon, Apr 05, 2021 at 11:18:48AM +0530, Bharata B Rao wrote: > > > > > As an alternative approach, I have this below hack that does lazy > > > list_lru creation. The memcg-specific list is created and initialized > > > only when there is a request to add an element to that particular > > > list. Though I am not sure about the full impact of this change > > > on the owners of the lists and also the performance impact of this, > > > the overall savings look good. > > > > Avoiding memory allocation in list_lru_add() was one of the main > > reasons for up-front static allocation of memcg lists. We cannot do > > memory allocation while callers are holding multiple spinlocks in > > core system algorithms (e.g. dentry_kill -> retain_dentry -> > > d_lru_add -> list_lru_add), let alone while holding an internal > > spinlock. > > > > Putting a GFP_ATOMIC allocation inside 3-4 nested spinlocks in a > > path we know might have memory demand in the *hundreds of GB* range > > gets an NACK from me. It's a great idea, but it's just not a > > I do understand that GFP_ATOMIC allocations are really not preferrable > but want to point out that the allocations in the range of hundreds of > GBs get reduced to tens of MBs when we do lazy list_lru head allocations > under GFP_ATOMIC. That does not make GFP_ATOMIC allocations safe or desirable. In general, using GFP_ATOMIC outside of interrupt context indicates something is being done incorrectly. Especially if it can be triggered from userspace, which is likely in this particular case... > As shown earlier, this is what I see in my experimental setup with > 10k containers: > > Number of kmalloc-32 allocations > Before During After > W/o patch 178176 3442409472 388933632 > W/ patch 190464 468992 468992 SO now we have an additional half million GFP_ATOMIC allocations when we currently have none. That's not an improvement, that rings loud alarm bells. > This does really depend and vary on the type of the container and > the number of mounts it does, but I suspect we are looking > at GFP_ATOMIC allocations in the MB range. Also the number of > GFP_ATOMIC slab allocation requests matter I suppose. They are slab allocations, which mean every single one of them could require a new slab backing page (pages!) to be allocated. Hence the likely memory demand might be a lot higher than the optimal case you are considering here... > There are other users of list_lru, but I was just looking at > dentry and inode list_lru usecase. It appears to me that for both > dentry and inode, we can tolerate the failure from list_lru_add > due to GFP_ATOMIC allocation failure. The failure to add dentry > or inode to the lru list means that they won't be retained in > the lru list, but would be freed immediately. Is this understanding > correct? No. Both retain_dentry() and iput_final() would currently leak objects that fail insertion into the LRU. They don't check for insertion success at all. But, really, this is irrelevant - GFP_ATOMIC usage is the problem, and allowing it to fail doesn't avoid the problems that unbound GFP_ATOMIC allocation can have on the stability of the rest of the system when low on memory. Being able to handle a GFP_ATOMIC memory allocation failure doesn't change the fact that you should not be doing GFP_ATOMIC allocation in the first place... Cheers, Dave. -- Dave Chinner david@fromorbit.com