From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0615EC433EF for ; Wed, 16 Mar 2022 03:09:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63C628D0002; Tue, 15 Mar 2022 23:09:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5EC068D0001; Tue, 15 Mar 2022 23:09:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B38B8D0002; Tue, 15 Mar 2022 23:09:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 38EFF8D0001 for ; Tue, 15 Mar 2022 23:09:01 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E37FCD57 for ; Wed, 16 Mar 2022 03:09:00 +0000 (UTC) X-FDA: 79248767640.04.66C7310 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf30.hostedemail.com (Postfix) with ESMTP id 7FCBE8000F for ; Wed, 16 Mar 2022 03:08:58 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0V7KSVpu_1647400131; Received: from B-P7TQMD6M-0146.local(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0V7KSVpu_1647400131) by smtp.aliyun-inc.com(127.0.0.1); Wed, 16 Mar 2022 11:08:53 +0800 Date: Wed, 16 Mar 2022 11:08:51 +0800 From: Gao Xiang To: Dave Chinner Cc: Roman Gushchin , Matthew Wilcox , Stephen Brennan , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Gautham Ananthakrishna , khlebnikov@yandex-team.ru Subject: Re: [LSF/MM TOPIC] Better handling of negative dentries Message-ID: Mail-Followup-To: Dave Chinner , Roman Gushchin , Matthew Wilcox , Stephen Brennan , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Gautham Ananthakrishna , khlebnikov@yandex-team.ru References: <20220316025223.GR661808@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220316025223.GR661808@dread.disaster.area> X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7FCBE8000F X-Stat-Signature: babd5h15z5k48py1ezstnj8ear38wncg Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf30.hostedemail.com: domain of hsiangkao@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=hsiangkao@linux.alibaba.com X-Rspam-User: X-HE-Tag: 1647400138-697556 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000026, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 16, 2022 at 01:52:23PM +1100, Dave Chinner wrote: > On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote: > > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote: > > > > > > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox wrote: > > > > > > > > The number of negative dentries is effectively constrained only by memory > > > > size. Systems which do not experience significant memory pressure for > > > > an extended period can build up millions of negative dentries which > > > > clog the dcache. That can have different symptoms, such as inotify > > > > taking a long time [1], high memory usage [2] and even just poor lookup > > > > performance [3]. We've also seen problems with cgroups being pinned > > > > by negative dentries, though I think we now reparent those dentries to > > > > their parent cgroup instead. > > > > > > Yes, it should be fixed already. > > > > > > > > > > > We don't have a really good solution yet, and maybe some focused > > > > brainstorming on the problem would lead to something that actually works. > > > > > > I’d be happy to join this discussion. And in my opinion it’s going beyond negative dentries: there are other types of objects which tend to grow beyond any reasonable limits if there is no memory pressure. > > > > +1, we once had a similar issue as well, and agree that is not only > > limited to negative dentries but all too many LRU-ed dentries and inodes. > > Yup, any discussion solely about managing buildup of negative > dentries doesn't acknowledge that it is just a symptom of larger > problems that need to be addressed. > > > Limited the total number may benefit to avoid shrink spiking for servers. > > No, we don't want to set hard limits on object counts - that's just > asking for systems that need frequent hand tuning and are impossible > to get right under changing workloads. Caches need to auto size > according to workload's working set to find a steady state balance, > not be bound by artitrary limits. > > But even cache sizing isn't the problem here - it's just another > symptom. > > > > A perfect example when it happens is when a machine is almost > > > idle for some period of time. Periodically running processes > > > creating various kernel objects (mostly vfs cache) which over > > > time are filling significant portions of the total memory. And > > > when the need for memory arises, we realize that the memory is > > > heavily fragmented and it’s costly to reclaim it back. > > Yup, the underlying issue here is that memory reclaim does nothing > to manage long term build-up of single use cached objects when > *there is no memory pressure*. There's of idle time and spare > resources to manage caches sanely, but we don't. e.g. there is no > periodic rotation of caches that could lead to detection and reclaim > of single use objects (say over a period of minutes) and hence > prevent them from filling up all of memory unnecessarily and > creating transient memory reclaim and allocation latency spikes when > memory finally fills up. > > IOWs, negative dentries getting out of hand and shrinker spikes are > both a symptom of the same problem: while memory allocation is free, > memory reclaim does nothing to manage cache aging. Hence we only > find out we've got a badly aged cache when we finally realise > it has filled all of memory, and then we have heaps of work to do > before memory can be made available for allocation again.... > > And then if you're going to talk memory reclaim, the elephant in the > room is the lack of integration between shrinkers and the main > reclaim infrastructure. There's no priority determination, there's > no progress feedback, there's no mechanism to allow shrinkers to > throttle reclaim rather than have the reclaim infrastructure wind up > priority and OOM kill when a shrinker cannot make progress quickly, > etc. Then there's direct reclaim hammering shrinkers with unbound > concurrency so individual shrinkers have no chance of determining > how much memory pressure there really is by themselves, not to > mention the lock contention problems that unbound reclaim > concurrency on things like LRU lists can cause. And, of course, > memcg based reclaim is still only tacked onto the side of the > shrinker infrastructure... Yeah, it's really a generic problem between objects and shrinkers. Some intelligent detection and feedback loop (even without memory pressure) would be much better than hardcoded numbers. Actually I remembered such topic has been raised for times, hoping for some progress.. Thanks, Gao Xiang > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com