From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19269C4345F for ; Sun, 14 Apr 2024 23:59:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 103E66B0082; Sun, 14 Apr 2024 19:59:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B3DE6B0083; Sun, 14 Apr 2024 19:59:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E973C6B0085; Sun, 14 Apr 2024 19:59:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CBFDA6B0082 for ; Sun, 14 Apr 2024 19:59:26 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3E8581203AA for ; Sun, 14 Apr 2024 23:59:26 +0000 (UTC) X-FDA: 82009806732.18.C8ED3D6 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by imf11.hostedemail.com (Postfix) with ESMTP id 460C34000F for ; Sun, 14 Apr 2024 23:59:24 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=b4LpJYIj; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf11.hostedemail.com: domain of david@fromorbit.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713139164; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CGV27/0yG6UjQSbUj317jOwq2KsNUae71ElB8pdHdTg=; b=OtM4xVoVLwYaft2+h2DqcYx5kiJBx5TSucowjPVB4/DqKKef9Vk10kDOQjCqVwLgMnl+BA 9nUJXrnIf9N0fcmBngcjuSOsDj+RSXaUjqpQTHCOHfwuTZXOgjpAS5yppdpZNCPM6MAc/J 9o5vSbhhQzC+yS2NPnGM710JPhZZSJ8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=b4LpJYIj; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf11.hostedemail.com: domain of david@fromorbit.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713139164; a=rsa-sha256; cv=none; b=wI0QbJ59AnQWH1cSndwbX4MC1AStIwfpxHKtEDJxJ0GsrXfJqXw1/+R+syM1xynutL3dgt dn6CsdOSf3jjAlM7yUWKdTAW2LtfWA5GKD+jcIaQE0qofe6L3szU+cisgzizDsRa9VYDgu kbZegUBi9iMBLOQNk5o0jkWpfID/wdw= Received: by mail-pg1-f178.google.com with SMTP id 41be03b00d2f7-5ce2aada130so1644350a12.1 for ; Sun, 14 Apr 2024 16:59:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1713139163; x=1713743963; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=CGV27/0yG6UjQSbUj317jOwq2KsNUae71ElB8pdHdTg=; b=b4LpJYIj+UUD1X57rgU6W0kMrCwZzXYbNvPHWuvpY6grOzITEkq4mhMml0f7cwvvnG 4JgWpfyRiJn2BqU3HDG/6pgbQZdNH/IbdEXdp9UgLhMwUWMl2EN4YT2imjbbf6pV2w1/ wQ4YdwtOPCjOS9CIdOKpaPwKkS1Lusgcckr3e5L19XpygHWIpo1D344Low2R7O8aB9Ns 7tSpwoHNLIAgMI4BGZkUmT9Z1BkbVto0U7iv2BkFndTLhhlDIySnYT+jx0qZL99vZrcD O0iUpovwX6Qv80veUjqSxeenQeLV5E9Ee1Li80/h/OIeGCcCFLSgFoKvfVCCK1GyLUGF gVjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713139163; x=1713743963; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CGV27/0yG6UjQSbUj317jOwq2KsNUae71ElB8pdHdTg=; b=txDc0+OAgPt6GD2lwjyCY7XR2mMosHlBok2/AJ/+TX3QKv1Dk4epfYSataHlCFu6Zv tx1HXpRWAQmKkxz6YzpilpuFywALO4SXZNGxe0wdjBAwNSXsj1DnvrX+A9GoSgc4NlPX z83ep2hdWNr5G8RxHLPSE1i+ZN3LlrE93jE1y9wUqku0rPXSYB+kJXOojMYpZBugs/bZ iL3iipHIAD8sFnXPoVu51OfwPrCBKw16FW9yTUgWQTjK9pQKMCMKlAr/Bc52+jUlofMw s7Vfs+dfrT/eL9vgBe+HUlk21d0ut3YZBhHIv0zzqIVt45koQnap4RfPSbboceQx7+YL gWmw== X-Forwarded-Encrypted: i=1; AJvYcCXsD2x69GDkfWID5eqoazn1N6sfb0IXTWqyzFmjMsKUGNrhvgjgyPNi/jkv7PJf+vbb58vUYjnlBdAbWUpUk/wW/6U= X-Gm-Message-State: AOJu0YxJpzx0z91B5YX4C/WrwohVM9T1stQDnU6vFdBx7zWRwEqDivGa JyZzNIOGIYxOuSGyVWwQt1oWQ/37Cc47KuVt8HWtv1U7JnR5zoQT43P86Y8pnsk= X-Google-Smtp-Source: AGHT+IHFHu5EI4F4Qvvr4K46o+uEyBuZTAyTZ/KZLyxcZYs/+Nc6uhWqc94xlNok0j1ktZdLHFsdVQ== X-Received: by 2002:a17:903:4282:b0:1e4:60d4:916b with SMTP id ju2-20020a170903428200b001e460d4916bmr7028967plb.64.1713139162823; Sun, 14 Apr 2024 16:59:22 -0700 (PDT) Received: from dread.disaster.area (pa49-181-56-237.pa.nsw.optusnet.com.au. [49.181.56.237]) by smtp.gmail.com with ESMTPSA id b9-20020a170902d50900b001dee4a22c2bsm6580711plg.34.2024.04.14.16.59.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Apr 2024 16:59:22 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rw9kk-00FqSa-30; Mon, 15 Apr 2024 09:59:18 +1000 Date: Mon, 15 Apr 2024 09:59:18 +1000 From: Dave Chinner To: lipeifeng@oppo.com Cc: akpm@linux-foundation.org, zhengqi.arch@bytedance.com, roman.gushchin@linux.dev, muchun.song@linux.dev, 21cnbao@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm/shrinker: add SHRINKER_NO_DIRECT_RECLAIM Message-ID: References: <20240413015410.30951-1-lipeifeng@oppo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240413015410.30951-1-lipeifeng@oppo.com> X-Rspam-User: X-Stat-Signature: 6gifaka1u8ur5bzs6swyyznpboufccoo X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 460C34000F X-HE-Tag: 1713139164-808535 X-HE-Meta: U2FsdGVkX18Rc0kYkTmB57JLlae7VKpl8FF1E+akRYuBd1X6ExQEqwbfc/HdWNR0ETJekfqRoDXYkpci5m5y7I1HzDU5AEk3X/S15QZSZebsbdXnOdRS/DjkvBzK140INrs+tisN2VpU1UXdw+zzUIEO0Zmjg2YewO/8ZGETBSQrbmxVtnbaIvKVvOnJm5jqSOdB3dk2Z79hcaGiFOuFCfpj/R0AsxDs2/US6R3K8YePtPuh5hKitzVxfmzZDB8LzHWIi+NoRoOtGfcky/UpnjS3H7tMk0CgEd2tR5RByIVVh6ZRBbZGhF7FIy8thmyVAbwXe+H7GYfDqNzzTdXdy6+F41nLxKeLrduTVM4m84lS9gf12RIKgEwCY+SUu3qWdYoXTXVH/etoGiGVV6srGuXv5cqdw0XUGO7MQU7sl6nXY3w2ber3j0FCxYyYuND2CAsGlO3Iw89P7flZYUOtXIod+4yVF+b6R0mePFapI8owm2EkVyTVzpVb6SGjrxuKXKz/FAHih9MVXl78CDzh8jn6cp2bApA1KLrV+j9IPYd7Ih81fN4ry/4ny8FpFsPNru/ClElPAC6VQqyN2evR3D3/zUqRZVN2FCB+raQn2hKchPNxoUGwrA6mon8Ms6iIVYH5PXhgydUyc8k8ID7+GwLGYLYlUUmhVOtR0BwgSLy2BY9sbZJwPHQtXEuLvImHMtbJCmUHgtq5efjZupxK/sSWsu3NGdXgotRyRg7totz1zcz7lURTVxTNIMUdCDCgMEIOMMhhhFz5aaX6tT2ra477Yd6bJxgUrSm2IaMsTAOjmlvDYyDUHRfn/n5cj3m6LT33u/07UpCFwBgk3RexIrHZ2Jaj7ZuN7idTYR9YZ0jliryycaN+oYkJyWRO/JzOu100mrhFhiCmHjAM0SVSuzsNRmFKo7K6pqe0Koz4cp93sfWFgx40zsj69o709HRJCbCa2CQazQAf5TfJt1C BzfNU1hX mpvssdY03QxzfJQIViBPz041mIrTokWUbj5SPVNQxgEk1G13uI5JoKz+EZd9gV7+hCNUsAMkB+xpmXtw7SEt2qmt81e2kOB8m3Rq6y2Qu0SRy1d9+Dh7jLRa5GZd8XnQm8+3SzSS/D0d1QghcLObewBYeGOa5iG2oGhjE0J70cb51ubNM/8czb02YQP1F57CdxTEL/oW1lSQZQI1SCZ8YCvvVqv6ULRFGQtOuPxayh+vTSho5tl9IFKHzdX9aq9x3I0ccoWym4485dODBhWP714Z6XC6ARXiI7f299x/AayE+qzQHC67klxPrDWmYFHN3YWRpsVqgAivfqmqjYWDaINBsloazWPcOeTe6uCGPu+Fn8CY09L1hL9GRXJ/ia13rmS6s10lWXb1AanUro4ySX3nEvL2RyNuwsep9PG5pCWUB6Imji0JEOnjRk5RG2spQMLmzjfM9mTDkeeIuWkW4ptnOPS7VfSKjNtgARDex2An1fL5j4B6vJZAzJZluRnIL2g01pTKlBi0yl77744Nvk43foVzBOyekCSdqPHVeKH/FwKc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Apr 13, 2024 at 09:54:10AM +0800, lipeifeng@oppo.com wrote: > From: Peifeng Li > > In the case of insufficient memory, threads will be in direct_reclaim to > reclaim memory, direct_reclaim will call shrink_slab to run sequentially > each shrinker callback. If there is a lock-contention in the shrinker > callback,such as spinlock,mutex_lock and so on, threads may be likely to > be stuck in direct_reclaim for a long time, even if the memfree has reached > the high watermarks of the zone, resulting in poor performance of threads. That's always been a problem. That's a shrinker implementation problem, not a shrinker infrastructure problem. > Example 1: shrinker callback may wait for spinlock > static unsigned long mb_cache_shrink(struct mb_cache *cache, > unsigned long nr_to_scan) > { > struct mb_cache_entry *entry; > unsigned long shrunk = 0; > > spin_lock(&cache->c_list_lock); > while (nr_to_scan-- && !list_empty(&cache->c_list)) { > entry = list_first_entry(&cache->c_list, > struct mb_cache_entry, e_list); > if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || > atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { > clear_bit(MBE_REFERENCED_B, &entry->e_flags); > list_move_tail(&entry->e_list, &cache->c_list); > continue; > } > list_del_init(&entry->e_list); > cache->c_entry_count--; > spin_unlock(&cache->c_list_lock); > __mb_cache_entry_free(cache, entry); > shrunk++; > cond_resched(); > spin_lock(&cache->c_list_lock); > } > spin_unlock(&cache->c_list_lock); > > return shrunk; > } Yeah, we learnt a -long- time ago that using global locks in shrinkers that have -unbounded concurrency- is a really bad idea. This is just a poorly implemented shrinker implemenation because it doesn't take into account memory reclaim concurrency. This is, for example, why list_lru exists is tightly tied into the SHRINKER_NUMA_AWARE infrastructure - it gets rid of the need for global locks in reclaim lists that shrinkers traverse. > Example 2: shrinker callback may wait for mutex lock > static > unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s, > struct shrink_control *sc) > { > struct kbase_context *kctx; > struct kbase_mem_phy_alloc *alloc; > struct kbase_mem_phy_alloc *tmp; > unsigned long freed = 0; > > kctx = container_of(s, struct kbase_context, reclaim); > > // MTK add to prevent false alarm > lockdep_off(); That's just -broken-. If shrinkers are called from a context that they can't take locks because they might deadlock, then they must either use trylocks and abort (i.e. SHRINK_STOP) or use context flags provided by the allocation context (e.g. GFP_NOFS, memalloc_nofs_save()) to tell reclaim that context specific subsystem locks are held and the shrinker should not attempt to take them and/or run in this context. > mutex_lock(&kctx->jit_evict_lock); That's also wrong. Shrinkers must be non-blocking, otherwise the cause memory reclaim latencies that will result in unpredicatable memory allocation latencies and that makes anyone running applications with latency specific SLAs very unhappy. IOWs, this is a subsystem shrinker that is very poorly implemented and needs to be fixed before we do anything else. > In mobile-phone,threads are likely to be stuck in shrinker callback during > direct_reclaim, with example like the following: > <...>-2806 [004] ..... 866458.339840: mm_shrink_slab_start: > dynamic_mem_shrink_scan+0x0/0xb8 ... priority 2 > <...>-2806 [004] ..... 866459.339933: mm_shrink_slab_end: > dynamic_mem_shrink_scan+0x0/0xb8 ... Yup, that's exactly the problem with blocking shrinkers - they can screw the whole system over because it stops memory allocation in it's tracks. Shrinkers must be non-blocking. > For the above reason, the patch introduces SHRINKER_NO_DIRECT_RECLAIM that > allows driver to set shrinker callback not to be called in direct_reclaim > unless sc->priority is 0. No, that's fundamentally flawed, too. Firstly, it doesn't avoid deadlocks, nor does it avoid lock contention under heavy memory pressure - it just hides these problems until we are critically low on memory. Which will happen much faster, because we aren't reclaiming memory from caches that hold memory that needs to be reclaimed. This isn't good. Further, it bypasses the mechanism we use to defer the shrinker work to a context where it can be executed safely (i.e. kswapd). Shrinkers that cannot run in the current context are supposed to return SHRINK_STOP to tell the shrink_slab infrastructure to accumulate the work for the next context that can run the reclaim rather than execute it. This allows kswapd to do the reclaim work instead of direct reclaim. It also ensures that all the memory pressure being applied to the shrinkers is actually actioned so we keep all the caches and memory usage in relative balance. IOWs, the choice of running the shrinker or not is controlled by two things: 1. the shrinker implementation itself, and 2. the reclaim context flags provided by the allocation that needs reclaim to be performed. Long story short: if a shrinker is causing direct reclaim problems because of poor locking design, latency and/or context specific deadlocks, then the subsystem and it's shrinker needs to be fixed. We should not be skipping direct reclaim just because a shrinker is really poorly implemented. -Dave. -- Dave Chinner david@fromorbit.com