From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D385C001B0 for ; Tue, 8 Aug 2023 06:33:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 105EF6B0071; Tue, 8 Aug 2023 02:33:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 08FD68D0002; Tue, 8 Aug 2023 02:33:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4B548D0001; Tue, 8 Aug 2023 02:33:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CC5036B0071 for ; Tue, 8 Aug 2023 02:33:18 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A22A212067A for ; Tue, 8 Aug 2023 06:33:18 +0000 (UTC) X-FDA: 81099970476.15.DF8EA0E Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf15.hostedemail.com (Postfix) with ESMTP id 4EB91A0015 for ; Tue, 8 Aug 2023 06:33:13 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=DHsGX8t2; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf15.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691476396; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=m1STCrinczuq8FuoUgLs8N4vldOealvmees85dm6MPY=; b=iI4LnmEn9p3IJ8yZikGjW/GcnHRoO9COSF3vnTb+68npnOf4/yYE7XorDBX8EkB3/gzJ9g eintBIXezONzGVQUvuTsKWpdNghXNxj/xG89V7LbYHQNoTVw76teyy1Zbwvz1y/yvFBqam aGkkUqbRsiO44Wa2YvFXmV+k7FVJHNM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=DHsGX8t2; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf15.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691476396; a=rsa-sha256; cv=none; b=Xlj4rBKcboXcQUDAkOFz9KLL+I83C5FQB88bkmooP/eiKGLFvbv2R4LHwt4bg+ajz5HuAg VPZn2vyghZ3Iac25QppyhQcBotX8a1gWxtewzggvOmB378CeHT8uzxS2wybAaMEmIBGuoA VepX6ISGAxkZlCXDP/GZ4gidJuKExIo= Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6873f64a290so1465080b3a.0 for ; Mon, 07 Aug 2023 23:33:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691476392; x=1692081192; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=m1STCrinczuq8FuoUgLs8N4vldOealvmees85dm6MPY=; b=DHsGX8t2DlnQTZ3P5OI9n7d2eKbW76xN4tTGVVJSGRA1IM8FeS8yeKRc7hPuylIUq6 Vn6fFj0Cv1x6zxvq6HdiZeD3sss54GzeWb8hF97LFXrHomN7WJ9YcJ4yAlUPs8k8HgNL jvDOSB+a6LFcWjcXHBVF7wtFqF14I5u1bMgCHhYO6BYyHXF5r6kcqyvIScTKejNj9azo Dlxz0pBdKdmaGIcMiIhQ+2FswwrIXK2CIvDeChsrdQ0sCkYtUwWfiT2WKOfkpBIAGKTy jKW26xAUSVtB349qP1v3gjKBFhibzKHRUcGcdDNMBlsrboMdDTLiQvbJMlgqGN2hazF3 /EiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691476392; x=1692081192; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=m1STCrinczuq8FuoUgLs8N4vldOealvmees85dm6MPY=; b=HeB4Hj4M2kH+WzVIHy6S8WAyMR+DYz8YnqKk8eWcOdWMZwN1Cd4VPLIrGXLjtdExSV v7rFdj+tFEI/3347H6tEPbo3nh5U250OGk/FVPwwWwA990f3CuXRzYCFgIw2Uc7gxqDk aJnfvax6U/sT3GHJ9E1V7ThpN9KlNxp8XHOoR/MgLG46PkNgIm/az+vaUYc3eF4jG+1j IHpv6EFwUCZGYvH+5zDYbRno4cclpQUGuh6ZJac0lT3d78vQ2zpFRepr825Qd3pgUY9e hCz+26tbAdCB3qxvFsHGTplkWAnZhj6yFByCVtaVW5jNUP0Gy7OFk2zWWEiJbh2OHsmp L/9A== X-Gm-Message-State: ABy/qLYEu2PBFl4XIbF9QHlroZdhOC2coL4wJ4CSUXH7i9aA2I6vaD2Y P8IaowpO/YWQ9RFpGd6ype32RQ== X-Google-Smtp-Source: APBJJlEAHd7r+M3vyCUQDMAxEjMveuzSqb/Q6wUuSUNAB0Ns9zs1qKL/YECIY4J9DtcJjA3HpA/s7A== X-Received: by 2002:a05:6a21:998c:b0:13d:1ebf:5dfc with SMTP id ve12-20020a056a21998c00b0013d1ebf5dfcmr38062815pzb.5.1691476391828; Mon, 07 Aug 2023 23:33:11 -0700 (PDT) Received: from [10.70.252.135] ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id ff12-20020a056a002f4c00b0067f2f7eccdcsm7204570pfb.193.2023.08.07.23.32.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Aug 2023 23:33:11 -0700 (PDT) Message-ID: <0e7b16ce-19f9-0c70-4a94-f05cbfee613a@bytedance.com> Date: Tue, 8 Aug 2023 14:32:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH v4 44/48] mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred} To: Dave Chinner Cc: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, simon.horman@corigine.com, dlemoal@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, linux-erofs@lists.ozlabs.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, linux-mtd@lists.infradead.org, rcu@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, Muchun Song References: <20230807110936.21819-1-zhengqi.arch@bytedance.com> <20230807110936.21819-45-zhengqi.arch@bytedance.com> Content-Language: en-US From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4EB91A0015 X-Stat-Signature: u5no6ub8heepqowybqubknmp9ah9ao75 X-HE-Tag: 1691476393-642756 X-HE-Meta: U2FsdGVkX19W4Nss8PUfJ/CMKQ5HtM9h6+aO+BC5KgMS+D4Onmfiq7oQZcUhPShvDryg9xv+LL3HnGT37o6nLr/eAzQqzFjL2uw7dBeEivO2GlBZX1OmGYib9hNyVfWt7JBZzjuXJht6CwnVYrAIF2Msw8RUjUn4qN4TWMM9VuHYW9GgCaPRPTjKWHY9j2pI6Hq6tih7BXwMg5Ix7VPwAQcwsjlGo9oXLYVMZLPw0AQtyMn9zOFTnlzrzymtpQxyfzAYsIsmvlL/26FmciChcoTD9E6mwi86zGxgu4hN0LmC3PeqsVPJKMQ1xFMIqAvNEUG6Wnv3NQmn4ZIzKZOSuUXMTBbMdrnLAsOJEirha8uRTd5sAJ1js2tfDngVEIPtbDx65t2S4QPuYZ/yhAAzKQulpM1z7YLAnFtB4RFUF9OfXuaQwcS3qzuNZEQWUOoKcWm2js1P9ptFJfIzxf4Z5/y+x55lx2MEFEDM0Z7bv30GhsZ6ciRURlNI1DcxdYZqFZ6CV91aqC91P8XsGUGx1INXseyuw92jD1eILH3G2vs2RJtWciR4HNtdxLpy6ZbQrX376X3Hol7xTiOMDLkOQWznB9AYAZCAEA2F8GgScgaRjZAwRvgSHNmbJW9UH8tIR9Z+j8uFKMM93nCsMZgypG7tsYxK7aHMmXhJF8wCDFjODm0TDJtK9JkHURgY1lxX9nyvLw3pBzoTnQ58axeTQU5diJEW9YekBIUpai4pM4A29WR9uBkjMfWvrkR8X23xDw+DHUENhdF+D9EBuHoujMP9ouFkPikY/1O7HlJOzzQdgv9w861CteYSBTWJRd0tHZ2NnhiWFFBCu/i+zanA/09D4KXL4ZxsW8qiLGK8QvtL0N4AY44AXnUBjrsDwNqMIO1PatlXrI1ZH1lcZSudVnv34L708KOLJy9fL292Z0GRVWBsb7kbSzbVcN4/tGCPz08nRxJEwiIVEYFqoRo /zv3MSKV sMhnm2jsdROj5bEuwp/dGHuLxOkGcqRBJUKEXk0xYnZE69RvMkuX569r/yILudFdrs+X+UeyGuy8pAoUyyP/uqAQhvfCC/zSryJ1TUQeDouG7WHd3Mp+f15hCQkX8WfnbQHbuRKvs7OEnEXphMxhvPMkWvoduv2V6OYZTdiehcHhCrUxyLMT2XvHTAd5qzeitAld8PTKPQutvqZ2Zi/Jdl0rkv4qkzgOK4YZ4dqz35LPAWsdJBPXZsvEXMCb+7ZTjh+/HSVjOPpbx3JvNe6H+zQ7upmtfSPz4+0BSIRe25H67+e4WSX7Y5Se7Etf+J/alMkSPD9IpdqhH64oCCVlg09SlTu79to51kgoMpHQwp8zZEEEqXlC/3zCBsYuO+g29cZbPzZhxskvV4sTRuEx0Ap97QTlciZIvCI2PfzxxS2OPn86YbjyRPTU/4u12CXF2TTm5fCoT/jLdVd/4s5+6GgIfrcYzUuBTYTcDclF7sd6GcV3rNSDeTWG+O4WDxOsLyXiDSQn659mT17dEaNwlmmjFMoWYFo6edTlb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Dave, On 2023/8/8 10:12, Dave Chinner wrote: > On Mon, Aug 07, 2023 at 07:09:32PM +0800, Qi Zheng wrote: >> Currently, we maintain two linear arrays per node per memcg, which are >> shrinker_info::map and shrinker_info::nr_deferred. And we need to resize >> them when the shrinker_nr_max is exceeded, that is, allocate a new array, >> and then copy the old array to the new array, and finally free the old >> array by RCU. >> >> For shrinker_info::map, we do set_bit() under the RCU lock, so we may set >> the value into the old map which is about to be freed. This may cause the >> value set to be lost. The current solution is not to copy the old map when >> resizing, but to set all the corresponding bits in the new map to 1. This >> solves the data loss problem, but bring the overhead of more pointless >> loops while doing memcg slab shrink. >> >> For shrinker_info::nr_deferred, we will only modify it under the read lock >> of shrinker_rwsem, so it will not run concurrently with the resizing. But >> after we make memcg slab shrink lockless, there will be the same data loss >> problem as shrinker_info::map, and we can't work around it like the map. >> >> For such resizable arrays, the most straightforward idea is to change it >> to xarray, like we did for list_lru [1]. We need to do xa_store() in the >> list_lru_add()-->set_shrinker_bit(), but this will cause memory >> allocation, and the list_lru_add() doesn't accept failure. A possible >> solution is to pre-allocate, but the location of pre-allocation is not >> well determined. > > So you implemented a two level array that preallocates leaf > nodes to work around it? It's remarkable complex for what it does, Yes, here I have implemented a two level array like the following: +---------------+--------+--------+-----+ | shrinker_info | unit 0 | unit 1 | ... | (secondary array) +---------------+--------+--------+-----+ ^ | +---------------+-----+ | nr_deferred[] | map | (leaf array) +---------------+-----+ (shrinker_info_unit) The leaf array is never freed unless the memcg is destroyed. The secondary array will be resized every time the shrinker id exceeds shrinker_nr_max. > I can't help but think a radix tree using a special holder for > nr_deferred values of zero would end up being simpler... I tried. If the shrinker uses list_lru, then we can preallocate xa node where list_lru_one is pre-allocated. But for other types of shrinkers, the location of pre-allocation is not easy to determine (Such as deferred_split_shrinker). And we can't force all memcg aware shrinkers to use list_lru, so I gave up using xarray and implemented the above two-level array. > >> Therefore, this commit chooses to introduce a secondary array for >> shrinker_info::{map, nr_deferred}, so that we only need to copy this >> secondary array every time the size is resized. Then even if we get the >> old secondary array under the RCU lock, the found map and nr_deferred are >> also true, so no data is lost. > > I don't understand what you are trying to describe here. If we get > the old array, then don't we get either a stale nr_deferred value, > or the update we do gets lost because the next shrinker lookup will > find the new array and os the deferred value stored to the old one > is never seen again? As shown above, the leaf array will not be freed when shrinker_info is expanded, so the shrinker_info_unit can be indexed from both the old and the new shrinker_info->unit[x]. So the updated nr_deferred and map will not be lost. > >> >> [1]. https://lore.kernel.org/all/20220228122126.37293-13-songmuchun@bytedance.com/ >> >> Signed-off-by: Qi Zheng >> Reviewed-by: Muchun Song >> --- > ..... >> diff --git a/mm/shrinker.c b/mm/shrinker.c >> index a27779ed3798..1911c06b8af5 100644 >> --- a/mm/shrinker.c >> +++ b/mm/shrinker.c >> @@ -12,15 +12,50 @@ DECLARE_RWSEM(shrinker_rwsem); >> #ifdef CONFIG_MEMCG >> static int shrinker_nr_max; >> >> -/* The shrinker_info is expanded in a batch of BITS_PER_LONG */ >> -static inline int shrinker_map_size(int nr_items) >> +static inline int shrinker_unit_size(int nr_items) >> { >> - return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long)); >> + return (DIV_ROUND_UP(nr_items, SHRINKER_UNIT_BITS) * sizeof(struct shrinker_info_unit *)); >> } >> >> -static inline int shrinker_defer_size(int nr_items) >> +static inline void shrinker_unit_free(struct shrinker_info *info, int start) >> { >> - return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t)); >> + struct shrinker_info_unit **unit; >> + int nr, i; >> + >> + if (!info) >> + return; >> + >> + unit = info->unit; >> + nr = DIV_ROUND_UP(info->map_nr_max, SHRINKER_UNIT_BITS); >> + >> + for (i = start; i < nr; i++) { >> + if (!unit[i]) >> + break; >> + >> + kvfree(unit[i]); >> + unit[i] = NULL; >> + } >> +} >> + >> +static inline int shrinker_unit_alloc(struct shrinker_info *new, >> + struct shrinker_info *old, int nid) >> +{ >> + struct shrinker_info_unit *unit; >> + int nr = DIV_ROUND_UP(new->map_nr_max, SHRINKER_UNIT_BITS); >> + int start = old ? DIV_ROUND_UP(old->map_nr_max, SHRINKER_UNIT_BITS) : 0; >> + int i; >> + >> + for (i = start; i < nr; i++) { >> + unit = kvzalloc_node(sizeof(*unit), GFP_KERNEL, nid); > > A unit is 576 bytes. Why is this using kvzalloc_node()? Ah, will use kzalloc_node() in the next version. > >> + if (!unit) { >> + shrinker_unit_free(new, start); >> + return -ENOMEM; >> + } >> + >> + new->unit[i] = unit; >> + } >> + >> + return 0; >> } >> >> void free_shrinker_info(struct mem_cgroup *memcg) >> @@ -32,6 +67,7 @@ void free_shrinker_info(struct mem_cgroup *memcg) >> for_each_node(nid) { >> pn = memcg->nodeinfo[nid]; >> info = rcu_dereference_protected(pn->shrinker_info, true); >> + shrinker_unit_free(info, 0); >> kvfree(info); >> rcu_assign_pointer(pn->shrinker_info, NULL); >> } > > Why is this safe? The info and maps are looked up by RCU, so why is > freeing them without a RCU grace period expiring safe? The free_shrinker_info() will be called in alloc_shrinker_info() and mem_cgroup_css_free(). In alloc_shrinker_info(), it will only be called in the error path, so shrinker_info_unit and shrinker_info can be safely freed. In mem_cgroup_css_free(), when we get here, the traversal of this memcg has ended and will not be found again. That is to say, the corresponding shrink_slab() is also over, so shrinker_info_unit and shrinker_info can also be safely freed here. > > Yes, it was safe to do this when it was all under a semaphore, but > now the lookup and use is under RCU, so this freeing isn't > serialised against lookups anymore... > > >> @@ -40,28 +76,27 @@ void free_shrinker_info(struct mem_cgroup *memcg) >> int alloc_shrinker_info(struct mem_cgroup *memcg) >> { >> struct shrinker_info *info; >> - int nid, size, ret = 0; >> - int map_size, defer_size = 0; >> + int nid, ret = 0; >> + int array_size = 0; >> >> down_write(&shrinker_rwsem); >> - map_size = shrinker_map_size(shrinker_nr_max); >> - defer_size = shrinker_defer_size(shrinker_nr_max); >> - size = map_size + defer_size; >> + array_size = shrinker_unit_size(shrinker_nr_max); >> for_each_node(nid) { >> - info = kvzalloc_node(sizeof(*info) + size, GFP_KERNEL, nid); >> - if (!info) { >> - free_shrinker_info(memcg); >> - ret = -ENOMEM; >> - break; >> - } >> - info->nr_deferred = (atomic_long_t *)(info + 1); >> - info->map = (void *)info->nr_deferred + defer_size; >> + info = kvzalloc_node(sizeof(*info) + array_size, GFP_KERNEL, nid); >> + if (!info) >> + goto err; >> info->map_nr_max = shrinker_nr_max; >> + if (shrinker_unit_alloc(info, NULL, nid)) >> + goto err; > > That's going to now do a lot of small memory allocation when we have > lots of shrinkers active.... > >> @@ -150,17 +175,34 @@ static int expand_shrinker_info(int new_id) >> return ret; >> } >> >> +static inline int shriner_id_to_index(int shrinker_id) > > shrinker_id_to_index Will fix. > >> +{ >> + return shrinker_id / SHRINKER_UNIT_BITS; >> +} >> + >> +static inline int shriner_id_to_offset(int shrinker_id) > > shrinker_id_to_offset Will fix. > >> +{ >> + return shrinker_id % SHRINKER_UNIT_BITS; >> +} > > .... >> @@ -209,26 +251,31 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker, >> struct mem_cgroup *memcg) >> { >> struct shrinker_info *info; >> + struct shrinker_info_unit *unit; >> >> info = shrinker_info_protected(memcg, nid); >> - return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); >> + unit = info->unit[shriner_id_to_index(shrinker->id)]; >> + return atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker->id)], 0); >> } >> >> static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker, >> struct mem_cgroup *memcg) >> { >> struct shrinker_info *info; >> + struct shrinker_info_unit *unit; >> >> info = shrinker_info_protected(memcg, nid); >> - return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); >> + unit = info->unit[shriner_id_to_index(shrinker->id)]; >> + return atomic_long_add_return(nr, &unit->nr_deferred[shriner_id_to_offset(shrinker->id)]); >> } >> >> void reparent_shrinker_deferred(struct mem_cgroup *memcg) >> { >> - int i, nid; >> + int nid, index, offset; >> long nr; >> struct mem_cgroup *parent; >> struct shrinker_info *child_info, *parent_info; >> + struct shrinker_info_unit *child_unit, *parent_unit; >> >> parent = parent_mem_cgroup(memcg); >> if (!parent) >> @@ -239,9 +286,13 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg) >> for_each_node(nid) { >> child_info = shrinker_info_protected(memcg, nid); >> parent_info = shrinker_info_protected(parent, nid); >> - for (i = 0; i < child_info->map_nr_max; i++) { >> - nr = atomic_long_read(&child_info->nr_deferred[i]); >> - atomic_long_add(nr, &parent_info->nr_deferred[i]); >> + for (index = 0; index < shriner_id_to_index(child_info->map_nr_max); index++) { >> + child_unit = child_info->unit[index]; >> + parent_unit = parent_info->unit[index]; >> + for (offset = 0; offset < SHRINKER_UNIT_BITS; offset++) { >> + nr = atomic_long_read(&child_unit->nr_deferred[offset]); >> + atomic_long_add(nr, &parent_unit->nr_deferred[offset]); >> + } >> } >> } >> up_read(&shrinker_rwsem); >> @@ -407,7 +458,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >> { >> struct shrinker_info *info; >> unsigned long ret, freed = 0; >> - int i; >> + int offset, index = 0; >> >> if (!mem_cgroup_online(memcg)) >> return 0; >> @@ -419,56 +470,63 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >> if (unlikely(!info)) >> goto unlock; >> >> - for_each_set_bit(i, info->map, info->map_nr_max) { >> - struct shrink_control sc = { >> - .gfp_mask = gfp_mask, >> - .nid = nid, >> - .memcg = memcg, >> - }; >> - struct shrinker *shrinker; >> + for (; index < shriner_id_to_index(info->map_nr_max); index++) { >> + struct shrinker_info_unit *unit; > > This adds another layer of indent to shrink_slab_memcg(). Please > factor it first so that the code ends up being readable. Doing that > first as a separate patch will also make the actual algorithm > changes in this patch be much more obvious - this huge hunk of > diff is pretty much impossible to review... OK, I will send this patch together with PATCH v4 01/02/03/43 as a single cleanup patchset. Thanks, Qi > > -Dave.