From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 977EDC7EE23 for ; Sat, 25 Feb 2023 21:28:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A31B16B0071; Sat, 25 Feb 2023 16:28:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E2A26B0073; Sat, 25 Feb 2023 16:28:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A9C86B0074; Sat, 25 Feb 2023 16:28:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 76D046B0071 for ; Sat, 25 Feb 2023 16:28:44 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 379B41A0247 for ; Sat, 25 Feb 2023 21:28:44 +0000 (UTC) X-FDA: 80507103768.13.6DF3F7E Received: from forward500b.mail.yandex.net (forward500b.mail.yandex.net [178.154.239.144]) by imf13.hostedemail.com (Postfix) with ESMTP id A9EC52000E for ; Sat, 25 Feb 2023 21:28:41 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=ya.ru header.s=mail header.b=vm+H5gme; spf=pass (imf13.hostedemail.com: domain of tkhai@ya.ru designates 178.154.239.144 as permitted sender) smtp.mailfrom=tkhai@ya.ru; dmarc=pass (policy=none) header.from=ya.ru ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677360522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SHc+R/cTRGNnSQKTHxpZJZ2zZcUrDIgedP9JZjwRsMM=; b=72mcVPb5qSFT3GdqbIf9JomTAxS9zwl1/fbwpXIvBUwMC6nQ38ajHeoRBLttxC+QDQ46fZ DmijS9Lk+5wGOwDGjXFZ2/nstuaUCuXAXN84j8U1h7jWTfWz68zpi08FTWe5f685uOpR3y rzzEvh30/CgcPHULnqS+fx/Xb/bVUH8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=ya.ru header.s=mail header.b=vm+H5gme; spf=pass (imf13.hostedemail.com: domain of tkhai@ya.ru designates 178.154.239.144 as permitted sender) smtp.mailfrom=tkhai@ya.ru; dmarc=pass (policy=none) header.from=ya.ru ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677360522; a=rsa-sha256; cv=none; b=FSYVug/qv+AM2qbpXlle8xvI+CQfcJk9b1D7xyFt8sdlZdvTVrvuwSrqkbv44IZMXRBfZp t12wGqCu/lkhFOcoSUYH5hn7uUkR9rOGIaXYsnHM7SyMvwHCrYSSDE//tWIkbHfprSkaKl CP+CBqdxfDHfuecJlgxE2qsGxWW9xiM= Received: from sas8-838e1e461505.qloud-c.yandex.net (sas8-838e1e461505.qloud-c.yandex.net [IPv6:2a02:6b8:c1b:28d:0:640:838e:1e46]) by forward500b.mail.yandex.net (Yandex) with ESMTP id B688E5EA02; Sun, 26 Feb 2023 00:28:38 +0300 (MSK) Received: by sas8-838e1e461505.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id WSZIpPQaBeA1-JjOV2dp5; Sun, 26 Feb 2023 00:28:36 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ya.ru; s=mail; t=1677360516; bh=SHc+R/cTRGNnSQKTHxpZJZ2zZcUrDIgedP9JZjwRsMM=; h=From:In-Reply-To:Cc:Date:References:To:Subject:Message-ID; b=vm+H5gmeFIFVO1g601VkmuMwvWq2lerqoWIWIJe+1mVEdodWzFpQsZVZfx50L02NJ yllPTGOqLLMyFMv3YRi0jzqE/AyX12e2dO+DcGpUnyJJaA8I10pqvH2aLf6bM/W6Bs WrwiUjYsDVSu/myvPzvb5VKQ+mItChHR+lic+4QY= Message-ID: <8dab3b27-7282-f8bc-7d04-ca63c9b872cf@ya.ru> Date: Sun, 26 Feb 2023 00:28:31 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v2 2/7] mm: vmscan: make global slab shrink lockless To: Qi Zheng , Sultan Alsawaf Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com, dave@stgolabs.net, penguin-kernel@i-love.sakura.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230223132725.11685-1-zhengqi.arch@bytedance.com> <20230223132725.11685-3-zhengqi.arch@bytedance.com> <8049b6ed-435f-b518-f947-5516a514aec2@bytedance.com> <1aa70926-af39-0ce1-ae23-d86deb74d1c6@ya.ru> <74c4cf95-9506-98b3-9fc0-0814f63d5d7f@bytedance.com> <5663b349-8f6f-874a-eb9b-63d3179dcab7@ya.ru> <2ba86f45-f0a5-3a85-4aa6-f8beb50491b3@bytedance.com> Content-Language: en-US From: Kirill Tkhai In-Reply-To: <2ba86f45-f0a5-3a85-4aa6-f8beb50491b3@bytedance.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: zekgpmsjn67odhcjw1t7rxzt8yay4kuw X-Rspam-User: X-Rspamd-Queue-Id: A9EC52000E X-Rspamd-Server: rspam06 X-HE-Tag: 1677360521-74930 X-HE-Meta: U2FsdGVkX1/1Qq2TM0zddKokPXvWrDkq7Q++GxSgGfLZAWuDzpO8vL8sx1330rdJbvtAB43C0s6FajeBE2b+q0q8RwgF60AqOEYKciBZqh+V6lFBkdzo6FjpaFU5dnme176PcfEayhxVLWUS//rssM8n1++b/IL660D1E8+sXEdKKhnHoKQvtPiCSZwLR0NYUPEmpMAGLGPMes7HsYCaeKt7FRTxzV9/lEp4jQxDPkTEVJCrT/l9a0JfLSDSjDSqfq/kdZjco+AuuTV3/4KxaKJHoH5Rm4Oa373f9VsbHzPoTcOh6ZJi8ELdh/lgxUlhK5HFISxoHwgzT/6Z5eE4+5xhgXcYYguNDjdx/GX3c9hoHdgoOcpREjgpkj/MbjOSd+nn4Cmctg/r561NGhdURuAlSW9BEL7imIugkt9vuA3Nnx3bIdphpjA3ot4cBkIyn4b80lRMW02GbZJ4jIsLRivbHWejhqBhR8tw0DJut9fY46ostd9YcRHWHWrlutcEQ/M9YTwK6pczsvyd9DV93whkwezl1lyQb8Gs94uChtkBkrF5SXdawNSCZbMrymYg66PFmeI5qfjj8Bmm52BlhE1HEbnoX5NM/0YzwTI8kpN6kAI9ibkPc3Xzf4YbkSPTwPcwhvePK48YEIFNLO3HfXc0Rbu3zfLhgHFyMsuLi1mhWDMAK1Jbo7hIaFYEv2JlmimLNoC0gvg1crtT2JIjibWYq3/XCPIhoMCs6YyYCrY6m1X5xUMi9xk7cNINFQAiSI+rU2oNli3Eo0xKT4p9a/H79Fgy/VD+8p9qDVzANSnNfk2hDvL52A4uixv0z7Z25EmEJ3ZTVrT3hnhfRX7sr0/F6zI8bmzBxGEstLSqZ5ixce+UKHakvNABsFHEyEqAYptSkaYp9GK3iv0veXc9M0O9LmqwNoMnzmj/nDkovIcarw+xLjIMRn6ZVfszriQNAeRlnRiKUlM1ubmlXko wOQrpJag Alcs/Ra5vPHWQnVH7atE9u+vW+BwtPaqzreDSdK8ocfoqSEPRa1FKrd+V2PW4MvzNNcrGNUHCby0ox1HntYVl9EK5jV2ECZpMAtgLqNcFcD+ZHdjOuRc7cMftj5161dornFbC+X8KVy5H/tfAwn8Ozn8RmXmq906XfcDMxrPu6h69FGW5LYHo3C3g4FE91dJ7do2ofXGhYO1vWnVtlJexbWT4wOgGod/ahSigtvJojb0BHw8kGb2Gkij22fIk3J/3PjCf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 25.02.2023 19:37, Qi Zheng wrote: > > > On 2023/2/26 00:17, Kirill Tkhai wrote: >> On 25.02.2023 18:57, Qi Zheng wrote: >>> > <...> >>> How about this? >>>>> >>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>> index ffddbd204259..9d8c53075298 100644 >>>>> --- a/mm/vmscan.c >>>>> +++ b/mm/vmscan.c >>>>> @@ -1012,7 +1012,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>                                    int priority) >>>>>    { >>>>>           unsigned long ret, freed = 0; >>>>> -       struct shrinker *shrinker; >>>>> +       struct shrinker *shrinker = NULL; >>>>>           int srcu_idx, generation; >>>>> >>>>>           /* >>>>> @@ -1025,11 +1025,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>           if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) >>>>>                   return shrink_slab_memcg(gfp_mask, nid, memcg, priority); >>>>> >>>>> +again: >>>>>           srcu_idx = srcu_read_lock(&shrinker_srcu); >>>>> >>>>>           generation = atomic_read(&shrinker_srcu_generation); >>>>> -       list_for_each_entry_srcu(shrinker, &shrinker_list, list, >>>>> -                                srcu_read_lock_held(&shrinker_srcu)) { >>>>> +       if (!shrinker) >>>>> +               shrinker = list_entry_rcu(shrinker_list.next, struct shrinker, list); >>>>> +       else >>>>> +               shrinker = list_entry_rcu(shrinker->list.next, struct shrinker, list); >>>>> +       list_for_each_entry_from_rcu(shrinker, &shrinker_list, list) { >>>>>                   struct shrink_control sc = { >>>>>                           .gfp_mask = gfp_mask, >>>>>                           .nid = nid, >>>>> @@ -1042,8 +1046,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>                   freed += ret; >>>>> >>>>>                   if (atomic_read(&shrinker_srcu_generation) != generation) { >>>>> -                       freed = freed ? : 1; >>>>> -                       break; >>>>> +                       srcu_read_unlock(&shrinker_srcu, srcu_idx); >> >> After SRCU in unlocked we can't believe @shrinker anymore. So, above list_entry_rcu(shrinker->list.next) >> dereferences some random memory. > > Indeed. > >> >>>>> +                       cond_resched(); >>>>> +                       goto again; >>>>>                   } >>>>>           } >>>>> >>>>>> >>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>> index 27ef9946ae8a..0b197bba1257 100644 >>>>>> --- a/mm/vmscan.c >>>>>> +++ b/mm/vmscan.c >>>>>> @@ -204,6 +204,7 @@ static void set_task_reclaim_state(struct task_struct *task, >>>>>>     LIST_HEAD(shrinker_list); >>>>>>     DEFINE_MUTEX(shrinker_mutex); >>>>>>     DEFINE_SRCU(shrinker_srcu); >>>>>> +static atomic_t shrinker_srcu_generation = ATOMIC_INIT(0); >>>>>>       #ifdef CONFIG_MEMCG >>>>>>     static int shrinker_nr_max; >>>>>> @@ -782,6 +783,7 @@ void unregister_shrinker(struct shrinker *shrinker) >>>>>>         debugfs_entry = shrinker_debugfs_remove(shrinker); >>>>>>         mutex_unlock(&shrinker_mutex); >>>>>>     +    atomic_inc(&shrinker_srcu_generation); >>>>>>         synchronize_srcu(&shrinker_srcu); >>>>>>           debugfs_remove_recursive(debugfs_entry); >>>>>> @@ -799,6 +801,7 @@ EXPORT_SYMBOL(unregister_shrinker); >>>>>>      */ >>>>>>     void synchronize_shrinkers(void) >>>>>>     { >>>>>> +    atomic_inc(&shrinker_srcu_generation); >>>>>>         synchronize_srcu(&shrinker_srcu); >>>>>>     } >>>>>>     EXPORT_SYMBOL(synchronize_shrinkers); >>>>>> @@ -908,18 +911,19 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>>>>     { >>>>>>         struct shrinker_info *info; >>>>>>         unsigned long ret, freed = 0; >>>>>> -    int srcu_idx; >>>>>> -    int i; >>>>>> +    int srcu_idx, generation; >>>>>> +    int i = 0; >>>>>>           if (!mem_cgroup_online(memcg)) >>>>>>             return 0; >>>>>> - >>>>>> +again: >>>>>>         srcu_idx = srcu_read_lock(&shrinker_srcu); >>>>>>         info = shrinker_info_srcu(memcg, nid); >>>>>>         if (unlikely(!info)) >>>>>>             goto unlock; >>>>>>     -    for_each_set_bit(i, info->map, info->map_nr_max) { >>>>>> +    generation = atomic_read(&shrinker_srcu_generation); >>>>>> +    for_each_set_bit_from(i, info->map, info->map_nr_max) { >>>>>>             struct shrink_control sc = { >>>>>>                 .gfp_mask = gfp_mask, >>>>>>                 .nid = nid, >>>>>> @@ -965,6 +969,11 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>>>>                     set_shrinker_bit(memcg, nid, i); >>>>>>             } >>>>>>             freed += ret; >>>>>> + >>>>>> +        if (atomic_read(&shrinker_srcu_generation) != generation) { >>>>>> +            srcu_read_unlock(&shrinker_srcu, srcu_idx); >>>>> >>>>> Maybe we can add the following code here, so as to avoid repeating the >>>>> current id and avoid triggering softlockup: >>>>> >>>>>               i++; >> >> This is OK. >> >>>>>               cond_resched(); >> >> Possible, existing cond_resched() in do_shrink_slab() is enough. > > Yeah. > > I will add this patch in the next version. May I mark you as the author > of this patch? I think, yes >> >>> And this. :) >>> >>> Thanks, >>> Qi >>> >>>>> >>>>> Thanks, >>>>> Qi >>>>> >>>>>> +            goto again; >>>>>> +        } >>>>>>         } >>>>>>     unlock: >>>>>>         srcu_read_unlock(&shrinker_srcu, srcu_idx); >>>>>> @@ -1004,7 +1013,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>>     { >>>>>>         unsigned long ret, freed = 0; >>>>>>         struct shrinker *shrinker; >>>>>> -    int srcu_idx; >>>>>> +    int srcu_idx, generation; >>>>>>           /* >>>>>>          * The root memcg might be allocated even though memcg is disabled >>>>>> @@ -1017,6 +1026,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>>             return shrink_slab_memcg(gfp_mask, nid, memcg, priority); >>>>>>           srcu_idx = srcu_read_lock(&shrinker_srcu); >>>>>> +    generation = atomic_read(&shrinker_srcu_generation); >>>>>>           list_for_each_entry_srcu(shrinker, &shrinker_list, list, >>>>>>                      srcu_read_lock_held(&shrinker_srcu)) { >>>>>> @@ -1030,6 +1040,11 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>>             if (ret == SHRINK_EMPTY) >>>>>>                 ret = 0; >>>>>>             freed += ret; >>>>>> + >>>>>> +        if (atomic_read(&shrinker_srcu_generation) != generation) { >>>>>> +            freed = freed ? : 1; >>>>>> +            break; >>>>>> +        } >>>>>>         } >>>>>>           srcu_read_unlock(&shrinker_srcu, srcu_idx); >>>>>> >>>>>> >>>>> >>>> >>> >> >