From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93618C636D6 for ; Wed, 22 Feb 2023 20:05:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16B5C6B0071; Wed, 22 Feb 2023 15:05:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 11C9D6B0072; Wed, 22 Feb 2023 15:05:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F25716B0073; Wed, 22 Feb 2023 15:05:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E46046B0071 for ; Wed, 22 Feb 2023 15:05:54 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A6199A072D for ; Wed, 22 Feb 2023 20:05:54 +0000 (UTC) X-FDA: 80496008628.02.B5511F9 Received: from forward501c.mail.yandex.net (forward501c.mail.yandex.net [178.154.239.209]) by imf04.hostedemail.com (Postfix) with ESMTP id 676D240025 for ; Wed, 22 Feb 2023 20:05:52 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=ya.ru header.s=mail header.b=JXjiXlof; spf=pass (imf04.hostedemail.com: domain of tkhai@ya.ru designates 178.154.239.209 as permitted sender) smtp.mailfrom=tkhai@ya.ru; dmarc=pass (policy=none) header.from=ya.ru ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677096352; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=71W5XMQdckK/wY/XZVLFIl4KhEfcbFhEzjoIgQolLJo=; b=vpoF9fP3ChNo2n05R9lG7QbJfJJ+xZvR6foK/Cj5rP7LY+XZMnslYeL0ht1iHqeOqQJKrA WTyL3hqqLN/UKTjNdzKJSe8zA3aSPKyayAHqWe19YCUwB8/dCMzDFLFttq5AFzUq2lcbYS n1nvxApZOxIDqVAEewZ/WW9O4K2shaU= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=ya.ru header.s=mail header.b=JXjiXlof; spf=pass (imf04.hostedemail.com: domain of tkhai@ya.ru designates 178.154.239.209 as permitted sender) smtp.mailfrom=tkhai@ya.ru; dmarc=pass (policy=none) header.from=ya.ru ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677096352; a=rsa-sha256; cv=none; b=R2EymODd2Wje7XUx6kgaEEW3HLGjPtGQr47rmuiyRvdCZHC7mDo36sYaB0B7qNAFwUbqI1 vdovFzCwJUf4iKCy2znNvfO9422EKVnt0HOQSkdPGS5NyaXtJd/wd6G5fovbtIwJP8gp71 +wylRIOpAHKld5lvQmXmlUoySCuUejw= Received: from myt5-69594d4a41fa.qloud-c.yandex.net (myt5-69594d4a41fa.qloud-c.yandex.net [IPv6:2a02:6b8:c12:3ca5:0:640:6959:4d4a]) by forward501c.mail.yandex.net (Yandex) with ESMTP id CCEA95F4C4; Wed, 22 Feb 2023 23:05:49 +0300 (MSK) Received: by myt5-69594d4a41fa.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id j5V0PZwZxqM1-uB90kAko; Wed, 22 Feb 2023 23:05:47 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ya.ru; s=mail; t=1677096347; bh=71W5XMQdckK/wY/XZVLFIl4KhEfcbFhEzjoIgQolLJo=; h=From:In-Reply-To:Cc:Date:References:To:Subject:Message-ID; b=JXjiXlofBhfA8j9ARJKSjobpmgysIq9/0LmRRmeD0MzRaLYU0E+l/ty8AGo6CvGOt d6LB2w9alT6Wmt1QmwpkBPCDC95928jffjqWtByCVvVhI7vYCI1J3HXmrY545SUOq6 3Qx9NMOCP3OxJbswMo6GfJLN3mYuebTQnmoceTq0= Message-ID: <23fde2e7-e9e2-2b0d-dfd8-1a654bc5503c@ya.ru> Date: Wed, 22 Feb 2023 23:05:43 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH 2/5] mm: vmscan: make memcg slab shrink lockless Content-Language: en-US To: Qi Zheng Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Johannes Weiner , Shakeel Butt , Michal Hocko , Roman Gushchin , Muchun Song , David Hildenbrand , Yang Shi References: <20230220091637.64865-1-zhengqi.arch@bytedance.com> <20230220091637.64865-3-zhengqi.arch@bytedance.com> <39cf7fa7-5dd6-0424-23de-1c33e948a201@bytedance.com> From: Kirill Tkhai In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 676D240025 X-Stat-Signature: u8a8tt74ug38ai9r6zg5qnq6k6z4mx7p X-HE-Tag: 1677096352-892659 X-HE-Meta: U2FsdGVkX1931feSAv9XbRelvKezGHGqC23q7XasLjIRLqtsHR+UXyjsUu1lEeKY6ug1ERhLP1ooRgEc+KM0wHBRRqUkXRXronG1XBKZF6r6FHtdtNe7xWwWA8hfOwQE6jqGF4nwClxShsUegR0F15FkE3m1skvahVTehQKHgXCDO0MtH1CCHQlEANQ8f8ifKTNdrRTW4Ds/nZS957HVP+EM8Y0aRjnVTs4Pi4zH9CnTMWOLx4sRnfEXqtTR8gvBSYM2I0yXPnzkIIB7HKi5IvVy4sJZbsU44jf6W+wKivXC8vGmqjeosvcZddVVZWj4W9cvKVmKSgCXnTfaSjqujVFrVUnwQvbL/Lg+vC9HFf1UiBYoak7Oyk/eEOIndlwLSv71lDz1wL5OKOuuUzxKlksNxhJNTM4nvg7T/g7IiWADpfulR7YRTgdr8J2PxQHPtj1rEnQHsmCqj2GdQPFir5id4SzYvhpcLiziu+Qs7AP3G61sT7a/+E8OfYCYFdTbhVFUWs6EzHbWrgsrxWvmt5sPtVkUMKi8ct6wkXZY2btC9lL5Zphob4ZEss74ijXDh1AziGIgmlFIH3qPp+niY/xmU3ZbItNepyCR9D/Kw3MHy8E5EB9R2HXyRmh0B0xhpmIoO1ayrbpOddsfGiwrzGGNED6yuc06kjX7SrhTCIHyY/GTDpmbFGPTePUEr8KG4SacLVX8ZhAfaAWHXjFhpAVbafoG67iTaQ2LY7M5ssa3NmJuEGlhw6Vq1IayOC7E2wlQoJkImZUEVCFfKIE7TAhxDKPEbZOnKA9WryIR00hC27jvrR3lC47u/I5bdRU7b8h/ApHxlnSNyPc8Mlg/UB+rVb77NW3vBNNOC1NZPwm+yrnn+qfEEJxefEGN7CoUGopERz9cXPawTB7nJrlcL9tTINl7t1QkMSfRXBZtE2uFys2JHxqQoRcdZ+TfX491ys5NMstihDjUyr6uMxC LOlBudm1 Zc1l+kn8Wz6lgC6GDehYwQ2fzqL3mtaXrHfmP2RHnQ9bm9js92HB8voXSnx4+TzgMoRpJ7QBeDydUvJTLMoDd9SZwPNQjKozub6j/Y5f2KMOX1FWA46rE/wu8S0DSI+DNNf3TQgDlTN2X1qwaF3Fgh2dl/lg8zPfNYGSSsZBX7p/3GftFu+hXLD0vpT3aYmJTsop2BMfXMPem4IEH64XEc5eln7jHldWq6n289DLVVIgV1xMHtX6A24zzFam7wK3ZbLSp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 22.02.2023 11:21, Qi Zheng wrote: > > > On 2023/2/22 16:16, Qi Zheng wrote: >> Hi Kirill, >> >> On 2023/2/22 05:43, Kirill Tkhai wrote: >>> On 20.02.2023 12:16, Qi Zheng wrote: >>>> Like global slab shrink, since commit 1cd0bd06093c<...> >>>>   static bool cgroup_reclaim(struct scan_control *sc) >>>> @@ -891,15 +905,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>>   { >>>>       struct shrinker_info *info; >>>>       unsigned long ret, freed = 0; >>>> +    int srcu_idx; >>>>       int i; >>>>       if (!mem_cgroup_online(memcg)) >>>>           return 0; >>>> -    if (!down_read_trylock(&shrinker_rwsem)) >>>> -        return 0; >>>> - >>>> -    info = shrinker_info_protected(memcg, nid); >>>> +    srcu_idx = srcu_read_lock(&shrinker_srcu); >>>> +    info = shrinker_info_srcu(memcg, nid); >>>>       if (unlikely(!info)) >>>>           goto unlock; >>> >>> There is shrinker_nr_max dereference under this hunk. It's not in the patch: >>> >>>          for_each_set_bit(i, info->map, shrinker_nr_max) { >>> >>> Since shrinker_nr_max may grow in parallel, this leads to access beyond allocated memory :( >> >> Oh, indeed. >> >>> >>> It looks like we should save size of info->map as a new member of struct shrinker_info. >> >> Agree, then we only traverse info->map_size each time. Like below: >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index b6eda2ab205d..f1b5d2803007 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -97,6 +97,7 @@ struct shrinker_info { >>          struct rcu_head rcu; >>          atomic_long_t *nr_deferred; >>          unsigned long *map; >> +       int map_size; Sure, like this. The only thing (after rethinking) I want to say is that despite "size" was may suggestion, now it makes me think that name "size" is about size in bytes. Possible, something like map_nr_max would be better here. >>   }; >> >>   struct lruvec_stats_percpu { >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index f94bfe540675..dd07eb107915 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -239,14 +239,20 @@ static void free_shrinker_info_rcu(struct rcu_head *head) >>          kvfree(container_of(head, struct shrinker_info, rcu)); >>   } >> >> -static int expand_one_shrinker_info(struct mem_cgroup *memcg, >> -                                   int map_size, int defer_size, >> -                                   int old_map_size, int old_defer_size) >> +static int expand_one_shrinker_info(struct mem_cgroup *memcg, int new_nr_max, >> +                                   int old_nr_max) >>   { >> +       int map_size, defer_size, old_map_size, old_defer_size; >>          struct shrinker_info *new, *old; >>          struct mem_cgroup_per_node *pn; >>          int nid; >> -       int size = map_size + defer_size; >> +       int size; >> + >> +       map_size = shrinker_map_size(new_nr_max); >> +       defer_size = shrinker_defer_size(new_nr_max); >> +       old_map_size = shrinker_map_size(shrinker_nr_max); >> +       old_defer_size = shrinker_defer_size(shrinker_nr_max); > > Perhaps these should still be calculated outside the loop as before. Yeah, for me it's also better. >> +       size = map_size + defer_size; >> >>          for_each_node(nid) { >>                  pn = memcg->nodeinfo[nid]; >> @@ -261,6 +267,7 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, >> >>                  new->nr_deferred = (atomic_long_t *)(new + 1); >>                  new->map = (void *)new->nr_deferred + defer_size; >> +               new->map_size = new_nr_max; >> >>                  /* map: set all old bits, clear all new bits */ >>                  memset(new->map, (int)0xff, old_map_size); >> @@ -310,6 +317,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg) >>                  } >>                  info->nr_deferred = (atomic_long_t *)(info + 1); >>                  info->map = (void *)info->nr_deferred + defer_size; >> +               info->map_size = shrinker_nr_max; >>                  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info); >>          } >>          mutex_unlock(&shrinker_mutex); >> @@ -327,8 +335,6 @@ static int expand_shrinker_info(int new_id) >>   { >>          int ret = 0; >>          int new_nr_max = new_id + 1; >> -       int map_size, defer_size = 0; >> -       int old_map_size, old_defer_size = 0; >>          struct mem_cgroup *memcg; >> >>          if (!need_expand(new_nr_max)) >> @@ -339,15 +345,9 @@ static int expand_shrinker_info(int new_id) >> >>          lockdep_assert_held(&shrinker_mutex); >> >> -       map_size = shrinker_map_size(new_nr_max); >> -       defer_size = shrinker_defer_size(new_nr_max); >> -       old_map_size = shrinker_map_size(shrinker_nr_max); >> -       old_defer_size = shrinker_defer_size(shrinker_nr_max); >> - >>          memcg = mem_cgroup_iter(NULL, NULL, NULL); >>          do { >> -               ret = expand_one_shrinker_info(memcg, map_size, defer_size, >> -                                              old_map_size, old_defer_size); >> +               ret = expand_one_shrinker_info(memcg, new_nr_max, shrinker_nr_max); >>                  if (ret) { >>                          mem_cgroup_iter_break(NULL, memcg); >>                          goto out; >> @@ -912,7 +912,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>          if (unlikely(!info)) >>                  goto unlock; >> >> -       for_each_set_bit(i, info->map, shrinker_nr_max) { >> +       for_each_set_bit(i, info->map, info->map_size) { >>                  struct shrink_control sc = { >>                          .gfp_mask = gfp_mask, >>                          .nid = nid, >> >> I will send the v2. >> >> Thanks, >> Qi >> >>> >>>> @@ -949,14 +962,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>>                   set_shrinker_bit(memcg, nid, i); >>>>           } >>>>           freed += ret; >>>> - >>>> -        if (rwsem_is_contended(&shrinker_rwsem)) { >>>> -            freed = freed ? : 1; >>>> -            break; >>>> -        } >>>>       } >>>>   unlock: >>>> -    up_read(&shrinker_rwsem); >>>> +    srcu_read_unlock(&shrinker_srcu, srcu_idx); >>>>       return freed; >>>>   } >>>>   #else /* CONFIG_MEMCG */ >>> >> >