From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77BA9C61DA4 for ; Thu, 23 Feb 2023 04:37:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D870A6B0074; Wed, 22 Feb 2023 23:37:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D35E86B0075; Wed, 22 Feb 2023 23:37:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD6096B0078; Wed, 22 Feb 2023 23:37:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A97296B0074 for ; Wed, 22 Feb 2023 23:37:30 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 86169A0F38 for ; Thu, 23 Feb 2023 04:37:30 +0000 (UTC) X-FDA: 80497297860.15.CD35110 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf04.hostedemail.com (Postfix) with ESMTP id 8FD6E40005 for ; Thu, 23 Feb 2023 04:37:28 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=jzTvOXJ6; spf=pass (imf04.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677127048; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bgEUJI1FbPYHDyZTL+1aW43M1EL7Z5ld07eE1dMmj9I=; b=TVpGeW0l20pqRGkbJLnkW6dPpSMnZKgaEWE7kd6K2oTJY0fyqokk8eXSxpxecgCYjAWDdc ZAhRw44rdH4K6zOG9oiI7Ui3t/iWH2mDcDjlyyGIcklOK6QdujQkqKCYlqaj7ppIgwZpYl 6C8Hv42lTsuUq0hQdF0E+HjGtQd7ydI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=jzTvOXJ6; spf=pass (imf04.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677127048; a=rsa-sha256; cv=none; b=x48ZNIglz7TOZtnBcJdpbdsBGpoBuejiqhcKgrhR0zB7yXXVtfgXViBizos9luzmq1BuLb tmTxahvmzZ0dluXYXuyJJoOuwuI/wiC3y9QFES8ui1jD9jR2sCEUEEfXVzA+AuYo6UKY4A VwvwOZ1ZGTAc4puzIRonRQMG068rFzQ= Received: by mail-pj1-f48.google.com with SMTP id qa18-20020a17090b4fd200b0023750b675f5so3186566pjb.3 for ; Wed, 22 Feb 2023 20:37:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; t=1677127047; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=bgEUJI1FbPYHDyZTL+1aW43M1EL7Z5ld07eE1dMmj9I=; b=jzTvOXJ6rPYIaMB7arzhn/vdCZ3Nirr0e9iwyS8dc50XLFgECg0anfkCbeFhYAWm5Q SzSA55KxTB7pSx5sHwJMsU2XkmwPM6W0nu9igVyVH+D8UxroXEhD4gupD+bp7cIO3daA ZfL2wAnEq2MEb0ww4KVv08NUR+KH0SDiCcs5epaTkv/0SeyYZhtHOKb5jPH6GzD9+CXf aBWMOKq9ruA/poVC0KNQrNCb/R0hzk/xC7tN5LkY/g0GhHvkDWV6PeSGvh/EoQhlZMep KJiw2u+1/SBr4urHhFgEPccJBy4hkcJSrYaOoYJIjdQ693nPcCH60yMof6iRaZzvJHIv jjxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677127047; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bgEUJI1FbPYHDyZTL+1aW43M1EL7Z5ld07eE1dMmj9I=; b=H3Mdc2xBbKNP4zZbaXNS5ahwot+n5nzAhgny5a9J4efOw0o18mAdRDMD5m0zv8Lhx3 PUO1ZdBHtZZTSY+4vkFXL4SZ69YmyCQNzO2d5ZaJt5h07EhjeUKP+/IcV9xrZkdd7HKa 30qaS/ekT47EucBPoFEVdj1/gi9L/qHjqTDHCi3d0Mw/Mw1Jl1g9kMRtXaK5T/0zBUpG 0D/ZLxjQjNOxMXBE+1npTt2k2pfTMVyFca5Tk4afydX7fIYsXUSBbW/hCK1MLMLERFs+ tDlgg9bKgay9y5JkbTIYvYq7qre/Pj9eQaBjXgRwjbEm09HGmuzZEE5bqrJEWJ2iCCgh l6/Q== X-Gm-Message-State: AO0yUKVvpUQPYpCdrlo1ec5yTuL9azSKKUNjMM6XGtv6oe3jHcQ2Bds+ dcaKqo25YEr1AI2aWP1/1DokA0ugvD+jG9WR X-Google-Smtp-Source: AK7set+ENtMToE9BflcUsG3cSpXaHfuxXF93DgKsQa7jBdodIEfy25+u20zZPnr4l1O1SdzOgIBdcw== X-Received: by 2002:a17:902:d48d:b0:19a:a815:2853 with SMTP id c13-20020a170902d48d00b0019aa8152853mr11741156plg.1.1677127047388; Wed, 22 Feb 2023 20:37:27 -0800 (PST) Received: from [10.70.252.135] ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id u5-20020a17090282c500b0019c2d664869sm859813plz.298.2023.02.22.20.37.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Feb 2023 20:37:27 -0800 (PST) Message-ID: Date: Thu, 23 Feb 2023 12:37:19 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.7.2 Subject: Re: [PATCH 2/5] mm: vmscan: make memcg slab shrink lockless Content-Language: en-US To: Kirill Tkhai Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Johannes Weiner , Shakeel Butt , Michal Hocko , Roman Gushchin , Muchun Song , David Hildenbrand , Yang Shi References: <20230220091637.64865-1-zhengqi.arch@bytedance.com> <20230220091637.64865-3-zhengqi.arch@bytedance.com> <39cf7fa7-5dd6-0424-23de-1c33e948a201@bytedance.com> <23fde2e7-e9e2-2b0d-dfd8-1a654bc5503c@ya.ru> From: Qi Zheng In-Reply-To: <23fde2e7-e9e2-2b0d-dfd8-1a654bc5503c@ya.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: h73wuaea4x4bwfj7hpdxi5yzynsh6bwz X-Rspamd-Queue-Id: 8FD6E40005 X-HE-Tag: 1677127048-53145 X-HE-Meta: U2FsdGVkX18ee2/ZYZFApbJOXVL3Tr4nbBv0xuMralT9y7F4ZinqWgKtbnE/m9dP+iUhoKBjSxTPhfngkHe45pv4tfMR4AO9j3q5O4FcmzCAXCXprKV58SodRTiHtLdN90tsZNVymxJ2ihYYHNAXvnUf9ZRZqJYjofWrZEV9pbRaSFFyIJOVJ73Z3hf123Gi0uZsbb4B8/H0hNLWT4LtQXjPVJfGKb6jqFGtzrplUeeZ2zqZyQ9bW7GHaj5CAOwRMUlq7TacoxZ/i9CQB8ueKuPEZSKt6ctc4dyhnfwL6iW8csDAyu3tqX46XA/lAyLy694kaA4jEP9nlrveqiJvYpGZvFFD7dUAktTtOcEM3lKP6SiTHVhaRy5Has1QBh2vR0uiTc5ZalkKSH4Bh4TjqxcGWejFeIBxL8y8eX2y43JtYp0ueHvds7x8MuEQjMVTJNUdXaVnKvGOrOvKTkoPk1EZlFnvnMscZcV1mPa5+MnHAnMcazNtC/730vb25FCa4ulWSWT4al+Aru8SFv3ci3j483fACoN2Pi3dnmDMTcuE0/mb0yN4lth5XvLWVLD3FvfdJsJj7xChBek8RN8rPnyBcOxi80lD1USIk5mL48CbeRIRfQelUWG5/Y3HNdQ7VzuCWWuhYj5C1R8OFHsXYoeCfldQ49yJsVe2+q4rGfXQKujySOSHrz2IfkMk6XHX4mCyg23FhREEKUxCgqbh2UmhY8Pjkf6Ihv0hw8f5Yj/wjICFALSlPAoXeMB0s04pBZ3IbgAutVGSQiRCM9X1tkJ+McQg88b7ugTAbhPe8haneNxb61kvjHAT5+pkIhwUO86lPNmxTJyDtPb5quw+HbrjrlpEo2HITpRijouhaTuwuVqDmcCXMXKesGbFdu6LwgnfrM5um0nxLZQxtFaD8w1pN6nJDAr78x0NtxxPPVumGekmxQPNnoqJ4h+nXxrOPUe1p2C9rlIoDXTZeN8 g+RNYMVV 4quHwCcE0C3ER4Y2VxOPzU29/YfCwFxAmD8ad+w8AbtIQDjtiepqxj168ruMeONSuHO0df1AzFy4GllwaBVSiK7kKjhnlzYuqGN7vlFGlFrrn27a43CLaRX/YFXf95HrKMuym09Rtke5U+gUUd6DETG/Oa2BZBiTEIuMPfsOv2HZ8T8C9xiHXmqZGsKdREc53nZFsIgnYvxVdwsnUzTMIK/Zn5LxlRYufyeZj8bDDnobOwd0FT+MDkA5TVTkCiYHrJnFeMWsUOieF72NHmv70fyxyebjDgR/D/1FipSucOT+QU5BfjOOn0aAozE2pCiezWIzVngKI6S9iiyCD/qvN3mDk2QTwxGtT+UocRrN8/izWh5FUKlyW1zr+80CyPkNbG63t2zmZsIKjfM6VCFKHk5KLDQEghAGQkuBq2XJTPWcr6X/ihfEDGIJrgGjUHpBWuf5w X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/2/23 04:05, Kirill Tkhai wrote: > On 22.02.2023 11:21, Qi Zheng wrote: >> >> >> On 2023/2/22 16:16, Qi Zheng wrote: >>> Hi Kirill, >>> >>> On 2023/2/22 05:43, Kirill Tkhai wrote: >>>> On 20.02.2023 12:16, Qi Zheng wrote: >>>>> Like global slab shrink, since commit 1cd0bd06093c<...> >>>>>   static bool cgroup_reclaim(struct scan_control *sc) >>>>> @@ -891,15 +905,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>>>   { >>>>>       struct shrinker_info *info; >>>>>       unsigned long ret, freed = 0; >>>>> +    int srcu_idx; >>>>>       int i; >>>>>       if (!mem_cgroup_online(memcg)) >>>>>           return 0; >>>>> -    if (!down_read_trylock(&shrinker_rwsem)) >>>>> -        return 0; >>>>> - >>>>> -    info = shrinker_info_protected(memcg, nid); >>>>> +    srcu_idx = srcu_read_lock(&shrinker_srcu); >>>>> +    info = shrinker_info_srcu(memcg, nid); >>>>>       if (unlikely(!info)) >>>>>           goto unlock; >>>> >>>> There is shrinker_nr_max dereference under this hunk. It's not in the patch: >>>> >>>>          for_each_set_bit(i, info->map, shrinker_nr_max) { >>>> >>>> Since shrinker_nr_max may grow in parallel, this leads to access beyond allocated memory :( >>> >>> Oh, indeed. >>> >>>> >>>> It looks like we should save size of info->map as a new member of struct shrinker_info. >>> >>> Agree, then we only traverse info->map_size each time. Like below: >>> >>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >>> index b6eda2ab205d..f1b5d2803007 100644 >>> --- a/include/linux/memcontrol.h >>> +++ b/include/linux/memcontrol.h >>> @@ -97,6 +97,7 @@ struct shrinker_info { >>>          struct rcu_head rcu; >>>          atomic_long_t *nr_deferred; >>>          unsigned long *map; >>> +       int map_size; > > Sure, like this. The only thing (after rethinking) I want to say is that despite "size" was > may suggestion, now it makes me think that name "size" is about size in bytes. > > Possible, something like map_nr_max would be better here. Agree. Will change to it. > >>>   }; >>> >>>   struct lruvec_stats_percpu { >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index f94bfe540675..dd07eb107915 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -239,14 +239,20 @@ static void free_shrinker_info_rcu(struct rcu_head *head) >>>          kvfree(container_of(head, struct shrinker_info, rcu)); >>>   } >>> >>> -static int expand_one_shrinker_info(struct mem_cgroup *memcg, >>> -                                   int map_size, int defer_size, >>> -                                   int old_map_size, int old_defer_size) >>> +static int expand_one_shrinker_info(struct mem_cgroup *memcg, int new_nr_max, >>> +                                   int old_nr_max) >>>   { >>> +       int map_size, defer_size, old_map_size, old_defer_size; >>>          struct shrinker_info *new, *old; >>>          struct mem_cgroup_per_node *pn; >>>          int nid; >>> -       int size = map_size + defer_size; >>> +       int size; >>> + >>> +       map_size = shrinker_map_size(new_nr_max); >>> +       defer_size = shrinker_defer_size(new_nr_max); >>> +       old_map_size = shrinker_map_size(shrinker_nr_max); >>> +       old_defer_size = shrinker_defer_size(shrinker_nr_max); >> >> Perhaps these should still be calculated outside the loop as before. > > Yeah, for me it's also better. > >>> +       size = map_size + defer_size; >>> >>>          for_each_node(nid) { >>>                  pn = memcg->nodeinfo[nid]; >>> @@ -261,6 +267,7 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, >>> >>>                  new->nr_deferred = (atomic_long_t *)(new + 1); >>>                  new->map = (void *)new->nr_deferred + defer_size; >>> +               new->map_size = new_nr_max; >>> >>>                  /* map: set all old bits, clear all new bits */ >>>                  memset(new->map, (int)0xff, old_map_size); >>> @@ -310,6 +317,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg) >>>                  } >>>                  info->nr_deferred = (atomic_long_t *)(info + 1); >>>                  info->map = (void *)info->nr_deferred + defer_size; >>> +               info->map_size = shrinker_nr_max; >>>                  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info); >>>          } >>>          mutex_unlock(&shrinker_mutex); >>> @@ -327,8 +335,6 @@ static int expand_shrinker_info(int new_id) >>>   { >>>          int ret = 0; >>>          int new_nr_max = new_id + 1; >>> -       int map_size, defer_size = 0; >>> -       int old_map_size, old_defer_size = 0; >>>          struct mem_cgroup *memcg; >>> >>>          if (!need_expand(new_nr_max)) >>> @@ -339,15 +345,9 @@ static int expand_shrinker_info(int new_id) >>> >>>          lockdep_assert_held(&shrinker_mutex); >>> >>> -       map_size = shrinker_map_size(new_nr_max); >>> -       defer_size = shrinker_defer_size(new_nr_max); >>> -       old_map_size = shrinker_map_size(shrinker_nr_max); >>> -       old_defer_size = shrinker_defer_size(shrinker_nr_max); >>> - >>>          memcg = mem_cgroup_iter(NULL, NULL, NULL); >>>          do { >>> -               ret = expand_one_shrinker_info(memcg, map_size, defer_size, >>> -                                              old_map_size, old_defer_size); >>> +               ret = expand_one_shrinker_info(memcg, new_nr_max, shrinker_nr_max); >>>                  if (ret) { >>>                          mem_cgroup_iter_break(NULL, memcg); >>>                          goto out; >>> @@ -912,7 +912,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>          if (unlikely(!info)) >>>                  goto unlock; >>> >>> -       for_each_set_bit(i, info->map, shrinker_nr_max) { >>> +       for_each_set_bit(i, info->map, info->map_size) { >>>                  struct shrink_control sc = { >>>                          .gfp_mask = gfp_mask, >>>                          .nid = nid, >>> >>> I will send the v2. >>> >>> Thanks, >>> Qi >>> >>>> >>>>> @@ -949,14 +962,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>>>                   set_shrinker_bit(memcg, nid, i); >>>>>           } >>>>>           freed += ret; >>>>> - >>>>> -        if (rwsem_is_contended(&shrinker_rwsem)) { >>>>> -            freed = freed ? : 1; >>>>> -            break; >>>>> -        } >>>>>       } >>>>>   unlock: >>>>> -    up_read(&shrinker_rwsem); >>>>> +    srcu_read_unlock(&shrinker_srcu, srcu_idx); >>>>>       return freed; >>>>>   } >>>>>   #else /* CONFIG_MEMCG */ >>>> >>> >> > -- Thanks, Qi