From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84C19C64EC7 for ; Sat, 25 Feb 2023 16:37:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F015E6B0071; Sat, 25 Feb 2023 11:37:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E8B156B0073; Sat, 25 Feb 2023 11:37:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2AD66B0074; Sat, 25 Feb 2023 11:37:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BF5E56B0071 for ; Sat, 25 Feb 2023 11:37:47 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2A8D3A0754 for ; Sat, 25 Feb 2023 16:37:47 +0000 (UTC) X-FDA: 80506370574.19.11DE6E6 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf02.hostedemail.com (Postfix) with ESMTP id F139C80006 for ; Sat, 25 Feb 2023 16:37:43 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=hApu+kth; spf=pass (imf02.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677343065; a=rsa-sha256; cv=none; b=1QPlra6HUJ8uhCAKDd66UxLrjtb6h2JZhqzk4e7j+WRKllHproOrQ46qoBINAlmimszvFD whH3G8Ws4TfmnWzGmIcVmXYfhBNGZa4huDewLzKgUU/dOFw6qQdKFep+e8AhrsWKfhkDIk 5CiCT9dg7zNBEs56it/8QAGa4y1FEJM= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=hApu+kth; spf=pass (imf02.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677343065; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zzpzj1OFLrPZbdT7QYCED3LLStTs3Nt8QXR1qBvuP0A=; b=d+ScNhaN76jtoayYD5vfc4xJ9ZzuqCYZBT+yU2rXA0RiRdkqxANoUP2mir4dXaShAsh8Mu 6vhE0ScB83LuB+hYA/omFDCRtx/cGdukBtSA8KykVickygU65zxovTYbWTiNZrslzVh9VB u9o3Zid7Pqs4JppFdghR7IzkLMkH/V4= Received: by mail-pf1-f174.google.com with SMTP id n2so1193472pfo.12 for ; Sat, 25 Feb 2023 08:37:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1677343062; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=zzpzj1OFLrPZbdT7QYCED3LLStTs3Nt8QXR1qBvuP0A=; b=hApu+kth/nF+LS9HhImzRgTUIGuex6hLvU+OpogAlAKNgPOSou40kFnBUGI52fyalk 70AyqOWUz4FJ1YlAn9Upl6LVbss7MVU4VgKC9hicccadfRyebPZ1Xmb1Tt46NZVcO9eP pFj2+Rtp2tqStNLq5IHMfI24jBYU9lbgO4swx0p1Kz8TsbC5pv74Uzeiyq16d4BOVHKI Jh+G266UFCwgLpyKpe7dei+x9RsiHXoWtbcnyhLi4WIsqa7qmQRsKfBwzzh/l20RKyRU 60vHu2tMIHRvTSpz4V31cqb9m+cpKMww1Y08Z049wmLmKkEI6B2eOZJinPmrkpffBaea yRHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677343062; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zzpzj1OFLrPZbdT7QYCED3LLStTs3Nt8QXR1qBvuP0A=; b=HeHUTYzM7rLs9yDmr4D7Ysfy9sT5CBuB3wi7/YHI7MqUax6bOwErsCkBdS/07TP96K 4owmMnQrvIUGGno1VZaNPmDvCjBKlGMCuIrAjGqzo235uH8NawQKb1VurZBVIs4Et8ob EkVmM4udbLaiklf2tj6F6EqrJuL4kp/Mdm6MPMUEx2yPQ4Yge2tDHIvyTdPGyHUt6R/z n+FWfsulZxPwWV/MeXvi1CxxDmAoiDzFt5SMeBapYGZnqfMfbtLZ5eECCIPDssbcOqON Dvb7khDOkKr4G9pagaK0x/nTja13pyr5zb3K+20HDosAR4B/1arQple+QEKPGz13Sayz 01Ww== X-Gm-Message-State: AO0yUKWCs8VyW1HyLQ0f43X0WS71zCpkFf+ntMi5GLz2SoZJF20Ie6XJ 2kzjRNSPnpY0w+9CbqifYfdO6Q== X-Google-Smtp-Source: AK7set8ZbLsQ4gqQIq6h16wFivIWULElxqeJ146f1MlKsy29ocmp528ePQLJEl+Zn1G1BYsZJ18nlQ== X-Received: by 2002:a62:e713:0:b0:5a9:cebd:7b79 with SMTP id s19-20020a62e713000000b005a9cebd7b79mr19382829pfh.0.1677343062465; Sat, 25 Feb 2023 08:37:42 -0800 (PST) Received: from [10.200.9.56] ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id x13-20020aa7918d000000b005afda1496c6sm1422592pfa.31.2023.02.25.08.37.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 25 Feb 2023 08:37:42 -0800 (PST) Message-ID: <2ba86f45-f0a5-3a85-4aa6-f8beb50491b3@bytedance.com> Date: Sun, 26 Feb 2023 00:37:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.7.2 Subject: Re: [PATCH v2 2/7] mm: vmscan: make global slab shrink lockless Content-Language: en-US To: Kirill Tkhai , Sultan Alsawaf Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com, dave@stgolabs.net, penguin-kernel@i-love.sakura.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230223132725.11685-1-zhengqi.arch@bytedance.com> <20230223132725.11685-3-zhengqi.arch@bytedance.com> <8049b6ed-435f-b518-f947-5516a514aec2@bytedance.com> <1aa70926-af39-0ce1-ae23-d86deb74d1c6@ya.ru> <74c4cf95-9506-98b3-9fc0-0814f63d5d7f@bytedance.com> <5663b349-8f6f-874a-eb9b-63d3179dcab7@ya.ru> From: Qi Zheng In-Reply-To: <5663b349-8f6f-874a-eb9b-63d3179dcab7@ya.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: F139C80006 X-Rspamd-Server: rspam01 X-Stat-Signature: eg45juaud48pt48kw5s8fijtctxk73z7 X-HE-Tag: 1677343063-585325 X-HE-Meta: U2FsdGVkX18hsbKh2pUEJyUO8X/IEVKTe7UIrApasXVFcrnRfcWuu/EYRv3/srFKKqScXU88YJgS64lRRiuqD3mpMkyHZGTMWjzyMtB9WILeRAKt92PEx9OPe0kp3Iis13JBAKX7ixu1dpYGSjWLzglEeO4NgO4UBqoT7CfNshFQK20Jxw1FY3UJBz1exaIiPaLfF9NkuvFJ6Q6JU5PoVGjs3bX2/0qewsTBywgdf7if0EBV4V2U6fBjiy3MXpcpbzTgU0bgFKy4o/gzZxki4Z8fFmSgd3SShij7i3WzvmYvOEG3n8l7sdu8tPg4JvFoegPCcjDIMTIEL45Hv5gqgvbomsMcYJHSL+byGagMpXoo5V7ky7G0eFiirxAbWOVRpHt12Kl80qJRhGHFTrz18HLKACrZK1Pqml9Umd3DFJabgnysGcrgbEilX9ZTBnUiG945mltHSplVsArs9m+VGRjgvRGxCAruNyWbpeD5UW3LPrnnlQJ+3NW4kH5rWTeNR+P5EoTo3lZgJ1o9UVtTrIfdGBQNhGrEpB9VBvlArQDQ/48wtCKMtwdFEb2MFNbBqCIcoJi5RbeRHMRJHW2Ye4KHMZPZntY7AyZpM8fM702Hx5zErDoYtS9qvASJAMc6pdN6oQQVzUibAQHv4X/KmKJXvnoDYJBA7q+x+/UtJAR+9BjWKk/PUtEzSVZX/hrUUDmGijZ7yYffrYVrD9F+NXXn6kLBEnoAXHH4lByuIZrnMVtXY/YX7IQ1pCT4724p4U6RDUr34abCj7kSB5waKTo+gCt5CIjauv8l7SMVf7P+Je30YLyGHkk1uvbL11SuEsr3HAAtBkrR8Q5yEdK2Szxb50Xuc9bDqD7vDCao+j3psNlnuRa36WiV+W5VM2a/6dWBXdnwq4hz/zztGGBfN3KGEMWXS1VHR5Gp47BNGK7AhA1bNFHvSzTX0YpPzynf8FSLveUiUhKVxa23Wtg oRtKJg7Y VuvNNsMOU3MAY9UkvH0FSdspf6yq5z3JqH8+BBsu9hjtOrTTYagpUEMpBiBO7896AQ5z2oUXyTHztYUnRU2Loy2kLMxotenYxJMHA5NurakkT8U0LbtzvPe8LwgjOC5YIHKP3PHuioh95hlIkdJPyZT7gzt/4oeDCgPPkQKpAiu0zcvKGmPQ7LMUoX7sql5gp3aYArCSVI5LPmPLzi+07w5dE4KbNr45pu1xkbybqL3/NnpnU0OwmOA64PvucPOlKz8X98sen4/j6m/ZGHSEP3jNiezzmyFBrldGvX0c9LKX4ATu3wEnlJjTS3qZ7rdWBDrBYCJShoUW0hEQD+cwO2lhxANyhyXT7u5Cq65ZlgS0VWzPFcdTeyGjzMHz6K99JWm4/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/2/26 00:17, Kirill Tkhai wrote: > On 25.02.2023 18:57, Qi Zheng wrote: >> <...> >> How about this? >>>> >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>> index ffddbd204259..9d8c53075298 100644 >>>> --- a/mm/vmscan.c >>>> +++ b/mm/vmscan.c >>>> @@ -1012,7 +1012,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>                                   int priority) >>>>   { >>>>          unsigned long ret, freed = 0; >>>> -       struct shrinker *shrinker; >>>> +       struct shrinker *shrinker = NULL; >>>>          int srcu_idx, generation; >>>> >>>>          /* >>>> @@ -1025,11 +1025,15 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>          if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) >>>>                  return shrink_slab_memcg(gfp_mask, nid, memcg, priority); >>>> >>>> +again: >>>>          srcu_idx = srcu_read_lock(&shrinker_srcu); >>>> >>>>          generation = atomic_read(&shrinker_srcu_generation); >>>> -       list_for_each_entry_srcu(shrinker, &shrinker_list, list, >>>> -                                srcu_read_lock_held(&shrinker_srcu)) { >>>> +       if (!shrinker) >>>> +               shrinker = list_entry_rcu(shrinker_list.next, struct shrinker, list); >>>> +       else >>>> +               shrinker = list_entry_rcu(shrinker->list.next, struct shrinker, list); >>>> +       list_for_each_entry_from_rcu(shrinker, &shrinker_list, list) { >>>>                  struct shrink_control sc = { >>>>                          .gfp_mask = gfp_mask, >>>>                          .nid = nid, >>>> @@ -1042,8 +1046,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>                  freed += ret; >>>> >>>>                  if (atomic_read(&shrinker_srcu_generation) != generation) { >>>> -                       freed = freed ? : 1; >>>> -                       break; >>>> +                       srcu_read_unlock(&shrinker_srcu, srcu_idx); > > After SRCU in unlocked we can't believe @shrinker anymore. So, above list_entry_rcu(shrinker->list.next) > dereferences some random memory. Indeed. > >>>> +                       cond_resched(); >>>> +                       goto again; >>>>                  } >>>>          } >>>> >>>>> >>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>> index 27ef9946ae8a..0b197bba1257 100644 >>>>> --- a/mm/vmscan.c >>>>> +++ b/mm/vmscan.c >>>>> @@ -204,6 +204,7 @@ static void set_task_reclaim_state(struct task_struct *task, >>>>>    LIST_HEAD(shrinker_list); >>>>>    DEFINE_MUTEX(shrinker_mutex); >>>>>    DEFINE_SRCU(shrinker_srcu); >>>>> +static atomic_t shrinker_srcu_generation = ATOMIC_INIT(0); >>>>>      #ifdef CONFIG_MEMCG >>>>>    static int shrinker_nr_max; >>>>> @@ -782,6 +783,7 @@ void unregister_shrinker(struct shrinker *shrinker) >>>>>        debugfs_entry = shrinker_debugfs_remove(shrinker); >>>>>        mutex_unlock(&shrinker_mutex); >>>>>    +    atomic_inc(&shrinker_srcu_generation); >>>>>        synchronize_srcu(&shrinker_srcu); >>>>>          debugfs_remove_recursive(debugfs_entry); >>>>> @@ -799,6 +801,7 @@ EXPORT_SYMBOL(unregister_shrinker); >>>>>     */ >>>>>    void synchronize_shrinkers(void) >>>>>    { >>>>> +    atomic_inc(&shrinker_srcu_generation); >>>>>        synchronize_srcu(&shrinker_srcu); >>>>>    } >>>>>    EXPORT_SYMBOL(synchronize_shrinkers); >>>>> @@ -908,18 +911,19 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>>>    { >>>>>        struct shrinker_info *info; >>>>>        unsigned long ret, freed = 0; >>>>> -    int srcu_idx; >>>>> -    int i; >>>>> +    int srcu_idx, generation; >>>>> +    int i = 0; >>>>>          if (!mem_cgroup_online(memcg)) >>>>>            return 0; >>>>> - >>>>> +again: >>>>>        srcu_idx = srcu_read_lock(&shrinker_srcu); >>>>>        info = shrinker_info_srcu(memcg, nid); >>>>>        if (unlikely(!info)) >>>>>            goto unlock; >>>>>    -    for_each_set_bit(i, info->map, info->map_nr_max) { >>>>> +    generation = atomic_read(&shrinker_srcu_generation); >>>>> +    for_each_set_bit_from(i, info->map, info->map_nr_max) { >>>>>            struct shrink_control sc = { >>>>>                .gfp_mask = gfp_mask, >>>>>                .nid = nid, >>>>> @@ -965,6 +969,11 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >>>>>                    set_shrinker_bit(memcg, nid, i); >>>>>            } >>>>>            freed += ret; >>>>> + >>>>> +        if (atomic_read(&shrinker_srcu_generation) != generation) { >>>>> +            srcu_read_unlock(&shrinker_srcu, srcu_idx); >>>> >>>> Maybe we can add the following code here, so as to avoid repeating the >>>> current id and avoid triggering softlockup: >>>> >>>>              i++; > > This is OK. > >>>>              cond_resched(); > > Possible, existing cond_resched() in do_shrink_slab() is enough. Yeah. I will add this patch in the next version. May I mark you as the author of this patch? Thanks, Qi > >> And this. :) >> >> Thanks, >> Qi >> >>>> >>>> Thanks, >>>> Qi >>>> >>>>> +            goto again; >>>>> +        } >>>>>        } >>>>>    unlock: >>>>>        srcu_read_unlock(&shrinker_srcu, srcu_idx); >>>>> @@ -1004,7 +1013,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>    { >>>>>        unsigned long ret, freed = 0; >>>>>        struct shrinker *shrinker; >>>>> -    int srcu_idx; >>>>> +    int srcu_idx, generation; >>>>>          /* >>>>>         * The root memcg might be allocated even though memcg is disabled >>>>> @@ -1017,6 +1026,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>            return shrink_slab_memcg(gfp_mask, nid, memcg, priority); >>>>>          srcu_idx = srcu_read_lock(&shrinker_srcu); >>>>> +    generation = atomic_read(&shrinker_srcu_generation); >>>>>          list_for_each_entry_srcu(shrinker, &shrinker_list, list, >>>>>                     srcu_read_lock_held(&shrinker_srcu)) { >>>>> @@ -1030,6 +1040,11 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, >>>>>            if (ret == SHRINK_EMPTY) >>>>>                ret = 0; >>>>>            freed += ret; >>>>> + >>>>> +        if (atomic_read(&shrinker_srcu_generation) != generation) { >>>>> +            freed = freed ? : 1; >>>>> +            break; >>>>> +        } >>>>>        } >>>>>          srcu_read_unlock(&shrinker_srcu, srcu_idx); >>>>> >>>>> >>>> >>> >> > -- Thanks, Qi