From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1229DC64EC7 for ; Tue, 28 Feb 2023 10:08:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B5346B0071; Tue, 28 Feb 2023 05:08:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 86FD66B0072; Tue, 28 Feb 2023 05:08:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72B086B0073; Tue, 28 Feb 2023 05:08:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6030B6B0071 for ; Tue, 28 Feb 2023 05:08:31 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CAFCE161331 for ; Tue, 28 Feb 2023 10:08:30 +0000 (UTC) X-FDA: 80516275980.15.E256017 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf10.hostedemail.com (Postfix) with ESMTP id F0462C0004 for ; Tue, 28 Feb 2023 10:08:28 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=DHV9k4RY; spf=pass (imf10.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677578909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IMrMIaYlecKaagZSncM5nFsXU/ttF2/4dxLRTxFoUnE=; b=0ul+TXDHmEFtkvfzZ7d+LHJs4Cb6kFwdPxuB5pEJIxDJp5nsSTB2S6TndOBaSP9KMviirT piET2QK+iMof3siwJLRFQSIscqvLnEMy5v/tPWFzTsj5VTP01azAnY98dcmfgJRSWdoVke A0WPbC2H6iFEuJIOQfG3bn5BnTwRDxw= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=DHV9k4RY; spf=pass (imf10.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677578909; a=rsa-sha256; cv=none; b=xAHjrQx9d8Xxa5lh2GgiLKBuxonBlLd1vXK6cOa4EB9zixSiIrEyOjlYKiG3HJfvgi1INl kGp1WAcN6g4qleiqctiflsZnDZy/JAy6Q8YLesYfM+LCRR7JUDGFxmKpOp8QspsHxx8p3F dLTECQ9qPVvBkexojeZOQRPDTfMOidI= Received: by mail-pl1-f177.google.com with SMTP id h8so6534120plf.10 for ; Tue, 28 Feb 2023 02:08:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1677578907; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=IMrMIaYlecKaagZSncM5nFsXU/ttF2/4dxLRTxFoUnE=; b=DHV9k4RY/SMPIJgI3llD499H4oNMap6lSJ+mpnFUC/L852rVVuqz27glvDAISSSNAJ CDhAYvWh0eI+gNhcKnz2vNbwLxnB3WNG85bTEyt0n9zqD3ifZyaqkxUQKenCLyrGmE4B E8zQKNcZ95GeWGueROdzJ2v7+R+DCThmJgYqj8E1R9Ok+GsYtHW9lwU8oWUjcSYC1bsy CcgYMyzDgCiuFFEgXe8iCISWHgVtEjjyYGbbKZeWzBXY9Zr6AmXN2V4RLNSy9jbhp+SP JImMK/bLsLXyxq5Kb/Btu2I2UaYH/zFt+MSMUsiew5dkS/B6lb4ybphOTgx6aBwYklFH CJDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677578907; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IMrMIaYlecKaagZSncM5nFsXU/ttF2/4dxLRTxFoUnE=; b=UZwBpC2bVmcN3QvVe6J6mvmdPlA997y1ocw6Jrk1Z3PSDn7MsYcmEOp2UnWiOWFWy1 Hm7TKlE/Wj40EPcnSFBfcubaQp0k0K/g4Qv1+DNnXbR5AFdcb24KF8KXqwT2rPA6QPSa EkWHjltoK4QKbrPdKCf6eRAyaYENoHWmg9U7fZrkjtv601v4W4gvCr548+7VqZaGBMFT HdzvQSNGJlpDI+EZhnHQV8HLHcA+ED5WvlJHE7jWW2GvW1PFF4t2vzPAL2aBuaAwPFTP FcjtEuEGOmoTyJWAA4yTx6GxBIUhT6gXBTeSLFFsB3PJya9qRJ66h7tqligA2Qy0RMjk 8f+g== X-Gm-Message-State: AO0yUKVaA3LVjfks1VxrbxQT7b2uhFlgwyO+5iPwU41Tqcm85e8dmRYH GdSc1uWYnzDYQS7Z0TT6KMJ9Qw== X-Google-Smtp-Source: AK7set9HFBjFSB2TgvbJTbWe0AX4OJ7JJU8j0+WYLowJEKgBRDhC2KUnPZFIdGX28TyBAfm4XU3eBQ== X-Received: by 2002:a17:90a:2e8b:b0:234:117e:b122 with SMTP id r11-20020a17090a2e8b00b00234117eb122mr2677546pjd.0.1677578907668; Tue, 28 Feb 2023 02:08:27 -0800 (PST) Received: from [10.70.252.135] ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id x7-20020a17090aca0700b0023317104415sm7659621pjt.17.2023.02.28.02.08.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 28 Feb 2023 02:08:27 -0800 (PST) Message-ID: <25c644e2-70a0-9544-47db-46dd88b993d3@bytedance.com> Date: Tue, 28 Feb 2023 18:08:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.7.2 Subject: Re: [PATCH v3 0/8] make slab shrink lockless Content-Language: en-US To: Kirill Tkhai , Mike Rapoport Cc: Andrew Morton , hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com, sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@i-love.sakura.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230226144655.79778-1-zhengqi.arch@bytedance.com> <20230226115100.7e12bda7931dd65dbabcebe3@linux-foundation.org> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: npcamfk381fnr4r6wgqgwdytm6w1mj37 X-Rspamd-Queue-Id: F0462C0004 X-HE-Tag: 1677578908-651569 X-HE-Meta: U2FsdGVkX1+ijztrUvSZmUwpFFML8GPAUsKUebNuaPg95fx7FMxAr0+ZJMhbnHZFWhLu/n6GhnS/AkX7lJwujowiDxXoPZG8QamauGrwkn4oD85v/7urYUSpPckBLyKivVepn6C3KuAuhXJLg3eStvEqO7hqZ5TCFQ839Zu4QOvOXA+K3uyR9ttrgzC4gU2n2mX42JbcRZOYms+LBs5c51eCYSfnqd6LZHd1JbABm5GHPYdwobNlHq0qRKhDEZkyPNClxUSYMGh0mwnDCKlnVcAsgzi/e6hJoi7tOvClzxsnz28A/aFzftVh3Vok3t2jeh9OK2rLyGm3m1O5Jn5migh/PR/5f3DkvYkR87w+qIHacLyj+oU/1Zli6KHMCl1hKUo1f+GBspMJAeYwDRk1KC1pkgGomT61/RMju62SUxEcMft/penCylgm2TdHeVQKC/GYY5vSiql4pl2qnOIx5/DJjJd+VkD+fDLZBEeLcl+u9Ex/2Z796DapxYqQAwAjelLmzbNjlOoKatjdFNlaJ96q65pXVl3Ot/vHAa9iVa0HDPlw0LiySRe3IVXwRI6EUu5lgsHEkLZYepXy90xs9eUGAHynUjsbcmrkv4AuD1vE8otq2Gl+f+7/9C00xTe1afiSF7VQz3791haB8T4AiiFTsrxz4kZH8v+ebHW/PvD0ZdKrRPNFiIpWeiQMhL8knPVI7uxgxJiBfN04UzmlSP5+MhouBRknaVFR9Lo37NJ1Inw24P3hgUHb8wVZlgqokn2HH8KHKymmwdwJ5j4vdaE4L/t2VMK9EMXYQAzb6L6xr8jTTAAe2yIfPGD/sA99LPLH9g43R6Hv1uEsU4/phlXMA7GWxdd6nZ6eXWp/hsOcoI+Hwrh8nIIRjOE00qKbi8ZZriEoZLzsAmGP6TtA+1ERW0QfJqZ84nD8kEIwJfqrru6YGCoF2upSYsUmDhCngsFNCgY3fGlaDg/1Dcj iRlekV7n aWnaHvFllGX1y7HIsHNXr+o8HPuhYZ2mKU6ocHLMjxTx+2n7NbBWA2RV6+OCRbMLeo04j7Nkm74getOZCoAdUm3tpk1EZDc930WCQB8Y1nvCyEl/ZThl4evWyni81c9twMmsLPoBLP179uZsEYnBuM048/RhqUhy/DzTGptQOy/1co3+2BExRMV26XfU4EwhyBlXQpiV8SeO3G/o09kF8XXSIJSkBK21JgOLwZs2wMTbHS9aQ56WzLr/wkuyGMppjtH9ZLNETxSpF9MZyMPfUSt7516HtT/EsvPtayORGtMWKCn+ymIRqQGl+pIRB07Y6DYZzOvOQvafPvNHKo/QSI53/GTBmmIg/sDeB1hMP+0ha7tWgrka072Ahy8AsirX8JVhHTv6ipOyu7dHBWi7Kd2PcDT9Ls8EeVwPl0mYOpjhhooiSMc/iGULq2w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/2/28 03:20, Kirill Tkhai wrote: > On 27.02.2023 18:08, Mike Rapoport wrote: >> Hi, >> >> On Mon, Feb 27, 2023 at 09:31:51PM +0800, Qi Zheng wrote: >>> >>> >>> On 2023/2/27 03:51, Andrew Morton wrote: >>>> On Sun, 26 Feb 2023 22:46:47 +0800 Qi Zheng wrote: >>>> >>>>> Hi all, >>>>> >>>>> This patch series aims to make slab shrink lockless. >>>> >>>> What an awesome changelog. >>>> >>>>> 2. Survey >>>>> ========= >>>> >>>> Especially this part. >>>> >>>> Looking through all the prior efforts and at this patchset I am not >>>> immediately seeing any statements about the overall effect upon >>>> real-world workloads. For a good example, does this patchset >>>> measurably improve throughput or energy consumption on your servers? >>> >>> Hi Andrew, >>> >>> I re-tested with the following physical machines: >>> >>> Architecture: x86_64 >>> CPU(s): 96 >>> On-line CPU(s) list: 0-95 >>> Model name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz >>> >>> I found that the reason for the hotspot I described in cover letter is >>> wrong. The reason for the down_read_trylock() hotspot is not because of >>> the failure to trylock, but simply because of the atomic operation >>> (cmpxchg). And this will lead to a significant reduction in IPC (insn >>> per cycle). >> >> ... >> >>> Then we can use the following perf command to view hotspots: >>> >>> perf top -U -F 999 >>> >>> 1) Before applying this patchset: >>> >>> 32.31% [kernel] [k] down_read_trylock >>> 19.40% [kernel] [k] pv_native_safe_halt >>> 16.24% [kernel] [k] up_read >>> 15.70% [kernel] [k] shrink_slab >>> 4.69% [kernel] [k] _find_next_bit >>> 2.62% [kernel] [k] shrink_node >>> 1.78% [kernel] [k] shrink_lruvec >>> 0.76% [kernel] [k] do_shrink_slab >>> >>> 2) After applying this patchset: >>> >>> 27.83% [kernel] [k] _find_next_bit >>> 16.97% [kernel] [k] shrink_slab >>> 15.82% [kernel] [k] pv_native_safe_halt >>> 9.58% [kernel] [k] shrink_node >>> 8.31% [kernel] [k] shrink_lruvec >>> 5.64% [kernel] [k] do_shrink_slab >>> 3.88% [kernel] [k] mem_cgroup_iter >>> >>> 2. At the same time, we use the following perf command to capture IPC >>> information: >>> >>> perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10 >>> >>> 1) Before applying this patchset: >>> >>> Performance counter stats for 'system wide' (5 runs): >>> >>> 454187219766 cycles test ( >>> +- 1.84% ) >>> 78896433101 instructions test # 0.17 insn per >>> cycle ( +- 0.44% ) >>> >>> 10.0020430 +- 0.0000366 seconds time elapsed ( +- 0.00% ) >>> >>> 2) After applying this patchset: >>> >>> Performance counter stats for 'system wide' (5 runs): >>> >>> 841954709443 cycles test ( >>> +- 15.80% ) (98.69%) >>> 527258677936 instructions test # 0.63 insn per >>> cycle ( +- 15.11% ) (98.68%) >>> >>> 10.01064 +- 0.00831 seconds time elapsed ( +- 0.08% ) >>> >>> We can see that IPC drops very seriously when calling >>> down_read_trylock() at high frequency. After using SRCU, >>> the IPC is at a normal level. >> >> The results you present do show improvement in IPC for an artificial test >> script. But more interesting would be to see how a real world workloads >> benefit from your changes. > > One of the real workloads from my experience is start of an overcommitted node > containing many starting containers after node crash (or many resuming containers > after reboot for kernel update). In these cases memory pressure is huge, and > the node goes round in long reclaim. Thanks a lot for providing this real workload! :) > > This patch patchset makes prealloc_memcg_shrinker() independent of do_shrink_slab(), > so prealloc_memcg_shrinker() won't have to wait till shrink_slab_memcg() completes its > current bit iteration, sees rwsem_is_contended() and the iteration breaks. > > Also, it's important to mention that currently we have the strange behavior: > > prealloc_memcg_shrinker() > down_write(&shrinker_rwsem) > idr_alloc() > reclaim > for each child memcg > shrink_slab_memcg() > down_read_trylock(&shrinker_rwsem) -> fail > > All the slab reclaim in this behavior is just a parasite work, and it just wastes > our cpu time, which does not look a good design. > > Kirill -- Thanks, Qi