From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91B84C76195 for ; Sat, 25 Mar 2023 02:18:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD19A6B0071; Fri, 24 Mar 2023 22:18:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D808C6B0074; Fri, 24 Mar 2023 22:18:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C219B6B0075; Fri, 24 Mar 2023 22:18:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AB8A56B0071 for ; Fri, 24 Mar 2023 22:18:34 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 68F8C160996 for ; Sat, 25 Mar 2023 02:18:34 +0000 (UTC) X-FDA: 80605811748.26.746E1C4 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf16.hostedemail.com (Postfix) with ESMTP id 8519D180006 for ; Sat, 25 Mar 2023 02:18:32 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=O7Jst4Ab; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679710712; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Uw0vvRX6fB45yziv3c2fRfTS3bTqjYau48XTnRaZKUQ=; b=MJNGQuM5NfrJ6ekfiTZqNfyfFcaSPCnyt2UJ0aCYYOhKihMDzd6C47oA+FFiP2uULuWUGd jkuVlz1c4UvbBjpZah+OtFiT3nkMri8ynGY4uWlRBd+SEZ2oHwy+ORa+XDwdqAqwfrPqK6 ZzmOAyS/YaXBhH1uOrLWx2ufZBHPgfU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=O7Jst4Ab; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679710712; a=rsa-sha256; cv=none; b=7aTlNusEup5fAUJChnTvhZVXFu3J1bfrpp4h8h8PEbqx4aa5cmPhkDawsJqNjcvzG1sxvW neBmUvUYLbpn9U7etpHGNVEwFu/gq5eI7/R6/JEEjUNSJ4ZRZv27KGnOErMgMdtGwFzuyz ZzWdImwF4C06bF5LEdUExJO8T85uPSE= Received: by mail-ed1-f45.google.com with SMTP id i5so14927401eda.0 for ; Fri, 24 Mar 2023 19:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679710711; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Uw0vvRX6fB45yziv3c2fRfTS3bTqjYau48XTnRaZKUQ=; b=O7Jst4Abb03/2SPXMx5C68RHpw3r3kFMKH7fZnFXh0aJIp2ZBgoM9SwaZL0lI/9pGh nOnlLtm4EcFl3i8d0j46aVqMGUTdlYBqj5yZXFsOZY9Wt0HiaQAMqbcFmRL6lcVCWdbx TnkXpXZS53IaAukR3UsCxNyfv1DwvSSVUN2GtB8yMaLijLAA2wiQ4TtPwRaAWmO3RNMB ssGwMxA6FnWB+KLIdXXQrz42GNgvAWzmC9y1auxEEhX5ynPFv8qM/jW7VouPr1t8bz9W o/YBrB3nWIDIOZaWdVAxep7Nkgx0jPcDgraTEVMPnZzXibYcCq16TyKvx4l4lRaML3zU Wd7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679710711; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Uw0vvRX6fB45yziv3c2fRfTS3bTqjYau48XTnRaZKUQ=; b=IaRqsall0B4YZ5KQsTyz5KgT6kOrhl6MKvDy1NpLu4uxTvDBgCeIfvaMIfqyA3vWPL 8pkSB7KpmBAW62bThd/iHeiSVDr/3tXLPbg/wLE2XCS66sVSXva2gYG+vt+Fg/Pq3Gis E6Lk2nuJ3BtXB0RwUzKXpA1LAG60wq/7vVdL0qkfjyZA6zJpSuk9PC7pCn2wXjAYoyY6 G5Uvoi2BjbNb00ZVnRbHBlRV34EUZGuFJBzcQ8WND+T8JhivFfzKJEcx8c3CmraqYN7x XI3tE1SS5ziQiaSXBRJ7leL3mCQeNLrqoW+qDSYq5PU0yuXBF9ZryTs6bRAES4YqU56r Bo5g== X-Gm-Message-State: AAQBX9coGw6SBrAY9zo4HuLkNcdjNatzlf2S49zeWu92psJleVOE7Wmf 9qmfDrAS5vAWj2KxIG6KKrIaOIM06rJlEExmzqZk3w== X-Google-Smtp-Source: AKy350a0I79S1SUmVIQ6HRmx0m1CdZp1+LfkjYwoFLdgBR60G/VLIqAXQszZ4lDVKDYUrsvufqHj6nn/nq6hxnyegx8= X-Received: by 2002:a17:906:eec7:b0:93e:186f:ea0d with SMTP id wu7-20020a170906eec700b0093e186fea0dmr2057027ejb.15.1679710710778; Fri, 24 Mar 2023 19:18:30 -0700 (PDT) MIME-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> <20230323040037.2389095-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 24 Mar 2023 19:17:54 -0700 Message-ID: Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock To: Tejun Heo Cc: Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8519D180006 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: bpxge9nqkw4d4c8rn3jdt5qq4zqk5ie7 X-HE-Tag: 1679710712-14366 X-HE-Meta: U2FsdGVkX19cVDW+glBhv0qD0OG23QyaOl4m/8hMqwq3YCS6jmLRIg0xGWyCeWYStZEwga4PAX9IAhmE+mU+zMTuWk2FnHaWatZMhbGyPHUjxNuu0R0WQfuxv9FPWgmk3n9zKHTnqEm0oJBI+XVnfhjzh+Io7xflGONuWHq8I4eC2I2puswWnbiZElUZ6k8WNMBCrcijAbqs+f/4pwisktlfUyX19x7oxA2oyIqKwAiV1w89WN8osfmtZ/spX1FRoqPcTEWl7atA2nkyqu4TuvOb5GJ44KZIVBX2SBrO28XsZCieApJANC65TMwkLxXjSGV6aIpqeDMXuxkSu1SC89LARKUwEslsEagFgdqMN89VPU8Z19znU/y7PHnq5H+toxU9OQ+klSXGhms1AmY4QSiVHjdvg35kRERpnqKNDwnpc61qNlHu4h0IHxyKSKIK+REP3KW6l92+f1yPr9AGP1IZUfZyrkmE3MMfe4WEWA6HJlueZO3V4409OZRnNKgKAuTAZb5wboIdN/UcoBnNEYjkKj9xxt1S+3xf4zhDuZoP2Vbug2E7jnSfVY0+p9VaKB3JlgbIiUZPqMUllUppg0vChc2yjkzVcQcGIjxiw71zlOpIg3yQ6/sIMEd7zWhRPoRX9D4yc8dtKnejnwj1cRcclkWOo0ZDNVeOomNlb9gkt1ZmlY3N01ZiHR+AMf3iLdN2Ge/oxTDFa2H9MYa75IdRIUzLIiWVfYpZkPc9trqRsYjNamvw0ID56ue1yygbLgkWb4YPbDHOVwalO9fr71kbJm0cszIBbX7M635li+dg5xxIiV8+UPGtHl2JQ15HLFdEE52YaIkSqEi9WeP8YBbRrCdCBfBSkmcXtOXpzRaPXi2PpzF9LGFTwjAeserDnbWOjSghujZpjG7YdUnrXA0FQvdOo+s+7y/54Pg+BEmfhWnzAOW6v0Ks6dR8nuSXQwHae6E10XPM48aVG5k 2X4EfNOD egsMOPE9eq4gBGnxB5bwsd3Jhtf/DcQLJSlsbVxH4UWvGANqmdgyYHBFA3DlpL2ISbGT4HoVkVU7WsYLI3iGc3nJf8/dZ54ly5oYID/Hs49qMjwG0iVuJvGzBKFtIqWKvcSdanpQCn47ypchKtuZcX6rh1rue0qksuquE9CofCWuxdHWYL7hgT9UROPOxtgj2mepU/BPbXVpQN18Zmx+0Kc50mzCLEGtnAkn+MRtzYIVYCLte2ydg+QLbwQzmW4DmQtQC43cC6kshZ9XXUlu4O7QJF20PUjVB8IhiAH/d0DtCpXyvDTW/fJLRpA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 24, 2023 at 6:54=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Fri, Mar 24, 2023 at 12:22:09AM -0700, Yosry Ahmed wrote: > > I think a problem with this approach is that we risk having to contend > > for the global lock at every CPU boundary in atomic contexts. Right > > now we contend for the global lock once, and once we have it we go > > through all CPUs to flush, only having to contend with updates taking > > the percpu locks at this point. If we unconditionally release & > > reacquire the global lock at every CPU boundary then we may contend > > for it much more frequently with concurrent flushers. > > > > On the memory controller side, concurrent flushers are already held > > back to avoid a thundering herd problem on the global rstat lock, but > > flushers from outside the memory controller can still compete together > > or with a flusher from the memory controller. In this case, we risk > > contending the global lock more and concurrent flushers taking a > > longer period of time, which may end up causing multi-CPU stalls > > anyway, right? Also, if we keep _irq when spinning for the lock, then > > concurrent flushers still need to spin with irq disabled -- another > > problem that this series tries to fix. > > > > This is particularly a problem for flushers in atomic contexts. There > > is a flusher in mem_cgroup_wb_stats() that flushes while holding > > another spinlock, and a flusher in mem_cgroup_usage() that flushes > > with irqs disabled. If flushing takes a longer period of time due to > > repeated lock contention, it affects such atomic context negatively. > > > > I am not sure how all of this matters in practice, it depends heavily > > on the workloads and the configuration like you mentioned. I am just > > pointing out the potential disadvantages of reacquiring the lock at > > every CPU boundary in atomic contexts. > > So, I'm not too convinced by the arguments for a couple reasons: > > * It's not very difficult to create conditions where a contented non-irq > protected spinlock is held unnecessarily long due to heavy IRQ irq load= on > the holding CPU. This can easily extend the amount of time the lock is > held by multiple times if not orders of magnitude. That is likely a > significantly worse problem than the contention on the lock cacheline > which will lead to a lot more gradual degradation. I agree that can be a problem, it depends on the specific workload and configuration. The continuous lock contention at each CPU boundary causes a regression (see my reply to Waiman), but I am not sure if it's worse than the scenario you are describing. > > * If concurrent flushing is an actual problem, we need and can implement = a > better solution. There's quite a bit of maneuvering room here given tha= t > the flushing operations are mostly idempotent in close time proximity a= nd > there's no real atomicity requirement across different segments of > flushing operations. Concurrent flushing can be a problem for some workloads, especially in the MM code we flush in the reclaim and refault paths. This is currently mitigated by only allowing one flusher at a time from the memcg side, but contention can still happen with flushing when a cgroup is being freed or other flushers in other subsystems. I tried allowing concurrent flushing by completely removing the global rstat lock, and only depending on the percpu locks for synchronization. For this to be correct the global stat counters need to be atomic, this introduced a slow down for flushing in general. I also noticed heavier lock contention on the percpu locks, since all flushers try to acquire all locks in the same order. I even tried implementing a simple retry scheme where we try to acquire the percpu lock, and if we fail we queue the current cpu and move to the next one. This ended up causing a little bit of slowness as well. Together with the slowness introduced by atomic operations it seemed like a significant regression in the simple flushing path. Don't get me wrong, I am all for modifying the current approach, I just want to make sure we are making the correct decision for *most* workloads. Keep in mind that this series aims to minimize the number of flushers from atomic contexts as well, and for non-atomic flushers we allow giving up the lock at CPU boundaries anyway. The current approach only keeps the lock held throughout for atomic flushers. Any ideas here are welcome! > > Thanks. > > -- > tejun