From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD354C6FD1C for ; Thu, 23 Mar 2023 05:15:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2118E6B0072; Thu, 23 Mar 2023 01:15:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C1906B0074; Thu, 23 Mar 2023 01:15:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 063136B0075; Thu, 23 Mar 2023 01:15:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EA93C6B0072 for ; Thu, 23 Mar 2023 01:15:45 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BC38D80450 for ; Thu, 23 Mar 2023 05:15:45 +0000 (UTC) X-FDA: 80599000650.25.A10C012 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf20.hostedemail.com (Postfix) with ESMTP id E2B141C0004 for ; Thu, 23 Mar 2023 05:15:43 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KjytUVYD; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679548544; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YiBnEdYy6Aid4dyiIRb9ptP+TZS4JSWAxv3bvFBhfG4=; b=XHECdK7WeQ7708W34sncT9Z+vi8fMmB0oA/WXFSguMWF+0WRCIcRv1ZUmSDVK/LVYivWBF dBeVeZ0YG9hKXpH6xMQ8nd8ODNOK0ARXzOAAQ5dUhKWfZdMXBBW8eKQEYiiHem2I6y4Vh3 cMP67hlc40dgV/VkLoaxR5gc6DviZLk= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KjytUVYD; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679548544; a=rsa-sha256; cv=none; b=lNV4MBlIlVDU5U7RcOeCbfLexAAgrmRDBh9G/8Gc06GuefAI/birtg6ZHRSI63j7fzJ4yg OH2ZaDnBWa2KC2RdJLt67DQuGrjYcgec91iKB/gXBh7BhLazLXjs6T2bnMHHp/SXmwzf0X 6DDmqBf44n2xtiLU/M9E/P8N9FglQlc= Received: by mail-ed1-f50.google.com with SMTP id x3so81660414edb.10 for ; Wed, 22 Mar 2023 22:15:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679548542; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YiBnEdYy6Aid4dyiIRb9ptP+TZS4JSWAxv3bvFBhfG4=; b=KjytUVYDmXDoW1kHgbDCbvCLYfxK+63b9aN2H7qBDtgtsxUI5OrZu53Lwl+GZnw0aJ 01ArWTb9RVmWC495n9DxCFBjmF0qY/lqyeeIAVSiAn/aUnYfvmYR3QRULUpzpOBma5HJ 1+ytefxZiq0W1QKk5BPSDY7oYrkC4lDmQhAqbKsHhetsX6cK5fJcoZErFKGhWuRXhxyI p2BcFq4WozSFWYwoA28WxVcjRBVpqMyGmKeVHUGeck+Via4SQbj3gRnZsH2+wmdGGYqJ m3bky+4aHeeGT/A27qicrKyfuYVue8fxY0dpOhHe+LAcLJLzrkBD+xp1/xz5qYKXjWiw GJsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679548542; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YiBnEdYy6Aid4dyiIRb9ptP+TZS4JSWAxv3bvFBhfG4=; b=ty09w7K87WcPTIizgj47geklbe+UrNs5w7elVJ8ZdZBaAzuS/VPGu8Su6VLLXpfI5e evoN7D7C3Een3XPCqkobRDwR1fTBFEQjZD8RPycMw8LTKN1OkiLDxS/cxLzL/ZOcsXcr 8Wx4X1Veq5uGSkSSRYoP0NVZlFS2gmJ5nPMPEySjTkfgMG4Tdu9hBdqD6kBqWHLoNcQA xwxkUGPCrI6hnKfenEiyju2aRmItimDiGRvzrD8ed/6eSZKEdv0DFaPtYq/jtMlPcdSS 6iurZomVuuLqUHTQYGY+yFqSHwYP9uur9OgAoRcaAE5oylUy5QQqJobb5wQ26yWlLgGh UHHQ== X-Gm-Message-State: AO0yUKVLJlUTTcSWhYi/5nNslxZWfplXLxMUf2RJ4+iCZJUzDyEma7oJ RjWmG6oqtVSrfQuasqiaRki5jieVWjOfhk/8g52img== X-Google-Smtp-Source: AK7set9TkrVPoLwQATRu8u0oZvp5iDQj4vlxVo1OOqJOjETFOQd+b3oLzlntqCqCIUZCsCp2Y1/BhRjHZ4ki+1aUb4w= X-Received: by 2002:a17:906:aac9:b0:927:912:6baf with SMTP id kt9-20020a170906aac900b0092709126bafmr4297724ejb.15.1679548542233; Wed, 22 Mar 2023 22:15:42 -0700 (PDT) MIME-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> <20230323040037.2389095-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 22 Mar 2023 22:15:05 -0700 Message-ID: Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock To: Shakeel Butt Cc: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: awmug4u87k714h588dq7b6eb9jktuxhp X-Rspam-User: X-Rspamd-Queue-Id: E2B141C0004 X-Rspamd-Server: rspam06 X-HE-Tag: 1679548543-879551 X-HE-Meta: U2FsdGVkX18Z6Dx5xSt6iMOYdCjObgLp09t6G2kFka2AUKwP5Bw+paI2h7d3ig8xPEr5qk/AJDba3/FptV5hCYNNroXwqdDTYRBA9pflDiux01jq6QdN1eLGuTQXS2rfIJgmKntRdfJMNEts6RMPsx6tXRHJq1xslOj+21UU17SNmwSmj4Qq+ySh7piAAUuu8D9l0OjLctB/FHQOm6L+LwUwl5Yb5BOy7efgLlNPOxfyIrT1YPYKCbYWewne9lsZBgbxUcrl64ImuxcAo1nx/oQQ7D1FrPI1q5cgGh/FEd5oQc1loq9fC94K8+Dk6pHSQ/kj/s7gNQfUYZ51v0kk9rwcqFf9P6ZjK6DfVouiAdS/sLXJshSQvFACkqtgAFsaEKKTKXc9hwwCPbK4nZ23C5Px3IS2xM8Co0+v5+KrpCJGcOir4tOyvHjcv89fXpH/a2NdlJd8KJsUrh6r7jxBy02DiM6FJhTp+GVJRlnvBl9SZkWlMBn4Amg2Ro9gPzQvb6su8vuFEaHQtYytZ/6K1omNY4f3Ys8OmyB9fugar6GeeqbUheY5MF2ZDglmb0oIHJS+AZTLkBXLx2lKNLDY7uic3CuJteR/892vcgCnmt5G55Vgdwzz/Ot036x82AkO1qfP2yBmlq0YXN+RsJQYirYTbvbf+9h6k8EaNLLZa8JONUq3pXMooc4vYAtIW4MYQldwbHrl4UvJ//Y1cJL6jdnokomj8YdvOh4pxAnds985+tTXnqQa2F4h+M4cUSB8MdZ/HsNAY8FQkZBsC2FC3SJXiPtSq6Ol2C5QnpHsw2TNhr1XXjBwJE4h8yhe7DzDcHoD3NAuS2izxGAUTmP5UJt4OuWQhfM8yzG7ASBlG3T/lp/Xhm5nvlzTq1FNfStoSp5xYHA8Slr68ejpK+M7+xYRRmfBDM3Z75haFPTM/O0rpE/7mQazduSvtErKwVL9mokonECC6Jw6plAJ/96 Hwh55b5l E7QS2UsUAv3mZOW+fpP3nZRCoc3jBIMiVudronuK2ygo77dm4DAndluzkP9B/nDUOZDbgp+SGvPo36bjvgPP7X7hzL9huo2yxpbdXVVNXeT5HvmQcleSs28ILoPrwnn6CQGjKY8t4s7ZcRPaA7kOSg/SGaX+0ygyrK+SRPFpZIbbng7oMxL+59lENy5J529/DXgYHwlEA8SNuVNLmnmoA9Jdh1SEt8/FuzQJqN2Y1v5Oz8DMzb1CifqD2IgcoJQzhtDd1ABk11miSeaI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 22, 2023 at 9:29=E2=80=AFPM Shakeel Butt = wrote: > > On Wed, Mar 22, 2023 at 9:00=E2=80=AFPM Yosry Ahmed wrote: > > > > Currently, when sleeping is not allowed during rstat flushing, we hold > > the global rstat lock with interrupts disabled throughout the entire > > flush operation. Flushing in an O(# cgroups * # cpus) operation, and > > having interrupts disabled throughout is dangerous. > > > > For some contexts, we may not want to sleep, but can be interrupted > > (e.g. while holding a spinlock or RCU read lock). As such, do not > > disable interrupts throughout rstat flushing, only when holding the > > percpu lock. This breaks down the O(# cgroups * # cpus) duration with > > interrupts disabled to a series of O(# cgroups) durations. > > > > Furthermore, if a cpu spinning waiting for the global rstat lock, it > > doesn't need to spin with interrupts disabled anymore. > > > > If the caller of rstat flushing needs interrupts to be disabled, it's u= p > > to them to decide that, and it should be fine to hold the global rstat > > lock with interrupts disabled. There is currently a single context that > > may invoke rstat flushing with interrupts disabled, the > > mem_cgroup_flush_stats() call in mem_cgroup_usage(), if called from > > mem_cgroup_threshold(). > > > > To make it safe to hold the global rstat lock with interrupts enabled, > > make sure we only flush from in_task() contexts. The side effect of tha= t > > we read stale stats in interrupt context, but this should be okay, as > > flushing in interrupt context is dangerous anyway as it is an expensive > > operation, so reading stale stats is safer. > > > > Signed-off-by: Yosry Ahmed > > Couple of questions: > > 1. What exactly is cgroup_rstat_lock protecting? Can we just remove it > altogether? I believe it protects the global state variables that we flush into. For example, for memcg, it protects mem_cgroup->vmstats. I tried removing the lock and allowing concurrent flushing on different cpus, by changing mem_cgroup->vmstats to use atomics instead, but that turned out to be a little expensive. Also, cgroup_rstat_lock is already contended by different flushers (mitigated by stats_flush_lock on the memcg side). If we remove it, concurrent flushers contend on every single percpu lock instead, which also seems to be expensive. > 2. Are we really calling rstat flush in irq context? I think it is possible through the charge/uncharge path: memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usage(). I added the protection against flushing in an interrupt context for future callers as well, as it may cause a deadlock if we don't disable interrupts when acquiring cgroup_rstat_lock. > 3. The mem_cgroup_flush_stats() call in mem_cgroup_usage() is only > done for root memcg. Why is mem_cgroup_threshold() interested in root > memcg usage? Why not ignore root memcg in mem_cgroup_threshold() ? I am not sure, but the code looks like event notifications may be set up on root memcg, which is why we need to check thresholds. Even if mem_cgroup_threshold() does not flush memcg stats, the purpose of this patch is to make sure the rstat flushing code itself is not disabling interrupts; which it currently does for any unsleepable context, even if it is interruptible.