From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 843A3C4345F for ; Thu, 18 Apr 2024 20:39:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0859A6B00C8; Thu, 18 Apr 2024 16:39:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0355C6B00C9; Thu, 18 Apr 2024 16:39:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8D4D6B00CA; Thu, 18 Apr 2024 16:39:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CC21A6B00C8 for ; Thu, 18 Apr 2024 16:39:02 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7E427140C47 for ; Thu, 18 Apr 2024 20:39:02 +0000 (UTC) X-FDA: 82023816924.05.6D9A1FE Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf22.hostedemail.com (Postfix) with ESMTP id 90D24C0007 for ; Thu, 18 Apr 2024 20:39:00 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=R5gfRe9T; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713472740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kB9PcMjE0hVVpql+JbzqFJwr2b8GY15J6ZW7aMEdH3Q=; b=rc+3fH9NPSHFiwEEU8CTj6wut1xSRnDT3zHWrx1AILcr8/bmV0+vP7P87S0jN5HZEjt7vo nzycYOz7dcIw8lMU6pk/+svMr3UAGQbQW0VMaoK0XB4+hgaF7CjjwehW8SshWo3i67RNJh B1PItIah3akuZy1HeIaMT2TmRSLkTAg= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=R5gfRe9T; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713472740; a=rsa-sha256; cv=none; b=L/gSU2yHM3MTGDLh0WQ9DT+HswalF3kQ4B7FsumhRoEEGBzu9F71i/2I79cDd9Rqr5wlah CxzSu/npf8GFS0E0DsHhuxbOhaTe2bECGW7y2kealAVgsWQTUq5J78ss9vUVRVPdf4JTDv 6nlIzb+LuB20zgqhNkIq89MUmlm1pB4= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-570423c4f80so1312945a12.0 for ; Thu, 18 Apr 2024 13:39:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713472739; x=1714077539; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kB9PcMjE0hVVpql+JbzqFJwr2b8GY15J6ZW7aMEdH3Q=; b=R5gfRe9TtMNfX9UFTtCp9HGD9wlw+UShBmHoT4qk0h0RdqWeR0fY4ztQFXhf6IqiOX RJSoDhXfEGSQ5yVJTu7JrmgbnCHF9ST5BVFjJCXn88A/ew6BtokKe+CGNf2gB2zPABze HWS4JQPVgznSJ3bWVyX2wPMFhZ/lTfQsUGEUTlMTKHj0ucDv+ePxzpGJCrH3CMdchGXl gJSs2YOTQMiLo1ZAMBFYX6h2eACCizxdO0N7CleN2rau9GVqpXJTSBymzxn3u/Men2L8 mf39v06xovuOM3UHAu5acIc2beMyxalJjiGD+gQK2i9++0dO7V1bwZOV1QsQ62uQiXD0 m5XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713472739; x=1714077539; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kB9PcMjE0hVVpql+JbzqFJwr2b8GY15J6ZW7aMEdH3Q=; b=lj3pGpn0wgdubXRMs+Gz8DC08HWssKvKRG8ba1vGcFdHIqD9Y7aGPapgZLjWXLuztP 05EHUqhClY1+cYU7bmrLavsr+CrUSmLfPE1WZCPJWd2lhuZ6dfBiLLvZt8dtIhUxfAH2 TNR4/ygPdwPSZLWEDOAaoudyOWF1Wlf3kiS5QHHafJogEmiyd453AthDkyohxBmbznl8 yMgGL8jn6J/Yp98LPoCnrz9rtsVGw9ESde7qJwtIlpiqkoQ1tW51xOjsC+f5YFHOzrXR 3eJXln63Jwa/JROsH1GC9zdbcwChA1a+N4yPfxUVt49faNhmaluH7Z7UwlA8r+hE2gCy 2NuA== X-Forwarded-Encrypted: i=1; AJvYcCWk0eLKlv9WqHfjGxt24A3ekjiIUgGXsxsTKp/VCC0NjIC4ZJCymdiVJrT2nqQHbwGiiIgvtVxuLCqtLIeQ6U5k3I4= X-Gm-Message-State: AOJu0YxdLjbSJsqIdxf5VzpxYWeN8mqjEQ5LJBFYzWZFgQ4IrJUx5oVd RivPiZdI5DmFU0xtBTMJ8eVDmwe0n8JhjJJC/M+7i2SJWC5Hz6MoBrdsRmBh9gLceKW0FxzUrHC lbKH/eDQ98vfGQDGQNeQgzTEPQYfe2oHfqfyi X-Google-Smtp-Source: AGHT+IE5WBFPkQE1UYAhvIFf0d5mnAUdlsEGWrhC2zVPDjA0RfuTTaSKcmUw73ewun1AK1uASHQuPihQ9bZCIaT/BJw= X-Received: by 2002:a17:906:a24c:b0:a55:59e6:13f5 with SMTP id bi12-20020a170906a24c00b00a5559e613f5mr170031ejb.26.1713472738744; Thu, 18 Apr 2024 13:38:58 -0700 (PDT) MIME-Version: 1.0 References: <171328983017.3930751.9484082608778623495.stgit@firesoul> <171328989335.3930751.3091577850420501533.stgit@firesoul> <651a52ac-b545-4b25-b82f-ad3a2a57bf69@kernel.org> In-Reply-To: <651a52ac-b545-4b25-b82f-ad3a2a57bf69@kernel.org> From: Yosry Ahmed Date: Thu, 18 Apr 2024 13:38:20 -0700 Message-ID: Subject: Re: [PATCH v1 2/3] cgroup/rstat: convert cgroup_rstat_lock back to mutex To: Jesper Dangaard Brouer Cc: tj@kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, cgroups@vger.kernel.org, longman@redhat.com, netdev@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, shakeel.butt@linux.dev, kernel-team@cloudflare.com, Arnaldo Carvalho de Melo , Sebastian Andrzej Siewior , mhocko@kernel.org, Wei Xu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 90D24C0007 X-Stat-Signature: oh3hcd653kjw9m8d88f6ae3pnkfch51j X-HE-Tag: 1713472740-422238 X-HE-Meta: U2FsdGVkX1/lWagHAM+ftpTIPT+vUKFR8RrXzpEZqFLNigPXjjAzXLgsyMLr7T30DEVjWwXPir2J9a/UVGKMfwcYwEAj2blb6IeMeO3rFh1yjoNb2ohzC63G1NwwaCRsU5Yrt3kNG82Tr3YA7W+CB/wCXjJfJ5JZ3aEfpoKtamManIneJvV4Qv28WobcqS7MIzW/hwlTf44jZeD/1uYFebbfDggnpICZsn95JxNQGn93q2o8VmCuHqahpiyJjHuI7VdbOYnd5X7k4FhOs1KwpkmftQldXonSLx1Ycn/oHPBX3Np575BWRY9M5MpqQWr3Aj0PTy2lBWy5/9sc3V0+09vAZJ5oFmgkSZfLXc8oiUsOHLxqWX42+vvkuBp0b76X2gD2V/AuomFiqXDEd+mN9pdoH9GQH0ECvstpwRimJ3NXya0MhcjhV9v7AWkupzzwO0NhQJtCg1t+g6EeQbhdrQMVNYu1I1LXgvWMx9YQ3OBo2u6tdPDJKbf9VC7bLvff/N6b+DpxTQ8fOsv/sKNPWfdOTSwBZErNGyHBU8sRN+7Ek7zIH8DGzzVUhCOazZLx2QDsPWW0fKSXpYvVIbIZ7fDIJ88BtnHlk49Q44rZ/X6VqEhRFgtOVV+KtauqOWnykYUBUJdEB38dYAlxSwJkd7uTMBdXoMcT91E19aNzDSyEiBH7wHZFi5cNl8G9caE48hsfRMDuq/6E7NZ8VGq6s29BSVxfsEw/4GuUosHIS2Y24VjfGtNCqdGEFpKQOCmAuT4fxsDPlSsI9V0Td4nRktM028/CyLQf0C29t3mVtU8zd/UqYTPNECjMsU4lP+t8F7r6bFWEWY03mi4RvouHWQJ8YuRGU4yg5GBI5Q8ZfQrFKVpPKoJlJCC2sYErNmvmT7EpcPoBH5NQgy5Eq2Pd+hoqZ+RONd2uXZXLqbXTOeDX12ErhpeuRIYqHnW4DNgkQbk7A2DmGEvV1PbT6dB Upip2sjk rQTNdMDyP5Cgkz+1fJLr3l17a/EIWZ2Za4HzAz4BwrwbuGt9jkoAN8aIIn+g9aAvCSmvQ4v9PkL9Wecio9U2tvSkgjX9rp81k6RCIqOR/Ls6j0sUR8LZaI2M9pZ9ZFAAIALncEPM6XNbSJ72ZHpAqiK56DseKRo64LFpqUFEzOv5RJmRatczRDz8JgV6m3mjPx3pNoShCCL2E+hHIt6kPFzjh9/iVBVS/eiVUZNuGY9wFOWkn80lUuGWcJHBLeG3hd7joa5D03cL/Nr5snBTLnr9gkpl3newxOR2Px/EcJPdcuxcOBkhaEixjPJJ2Jj0CfQVuESkLtVO9IcCxOhG+f4zSmw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 18, 2024 at 2:02=E2=80=AFAM Jesper Dangaard Brouer wrote: > > > > On 18/04/2024 04.19, Yosry Ahmed wrote: > > On Tue, Apr 16, 2024 at 10:51=E2=80=AFAM Jesper Dangaard Brouer wrote: > >> > >> Since kernel v4.18, cgroup_rstat_lock has been an IRQ-disabling spinlo= ck, > >> as introduced by commit 0fa294fb1985 ("cgroup: Replace cgroup_rstat_mu= tex > >> with a spinlock"). > >> > >> Despite efforts in cgroup_rstat_flush_locked() to yield the lock when > >> necessary during the collection of per-CPU stats, this approach has le= d > >> to several scaling issues observed in production environments. Holding > >> this IRQ lock has caused starvation of other critical kernel functions= , > >> such as softirq (e.g., timers and netstack). Although kernel v6.8 > >> introduced optimizations in this area, we continue to observe instance= s > >> where the spin_lock is held for 64-128 ms in production. > >> > >> This patch converts cgroup_rstat_lock back to being a mutex lock. This > >> change is made possible thanks to the significant effort by Yosry Ahme= d > >> to eliminate all atomic context use-cases through multiple commits, > >> ending in 0a2dc6ac3329 ("cgroup: removecgroup_rstat_flush_atomic()"), > >> included in kernel v6.5. > >> > >> After this patch lock contention will be less obvious, as converting t= his > >> to a mutex avoids multiple CPUs spinning while waiting for the lock, b= ut > >> it doesn't remove the lock contention. It is recommended to use the > >> tracepoints to diagnose this. > > > > I will keep the high-level conversation about using the mutex here in > > the cover letter thread, but I am wondering why we are keeping the > > lock dropping logic here with the mutex? > > > > I agree that yielding the mutex in the loop makes less sense. > Especially since the raw_spin_unlock_irqrestore(cpu_lock, flags) call > will be a preemption point for my softirq. But I kept it because, we > are running a CONFIG_PREEMPT_VOLUNTARY kernel, so I still worried that > there was no sched point for other userspace processes while holding the > mutex, but I don't fully know the sched implication when holding a mutex. I guess dropping the lock before rescheduling could be more preferable in this case since we do not need to keep holding the lock for correctness. > > > If this is to reduce lock contention, why does it depend on > > need_resched()? spin_needbreak() is a good indicator for lock > > contention, but need_resched() isn't, right? > > > > As I said, I'm unsure of the semantics of holding a mutex. > > > > Also, how was this tested? > > > > I tested this in a testlab, prior to posting upstream, with parallel > reader of the stat files. I believe high concurrency is a key point here. CC'ing Wei who reported regressions on previous attempts of mine before to address the lock contention from userspace. > As I said in other mail, I plan to experiment > with these patches(2+3) in production, as micro-benchmarking will not > reveal the corner cases we care about. Right, but micro-benchmarking should give us a signal about regressions. It was very useful for me when working with this code before to use synthetic benchmarks with high concurrency of userspace reads and/or kernel flushers.