From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 932B5C30653 for ; Mon, 24 Jun 2024 21:44:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E1A96B0366; Mon, 24 Jun 2024 17:44:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 193FC6B0367; Mon, 24 Jun 2024 17:44:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 033016B0368; Mon, 24 Jun 2024 17:44:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D83D96B0366 for ; Mon, 24 Jun 2024 17:44:58 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A105D1C2D9F for ; Mon, 24 Jun 2024 21:44:58 +0000 (UTC) X-FDA: 82267112676.14.AD53A7A Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf07.hostedemail.com (Postfix) with ESMTP id D8A5940006 for ; Mon, 24 Jun 2024 21:44:56 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="lsjb/rtB"; spf=pass (imf07.hostedemail.com: domain of rientjes@google.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719265478; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LGE0B98B7tkU0CiCiK48rYPY7w91nRzUOsdNWTj24JY=; b=60m1xmdaj5uc6nW4S2SxbJFt5ol/WUK5bqss7tAnEB4NkHeeg+F84PvIrGO+L7hMxMB8yu ybYn+hqOiC+sdsfGR0VrBg7FdOD4eFgtsi4qovH1UaoadISampsWkPK53iGB4Dm9n+z0lv WImu0nuU1Mow794tETr0+ahbDaGT4Ds= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="lsjb/rtB"; spf=pass (imf07.hostedemail.com: domain of rientjes@google.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719265478; a=rsa-sha256; cv=none; b=bQTRRGtE726vRAXj20eGe3KH8MxfwxcHJ0keS9nM7v9zhV7dfKaX0z7x6kvxjwcL5ZDL1t lENhMtJtsHSWW2UGkkQ/JOfjVSTZYkPPIN7Fm86JZt2FwxElpp23Og+R++/H/7Z1i7i3Pt fsEUFq6mQmxqOK9ClcfSuXLtqXcLVoY= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1f991bb9fb8so21135ad.0 for ; Mon, 24 Jun 2024 14:44:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719265496; x=1719870296; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=LGE0B98B7tkU0CiCiK48rYPY7w91nRzUOsdNWTj24JY=; b=lsjb/rtBVJpVFTGmvSwTIMxhAdFKyNsq/i2ujBG51gg8n1ZmEh1N/4ESSivfHqpzug TfcEW1vt6Ov59yR1LU0iQvIYnzic3dY2M9AMDGQNkF4fSEHKTfSD3HcwtjklP5aWitB6 ijjwzfV6482hV9yMbupNkrvlBWKmv2Y7NFIREOL1tfSecO56dCZUWtZFXeqHj+SGETwC rMPQM1j2dIMCWB0i6HV1ZVcxvpOgXPeyDLUS/HCvkufpk4cJqhXObljjzEuI3YnNysH9 oE1f5rSuQzNh2LmuDv+IhCI6J8shNG+jCh3e07TfUpSfOCWubu6jZ/QtMk8eGxzThw0+ N5kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719265496; x=1719870296; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LGE0B98B7tkU0CiCiK48rYPY7w91nRzUOsdNWTj24JY=; b=g6CPXBHecSZMVx2Q58HYUSLRU67tMHz+WdyoXmM9nybcnLd2VrAm4+3JmSHKuWFQHG SOoe9wXjUrtfQmtc4NJoC4p5Yohf6Ah1dJD2rxnDvU+XJmbeMs+lENc9awSKVeYyjpYj qqUDMOttWGZHnu8zbGv4kuc+w9znaPeS9RW8APGodl2/nINz6lIxX0E6cNmMJnOtEl7F JhGgVLDWfVrIlSlSvhLf3CXsBM1lWE14VNToXlcBlJEgCXgA+YDWz+1DnRHH8NbBLqyo PvJHSslNTQKytaEbBF8vza1DixCKOzbfU46EjJVGMKsDG47CViM1sJwLyRmMgP6Mt2CT Xg+Q== X-Forwarded-Encrypted: i=1; AJvYcCXs2WfGMUGyukWTIlhUh2BNb9tOEEXZzPxC8zBWIzLSYKHqku2rlpDRUV/+xUbtPVtwVvbCzWFWxw9/h4/PfHODSzE= X-Gm-Message-State: AOJu0YzvqDXde6aEUiDcc/zAPUq1kAB1F7PfqnUDcuh1LQpfejUV2rHz JHJvBl01uuqtcrrODw1GMQUmPJmlewlBoV+4uzOSoQq2B6Hv0Sa4+lcsf7WaSg== X-Google-Smtp-Source: AGHT+IElukmBJHvBjTqFVh/EM/n3fRR/gMswveuX6XnCHh3+9TbVEO0YaxK5rQNrHmgIhu91mgdvIQ== X-Received: by 2002:a17:902:7593:b0:1f7:3bb8:c32f with SMTP id d9443c01a7336-1fa6ca74647mr336345ad.13.1719265495184; Mon, 24 Jun 2024 14:44:55 -0700 (PDT) Received: from [2620:0:1008:15:5ad8:4b69:7e4b:92e2] ([2620:0:1008:15:5ad8:4b69:7e4b:92e2]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f9ebbc8dbcsm66980605ad.307.2024.06.24.14.44.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jun 2024 14:44:54 -0700 (PDT) Date: Mon, 24 Jun 2024 14:44:53 -0700 (PDT) From: David Rientjes To: Tejun Heo cc: Namhyung Kim , Andrew Morton , "Liam R. Howlett" , Suren Baghdasaryan , Matthew Wilcox , Christoph Lameter , "Paul E. McKenney" , Johannes Weiner , Davidlohr Bueso , linux-mm@kvack.org Subject: Re: MM global locks as core counts quadruple In-Reply-To: Message-ID: References: <07e7d078-0c9d-6a1f-1ab5-295f86974b72@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D8A5940006 X-Stat-Signature: iosyb7gfuc8ydpptoq9bx7dxscktse4b X-Rspam-User: X-HE-Tag: 1719265496-667824 X-HE-Meta: U2FsdGVkX1/UzqVbrm4CgrB1qqV6CdXq8HT1WLHV6Qo/MeliKy6X7pDLCdU1DrcwJKROYw+47w3W8OJGLJfZhH6haDGWj8UY8GNP8KpRuBde4Q1Yddnc9VqEJGJIjWPuFP5TH3wr4bAtAFNu3ZpSi08BxtLnDdx+U56xuDrg+shlwBmhlcOYTI0NgGpkfz5fqWMfg9Wo6HNO/0yRtyaCyLy7Zjp4IzBrnMlIdq0qp1JuWw8mOUF99DfifEC+EhDVTX/059cRvMIrybphyJJiZMcF9zcIiG+E1JFbMRH41gviXqWd6L22yzJCbPIfYeHw+vOUq9bhQMxmIFlO2CQOrlWKz1N/GrIvK4LIq0aUqT01U5jNGP88+Errn5vwOjieD8psohmmK276vbMirnrCaqkx50e5nS5qP7fGksvb5mFYhkVQQHyC1HzR07pkOSypZzDQ0E+Q6Uyl9TL3HmXAw3ThyPCfNYwO4opxuIqnnvPcHDe49U1IDERCpvkStgKLnI07jqu7hP8NPoKXAIbCtXXzgEyDssmvW0p/YSMNe4+aOACQVy6osaM6qhon7O2wch3FpOS7YCY5qxSDuMqxU7MD3DmyGM5mnkFc+r5Sy8rMOyzGmoEn5bt0loT+WFNfVqIXMJB9YPN8ezRkH6o7R8LgBdnU0YIwJbTBzaSdu6JlmERv5qBH66UDSJQh9rp2MQmBDNumoZ3h4E4HqBTh/QW4ogS8RiWQvPjGDkqViOECSL5xC9HnvbTfUOPzEQTEhpb3LVChG2hgM3JJPSRqLmZCkNpPUl6gmOlHNXgZQ8IUPXmvvSfh+vKGVFyfjD5AG9uL8Ub2a5vAOl/MtSae6Cy+xxA1hb0LHe2W3IqukaSMVopJThl2Rg2xkT2SF/VKzj2sanPIKjIjlEiqBZuumeBkm7zpEmfnnpYR+fPwoCzIugjJ7yq4TsZM6FcLYKNhwMBXfuD0M02FY/5qUvQ w213i+qc gLKLm2gvC6a40Wv1qGmoAXDfhYFskGsSdXopnTKJAUjSwriEj7U3eTTPjxJ6tYKPYMYYEXq87GRDSr3tGDLV8CjbblewCPsZI6E7B5uUsb5HqOiIIhWaFfa+VLDQWFCrrt3pdGbBzaA7EtI4Qyp7BSixfulhy7PX5Qa2kluyrJbcGQtw3PSaNgqyvZlO9GqJMHjDT03hWUIEm9ZTwuaFd5DWrz3AEqB8U1C+xXtTAu3XYPvO5/kFVpURztaubeqed684h95wdkJM4IOPKDtFsT3OZYeolxGmlNIklT2arASqPFaqMn6Z94shNbAe+w+3j00K3SfQQmL7NWj71hE20FwVKTw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, 23 Jun 2024, Tejun Heo wrote: > Hello, > > On Fri, Jun 21, 2024 at 02:37:43PM -0700, Namhyung Kim wrote: > > > > - cgroup_threadgroup_rwsem > > > > > > This one shouldn't matter at all in setups where new cgroups are populated > > > with CLONE_INTO_CGROUP and not migrated further. The lock isn't grabbed in > > > such usage pattern, which should be the vast majority already, I think. Are > > > you guys migrating tasks a lot or not using CLONE_INTO_CGROUP? > > > > I'm afraid there are still some use cases in Google that migrate processes > > and/or threads between cgroups. :( > > I see. I wonder whether we can turn this into a cgroup lock. It's not > straightforward tho. It's protecting migration against forking and exiting > and the only way to turn it into per-cgroup lock would be tying it to the > source cgroup as that's the only thing identifiable from the fork and exit > paths. The problem is that a single atomic migration operation can pull > tasks from multiple cgroups into one destination cgroup, even on cgroup2 due > to the threaded cgroups. This would be pretty rare on cgroup2 but still need > to be handled which means grabbing multiple locks from the migration path. > Not the end of the world but a bit nasty. > > But, as long as it's well encapsulated and out of line, I don't see problems > with such approach. > > As for cgroup_mutex, it's more complicated as the usage is more spread, but > yeah, the only solution there too would be going for finer grained locking > whether that's hierarchical or hashed. > Thanks all for the great discussion in the thread so far! Beyond the discussion of cgroup mutexes above, we also discussed increasing the number of zones within a NUMA node. I'm thinking that this would actually be an implementation detail, i.e. we wouldn't need to change any user visible interfaces like /proc/zoneinfo. IOW, we could have 64 16GB ZONE_NORMALs spanning 1TB of memory, and we could sum up the memory resident across all of those when describing the memory to userspace. Anybody else working on any of the following or have thoughts/ideas for how they could be improved as core counts increase? - list_lrus_mutex - pcpu_drain_mutex - shrinker_mutex (formerly shrinker_rwsem) - vmap_purge_lock Also, any favorite benchmarks that people use with high core counts to measure the improvement when generic MM locks become more sharded? I can imagine running will-it-scale on platforms with >= 256 cores per socket but if there are specific stress tests that can help quantify the impact, that would be great to know about.