From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56083C74A5B for ; Wed, 29 Mar 2023 20:00:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C3676B0072; Wed, 29 Mar 2023 16:00:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74D7A6B0074; Wed, 29 Mar 2023 16:00:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A14F6B0075; Wed, 29 Mar 2023 16:00:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 48DC06B0072 for ; Wed, 29 Mar 2023 16:00:56 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1DA9B160E8C for ; Wed, 29 Mar 2023 20:00:56 +0000 (UTC) X-FDA: 80623004112.30.2DCCF56 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf20.hostedemail.com (Postfix) with ESMTP id F088A1C0023 for ; Wed, 29 Mar 2023 20:00:53 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=VhGF9AnS; spf=pass (imf20.hostedemail.com: domain of htejun@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680120054; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lwysfnQkAd80wlNyJ6XzThTPiLtn2B/tyA3fjQAFkXU=; b=jWAxUI/wWo3FrRzYqxLQkIkXCbCBr2DjeSHnAlPUomXEJ6rgP5g6jDoXzWAWJCuxkQ5szU JQuAyImUhNNqs2Xf/v/frYjTPllU1sslVM+w5LoT8a2uS2Hc5fcVX7lxjsW+x2RiWwQCwu W1+V+YV7UBwgG6YWv62dSsX/ZZmztOM= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=VhGF9AnS; spf=pass (imf20.hostedemail.com: domain of htejun@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680120054; a=rsa-sha256; cv=none; b=rmcGXNfFGQegXvX3hTD0cb78IlgoJvVDhWPZTA8euPwdyD9Iit0cQ9S+RGyofLdiy4vpzS GfRmSwCwt83hiizfhXY4tYc2yudmmhhjAxs5R/f7Zmt+3u4XxkhDKdgQ/AicYjULJGOaAM ivvh6hMsLGA42bObOl/DlmRYLgUJn2A= Received: by mail-pj1-f48.google.com with SMTP id f6-20020a17090ac28600b0023b9bf9eb63so17350506pjt.5 for ; Wed, 29 Mar 2023 13:00:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680120052; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=lwysfnQkAd80wlNyJ6XzThTPiLtn2B/tyA3fjQAFkXU=; b=VhGF9AnSesD95PbIw4HKQtmtguZMl7HEDnjXJr5VTDVRIevs+1HBUeQ5jfnJOor6Uk VGnu7EvlPqhvNxv8RZkHktwn+oBeH+Pnnf1g3tb+S0UFbD5H/EVLQ7WSVvtpCJlJD4qF 7WQdVImaam9FqrqmDuRw2Sq3cKQbxxS8w6lkd/SzoMxPwFfREAwPn8bPp7a3XoFqAWFt Hd9WBZMtBQzMHWAZlCrnKrd5hwd48+xbCLXjnP7LePyuuo/yjboiAxWuCKIOggXKhNC+ AAs/7O15Dtlcb5z865sBl3TbnMRLyLavIcjyeMqUHVcD8RPDq2PxXPvBzBprp/8JgpoG Fulw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680120052; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lwysfnQkAd80wlNyJ6XzThTPiLtn2B/tyA3fjQAFkXU=; b=BPamXpoO2HAu3W3xXOBmBAKqnZcedL64v269XZVpmxrADLKuAt1ZyqfL4Byc8QzKGj njCACSs7XlHvMqWx8NbgLumGq59eLyf4w5zkqeE0s2gPl0NH03sU6MR1mLZ7npAprGAh ARFOGtPGdOYzZ8gLLVmBJKnEQpZbLAqLR6zqf/4YMgkc4igtSQ1ChqU8iH6tJELu6fXG CbL9tkInbmt4qflq1Z7elTrI+WmQzgGdY5gOXqMft/Ajdh5u2JyvDNDXJUdMrHDDUdTY C7sV+pr4Ep+Y/qrmGHVcFlELNBBxn/bqVlRd9OOaKp0uHyouCCEKVNh0b13cty2c1HgE e4Lw== X-Gm-Message-State: AO0yUKXXUi5wXFlXmCc8rP4IhRtuHucVP9D6fpkliMbVl+GsKcpFuR0m sDlMUJ0HjYP30KaWzqDHTWw= X-Google-Smtp-Source: AK7set/hUGyEZjDACyGqVfdIw/bpUvUdmMhhbxm3aUkYiMQ4iG1YGdBAY5KP8EqrQQnHaA6jV87MwA== X-Received: by 2002:a05:6a20:1321:b0:d4:fd7e:c8b0 with SMTP id g33-20020a056a20132100b000d4fd7ec8b0mr16282702pzh.7.1680120052315; Wed, 29 Mar 2023 13:00:52 -0700 (PDT) Received: from localhost (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com. [2603:800c:1a02:1bae:a7fa:157f:969a:4cde]) by smtp.gmail.com with ESMTPSA id 23-20020aa79117000000b005a8173829d5sm21406589pfh.66.2023.03.29.13.00.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 13:00:51 -0700 (PDT) Date: Wed, 29 Mar 2023 10:00:50 -1000 From: Tejun Heo To: Hugh Dickins Cc: Yosry Ahmed , Shakeel Butt , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F088A1C0023 X-Rspam-User: X-Stat-Signature: xej6fc511sqznrcgs9c7kx1jrimsi4xz X-HE-Tag: 1680120053-398735 X-HE-Meta: U2FsdGVkX1+w4oeego+HOEf78+9vr1gxAwIblTRSAT8iBFpnQeSXowWF8K9NxN0zlrFdXg5gKsTH0EvtoW0/eS1nlwAZME9hSlG9UGUkon4jLUOs+x1KFH/QRmf6B9nLOXkfwww756ZIrDakHDvlIFu8b+HowX5KCeNUSFOmmXYwgGwM0kJXQbByDKt5udL+uykFpWvWA2TZMwUxQtI8GUyin5BppzHP6vHFUuLupcDj0/GAkl92FDrEoUGPGFeeeZQ6NMct/S/2AfjE8Vgh4XNYaoizrycYogQzBiJ1otl/7rnKhDJySER3gXp3We1EaG0zFgEFvxyfQpSr9GytOkc6nQSVBf2QkSA+l2X3i2kfHOcREuQryuSNZ9NGwJ6X/Vk11tDbfo+TERTxjvvGxk+UBW0N5o5DYzlcISQ5q8kgaCA5TZaoFEcgfN7/EWZXpid4x7YRTGoJHVNI9CnBCN2S4Xwgrx9EVy3ffPpwxVS8jq1tz6pijV8ebtWaxLpuK0e9sTBsqXtAxNEUwIrEBAd4KlE+YpWAVkVPG8PGl1z9XXlX3ouxmqQvF98d67U015KPZoam3X97nKLgDQvGzm+5GYuzEKKF0/4pQVDmH3P2zADl7nD8xGGzmRcPhRzsmIkvI5jjeCWhprXtEC5BYaBaIqBysg9xhA5XfdG/GoiPgUFq2YmBuQqg/ioSFIeOlKo/Da1V/rDxuhpkrMhFpBby2gAAIIhrVivDDDb2oX2sdIJ9NruH7dtq9LkggZACfprt/z/+3IAf1KEoWYoDL5OLn7tX79YDpTyfPlgezl5+PoAL4UxquCa8YArYdpaP5Hpxcz9hJ9LYK/AT+n9ffj3HbMN0lTe0c84jOqlfOG+1nr8hApOaKMw43JOwltqebJAWrSslVnW/6u8LoQ5n6U0W6CNOuSuxTA9DOGAp/kRxL4LlJ+gi7ZUblo/bZb4/1qbLh30hc0B4dqWspxt fQj+atPE XvHVQQRoRgNY0qClkYis6jgr2G1DqrI21jzROpjn/zdkiEv3rRzHsSwt5eJ066hHx144/G+AuR7k8tSKfvhiea2dBaWc5tXZQqWNRIQPDyZK4+eLZTg636rOldFFQHXor1nENXW1aFT3eUh+yWxvwTjUGqEiOVZ2J2N2K8eZkSzEGueBo8Cr3/IehcnOAdDkhTVuQ4yJEAqFBFTczxlswarKRmVG58lBbRFpS4HQ6BiNAYteEK3Vy4ORm4QQ+wB2UijQGPbsTmBzBcrqzZSeFdD8uLBCLJwQBI+0mVJyWRH3Pkd59q6kZ5BalmoQJI6a788HgitlGoiAktVsvT88JJJv7mkSAf9V1I4CVOOMkqGu6o3bS/68kwqocRy/4cZ3T4MIx7eWO8NIbWnF8cWeK7AWjNJ3zAspYQTTI8XsU21b7ispSWY/mLbuWFIbRe08XH9qoEXjsJwo29rJU5LyvC8p0ZMfrflOBN0aTbh4gUgus0wXPduaDvM7ZVw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, Hugh. How have you been? On Wed, Mar 29, 2023 at 12:22:24PM -0700, Hugh Dickins wrote: > Hi Tejun, > Butting in here, I'm fascinated. This is certainly not my area, I know > nothing about rstat, but this is the first time I ever heard someone > arguing for more disabling of interrupts rather than less. > > An interrupt coming in while holding a contended resource can certainly > add to latencies, that I accept of course. But until now, I thought it > was agreed best practice to disable irqs only regretfully, when strictly > necessary. > > If that has changed, I for one want to know about it. How should we > now judge which spinlocks should disable interrupts and which should not? > Page table locks are currently my main interest - should those be changed? For rstat, it's a simple case because the global lock here wraps around per-cpu locks which have to be irq-safe, so the only difference we get between making the global irq-unsafe and keeping it so but releasing inbetween is: Global lock held: G IRQ disabled: I Percpu lock held: P 1. IRQ unsafe GGGGGGGGGGGGGGG~~GGGGG IIII IIII IIII ~~ IIII PPPP PPPP PPPP ~~ PPPP 2. IRQ safe released inbetween cpus GGGG GGGG GGGG ~~ GGGG IIII IIII IIII ~~ IIII PPPP PPPP PPPP ~~ PPPP #2 seems like the obvious thing to do here given how the lock is used and each P section may take a bit of time. So, in the rstat case, the choice is, at least to me, obvious, but even for more generic cases where the bulk of actual work isn't done w/ irq disabled, I don't think the picture is as simple as "use the least protected variant possible" anymore because the underlying hardware changed. For an SMP kernel running on an UP system, "the least protected variant" is the obvious choice to make because you don't lose anything by holding a spinlock longer than necessary. However, as you increase the number of CPUs, there rises a tradeoff between local irq servicing latency and global lock contention. Imagine a, say, 128 cpu system with a few cores servicing relatively high frequency interrupts. Let's say there's a mildly hot lock. Usually, it shows up in the system profile but only just. Let's say something happens and the irq rate on those cores went up for some reason to the point where it becomes a rather common occurrence when the lock is held on one of those cpus, irqs are likely to intervene lengthening how long the lock is held, sometimes, signficantly. Now because the lock is on average held for much longer, it become a lot hotter as more CPUs would stall on it and depending on luck or lack thereof these stalls can span many CPUs on the system for quite a while. This is actually something we saw in production. So, in general, there's a trade off between local irq service latency and inducing global lock contention when using unprotected locks. With more and more CPUs, the balance keeps shifting. The balance still very much depends on the specifics of a given lock but yeah I think it's something we need to be a lot more careful about now. Thanks. -- tejun