From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9B92E92FE4 for ; Fri, 6 Oct 2023 05:35:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C571F940011; Fri, 6 Oct 2023 01:35:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C076694000B; Fri, 6 Oct 2023 01:35:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA8A0940011; Fri, 6 Oct 2023 01:35:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 991FE94000B for ; Fri, 6 Oct 2023 01:35:45 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5CAB01A0235 for ; Fri, 6 Oct 2023 05:35:45 +0000 (UTC) X-FDA: 81313924650.09.6222E92 Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf11.hostedemail.com (Postfix) with ESMTP id 8975040006 for ; Fri, 6 Oct 2023 05:35:43 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qOssfZcs; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696570543; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PaezefeDp7zAJYrklW0Fkd4Zlqg9EojUKIwPXxGjmPs=; b=iS7YhZjs+748XG/moQs0g148s1xSOXzk6t+W1Y9PckBC6KKs4EaKykxbrMts7yVEKWgT51 nNXrJ4M8Hi9ATfrioc/6XtjaSoyM0AfdofKEd19WTKe7rlYFPNAB99SGhQAy5ax4hRXibx FWXP5r+0UOLve9XwAXada81yETaeYmc= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qOssfZcs; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696570543; a=rsa-sha256; cv=none; b=C2xfa7gqZwITC2UCaavgnoJcB5OzoHsON6Tc5XcaekINDAbAgzXf1kwWdUvWUfAcFGlZDM ce5ahTRd9V7mVAKwMqRjFc7IWt3dpya0nbFvqfQst6+d6zlZNAuXBATZzdPJ7AAVjgezeD MKAmhmGE8Cu+GcoGvtn9E2WdSMLxemQ= Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-5a4e29928c3so21012317b3.1 for ; Thu, 05 Oct 2023 22:35:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696570542; x=1697175342; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=PaezefeDp7zAJYrklW0Fkd4Zlqg9EojUKIwPXxGjmPs=; b=qOssfZcsOTEsJr6UW5csJKuqwtcJFSF8F6BmwMY+EhTKdEd6GtVOmFwe2X4OBhK989 EMrRxFGIKGvRTaZoYiMuoYCx/TlHCLHYdFDXz2+tuYYMbc7nq8aYqP8IGfQmCDXPjo2y zhfTiAbXT9KWYjqq59Oe9HgNgKFrYU+QQ5sentZEUrAGKPcV0JV5GTQSlyiZP7HB/R6p X6rqsW+JQJ09vsSopnIFEvVnVP6MBAhpMkphX+6NE/T90qmKH0zf3o45XjEsOkH88uQ1 9GdrlNTK8mT1Htk6JFIHF6GfEEZ0tY+Jji4q+DKJflgqmy7lQ1SeqRQCig+48oT5Y/Sj +q7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696570542; x=1697175342; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PaezefeDp7zAJYrklW0Fkd4Zlqg9EojUKIwPXxGjmPs=; b=mZJp78wtS+6+fgNgy652OqwrCLBRyD2hpXgN2BWHDlFwoGfBchAzyKlxiOqoCq5I5s IODp1Ugq6XQH1FBm4bxqorl81n3rAQ/htJhcBoE/KNbdimFXuV/yGvWwE1oTZZ94mw7w n63MECqKvgdMVCWE1PpVQEbmilz/VruCjF1on7Qjsx1BX2twhTl4PAiCeBH2TaXafoP9 iELpKBQMZZesHfySMtmcGLkw19zZeD0Hbz2JxUrLF+CM2CJiE9CzQT52yHUNALfqYvoK 1WCmcVNDbceNgj4FjkPW11GM21bHfc1uzNwAAPEKwuFs/4rau8SNwUyQhuxrwi2xZHws Cxlw== X-Gm-Message-State: AOJu0YznY3oJUHS0y7wjZxc5xfRUkw/6f6oT/BamGn70nfigIqXVCWw1 0EZE6WD1dot7kLcEK9bIAJgIhw== X-Google-Smtp-Source: AGHT+IEpTrnr9XdsKNbpo6pfTBAH5V4ZhM8V1KQHtZFnBvpqLiwm1YQa3LOzuYzsLeu+aZuIuTGmrg== X-Received: by 2002:a81:a18a:0:b0:595:320d:c9e2 with SMTP id y132-20020a81a18a000000b00595320dc9e2mr8370979ywg.24.1696570542515; Thu, 05 Oct 2023 22:35:42 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id l197-20020a0de2ce000000b005a4dde66853sm1059846ywe.0.2023.10.05.22.35.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 22:35:41 -0700 (PDT) Date: Thu, 5 Oct 2023 22:35:33 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Dave Chinner cc: Hugh Dickins , Andrew Morton , Tim Chen , Dave Chinner , "Darrick J. Wong" , Christian Brauner , Carlos Maiolino , Chuck Lever , Jan Kara , Matthew Wilcox , Johannes Weiner , Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit, amount) In-Reply-To: Message-ID: <2451f678-38b3-46c7-82fe-8eaf4d50a3a6@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 8975040006 X-Stat-Signature: 7g1n4ymhd1km75yrsfsceot9di8m6iuu X-Rspam-User: X-HE-Tag: 1696570543-743576 X-HE-Meta: U2FsdGVkX18NuTVDI+W/z6EBYUkiOccD+SuFInFTllJOtYEOOiGYbnbynBpwkpyJzM65O4Es+2IRzHYl03Toi7f7omTsDV1veBWr78hZqGzX5h18092+5VWPrfTKpJQdvrWNlDTHOiKIOB7NqyBAqIr2+qBindYRvclwLmQs2ViKjp2tQG81gmoOFNihuqI8MTTsUTvxabeOXHZcDkeJ6+0EcYPpEKXGgSt7Audl1gntnpUJYUwFjliCS2f+y6cTBQbTZ4u76uRhdT1hr1XgAyRIQjMf0VYVvE+TTRwvqTNNtIKv1exm0rIN7a1W9e+hlAbuE5T6r+v2ojznS+gRXzX6GU0IF5yqxOcL70AcoYWjvnZXjH329hsG9JomOy0puVQE5XclwGOsN9ntjE2xy2mUY9kPTWkWVp/zQX6zYgaBFT5xgONmGX8hbY8iuqOstDL1lvIFQBySr6hXkmR3vtN0uHxIP4h1moPU4Co7nGbsossqNbIu02NROq8H8rmZVIVqxJD68JD1pjTIZXFQsXtcvs0k+ZUhdt0GYcv7o96AyQ9Sw6/oA11xMCmZ6Z5EmooRcVnnWrd4CfzSnSzZb0rYyxklOkHp3lgkubPW+aKMSKsR56AgDuvryDP3uhhsROycysKRPxV1n/X78B5RTBCzmTuHPdOCFnLQXv3eFfO8OHUzXLhQnWKtOyCqLu8UFm15PF5V2419hG9z3CLm5pRYM0Qas1UNebZCB12MbqyQ+yb5/UnuDQPJljBPlBjCmtmfYCkkMsoK8sl9AA975w14lOyDrsRzj5yaa18HfRPx8y38eCay7c+FRqgFEMbT09El4w2g7nsP0ytw9wqd/yYz+qrJqcfWyz96CK7u2nEh0FGev+79+EQ+cgAz3X+7UT/0xh4K5Iq35jTS94ih4oZlEatOz1gs957bIxDy+yt5MFrZ1HFp2BwNneTrJgQjuSnwB6jcocqmSl68mWm 6qwSzWoo RAy9e7p/esqk61X2R3XQQPGCJ6X/ElfI00+jSFidqGiCzh8zz6dtErBjsE5emeItDdposmnAs/UQ+xTl+QE/yLTbdLyEg3QKvq0/Zs3e9H9ebZ/xYwuQJfaWwxnJpVcSvuh0PwDuVQRk3WhdJ9vCwjQ2PLqnI7yOgoTckR07wWjSCxYItENUtWKPbd2zrlTXhmrYHpLoTLRtL7JEhZO0W6vMDx8PBsgrvbGIYHwitGE5g6ZZraNbAXpJ874EzAOaGmfUdkIwYVdzF0Fi/20uDGO4gOkMrToRCUQTmSI5nOWhca9k0kYPsq0hxGc5FjVC4ilQxO+U/TE0yDoteq39Ose9XlCWgHyDD4UOyGpAiPNbKfBalafnVUPXhi9DaDwLh+VCmOmmsdxtFDUGkRXfHqnuakzRnFkjmdz0JwogwM75jrhxJNwcjp0ZHXoobCA6J6tXeFBbovpe2u5qnBKL+aw3uNwPt2xFeVT3YPWqqv+8VW5T41+/dBQpBXjL5mVAbGQPXIn+rZCq88+3Z4Y9icbU5qhT0RqMNtiFFNByRC6/D7yNDl5eeGzHTQVM3s58LiQkJzE20wVTTmgxWOs9IyXqPtbfQjzKLfIxs X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 5 Oct 2023, Dave Chinner wrote: > On Fri, Sep 29, 2023 at 08:42:45PM -0700, Hugh Dickins wrote: > > Percpu counter's compare and add are separate functions: without locking > > around them (which would defeat their purpose), it has been possible to > > overflow the intended limit. Imagine all the other CPUs fallocating > > tmpfs huge pages to the limit, in between this CPU's compare and its add. > > > > I have not seen reports of that happening; but tmpfs's recent addition > > of dquot_alloc_block_nodirty() in between the compare and the add makes > > it even more likely, and I'd be uncomfortable to leave it unfixed. > > > > Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. > > > > I believe this implementation is correct, and slightly more efficient > > than the combination of compare and add (taking the lock once rather > > than twice when nearing full - the last 128MiB of a tmpfs volume on a > > machine with 128 CPUs and 4KiB pages); but it does beg for a better > > design - when nearing full, there is no new batching, but the costly > > percpu counter sum across CPUs still has to be done, while locked. > > > > Follow __percpu_counter_sum()'s example, including cpu_dying_mask as > > well as cpu_online_mask: but shouldn't __percpu_counter_compare() and > > __percpu_counter_limited_add() then be adding a num_dying_cpus() to > > num_online_cpus(), when they calculate the maximum which could be held > > across CPUs? But the times when it matters would be vanishingly rare. > > > > Signed-off-by: Hugh Dickins > > Cc: Tim Chen > > Cc: Dave Chinner > > Cc: Darrick J. Wong > > --- > > Tim, Dave, Darrick: I didn't want to waste your time on patches 1-7, > > which are just internal to shmem, and do not affect this patch (which > > applies to v6.6-rc and linux-next as is): but want to run this by you. > > Hmmmm. IIUC, this only works for addition that approaches the limit > from below? That's certainly how I was thinking about it, and what I need for tmpfs. Precisely what its limitations (haha) are, I'll have to take care to spell out. (IIRC - it's a while since I wrote it - it can be used for subtraction, but goes the very slow way when it could go the fast way - uncompared percpu_counter_sub() much better for that. You might be proposing that a tweak could adjust it to going the fast way when coming down from the "limit", but going the slow way as it approaches 0 - that would be neat, but I've not yet looked into whether it's feasily done.) > > So if we are approaching the limit from above (i.e. add of a > negative amount, limit is zero) then this code doesn't work the same > as the open-coded compare+add operation would? To it and to me, a limit of 0 means nothing positive can be added (and it immediately returns false for that case); and adding anything negative would be an error since the positive would not have been allowed. Would a negative limit have any use? It's definitely not allowing all the possibilities that you could arrange with a separate compare and add; whether it's ruling out some useful possibilities to which it can easily be generalized, I'm not sure. Well worth a look - but it'll be easier for me to break it than get it right, so I might just stick to adding some comments. I might find that actually I prefer your way round: getting slower as approaching 0, without any need for specifying a limit?? That the tmpfs case pushed it in this direction, when it's better reversed? Or that might be an embarrassing delusion which I'll regret having mentioned. > > Hence I think this looks like a "add if result is less than" > operation, which is distinct from then "add if result is greater > than" operation that we use this same pattern for in XFS and ext4. > Perhaps a better name is in order? The name still seems good to me, but a comment above it on its assumptions/limitations well worth adding. I didn't find a percpu_counter_compare() in ext4, and haven't got far yet with understanding the XFS ones: tomorrow... > > I'm also not a great fan of having two > similar-but-not-quite-the-same implementations for the two > comparisons, but unless we decide to convert the XFs slow path to > this it doesn't matter that much at the moment.... > > Implementation seems OK at a quick glance, though. Thanks! > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com