From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 412B0E95A8E for ; Mon, 9 Oct 2023 00:15:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 658736B0192; Sun, 8 Oct 2023 20:15:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 608CC6B0194; Sun, 8 Oct 2023 20:15:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D0A16B0196; Sun, 8 Oct 2023 20:15:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3750B6B0192 for ; Sun, 8 Oct 2023 20:15:55 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0AD091A0206 for ; Mon, 9 Oct 2023 00:15:55 +0000 (UTC) X-FDA: 81324005070.26.8D2A7EB Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf18.hostedemail.com (Postfix) with ESMTP id 1D4C51C0024 for ; Mon, 9 Oct 2023 00:15:51 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=dnWYpLT5; spf=pass (imf18.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696810552; a=rsa-sha256; cv=none; b=Jaf4EmiVSDhGbG7OEnuTapKKa9LVdlHza1y/lKxndpxdM1wA/XsRdxqYqPce4wMRVP5dSY aCgLPQPu56q7T33uvUBzhhMugpg4VuHVTuIXqE0HFQEhUDiXCEeQdl0Buu164aJUMhM+gM HeDdfCp1WTjFrTqH+UYiORt0ENNgsWI= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=dnWYpLT5; spf=pass (imf18.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696810552; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5pMbdaei/dF0n16cuejvm6nYTMm6wJ9kYRo603Y9tXY=; b=MDQr38PeuSZZJns2gCMQ+XXMluHwtUeXxkjhfrrIWFyEmosV1RGXuo4rGNr3QtLeudRurE ZlCivO7ScEpV2foo0Nb0BXotqW8SwQq/9+JNcVPhmaAsK8K0I50ga/XfalkvbPK12W81Zg pO7erngKtDOkKxIZOu/lffo3zdMWhns= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1c5bf7871dcso28304475ad.1 for ; Sun, 08 Oct 2023 17:15:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1696810551; x=1697415351; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=5pMbdaei/dF0n16cuejvm6nYTMm6wJ9kYRo603Y9tXY=; b=dnWYpLT54LQHEuKDeeu/elm0wh5mMrYPpiPVZJS7rK7wQWWBOpW1k1wNPiK8YcVYYL M0FLZffBHUh3jrqLcw19pyPumqnmUUwRCOenYn94/fBeXigj4yJxjB0YBdXlipra+GiP qz6MUSJdwS/6AKtoTK9gQg1Y2jK9GcJpYYr4F6PKhZokgQWcpo28MUcteVh/QuqUJwuX Yj94b2en6EmtoTWWn8mpO3DnFJ8FEXqXd1VRl2559EgG+a86hLf5g6bBgm1yVRHmQHcS cXLm3O6ISmr31vhH0n4tCzw7vFnllenIvzDaiFLnwQ+FTkKnwc+0aJ61e6SVmrLGCloB WZBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696810551; x=1697415351; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5pMbdaei/dF0n16cuejvm6nYTMm6wJ9kYRo603Y9tXY=; b=TPRLffu6zSVDx8efA2WyT4O0W89J6baSWCjia+RVJFTbwsMiZ0WKQtQJky1Ic2uH32 zQkVBhuaSOBF3XOzl6hF8ydbROaVFuUuLtSd0Ict2LzEOhVuVNCjhJDakoJDegjs/W5T 2AYsv56gaWFaMgc3+MDKqg8jPrAfJKoqcM8exGhd1Lrw/AMeXtceAZtk9J6SHeXR3XWh 6l0jRClAAE8feXtuROmXQ83GX5DLtZ8KC8L6mPNk6tarkhLnLCSN6v1AnEfA3eum5cRL tLfS23AOjYL0eJ99xW1Cf2N4i110JRKPSzSlFrI/qnRc6Bl6Ky69zfrk3sfmjOWeZfDL OURQ== X-Gm-Message-State: AOJu0Yxh543DmxerUv8Jc4q6mCgqXu97bSoWcescZY/MKwyIden+eVzX OFqAxUE4QCZD87KpacWXlPw5Cg== X-Google-Smtp-Source: AGHT+IGJkPv/XYegx4s9UEyZ39lWpjWST9+uJFQQhOB9GWsqna9kS2F0bIydpa24vxmPrUY9+UdkaQ== X-Received: by 2002:a05:6a21:6d9f:b0:13f:b028:7892 with SMTP id wl31-20020a056a216d9f00b0013fb0287892mr14836860pzb.2.1696810550803; Sun, 08 Oct 2023 17:15:50 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id ix1-20020a170902f80100b001b53953f306sm8122275plb.178.2023.10.08.17.15.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Oct 2023 17:15:50 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qpdw2-00BIkj-31; Mon, 09 Oct 2023 11:15:46 +1100 Date: Mon, 9 Oct 2023 11:15:46 +1100 From: Dave Chinner To: Hugh Dickins Cc: Andrew Morton , Tim Chen , Dave Chinner , "Darrick J. Wong" , Christian Brauner , Carlos Maiolino , Chuck Lever , Jan Kara , Matthew Wilcox , Johannes Weiner , Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit, amount) Message-ID: References: <2451f678-38b3-46c7-82fe-8eaf4d50a3a6@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2451f678-38b3-46c7-82fe-8eaf4d50a3a6@google.com> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 1D4C51C0024 X-Stat-Signature: fc8p7hxkqsrys7pys3uph4cx9ukzsbqk X-Rspam-User: X-HE-Tag: 1696810551-981295 X-HE-Meta: U2FsdGVkX1/oFDaKVIqDXJ01ivFfDWghHiUFdtUZowJLs4jYcn0n5QOQIVqGAd/hKr/PmqxE+nljbbZslQT8l6WDIM5tVbffDa5O17nEDoQhROIxEwcLZ6uodc0RDjDJfltBIl9MNBXa+fgyvzDt0X0RV3Az5IajlVu69tBmkK8i4ryxnLksZ3sEBmnQK9GT6pJkYQreWheIN4lK2PEPVJyRpxsJmCLn9bVzNBqxO9AZ4dy6ujKAC04BP7VfurE+eQ4r6JHqUGy6CTShcNR1/kPbHKpxqlCSXQ4VAmADIAgGm+DQ7G0FHYCem2o+n8xMRk70AWkX5Ryy9TSJLWX23WwxX4ecVNt1sa3QnYkkooY2aJo1cG43yc22TViv18XnKUSKE9ISGAgi/BONFYoREUtO+dj2keLJF+kvqk0NxRQ/dPOqvRde9Ws/m2Tynvuip87NV85hJZjRe2KEEXJUEovVoL3k8Y1XJ52+bB4Aoo5wHbfSS/CSHnQs1MKDic5sCemBJKJ7yBoDKgRVaUwa0N8E70OG+uqiWlyQp6wtelmoG5DkKueigcbuTQdAE2o57DT9QktpJ7MF1Mc9MbUn34eUJv+T606VuA02G5RDlx3Ny0Uaq2+CYMr80DGzH31P/xY53uGdmhrQN6imJpTPT37obNX6JGJ0yqWrbM6r95w4RMzJM8s45rt4bTtkyIhXQEFDJ6iUIXM6JAOzdZ34R7g23Y2xb39azsBGK5z68Fdd8ywVXcBEfU3OZOgskW86jca66QtP0G6B8EkrzT+oTGQ1zicdm6O5F7rAbHEC8qNhkfVM5Vvrw9Msn0PUSwoMmGUW9AqsbAR2h+Kfd3dH0nt6LVbht4jLBa0HMtaUjiCvV1EDf7WfLspgSs6awpNNBGvgf5qs+TBP/vYUrKzAnDXCHhJ1FP8jrqYbyd/ijFl+Q6NblFKC+6BQlmirgBE3pdp3vo5LaGZYyR0VjHB kdaOKKdg QXMIhBkwWFrrBQIRFCy9bkfnHG4s5bMj/YVcqv6anVrsI4owjkMpQcODUjyQ+Ayz3bAD6hGcjZIKPtEPej+RXkHJ7khTb1/ajbICvorNITalhDBzTr6tkxhrq5PWhBW/30T2cVejbqhNfaR+A2a4q68RSKHdJsPRhwQjzzPBxDtZjVfBA/in4Y3QqDndUyh5tzRRTJr8SR1MkphQswz9KkXOkDqy+CLVLiebHy9t+bo+0eJS6cV/d8Fjfq3IXbl2X6C8H+bcopYrrz4bhYFtIwoPnbW2fG1FhBFetxO2KIOxsOkHV54caAQCqDT38UqI7TiAUEka7xjzgyiSzGDdRKZsjDKcb6VfQqGincQmLBki2O5Ud7DNLy6lyfiVdYgv+JZLAuO7NFFJT7vwzJcYTpcs+3l8BOhEPj/wwDWqETSiT+eF4mTMvpSFr8OeiQiKEKRJ2qgZO/bOS86tcuWlJZaE6cTQhP2zqFM3pOW/QcqR9Ehz/55Rcs2OMRAD4qHqXghQhWmiCi6R2KyABp5ZTckuw5OkRsS/xfS7fufkvFH/ZpCLCo+KG/I7t5HfAYgUioAzQLBgnCPgRw0vb8GoZqrU0mR0sU50/LOr5+ODkvKbAg+SfLEYijPNXlQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 05, 2023 at 10:35:33PM -0700, Hugh Dickins wrote: > On Thu, 5 Oct 2023, Dave Chinner wrote: > > On Fri, Sep 29, 2023 at 08:42:45PM -0700, Hugh Dickins wrote: > > > Percpu counter's compare and add are separate functions: without locking > > > around them (which would defeat their purpose), it has been possible to > > > overflow the intended limit. Imagine all the other CPUs fallocating > > > tmpfs huge pages to the limit, in between this CPU's compare and its add. > > > > > > I have not seen reports of that happening; but tmpfs's recent addition > > > of dquot_alloc_block_nodirty() in between the compare and the add makes > > > it even more likely, and I'd be uncomfortable to leave it unfixed. > > > > > > Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. > > > > > > I believe this implementation is correct, and slightly more efficient > > > than the combination of compare and add (taking the lock once rather > > > than twice when nearing full - the last 128MiB of a tmpfs volume on a > > > machine with 128 CPUs and 4KiB pages); but it does beg for a better > > > design - when nearing full, there is no new batching, but the costly > > > percpu counter sum across CPUs still has to be done, while locked. > > > > > > Follow __percpu_counter_sum()'s example, including cpu_dying_mask as > > > well as cpu_online_mask: but shouldn't __percpu_counter_compare() and > > > __percpu_counter_limited_add() then be adding a num_dying_cpus() to > > > num_online_cpus(), when they calculate the maximum which could be held > > > across CPUs? But the times when it matters would be vanishingly rare. > > > > > > Signed-off-by: Hugh Dickins > > > Cc: Tim Chen > > > Cc: Dave Chinner > > > Cc: Darrick J. Wong > > > --- > > > Tim, Dave, Darrick: I didn't want to waste your time on patches 1-7, > > > which are just internal to shmem, and do not affect this patch (which > > > applies to v6.6-rc and linux-next as is): but want to run this by you. > > > > Hmmmm. IIUC, this only works for addition that approaches the limit > > from below? > > That's certainly how I was thinking about it, and what I need for tmpfs. > Precisely what its limitations (haha) are, I'll have to take care to > spell out. > > (IIRC - it's a while since I wrote it - it can be used for subtraction, > but goes the very slow way when it could go the fast way - uncompared > percpu_counter_sub() much better for that. You might be proposing that > a tweak could adjust it to going the fast way when coming down from the > "limit", but going the slow way as it approaches 0 - that would be neat, > but I've not yet looked into whether it's feasily done.) > > > > > So if we are approaching the limit from above (i.e. add of a > > negative amount, limit is zero) then this code doesn't work the same > > as the open-coded compare+add operation would? > > To it and to me, a limit of 0 means nothing positive can be added > (and it immediately returns false for that case); and adding anything > negative would be an error since the positive would not have been allowed. > > Would a negative limit have any use? I don't have any use for it, but the XFS case is decrementing free space to determine if ENOSPC has been hit. It's the opposite implemention to shmem, which increments used space to determine if ENOSPC is hit. > It's definitely not allowing all the possibilities that you could arrange > with a separate compare and add; whether it's ruling out some useful > possibilities to which it can easily be generalized, I'm not sure. > > Well worth a look - but it'll be easier for me to break it than get > it right, so I might just stick to adding some comments. > > I might find that actually I prefer your way round: getting slower > as approaching 0, without any need for specifying a limit?? That the > tmpfs case pushed it in this direction, when it's better reversed? Or > that might be an embarrassing delusion which I'll regret having mentioned. I think there's cases for both approaching and upper limit from before and a lower limit from above. Both are the same "compare and add" algorithm, just with minor logic differences... > > Hence I think this looks like a "add if result is less than" > > operation, which is distinct from then "add if result is greater > > than" operation that we use this same pattern for in XFS and ext4. > > Perhaps a better name is in order? > > The name still seems good to me, but a comment above it on its > assumptions/limitations well worth adding. > > I didn't find a percpu_counter_compare() in ext4, and haven't got Go search for EXT4_FREECLUSTERS_WATERMARK.... > far yet with understanding the XFS ones: tomorrow... XFS detects being near ENOSPC to change the batch update size so taht when near ENOSPC the percpu counter always aggregates to the global sum on every modification. i.e. it becomes more accurate (but slower) near the ENOSPC threshold. Then if the result of the subtraction ends up being less than zero, it takes a lock (i.e. goes even slower!), undoes the subtraction that took it below zero, and determines if it can dip into the reserve pool or ENOSPC should be reported. Some of that could be optimised, but we need that external "lock and undo" mechanism to manage the reserve pool space atomically at ENOSPC... Cheers, Dave. -- Dave Chinner david@fromorbit.com