From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0542C3DA62 for ; Wed, 17 Jul 2024 17:04:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D10AE6B0089; Wed, 17 Jul 2024 13:04:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC0786B0092; Wed, 17 Jul 2024 13:04:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B882A6B0093; Wed, 17 Jul 2024 13:04:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9AB666B0089 for ; Wed, 17 Jul 2024 13:04:17 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2456B121825 for ; Wed, 17 Jul 2024 17:04:17 +0000 (UTC) X-FDA: 82349867754.15.5AF6E88 Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) by imf09.hostedemail.com (Postfix) with ESMTP id DBD47140029 for ; Wed, 17 Jul 2024 17:04:14 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=HmSLF11n; spf=pass (imf09.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.53 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721235836; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iRUnGmde0gGroE3rQQ9XsC/MMS7y6d0VThMieKRFaBg=; b=G9so9rStvzwvZVr/JuXRUpBTNJSAx36UixRqYDPi3HZksV5iGg52wJ7dDDXkGfvkdamOoP YojTAtjTRFU6gCiWwi/le4/du7FJutEJh1p4YzDFau6UOFpHvy2F1GUhGBoKWK+hHdeRDR Y47P55b+8FFe5yJjVdiBWnh6oeA1XIA= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=HmSLF11n; spf=pass (imf09.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.53 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721235836; a=rsa-sha256; cv=none; b=OPoa8GLXeEgXRHswDLdTSkbIHfFx058L238DVEXzNcI6RWb1jwVbhZl+Ap54yxNsEjoilA vAKpONjdaWDI5I6V1efREkd8m5dOElEGM5lAaNPtWh2jBiIKuHfjbcTg0/F4E1RwqxLsJc TMxj/bFX3ycZ15APL3dE/AIySH7hpfs= Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6b06e63d288so54006d6.0 for ; Wed, 17 Jul 2024 10:04:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1721235853; x=1721840653; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=iRUnGmde0gGroE3rQQ9XsC/MMS7y6d0VThMieKRFaBg=; b=HmSLF11nPJKZd+Hi/Q1Gi8pm7NIunkTiERLtvYBAFBw9LcoOoHWKqZsNvce1hVH9Gz oF1tqiTckafTKazcU5UlYpK8SDUREtiTVRc3ANZOSMWpwsyIP9aV0XTGfsjG9dPxc2cG O4ss0OjJEzQRO2uYkg0AWlySssfCQ9ZB5aYTXMow2najADt4cfUSAsPchFeHLOM3p3M7 LEIWNgxmF/DL6/6JC3Kf0fA8dp3bbugotd1EZJ3RtblgzQfF/tAtHNxIjp3AoE/Do5z+ ZXfNUh/OyEc4AHWmbTU7AcfHOp7LdvheDDF0yIAH+m7261DEkwDJ39tgQyYs06ViA38m Z0cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721235853; x=1721840653; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=iRUnGmde0gGroE3rQQ9XsC/MMS7y6d0VThMieKRFaBg=; b=Ggenq/IW9RQ5+iW5mi8ff0fNP1wxNoAbjv/mxxrx0U8MJRGLeMYF32rl8vAcojnF6x MJdVNJUpVzttTkpJB6quPzxvYmEbsFMK/drpRSy2w6VVLr89Xir5KkdSM1QfYs8VJlTm dBwCvCgPiblTOqBnASnWmQXUV8FbgiH3HpsZdCjFwy7iT4+lZkn1qS/qDCGkGYUytUWJ MQAqwszy5JSndNzEd/oWD9uKvdibmnMES9NrY8ErLYD23SyS7BjBFONcDn2FyqQ2Si4K 4Cd0yfFC+75LcdyFxpFNR3CXa05HWTjV1YNEyDlbfAsFLN8CeIm4Y5DniXC7zMnCVoT9 iyPA== X-Forwarded-Encrypted: i=1; AJvYcCUBL5CJqJ0H0CLvApf2mm1QdxJh7Q0EQP4ZXCn+hMtgt1bZs58OFqrmqXBsSozjVZH4VsEa4MHxVBLSkIMbOyXUk/E= X-Gm-Message-State: AOJu0Yy8IuOYc/0Hq7EppXsNFtvJlQCiaGHJ2/T/1g+VSD1A7vCTwxUd atu8yGXi0/ExF1Ce3iM293nEvTVsjTnrrbBxdF3I8dwL9UnuxD2dKNVPoeb86OU= X-Google-Smtp-Source: AGHT+IFWhIiuqTrPrLgEKk9yyoBUOyaQC4cIo3mPC9fwvmoBNcNRFbthBE/KZxwdgchUpEIo5TEs+A== X-Received: by 2002:a05:6214:5198:b0:6b5:6a1:f89a with SMTP id 6a1803df08f44-6b78caf6152mr37393066d6.2.1721235853523; Wed, 17 Jul 2024 10:04:13 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6b79c4f68f7sm143776d6.40.2024.07.17.10.04.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 10:04:13 -0700 (PDT) Date: Wed, 17 Jul 2024 13:04:08 -0400 From: Johannes Weiner To: Tejun Heo Cc: Michal Hocko , David Finkel , Muchun Song , Andrew Morton , core-services@vimeo.com, Jonathan Corbet , Roman Gushchin , Shakeel Butt , Shuah Khan , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers Message-ID: <20240717170408.GC1321673@cmpxchg.org> References: <20240715203625.1462309-1-davidf@vimeo.com> <20240715203625.1462309-2-davidf@vimeo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: DBD47140029 X-Stat-Signature: 3h1teybi5mrsotrk3dw1khedyn1de59s X-HE-Tag: 1721235854-690233 X-HE-Meta: U2FsdGVkX1+lFKboBsZCir9t6k8zjwfDBTdANz9blcrVa5oR0stcGkVlZEn2UU9exlpTN+N4++JU0AQhK/B0if9tZDEOZhwDb9iw6b6em4wew8rdinguHaIK7ScklYAqyOSaEIdiUaFddyje2sR3T6Gyf1G7xLicZpJulUIAXW0Y2fhsVILasxn/uMhbEGV8VC3jLa3zpeK2J+DwXaguYCJZiR3qwxN2DA38Dnztk5/g66niecIvYFaoCs4gE+OH4B0L0ZjhbqhW8d9Z6YVBL8ZouK4qjbNKEAcRg9NZqja3/GL4uqfbLe4JAMzDxQojbV1Awcj6OwIBL4VEaPTgWcYvNqFZUnf77JuVVG9pKwK/af+41FGwLmtLlMXfrbj+X5uVV+dm7ORh5CmurK+xXD9ClAnC0fye90aMg1BJjigEMDEeCc9fCgyr6h2+LO5TfDBhNIIS9ztss16R4eQZUYrcgzOn8xE+WpSKQphWe8ADtU+lu8nY0WE/cKZSYRgGbch/UlJMOGd7lcd72s7iF0126BRmAorMyfeAupppHDYnL/s7z1Iy08t85w1fdsxulL+c/rWS/QNHCMevUeRowBu74RsNZ1bjc+8yLF7eTj3TQDTUZx3CuU/5VgX4ojqprJCz3nEDd/v5HgsE4u68nDaO1GK19fqKVZHJwdI9uHrW/TphthZHxFqK/shca7+s12X7XS8XT4tzwssYrhc0U37N5i6LCMOZXLc9YXZD9us3q+mEgfOf9sZ6JhZQ2NDJcx9S3OwnVo/inQEUb3Jedn8hKPZNVNDcc4lgKJyqUWKFnWWybWZwFfs4Zwq1fialO4QdIZBofoc6F2uhOSnFwerUeCWqyFC0a/oi9p3yZhQsRo4IlTtulXFY5Ra6KsJGR+mTQ4/aZTSmplIHUNsMu/TgFlDPcNdkL7uFxBqbm/uvo1nVzddKcVmPWdfD7OYI6FDSWCB/wH+E0quETgH 3eTla6X+ nBej1oVzJGwgHFJD8q2CtYrgST1OiHVC6SrheSCg6EP1PlBWuptQuqg92EnygEHOm8DYDFoE6XPvX+JpwCkyeBea1PxSD2gScNOzL/C7HOFh9hn2N4BvoG6RrbVoU6ygOnMxggDQgW8Y7lL7iUT1Yv/s5FKvRo+UJeJHvuq5j+mSciCHh5mzhh586DW7s3aXlxPCm689VzPRNsCEXPXirxmK0ucTSarIOSCxxQSG0hfpC45Y2QPjLLqY5X1lnpG3qUlVp+vx+rDaREysmq98dDKax/rQji5kc4jm0MAsnLaUAQrp84k9TjXjihohzUYD7Pex9NzhpbcLtItBGJw+lIEAQt1yybg+GYInJzIJCIyV7XZ67xd5Y8l+U9FJ1yAbn84G0qJiFBObsFckoN09D+Vy2U3kxv2Xy03tIVeN7tzYCuy9kmm4+DlkXVw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 16, 2024 at 06:44:11AM -1000, Tejun Heo wrote: > Hello, > > On Tue, Jul 16, 2024 at 03:48:17PM +0200, Michal Hocko wrote: > ... > > > This behavior is particularly useful for work scheduling systems that > > > need to track memory usage of worker processes/cgroups per-work-item. > > > Since memory can't be squeezed like CPU can (the OOM-killer has > > > opinions), these systems need to track the peak memory usage to compute > > > system/container fullness when binpacking workitems. > > Swap still has bad reps but there's nothing drastically worse about it than > page cache. ie. If you're under memory pressure, you get thrashing one way > or another. If there's no swap, the system is just memlocking anon memory > even when they are a lot colder than page cache, so I'm skeptical that no > swap + mostly anon + kernel OOM kills is a good strategy in general > especially given that the system behavior is not very predictable under OOM > conditions. > > > As mentioned down the email thread, I consider usefulness of peak value > > rather limited. It is misleading when memory is reclaimed. But > > fundamentally I do not oppose to unifying the write behavior to reset > > values. > > The removal of resets was intentional. The problem was that it wasn't clear > who owned those counters and there's no way of telling who reset what when. > It was easy to accidentally end up with multiple entities that think they > can get timed measurement by resetting. > > So, in general, I don't think this is a great idea. There are shortcomings > to how memory.peak behaves in that its meaningfulness quickly declines over > time. This is expected and the rationale behind adding memory.peak, IIRC, > was that it was difficult to tell the memory usage of a short-lived cgroup. > > If we want to allow peak measurement of time periods, I wonder whether we > could do something similar to pressure triggers - ie. let users register > watchers so that each user can define their own watch periods. This is more > involved but more useful and less error-inducing than adding reset to a > single counter. > > Johannes, what do you think? I'm also not a fan of the ability to reset globally. I seem to remember a scheme we discussed some time ago to do local state tracking without having the overhead in the page counter fastpath. The new data that needs to be tracked is a pc->local_peak (in the page_counter) and an fd->peak (in the watcher's file state). 1. Usage peak is tracked in pc->watermark, and now also in pc->local_peak. 2. Somebody opens the memory.peak. Initialize fd->peak = -1. 3. If they write, set fd->peak = pc->local_peak = usage. 4. Usage grows. 5. They read(). A conventional reader has fd->peak == -1, so we return pc->watermark. If the fd has been written to, return max(fd->peak, pc->local_peak). 6. Usage drops. 7. New watcher opens and writes. Bring up all existing watchers' fd->peak (that aren't -1) to pc->local_peak *iff* latter is bigger. Then set the new fd->peak = pc->local_peak = current usage as in 3. 8. See 5. again for read() from each watcher. This way all fd's can arbitrarily start tracking new local peaks with write(). The operation in the charging fast path is cheap. The write() is O(existing_watchers), which seems reasonable. It's fully backward compatible with conventional open() + read() users.