From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C7D6C3DA5D for ; Wed, 17 Jul 2024 20:14:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF0856B00BB; Wed, 17 Jul 2024 16:14:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D53AF6B00BC; Wed, 17 Jul 2024 16:14:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCC9C6B00BD; Wed, 17 Jul 2024 16:14:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9CEA86B00BB for ; Wed, 17 Jul 2024 16:14:22 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 625B41418D4 for ; Wed, 17 Jul 2024 20:14:22 +0000 (UTC) X-FDA: 82350346764.10.B5384A3 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf27.hostedemail.com (Postfix) with ESMTP id 78D4640003 for ; Wed, 17 Jul 2024 20:14:20 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=XzipQ3dr; spf=pass (imf27.hostedemail.com: domain of davidf@vimeo.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721247207; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f+Rl51Pjux9AOX3fJ65ZFReyLBwLYFT1Ie5IA25miZs=; b=lSBWh+fQSMd3YcslmQRv16vOW7dkVfWjq6ix0DRExK5suWhtQLTJiYB4HXKJ4mlIauZ8kS 5Tu2rwd7Cgk/n5Y50Z1VjzUZjwvJ09U0ivyVWtkJ+dOBI4BD8jEqp/Cruu0ogZ5cfP9u+I B1I/jOYRM1Owpk/f+QQWCamBiuXVj9U= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=XzipQ3dr; spf=pass (imf27.hostedemail.com: domain of davidf@vimeo.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721247207; a=rsa-sha256; cv=none; b=h8yF0bF9C/yEqcmV/zRj1kWlSGeBM3bGlMHjMwF+Gga4nuVnwY1mX/a6MxMFaqFx0Iu2ZH 8zKJHvqcYUnjBl53txnMG24oZAamW47Qer06pgpID1Dvcx5CDzALWnhxyBRiF3OOJbJMey VIbt/LNnTSg6xUmB0XpcLrTl5ZVIG00= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1fb4a807708so978285ad.2 for ; Wed, 17 Jul 2024 13:14:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vimeo.com; s=google; t=1721247259; x=1721852059; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=f+Rl51Pjux9AOX3fJ65ZFReyLBwLYFT1Ie5IA25miZs=; b=XzipQ3drqEzULdjsYoQ8gmdp5v1NIFJL7CvesWtCuXYPuVMjLcoZKUll5gtYiQnpWI 4m97goS9w+y6kHEsrqWuiu+3CifJkkaGJIYDZHbV6hcaYTH4+MIaahWodY9Al6nGnaAs UehrZlRKDSNQcLvx0dnrW1qOaZFTvPxYRvdgM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721247259; x=1721852059; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f+Rl51Pjux9AOX3fJ65ZFReyLBwLYFT1Ie5IA25miZs=; b=hhFX8Zc1rbPsW6CiIE9cRsQEIbzZ53JQuZ9WzvSDDLE3aFg87lpZiQXHZY6tf45wSy qc9Lp9GFocdZ3K0/YIYqG7dHsSJF+Di6S175cCeNU25AMIC8l/C5DpDJeoXQ1I7T9f3t hPGEEoob8yvE70qpQr6LotbP1Z8rjvqavQoUDJuNmSF9rzGznY+cOYxwHprx6ho46+FG GNDj1skLeGkMvvrTr0xtlV9T21VXfFUfEZyFLrHTFoZwH1nhxu3qvcfKCliiLyVB+8Gb 9WWCQUKAcHqLw3r4QcjUx56yyRTKmQdAeWxgAPMxwqV5Xe8iSxEyZzvpssKe21eLNpu5 m+gA== X-Forwarded-Encrypted: i=1; AJvYcCU1G9AhlaTkTUh7nYBCTzXbhIvaouMvxswIeV2wkXTmtdtiplhOGZXYu4Uet7Xty2Lp+kVU1R8TcGScXboHBONfRtw= X-Gm-Message-State: AOJu0YxTs6wlE4yXXrYYDwM4vijDheH6jUzr5F63AtH1dkRcs06YBzFJ QiO43UCt73nyoxlPZi3AL26OYbzXy3QxgbQr8KFKqX+MmVmUISKfP53KWopRv/IAKGxTEqypPOf S6e8Z8PJpClC75TkxDMJvAxhRiC1pStrvQyo47g== X-Google-Smtp-Source: AGHT+IFsqWciT6jPaa0Dlx43ECGt3R243m651jWcjSVnPUBqfjdJ7t0VIckH17rerVRQxJrGCfkyWyJXQjrE+BvHMWo= X-Received: by 2002:a05:6a20:8423:b0:1c0:f6d5:be9a with SMTP id adf61e73a8af0-1c3fddc4176mr3946787637.36.1721247259155; Wed, 17 Jul 2024 13:14:19 -0700 (PDT) MIME-Version: 1.0 References: <20240715203625.1462309-1-davidf@vimeo.com> <20240715203625.1462309-2-davidf@vimeo.com> <20240717170408.GC1321673@cmpxchg.org> In-Reply-To: <20240717170408.GC1321673@cmpxchg.org> From: David Finkel Date: Wed, 17 Jul 2024 16:14:07 -0400 Message-ID: Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers To: Johannes Weiner Cc: Tejun Heo , Michal Hocko , Muchun Song , Andrew Morton , core-services@vimeo.com, Jonathan Corbet , Roman Gushchin , Shuah Khan , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Shakeel Butt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 78D4640003 X-Stat-Signature: 5ctad86s8yhtgisgyoqepzki6bakya18 X-Rspam-User: X-HE-Tag: 1721247260-956530 X-HE-Meta: U2FsdGVkX1+WbyJVT32UEL6j4hNGSyp3lHkhloUq+RovM1OSzf4ZRhwSC5XbZe6VkxtD1wmx/SuAprrKaDiB/11nGz19PXsHaT4LKPjI3LcbFSUNUqcz84w6SR7LovR0L6A4Dl1ikn2GrILqN2+6b+R8QwqsH8B8OWAJAAg6dSbr7esxxRtQTP6nT0XLccVVXwaqG1IsNnecqX7kh+ZQxuNjRNH0U4y+qf2Cf7421hmCsH99612wyP+Oq/HU13yxXqOteBCZvAFtq27I658dc5xrlLPUyRNEiZ24nPAKTd/8hf9PcO+IjqJ54Tex+FN6a8Qv/+LCjV47nRsHGJm8wqOsCuOMVl7JWvq+aFEk9uMdjIfAvmq7ZXpFEveEN5YEsXihMPZ2FZjfoXKc/ZhlRpV9josfj95mTVuK+22moIOQHaDTeVNiHWGRvDEMPQYa319/rKFrOnQ1n/I4l84+A0AjTsqdJS0Xe1754SdOuapGk3fOxFP1sIzZv7EvZZ1HQmQBV71tZtmnGSXJK72hOZJvnonLsY5/AcFSjWA+YhaUu3nERY6Nw0FlZwI0lS8ZS2C8P3vzeYmaSIqhPEpNK9Z574CjjjeFmakTS3UeoQGh/K1bRLaebFT4FTuJf9agcUyTwL9Qg6ZkrWkfC9oP53joWIzvRi0F/xA4F8QhOkRauzJu/3LuhqF46y+74xafkNHgkpnE6Vo5Fj228pdOosaQRudby0qbYomh516VXyjtYGlf5bNcOSta5wIGa7eR1ZV1K+gR4Tbz87/FG6vZtXCKNHyV/N1BTcvSRqfaLF8klCy6UXnHdl5x5u1pkR+w+k3s+r9EI5PtXmJ/cZHs7dTdgj1w0w2HZrz0yszoKg8ey+n48Zo+fzTjavK4Gr8h4ZD0rTWspWSf0fBma8OLbp/3C0VGdpmIdwdTiZ2Md41ywnBRIeUFgBffMebf47gC+bj7EqKe+YoZhyaYs9h LvNOWBFQ VDj5K/Me8F5McWz1sn6cp3SMP/LOATsZtyRvLc84JXgj3WKd9yd+bv4jF3NM56OJWWakp2BGYIn4uCQPNL0QBCliQPZJkMaRzlfwWg/bsyJTneUfcE2q4NUI0ncOWNibp+W9X/mqK9WAJfI7PBnL4XIT9qtd/uo0LqoNBJAgFryLWqtTWY5ynHvg2/kH4rOdPjMaKJucIAxmKOZ/kus08oR12SPeL3dt5caVo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 17, 2024 at 1:04=E2=80=AFPM Johannes Weiner wrote: > > On Tue, Jul 16, 2024 at 06:44:11AM -1000, Tejun Heo wrote: > > Hello, > > > > On Tue, Jul 16, 2024 at 03:48:17PM +0200, Michal Hocko wrote: > > ... > > > > This behavior is particularly useful for work scheduling systems th= at > > > > need to track memory usage of worker processes/cgroups per-work-ite= m. > > > > Since memory can't be squeezed like CPU can (the OOM-killer has > > > > opinions), these systems need to track the peak memory usage to com= pute > > > > system/container fullness when binpacking workitems. > > > > Swap still has bad reps but there's nothing drastically worse about it = than > > page cache. ie. If you're under memory pressure, you get thrashing one = way > > or another. If there's no swap, the system is just memlocking anon memo= ry > > even when they are a lot colder than page cache, so I'm skeptical that = no > > swap + mostly anon + kernel OOM kills is a good strategy in general > > especially given that the system behavior is not very predictable under= OOM > > conditions. > > > > > As mentioned down the email thread, I consider usefulness of peak val= ue > > > rather limited. It is misleading when memory is reclaimed. But > > > fundamentally I do not oppose to unifying the write behavior to reset > > > values. > > > > The removal of resets was intentional. The problem was that it wasn't c= lear > > who owned those counters and there's no way of telling who reset what w= hen. > > It was easy to accidentally end up with multiple entities that think th= ey > > can get timed measurement by resetting. > > > > So, in general, I don't think this is a great idea. There are shortcomi= ngs > > to how memory.peak behaves in that its meaningfulness quickly declines = over > > time. This is expected and the rationale behind adding memory.peak, IIR= C, > > was that it was difficult to tell the memory usage of a short-lived cgr= oup. > > > > If we want to allow peak measurement of time periods, I wonder whether = we > > could do something similar to pressure triggers - ie. let users registe= r > > watchers so that each user can define their own watch periods. This is = more > > involved but more useful and less error-inducing than adding reset to a > > single counter. > > > > Johannes, what do you think? > > I'm also not a fan of the ability to reset globally. > > I seem to remember a scheme we discussed some time ago to do local > state tracking without having the overhead in the page counter > fastpath. The new data that needs to be tracked is a pc->local_peak > (in the page_counter) and an fd->peak (in the watcher's file state). > > 1. Usage peak is tracked in pc->watermark, and now also in pc->local_peak= . > > 2. Somebody opens the memory.peak. Initialize fd->peak =3D -1. > > 3. If they write, set fd->peak =3D pc->local_peak =3D usage. > > 4. Usage grows. > > 5. They read(). A conventional reader has fd->peak =3D=3D -1, so we retur= n > pc->watermark. If the fd has been written to, return max(fd->peak, pc-= >local_peak). > > 6. Usage drops. > > 7. New watcher opens and writes. Bring up all existing watchers' > fd->peak (that aren't -1) to pc->local_peak *iff* latter is bigger. > Then set the new fd->peak =3D pc->local_peak =3D current usage as in 3= . > > 8. See 5. again for read() from each watcher. > > This way all fd's can arbitrarily start tracking new local peaks with > write(). The operation in the charging fast path is cheap. The write() > is O(existing_watchers), which seems reasonable. It's fully backward > compatible with conventional open() + read() users. That scheme seems viable, but it's a lot more work to implement and maintai= n than a simple global reset. Since that scheme maintains a separate pc->local_peak, it's not mutually exclusive with implementing a global reset now. (as long as we reserve a way to distinguish the different kinds of writes). As discussed on other sub-threads, this might be too niche to be worth the significant complexity of avoiding a global reset. (especially when users would likely be moving from cgroups v1 which does have a global reset= ) Thanks, --=20 David Finkel Senior Principal Software Engineer, Core Services