From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 012A6C3DA59 for ; Tue, 16 Jul 2024 17:10:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44FBB6B0088; Tue, 16 Jul 2024 13:10:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4000B6B0089; Tue, 16 Jul 2024 13:10:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C7AA6B008A; Tue, 16 Jul 2024 13:10:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0DC9E6B0088 for ; Tue, 16 Jul 2024 13:10:29 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id ACC49C0725 for ; Tue, 16 Jul 2024 17:10:28 +0000 (UTC) X-FDA: 82346254536.23.CE0748C Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf29.hostedemail.com (Postfix) with ESMTP id D999E120007 for ; Tue, 16 Jul 2024 17:10:26 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b="KZy/w5GI"; spf=pass (imf29.hostedemail.com: domain of davidf@vimeo.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721149789; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/1c3365id7LMdS2yg0ILYdAqscK4GzKxY0IS4H4GUy4=; b=pBxmXO/nxKCEU31b1AliwA2iFbwieZwYYubxH4fTvBh65o0Eq6bXIAlNd8O6aKfakUMoVz rZWawRRF3sjOtVH16ONCP+rjY2dopZ9tj92YKZohGnIvRhOMhRgV71kv1TdiYVNZ11EQK6 PtJ5Av4T73fDESDEZ18kwOT3+p9ItV8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721149789; a=rsa-sha256; cv=none; b=LxeNv/VoSIRRJ4WUg2p1iWEve8rbYETB5/cxurXmQKj0GDfIDg98gr12DPpAys+rkWDmcm fBMb+Tt+3mn/dmxjjBVVt3IQvFLDY8ZqE+oJOqc3QdEg6oLK2Z8Z/5nTtRAYZV/QMAsZJ5 4lBvIpw3VTaMrXuJDKkTag9ZWs+CHhc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b="KZy/w5GI"; spf=pass (imf29.hostedemail.com: domain of davidf@vimeo.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-70af4868d3dso4405562b3a.3 for ; Tue, 16 Jul 2024 10:10:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vimeo.com; s=google; t=1721149825; x=1721754625; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/1c3365id7LMdS2yg0ILYdAqscK4GzKxY0IS4H4GUy4=; b=KZy/w5GIy4+nbTPmBhz+h9aooUgskUwt5eraxUjGTHV6bwUXKngaGahCHevopC4LOe FI2V8E0cFW07GGhXm4opWtzvzabhhzG7vcv3gjR0MwF4vUEr5mLv1/w3Z55RDmqXepy8 UcfYVVF9egw1jZ3wD17TUKqCnMSUI2e8mUDaw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721149825; x=1721754625; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/1c3365id7LMdS2yg0ILYdAqscK4GzKxY0IS4H4GUy4=; b=iWdnZvIjMuwunVbfLmjU2jDx36A1DESa5iSk69HmcaneVYwX7qW34Ef5A5wpHHBMZa pdetQxgKPjUsldnmtop+vgPG0CFDqEPA2zgERiL37xO3/c+RzWZ6f+fJhC2aFalGb+hX Sg7/jZ8o6ZLfLbblnvyrX4VAk7SxfegA0Vw4jYnqOaYQGnBWAfRfFtoQMM2YlLvaOukd fbQDCKa3aCWdmzPCIg9sbLKwwDDvM8zF0s6RzZ9aJM3sjJOocHryPMVJmBs4d0skl8fH umy0eR+yTx6mBLBsFHOiiguUU18noVFjqQb8Z/tYWnN+khGrso7b/X7yK/9Hgz3jGPge rUVA== X-Forwarded-Encrypted: i=1; AJvYcCVtH0TdKzhn1MrntREbzhHaoccklkaeiNkngn9Pm4EMsYBC4nYEeV51Cqz3IvqlAaXc1lx9n2mBksY6L8gNvAEiQxU= X-Gm-Message-State: AOJu0Yzhi9Vp5vcb32Ba2YqEZQRaTD5OJ0zQg8zJK3yAyYwrFCC9Byyn TCdl4LkRUiujCpN85BThMhOK0BKH1tdoiEETHaYNpP3eXKe6S9bHxXPLQL57vpUoG7MmawhIIhY dnnO9YRDz7MKpbzCiZBJddDgKUkppDywlQpuPvA== X-Google-Smtp-Source: AGHT+IGxrFhf+t9CbrOmTbCaTrzVOqPOdzLWgYcjX29oem9uwcU5xToAeNAhBIZrmTimW7gqvO45de+d88JBZoYD3qY= X-Received: by 2002:a05:6a21:32a7:b0:1c3:18f8:ebc9 with SMTP id adf61e73a8af0-1c3f1271886mr3158626637.41.1721149825418; Tue, 16 Jul 2024 10:10:25 -0700 (PDT) MIME-Version: 1.0 References: <20240715203625.1462309-1-davidf@vimeo.com> <20240715203625.1462309-2-davidf@vimeo.com> In-Reply-To: From: David Finkel Date: Tue, 16 Jul 2024 13:10:14 -0400 Message-ID: Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers To: Tejun Heo Cc: Michal Hocko , Muchun Song , Andrew Morton , core-services@vimeo.com, Jonathan Corbet , Roman Gushchin , Shakeel Butt , Shuah Khan , Johannes Weiner , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D999E120007 X-Stat-Signature: 37nfakd7zujeohnpjz8rgw8k6tz7c993 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1721149826-238422 X-HE-Meta: U2FsdGVkX1+6sFvwW4217ARZE2v5QNrblbHuWy8ap+JqUF4NzwPOLgX+RFvzTe3UD+vFDzrlV+VIVIGwWLwQ4oSvWMrhr25dFSUsWQoivia0tMTH26Gb+KBxJzL5S1wxZ1fAjq+M3kWkEwwtUFOr4xur39VXJKXEpTzdf1erPCi6PDvEZzTAvm16AEMnVSDG608L9eDYxRpFCL5UbWjCiE4CXh1PbqzVomeEd20Jm3ERWHyTzu9gruHnIZcJkFr3y7lGEwK8OVu4O0PSl0P6rfQeFcm9EImq25Ad2acgZ17WRhQemawhE8YP28vBY0V8C2F7yAhn0iqNeSADItJIqPDtGibVNCTW17ZsnhZ6MADgVnWsehgRo1e2YgXjkocQrsGTHY5JGgGQKbOebcu6FTli4+50MJTMINdYi0zTOWGlaxTF7F3AG0V3wuw86RiCrPK4pm9MBWYpjJWqIvXokTjdowPlmtu8iLyA0AQ/+FUpoilU03Q0LZ5Jz60ji73jXuHl1wSQvoU/KXqNs8AXyE97JPAvgWm2BaOH3FdsTeJ98BoYoAdnCIQ2eDLDMh2sn8HnVqAsUDHLScaZm5xfVtFr26aj/iJUvCo7yKDM9zdcNo0bGDCGPzl86YxAiRIw1hVM/H9ew7r5DTpLVaJ1Y2RgibttnaibbByWM8zq9VzpL9F82WX5x2+r+HcPxgtIr8dfCvXZZuW0Je9LOoWvXe/EFaRdLX3pzqXSxKREO50476CGZ6yNHqcQYVA4zNCnQf6444gpwmn/Z+vRL6act0Z+wP4B254O1rfaaOJAMXUxdq4EVea6Ke8+pYrN2vOxORvlHHpYeEB5ogJUscaDkcaEq2GRWiBUGGVuw7Co/UhYvl9fD0lrcGOIAlSiLkM0UUTl0JDxMV1PdchW9QIR+0HyCcc6XaeFH4LF3RchaD6kCD+Jy6GKHUNpXLW/3VXUJFIIMkppIM76aCLX0y4 MeTTEPBP RIRmxSwi6zyA9aAoPYpkdXmSHk5wFXm4yirLJIyOAcJ8p8dN+Uu1Uxx8fnPcjtkUmhNVWSaaZQcSMWhLeuqq5ts9R7hzhyF5rb/vXEfmAuxPcwl6mihYO9rHI1ZbROpnrbJnhUAwuFvhZpFS/yWG4781TJeG2h3XKtKFhASmCcPEoV9x/TNf1radskOlqwwtCEG+2dWX1YoE3ydsMAcjoPnD8/fL/KYVaLcLI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 16, 2024 at 12:44=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Tue, Jul 16, 2024 at 03:48:17PM +0200, Michal Hocko wrote: > ... > > > This behavior is particularly useful for work scheduling systems that > > > need to track memory usage of worker processes/cgroups per-work-item. > > > Since memory can't be squeezed like CPU can (the OOM-killer has > > > opinions), these systems need to track the peak memory usage to compu= te > > > system/container fullness when binpacking workitems. > > Swap still has bad reps but there's nothing drastically worse about it th= an > page cache. ie. If you're under memory pressure, you get thrashing one wa= y > or another. If there's no swap, the system is just memlocking anon memory > even when they are a lot colder than page cache, so I'm skeptical that no > swap + mostly anon + kernel OOM kills is a good strategy in general > especially given that the system behavior is not very predictable under O= OM > conditions. The reason we need peak memory information is to let us schedule work in a = way that we generally avoid OOM conditions. For the workloads I work on, we generally have very little in the page-cache, since the data isn't stored locally most of the time, but streamed from other storage/database systems. For those cases, demand-paging will cause large variations in servicing time, and we'd rather restart the process than have unpredictable latency. The same is true for the batch/queue-work system I wrote this patch to supp= ort. We keep very little data on the local disk, so the page cache is relatively small. > > > As mentioned down the email thread, I consider usefulness of peak value > > rather limited. It is misleading when memory is reclaimed. But > > fundamentally I do not oppose to unifying the write behavior to reset > > values. > > The removal of resets was intentional. The problem was that it wasn't cle= ar > who owned those counters and there's no way of telling who reset what whe= n. > It was easy to accidentally end up with multiple entities that think they > can get timed measurement by resetting. > > So, in general, I don't think this is a great idea. There are shortcoming= s > to how memory.peak behaves in that its meaningfulness quickly declines ov= er > time. This is expected and the rationale behind adding memory.peak, IIRC, > was that it was difficult to tell the memory usage of a short-lived cgrou= p. > > If we want to allow peak measurement of time periods, I wonder whether we > could do something similar to pressure triggers - ie. let users register > watchers so that each user can define their own watch periods. This is mo= re > involved but more useful and less error-inducing than adding reset to a > single counter. I appreciate the ownership issues with the current resetting interface in the other locations. However, this peak RSS data is not used by all that many applications (as evidenced by the fact that the memory.peak file was only added a bit over a year ago). I think there are enough cases where ownership is enforced externally that mirroring the existing interface to cgroup2 is sufficient. I do think a more stateful interface would be nice, but I don't know whether I have enough knowledge of memcg to implement that in a reasonable amount of time. Ownership aside, I think being able to reset the high watermark of a process makes it significantly more useful. Creating new cgroups and moving processes around is significantly heavier-weight. Thanks, > > Johannes, what do you think? > > Thanks. > > -- > tejun --=20 David Finkel Senior Principal Software Engineer, Core Services