From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DBE0C3DA59 for ; Tue, 16 Jul 2024 13:39:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E068E6B009E; Tue, 16 Jul 2024 09:39:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB6E56B009F; Tue, 16 Jul 2024 09:39:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C57486B00A2; Tue, 16 Jul 2024 09:39:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A7FA46B009E for ; Tue, 16 Jul 2024 09:39:38 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 607D9C05E7 for ; Tue, 16 Jul 2024 13:39:38 +0000 (UTC) X-FDA: 82345723236.23.7E1E0EB Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by imf15.hostedemail.com (Postfix) with ESMTP id 85F74A0005 for ; Tue, 16 Jul 2024 13:39:35 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=V6QawrJf; spf=pass (imf15.hostedemail.com: domain of davidf@vimeo.com designates 209.85.166.50 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721137157; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YW+mzHCCEAL/Dc8/KBWky+xkhTLtxxONIcmsaUT5KqU=; b=JTZ2ut//JE+4EnRu+onwSiQDXbk5EubgtaEQedTuAGbe8WQU/daGDYc3UlwxvQLCDk0R56 KOB2MAXmLWPYkLA7odsdzOS6q0vAkufjyIqZNhSqtIrjJtteyehKAhb8eVTdXF1omFAL1c A7xn9gfLXuWeBOjxYnossZsdbqQLtWQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=V6QawrJf; spf=pass (imf15.hostedemail.com: domain of davidf@vimeo.com designates 209.85.166.50 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721137157; a=rsa-sha256; cv=none; b=r5ZtIeJOPLw0wrfg+bd+p2gTKs31zmTJNSqIttiY4xVB3JiaaqcD01kpMi4VQcnLOEa5S8 kSDkSprEsU/mSk3xZZ6+isi+Q7kd561jyV2XxXyxS38xXC6YCKFYDat4GpFX/QTQqsqZgz sMnWI9Wryi0bTnQ7KCncmOqYTSt3t78= Received: by mail-io1-f50.google.com with SMTP id ca18e2360f4ac-8076ebb7ba1so231485639f.0 for ; Tue, 16 Jul 2024 06:39:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vimeo.com; s=google; t=1721137174; x=1721741974; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YW+mzHCCEAL/Dc8/KBWky+xkhTLtxxONIcmsaUT5KqU=; b=V6QawrJf628nfBpFjvpdpeDTSSDEMtc0BZzBUGRK2UzEA0BDribLAg5/tfrS7IVN6k 5IcdccSWoorNVEDNmGx6g38YHBiXbkeIGur+FejDK7Je0H3FUZtFFx8wJUaoAGMk1IbI XVBLVBWqezlbsNNSFdDyWLum/g1YwZECNUeIg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721137174; x=1721741974; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YW+mzHCCEAL/Dc8/KBWky+xkhTLtxxONIcmsaUT5KqU=; b=NsfRVP9srZZ11oDcZo3YIq8kmK7hzoyzU828OEaA3o9YZ3XUHiHx6qcKPasCUmot4a UXLzOFXHh5bzwQDQFRt6CNnQLMA5gaaBk7TjqOWioSkqAVmpT4o77KrlkSAG+lPenQxH Ik5aRvrNUyyosmWetjl6QccpMvASJmM3eioJw9LKSVsT/EdaG25fGxTw4JbeOdnMRhdt 8KCslS2iMjzfgTYMoJvm9bQewA3PVLPDNM4vlQ/VkY+jpW5eEU2bQm9DLIWA4BXeZEBM QsA4XJBrNkqpD+qZyBSRIquVEyRmiJ4h2dDUJers3phMOUBTGnWsXYIj+QY+DPtbWIpc 4uCQ== X-Forwarded-Encrypted: i=1; AJvYcCUwzjeetMcZEaTVHzhnxdYnqDUPF794LDMqIKHFWcBf/FHl5JDksA0/DSeMNYVe2V9UoLLJKdBLXsjqI4+4dIwsK/s= X-Gm-Message-State: AOJu0YwIYIKjx8FN1gtHdTMO+QK9z7ArCevbajgZbDI955Go+/tbF0CI rDFFwzXcEXuygFneK5VAbzDOo8slNlzt0owXw6ZH3MBTVIwapOPkmSO/MO4KxB1Tu6nK0sUyqt7 DabP5Ulzjs4DdkjzoFwm6Nfb7eBCZWvofAB3Qzg== X-Google-Smtp-Source: AGHT+IFAW6xD2cRw42K+ikziVp4S7EEpVyEzWOobZokAgbO77o8tjQsQwHXYPrPwZ6wknA35Un4xwKN+/cd1Bihx4nQ= X-Received: by 2002:a05:6602:6417:b0:803:980e:5b38 with SMTP id ca18e2360f4ac-81574541183mr302910339f.1.1721137174472; Tue, 16 Jul 2024 06:39:34 -0700 (PDT) MIME-Version: 1.0 References: <20240715203625.1462309-1-davidf@vimeo.com> <20240715203625.1462309-2-davidf@vimeo.com> In-Reply-To: From: David Finkel Date: Tue, 16 Jul 2024 09:39:22 -0400 Message-ID: Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers To: Michal Hocko Cc: Muchun Song , Andrew Morton , core-services@vimeo.com, Jonathan Corbet , Roman Gushchin , Shuah Khan , Johannes Weiner , Tejun Heo , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Shakeel Butt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 85F74A0005 X-Stat-Signature: 7eiup1arijox8z3q9iecu83od5yb6rbo X-HE-Tag: 1721137175-330852 X-HE-Meta: U2FsdGVkX19S+IdDaGUI4knyItd6Pp/OkCMwxBPYcSVUNBNRRs+8W9Cx4T8QXv7xrsPqh1zBHcvfIl9zx2lNR8rUptUlP0KK1nNQpZrDl32XLqKd0/jQXD2m2B7DRDy03xASxkfP55DnKfLABI/qWZqBxNm8pdO+9i5wmc4u4rJIQnVocpg4T/Ix0OZoY7+RrLYub99RmkbJ+dScTCfb0iXcNBjcqGfuXQNMD4TmwdZN0sPRZGhpxbfa3CQ1+4riFZ+wv/dcJspz9qDpb2bt1HLnVBIAea+E/sbVYHG0cQm0BbjJUEBLF/4ZTV+E3npt+UIihSUXa1m1GNzXC3sRbFyscGmb9RcTsUDModpj/JNJHJZOxSwAss6jivux8b+wqQumbcq/qowsq3/oIjULKzpXJgK1Ih4PVRihdKg3YxaXvwWxY+68S22kjlPjz5cne5hXKssoXChFrbcN0iDeIhgkGMFItkQHL8HRo2eSHVsE//n/oXQGeIGTIWFyzLlBHaNhTKFCZKdNALcuBXvcmrt5khaNzFE4GPtzoSkzOlT3m03ZWQlW6Cz2bq35w7ri8JWYmxcNJh2LzmHEd5iY0Szg6hHB0on6Xi7oNxX1DTGrT+pyVnymclLkkialfITRHKfifY6qSEAP7Zhw0igtm5W2xrF+7GEdTJ29xd9jOs/t6BhvrP5UEy/H4DTLH6MhXJJTHJPKV0x+us2RJh/xmDvUufwbsohoU52OJLBKvav83FOva9ra8gOTGWHBJlCWiV1IhIEM+01yJC6ijUOHXpfSN2iltV4GY0AZeUBtSTyQktq3AQsqg+LShz/q2d7C8Mh4l0oyatPSQJV6iHQ9sloe+CCScaO/BGOc4rGPdoMnV+LJx/yLPgvvyMdyIngXx6WB2WUZeD+xAlxC/EseIroTJB0dzCPpMmC+GNVx0RV56CAV1uRhplSouYSOb4qX+/JWT850yIg+IRjpGRU stCgYfLH x9ir00ZBslAOD8WnWX9MfkAvUq5PZIecNtDP3KeqwaLAZnUvmceaE+Cx8jL87nwUb+dAVj5pwrqVMbl93+YMb4AAAE2X6DJBMzX2feBq1UFXHP+iOCYEtiuLmY8EzP/k4WVeFEgPkgrHeo0NYcm1fVrthWH5Kge8RzFiGC8q1g0pm39jlmB1rf2a8Oj2aL/k5geGSnWpGu2Q/oCFNj3UXGNVjHgfdXVnEZI+gzgG4RacBL6W7GFMg35IhKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 16, 2024 at 9:19=E2=80=AFAM Michal Hocko wrot= e: > > On Tue 16-07-24 08:47:59, David Finkel wrote: > > On Tue, Jul 16, 2024 at 3:20=E2=80=AFAM Michal Hocko = wrote: > > > > > > On Mon 15-07-24 16:46:36, David Finkel wrote: > > > > > On Mon, Jul 15, 2024 at 4:38=E2=80=AFPM David Finkel wrote: > > > > > > > > > > > > Other mechanisms for querying the peak memory usage of either a= process > > > > > > or v1 memory cgroup allow for resetting the high watermark. Res= tore > > > > > > parity with those mechanisms. > > > > > > > > > > > > For example: > > > > > > - Any write to memory.max_usage_in_bytes in a cgroup v1 mount = resets > > > > > > the high watermark. > > > > > > - writing "5" to the clear_refs pseudo-file in a processes's p= roc > > > > > > directory resets the peak RSS. > > > > > > > > > > > > This change copies the cgroup v1 behavior so any write to the > > > > > > memory.peak and memory.swap.peak pseudo-files reset the high wa= termark > > > > > > to the current usage. > > > > > > > > > > > > This behavior is particularly useful for work scheduling system= s that > > > > > > need to track memory usage of worker processes/cgroups per-work= -item. > > > > > > Since memory can't be squeezed like CPU can (the OOM-killer has > > > > > > opinions), > > > > > > I do not understand the OOM-killer reference here. Why does it matter= ? > > > Could you explain please? > > > > Sure, we're attempting to bin-packing work based on past items of the s= ame type. > > With CPU, we can provision for the mean CPU-time per-wall-time to get > > a lose "cores" > > concept that we use for binpacking. With CPU, if we end up with a bit > > of contention, > > everything just gets a bit slower while the schedule arbitrates among c= groups. > > > > However, with memory, you only have so much physical memory for the out= er memcg. > > If we pack things too tightly on memory, the OOM-killer is going to kil= l > > something to free up memory. In some cases that's fine, but provisionin= g for the > > peak memory for that "type" of work-item mostly avoids this issue. > > It is still not clear to me how the memory reclaim falls into that. Are > your workloads mostly unreclaimable (e.g. anon mostly consumers without > any swap)? Why I am asking? Well, if the workload's memory is > reclaimable then the peak memory consumption is largely misleading > because an unknown portion of that memory consumption is hidden by the > reclaimed portion of it. This is not really specific to the write > handlers to reset the value though so I do not want to digress this > patch too much. I do not have objections to the patch itself. Clarifying > the usecase with your followup here would be nice. Thanks, I'm happy to clarify things! That's a good point about peak-RSS being unreliable if the memory's reclaim= able. The memory is mostly unreclaimable. It's almost all anonymous mmap, with a few local files that would be resident in buffercache. (but generally aren't mmaped) We don't run with swap enabled on the systems for a few reasons. In particular, kubernetes disallows swap, which ties our hands, but even if it didn't, demand paging from disk tends to stall any useful work, so we'd rather see the OOM-killer invoked, anyway. (we actually have some plans for disabling OOM-kills in these cgroups and letting the userspace process managing these memcgs handle work-throttling and worker-killing when there are OOM-conditions, but that's another story :) ) > > Thanks for the clarification! > -- > Michal Hocko > SUSE Labs --=20 David Finkel Senior Principal Software Engineer, Core Services