From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A39DC3DA49 for ; Tue, 16 Jul 2024 19:48:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6D336B008A; Tue, 16 Jul 2024 15:48:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1DA76B008C; Tue, 16 Jul 2024 15:48:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE4446B0093; Tue, 16 Jul 2024 15:48:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A06AD6B008A for ; Tue, 16 Jul 2024 15:48:17 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 07224C0772 for ; Tue, 16 Jul 2024 19:48:17 +0000 (UTC) X-FDA: 82346652234.23.93609C9 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf20.hostedemail.com (Postfix) with ESMTP id 2A99A1C0021 for ; Tue, 16 Jul 2024 19:48:14 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VXWTDQMi; spf=pass (imf20.hostedemail.com: domain of htejun@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721159256; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=afKFF0MqOKII6px42Zfqj+kP7vnjeE8h4vRzd8rEIWY=; b=6Gk3hTV0np8S6tHZIELi5Hyd0uu1udhhhx68j+IOvynZL1czPXe1QvFjHVQT0VilemNa6d OLo1k2VwGkRmD9vx0zTe2nH0kH4ZJYNX9/WytfVrLpabfknf/4QQReh4uF7oAgb7mK5eGR cVh3o8CPcRdmYIcgz4CBzF38W5KwY5c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721159256; a=rsa-sha256; cv=none; b=k3awCEWV1lUCX1LgcBcFLqsBk/ne1WzqjdL893yjco8YeQp07rM+Sa0IVV3FBMz0O9XIyg plc7Kr8jXuWgN9XIbEIsG+BAfRTfZZMPOa7I2hRt6slkr9gGDsJpprM0J2xEfs54M6AQFq u+eLd8AiKfkJwvG0cDSIAl8LOmTGTgs= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VXWTDQMi; spf=pass (imf20.hostedemail.com: domain of htejun@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1fb3b7d0d56so32971935ad.1 for ; Tue, 16 Jul 2024 12:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721159294; x=1721764094; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=afKFF0MqOKII6px42Zfqj+kP7vnjeE8h4vRzd8rEIWY=; b=VXWTDQMiNSyzFeEqwXsaAZcgLgHZM70h9sFxNEyABXdZxxHLgou1u+0gwdfaixQ/YL pfTbIZ4oh1CnhGoHQ61O14PdkcexqWqKrB4yH0/WEh7ihTN9fdevcd6sgUs3+1C5NZjv SY417i1yU52YcbibshWU/tdPh36qrPG4CMf7Xkolev9YoHUNFzW/let2Y6ZPRjjxF2PX hhvDwoNP3l6KKwbZ6YywlOIFZwYilvq0WlEKpvY1WWV1E95mxpPPe9IoqawC7jpk1oqZ iyYGVj7XpfIpHr7vDbzUxltOXbJXSofLViKGCLCDJ5Z68fFo2e2vaqGZdypwQV+2Hby/ qZFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721159294; x=1721764094; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=afKFF0MqOKII6px42Zfqj+kP7vnjeE8h4vRzd8rEIWY=; b=LGSdnfwXv3XqOTeTPlgGn5h9xnTK+LVwicMtFTKMd/PebOI7ZLUYMeozmIIgnWpnhu HU52Vr+yu2Jk9VnU/7UEo0YwA0hzUXJzfa19j8PxmAj6W6kvzrghsw4vG4NLe6iNm4/1 jhSS4EJM8jvDl9Qz8AfVhEFn/xTTv9nYY81l+Al7rMbL6QYwwAkF8xcE5RytjGy1M34N eKo8zbFcUrzjm+Vee6hqgBlk9EcQ4Qz0mIm6LbJI+eSnHJVaIUROV+J92kVooUHzh+Yi Eg8t9+hYbRrTeop1fzNYodqUS7VxSo9HPkKAHOmYp3CYu8SbnrMvpu+D6AJCbye0MMr9 cylg== X-Forwarded-Encrypted: i=1; AJvYcCWiy9Yha/aOXJRtwesLPvQ/jWlZk+Z8uyQuF3FKfKLe4+fcvASD0omVKqaM4HJhRd2+y+yToLFNBKPsirp1E/1bqPA= X-Gm-Message-State: AOJu0YzmSQa5WepwpdM2ew0LBLEGoWkPWnZkZruud3iWgJoOsLYj57jb 8KHYnPwcAEQLpYsX+WjN1MAvIr8DjkeemSoX7pR7fhetmq6Ww27w X-Google-Smtp-Source: AGHT+IFZ+3eBSrn8MmZv7XoeGhU1YbpdzqQrQM5X3Nv7c207QpLA42+Ma7V5iHZbJeYh8qf2cFhkhA== X-Received: by 2002:a17:903:41c3:b0:1fb:9b91:d7db with SMTP id d9443c01a7336-1fc3d941907mr26937265ad.19.1721159293721; Tue, 16 Jul 2024 12:48:13 -0700 (PDT) Received: from localhost (dhcp-141-239-149-160.hawaiiantel.net. [141.239.149.160]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fc0bbbf792sm62127375ad.96.2024.07.16.12.48.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jul 2024 12:48:13 -0700 (PDT) Date: Tue, 16 Jul 2024 09:48:11 -1000 From: Tejun Heo To: David Finkel Cc: Michal Hocko , Muchun Song , Andrew Morton , core-services@vimeo.com, Jonathan Corbet , Roman Gushchin , Shakeel Butt , Shuah Khan , Johannes Weiner , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers Message-ID: References: <20240715203625.1462309-1-davidf@vimeo.com> <20240715203625.1462309-2-davidf@vimeo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2A99A1C0021 X-Stat-Signature: bkwtmxe1nunbr6ufjhejxjz8ukuw4peu X-HE-Tag: 1721159294-534917 X-HE-Meta: U2FsdGVkX19kbw8HaIC3AhHwf4xLODvoYbCjcrl0q/BQtDysvkEZxDfsp6uTR285nTBIoB6XuUfVHPxlhajq1aZcDPWRstBNbdxs4bAT+tWVH6/hJAqCAMsPY9mj7FH6ICuR948DYU9nZS4rrxZniCtvoOGHZ2+V5Oqyo9p2q7mIgOcRM+qSCooYwiw6HH970MErJmSu/tqJKt40T6Vqc7byMkGEfYkg03f+EB+s056Vj4+CzQu1IiLQaJN6T956LsN8Ya8yKZKH8L8Bn4pRk0UCWqArNNYvG5R4LX6RtxqiNqKl45nImokQHm4b2WHb5TwnmuQr2X46ebengXVaJT0KEFdo2X08uL87twNiO4eMQw2dljRgd1bNYVBPmIHolKeRPuD9iht+gmnSGSkLS+bHmNBstTiN7GF/7Z2gNx07YJoqwkw2ZHbDaWdy7I+hhk1jex/DvHa60q2wi96ht5vB1cUCLDRKgJFc3JQAsz5shbgfsnHghyt0heE89vBIw9A7TnmlyJyZY3Li0XIzutQVO+Qvh75B3gK4TspJBB4Ig7mf/+VDob7ypK1YwI8+CsEWZ3NrXQbYGW+dRfnj/yFTeGXrIwaHEZ3jUZjSGG2qP+jguOfaVIq5TMLegGuJIPWPMI5nSV7D/iegKHLQbSVmnyKKaEu3d8ndiVJ6SqroIdFqACWwbUajS6zM0ScpdjijBHHwV4j9RAqe36pjiQq17ZAXdZo+ghxBECrSj416EdXYjmxNfDG71yzETFuB3F7tMONh1xeaszMswC3STu/sLQQbsctrtUns+5GaTJGIjRd17REUC+0y6qkKuD6ZzCWrXznHmHzW5JIjQFq/Yq1gVnVFJCGPW0nJM0ZNT3s2UkYa/cD38i+PWO4/mZJB/osIj4m/PBfbSm6VTkpY+t65fR/G56QNX+7/Tkg6UlAArVgVWEuP8VlB09tUVGho8xdzDtL7brvIrF/FLPq 52+C+Qp3 Y2hYHsgvuUEzWIOG01dVhINVx8aVHh734mDIbx9TaIX2bem7nDTpkdbqjN9AmYjbtnK++PObib1+Az4z8QApVNRMozfwpZ27h5Hh9fkQh5sbdNq7JsYCWAdYgyg9xuuO8OebvNxo04Jld04uw8lh4rMXB71z0hjaBeWs4GfY9itmHVwfJIVo3xzy/qN4/aWKD6HcLulGPDCqowy2T37+qfKfdGUmrRUK/yjpuo+8m/mi/VP9CCZmGDVR7jMxZ4OJ9T86IwbkgyS6Qa0vhtPANeaJW+Tiktcy6yUeEKzME94yJfFImv4M6HOg1Ot20yqQ20fugDm2ieiCfC6cwOfvYIdxifklWF81edFoyOHaHwJy6yKNSFaVh3TsY9PcWJFpvn0/+dkRTZaTgD+k0AzweTlVYdtdWyWGOnB/zcZm/f6MsdcKACDdX1nmQHtva6EWe5yDv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello, On Tue, Jul 16, 2024 at 01:10:14PM -0400, David Finkel wrote: > > Swap still has bad reps but there's nothing drastically worse about it than > > page cache. ie. If you're under memory pressure, you get thrashing one way > > or another. If there's no swap, the system is just memlocking anon memory > > even when they are a lot colder than page cache, so I'm skeptical that no > > swap + mostly anon + kernel OOM kills is a good strategy in general > > especially given that the system behavior is not very predictable under OOM > > conditions. > > The reason we need peak memory information is to let us schedule work in a > way that we generally avoid OOM conditions. For the workloads I work on, > we generally have very little in the page-cache, since the data isn't > stored locally most of the time, but streamed from other storage/database > systems. For those cases, demand-paging will cause large variations in > servicing time, and we'd rather restart the process than have > unpredictable latency. The same is true for the batch/queue-work system I > wrote this patch to support. We keep very little data on the local disk, > so the page cache is relatively small. You can detect these conditions more reliably and *earlier* using PSI triggers with swap enabled than hard allocations and OOM kills. Then, you can take whatever decision you want to take including killing the job without worrying about the whole system severely suffering. You can even do things like freezing the cgroup and taking backtraces and collecting other debug info to better understand why the memory usage is blowing up. There are of course multiple ways to go about things but I think it's useful to note that hard alloc based on peak usage + OOM kills likely isn't the best way here. ... > I appreciate the ownership issues with the current resetting interface in > the other locations. However, this peak RSS data is not used by all that > many applications (as evidenced by the fact that the memory.peak file was > only added a bit over a year ago). I think there are enough cases where > ownership is enforced externally that mirroring the existing interface to > cgroup2 is sufficient. It's fairly new addition and its utility is limited, so it's not that widely used. Adding reset makes it more useful but in a way which can be deterimental in the long term. > I do think a more stateful interface would be nice, but I don't know > whether I have enough knowledge of memcg to implement that in a reasonable > amount of time. Right, this probably isn't trivial. > Ownership aside, I think being able to reset the high watermark of a > process makes it significantly more useful. Creating new cgroups and > moving processes around is significantly heavier-weight. Yeah, the setup / teardown cost can be non-trivial for short lived cgroups. I agree that having some way of measuring peak in different time intervals can be useful. Thanks. -- tejun