From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D2F2C48BCD for ; Wed, 9 Jun 2021 20:31:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EB1BD613E6 for ; Wed, 9 Jun 2021 20:31:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EB1BD613E6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 174E96B006C; Wed, 9 Jun 2021 16:31:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 126736B006E; Wed, 9 Jun 2021 16:31:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB9536B0070; Wed, 9 Jun 2021 16:31:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0104.hostedemail.com [216.40.44.104]) by kanga.kvack.org (Postfix) with ESMTP id BA4736B006C for ; Wed, 9 Jun 2021 16:31:41 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4B0C7DB51 for ; Wed, 9 Jun 2021 20:31:41 +0000 (UTC) X-FDA: 78235331202.32.7B2E821 Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) by imf30.hostedemail.com (Postfix) with ESMTP id 1ADBAE0004D6 for ; Wed, 9 Jun 2021 20:31:37 +0000 (UTC) Received: by mail-qk1-f180.google.com with SMTP id o27so25049529qkj.9 for ; Wed, 09 Jun 2021 13:31:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=4tou8Ij0MR64Lz7wvtzPtmf/akDmg+C8gLyT+EbWhtY=; b=l+WAIxTSX0+7AYWb4Zsg5/UW3epyBa2JngfMmH0BO49mEDNpyjh33mYiwtMkRHBRiw YdvYAcIGIyCuJ22jFnhjm87zBEui4oDOaT/MnaQaXqfxDSvz655+8bUJm2UzugvUUmJs R9fTjM9fUlHY/QJFZaEnM2gAqGukpqbHAHUgo7qMZabYr9tqOQg0bUBlRKxYlf9uIOcF ZXhdnylcHfz5zUNB38jfPl5MZ9sYaP8rr0bg/CWu3ufkqHG9r7R8Gh90vZ347oxwICng 7ckPHHOTIvlkZB5uCqYQ2UMMmD0lYzSyfppjJ7ZtIr4QO+GtyOZTyc9iG7r7zL87W+fV Xelw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=4tou8Ij0MR64Lz7wvtzPtmf/akDmg+C8gLyT+EbWhtY=; b=lC2IQ3OsutQI8TEujQo4wKA5mSEFiGKmAywzT7Bi7Zf12Ta30cxClbj4jUPv+yJLJG MTbkXHJQwIznTCdQ2B0K1g3tIYnC+WQI3tUegy2jCPwrIJMgp6oYCpJcLUSEN/TYCDnI UpAleoQ81UIHcMqI7YmHe1XPr7hvpytD0h6y95+2u1IYZ8wHgqJPec/vvk/SyVPy6OZM idGg6ytfMLUntzpZ868VYCD0PCcDCFyr+1sFAFirkT70q0OubTTx2ddZPK8lrn9NUl5a Tfo4lGWZFQoMfs2bmi+fXtzCCF7p1k3/rRI3Lnq0uzB6Awl98CzuMKHEE1cPde7gufPm RBnQ== X-Gm-Message-State: AOAM532NfjjM5u++39OXE7t0j+75Z8Z1P3dk8rRENUeTGGpR2yJMd2xp /akK1SX2OLgYAvEpuRZLnBEeRg== X-Google-Smtp-Source: ABdhPJxUlgcTmZr9HuiF5jr294kcz/LFgjRp/R6q/V5HHkkvWATBaunm70lpEJrVBM/OdCEj9/twcQ== X-Received: by 2002:a37:848:: with SMTP id 69mr1444328qki.411.1623270699692; Wed, 09 Jun 2021 13:31:39 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:2165]) by smtp.gmail.com with ESMTPSA id e3sm789600qts.34.2021.06.09.13.31.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Jun 2021 13:31:38 -0700 (PDT) Date: Wed, 9 Jun 2021 16:31:37 -0400 From: Johannes Weiner To: "Eric W. Biederman" Cc: "Enrico Weigelt, metux IT consult" , Chris Down , legion@kernel.org, LKML , Linux Containers , Linux Containers , Linux FS Devel , linux-mm@kvack.org, Andrew Morton , Christian Brauner , Michal Hocko Subject: Re: [PATCH v1] proc: Implement /proc/self/meminfo Message-ID: References: <87k0n2am0n.fsf@disp2133> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87k0n2am0n.fsf@disp2133> Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=l+WAIxTS; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf30.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.180 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org X-Rspamd-Server: rspam02 X-Stat-Signature: 71cem8efcp1uxapdhwe5y9z3rfr4j7rj X-Rspamd-Queue-Id: 1ADBAE0004D6 X-HE-Tag: 1623270697-841452 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 09, 2021 at 02:14:16PM -0500, Eric W. Biederman wrote: > "Enrico Weigelt, metux IT consult" writes: > > > On 03.06.21 13:33, Chris Down wrote: > > > > Hi folks, > > > > > >> Putting stuff in /proc to get around the problem of "some other metric I need > >> might not be exported to a container" is not a very compelling argument. If > >> they want it, then export it to the container... > >> > >> Ultimately, if they're going to have to add support for a new > >> /proc/self/meminfo file anyway, these use cases should just do it properly > >> through the already supported APIs. > > > > It's even a bit more complex ... > > > > /proc/meminfo always tells what the *machine* has available, not what a > > process can eat up. That has been this way even long before cgroups. > > (eg. ulimits). > > > > Even if you want a container look more like a VM - /proc/meminfo showing > > what the container (instead of the machine) has available - just looking > > at the calling task's cgroup is also wrong. Because there're cgroups > > outside containers (that really shouldn't be affected) and there're even > > other cgroups inside the container (that further restrict below the > > container's limits). > > > > BTW: applications trying to autotune themselves by looking at > > /proc/meminfo are broken-by-design anyways. This never has been a valid > > metric on how much memory invididual processes can or should eat. > > Which brings us to the problem. > > Using /proc/meminfo is not valid unless your application can know it has > the machine to itself. Something that is becoming increasing less > common. > > Unless something has changed in the last couple of years, reading values > out of the cgroup filesystem is both difficult (v1 and v2 have some > gratuitous differences) and is actively discouraged. > > So what should applications do? > > Alex has found applications that are trying to do something with > meminfo, and the fields that those applications care about. I don't see > anyone making the case that specifically what the applications are > trying to do is buggy. > > Alex's suggest is to have a /proc/self/meminfo that has the information > that applications want, which would be something that would be easy > to switch applications to. The patch to userspace at that point is > as simple as 3 lines of code. I can imagine people take that patch into > their userspace programs. But is it actually what applications want? Not all the information at the system level translates well to the container level. Things like available memory require a hierarchical assessment rather than just a look at the local level, since there could be limits higher up the tree. Not all items in meminfo have a container equivalent, either. The familiar format is likely a liability rather than an asset. > The simple fact that people are using /proc/meminfo when it doesn't make > sense for anything except system monitoring tools is a pretty solid bug > report on the existing linux apis. I agree that we likely need a better interface for applications to query the memory state of their container. But I don't think we should try to emulate a format that is a poor fit for this. We should also not speculate what users intended to do with the meminfo data right now. There is a surprising amount of misconception around what these values actually mean. I'd rather have users show up on the mailing list directly and outline the broader usecase.