From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BAABC48BD1 for ; Thu, 10 Jun 2021 00:36:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE31C613F5 for ; Thu, 10 Jun 2021 00:36:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE31C613F5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2D1886B0036; Wed, 9 Jun 2021 20:36:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2822D6B006E; Wed, 9 Jun 2021 20:36:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0ADA66B0070; Wed, 9 Jun 2021 20:36:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id C3B5D6B0036 for ; Wed, 9 Jun 2021 20:36:13 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 581F1180AD81F for ; Thu, 10 Jun 2021 00:36:13 +0000 (UTC) X-FDA: 78235947426.33.7FA8A30 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 3DB855001533 for ; Thu, 10 Jun 2021 00:36:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623285372; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A7suO4nDgiD+6S3O+tVZ/EsPL4sbgz3yVgPl2ez4lVI=; b=GTie2Hg0Bs4LaqVjzhz9dEqq8bIvqHtC9ivf5r1I02J16ptg6XDd7p/J3b63RH361KQnZz xF7limPZ5NNfAWbQ/GGANC31JoP+DudUFCicLopmK5FUkxrXwusc3BHErBjH6WmmZ/kPf3 DGK+MEHv3DKN+liq7c73UvGt/0SkxOs= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-493-sLwLEXlmObW3NWVmxqT_Sg-1; Wed, 09 Jun 2021 20:36:11 -0400 X-MC-Unique: sLwLEXlmObW3NWVmxqT_Sg-1 Received: by mail-qt1-f199.google.com with SMTP id q6-20020a05622a04c6b0290247f5436033so5908054qtx.5 for ; Wed, 09 Jun 2021 17:36:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:reply-to:subject:to:cc:references:from :organization:message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding:content-language; bh=A7suO4nDgiD+6S3O+tVZ/EsPL4sbgz3yVgPl2ez4lVI=; b=UrwgL2kvK65/dvxq0K7vtc2KBMfMs1ZVq9kp/M8ebkGmJfQY3HtlLjg9ro4lesoOvW sSZMKOi52pnYFvFucH8TkOBgBkhWtqDlABEhZi4B/QBuLdOT1ORCuY+SOFzGUf9nnA13 raRCsgQfzsrpDKKtizVXIbaa23D309aJLILppl0MsVhZu/URC7snFFtQmAZoeDcW4eyl BlAtLcRS/r4O9iI6gjZA1LOuOMVV6G2Nov7BCeWvdn0RWIVtwJPcC4c7ecPSOzzOGRoR RJQcQ2Xi06fNmkudRtxYqMiYO8kOmViE2QwwCtHdZkxjd7pCJ0hX0Rix04F+v75bSslR gy4Q== X-Gm-Message-State: AOAM530MbKXaU+MamM+l0bLU1kkHazPOAS6QzZ0t0CWBCDT1TyUzeMTN P6ra5pY9x/PEhqPXE5L+hVaUbkHgNXiRgTJ17aSKqOPouN9tWlmIXKQSE/EdVXBBhMhVqCjvYXS EPzfJym6prBQ= X-Received: by 2002:a37:68c7:: with SMTP id d190mr2435569qkc.142.1623285370899; Wed, 09 Jun 2021 17:36:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyn8hVENIZXqksUwq2uE591ub9SgveQKdsIPprHRCn76ho8Zr2ahv9KlYeWPtoYIR507+uGUg== X-Received: by 2002:a37:68c7:: with SMTP id d190mr2435543qkc.142.1623285370608; Wed, 09 Jun 2021 17:36:10 -0700 (PDT) Received: from localhost.localdomain (cpe-74-65-150-180.maine.res.rr.com. [74.65.150.180]) by smtp.gmail.com with ESMTPSA id q64sm1203499qkb.73.2021.06.09.17.36.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 09 Jun 2021 17:36:09 -0700 (PDT) Reply-To: dwalsh@redhat.com Subject: Re: [PATCH v1] proc: Implement /proc/self/meminfo To: "Eric W. Biederman" , Johannes Weiner Cc: "Enrico Weigelt, metux IT consult" , Chris Down , legion@kernel.org, LKML , Linux Containers , Linux Containers , Linux FS Devel , linux-mm@kvack.org, Andrew Morton , Christian Brauner , Michal Hocko References: <87k0n2am0n.fsf@disp2133> <87lf7i7o67.fsf@disp2133> From: Daniel Walsh Organization: Red Hat Message-ID: Date: Wed, 9 Jun 2021 20:36:08 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <87lf7i7o67.fsf@disp2133> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GTie2Hg0; spf=none (imf01.hostedemail.com: domain of dwalsh@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=dwalsh@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: b3qqkw3gs7i4163c7rai5psm9j3d4jb1 X-Rspamd-Queue-Id: 3DB855001533 X-Rspamd-Server: rspam06 X-HE-Tag: 1623285368-211843 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/9/21 16:56, Eric W. Biederman wrote: > Johannes Weiner writes: > >> On Wed, Jun 09, 2021 at 02:14:16PM -0500, Eric W. Biederman wrote: >>> "Enrico Weigelt, metux IT consult" writes: >>> >>>> On 03.06.21 13:33, Chris Down wrote: >>>> >>>> Hi folks, >>>> >>>> >>>>> Putting stuff in /proc to get around the problem of "some other met= ric I need >>>>> might not be exported to a container" is not a very compelling argu= ment. If >>>>> they want it, then export it to the container... >>>>> >>>>> Ultimately, if they're going to have to add support for a new >>>>> /proc/self/meminfo file anyway, these use cases should just do it p= roperly >>>>> through the already supported APIs. >>>> It's even a bit more complex ... >>>> >>>> /proc/meminfo always tells what the *machine* has available, not wha= t a >>>> process can eat up. That has been this way even long before cgroups. >>>> (eg. ulimits). >>>> >>>> Even if you want a container look more like a VM - /proc/meminfo sho= wing >>>> what the container (instead of the machine) has available - just loo= king >>>> at the calling task's cgroup is also wrong. Because there're cgroups >>>> outside containers (that really shouldn't be affected) and there're = even >>>> other cgroups inside the container (that further restrict below the >>>> container's limits). >>>> >>>> BTW: applications trying to autotune themselves by looking at >>>> /proc/meminfo are broken-by-design anyways. This never has been a va= lid >>>> metric on how much memory invididual processes can or should eat. >>> Which brings us to the problem. >>> >>> Using /proc/meminfo is not valid unless your application can know it = has >>> the machine to itself. Something that is becoming increasing less >>> common. >>> >>> Unless something has changed in the last couple of years, reading val= ues >>> out of the cgroup filesystem is both difficult (v1 and v2 have some >>> gratuitous differences) and is actively discouraged. >>> >>> So what should applications do? >>> >>> Alex has found applications that are trying to do something with >>> meminfo, and the fields that those applications care about. I don't = see >>> anyone making the case that specifically what the applications are >>> trying to do is buggy. >>> >>> Alex's suggest is to have a /proc/self/meminfo that has the informati= on >>> that applications want, which would be something that would be easy >>> to switch applications to. The patch to userspace at that point is >>> as simple as 3 lines of code. I can imagine people take that patch i= nto >>> their userspace programs. >> But is it actually what applications want? >> >> Not all the information at the system level translates well to the >> container level. Things like available memory require a hierarchical >> assessment rather than just a look at the local level, since there >> could be limits higher up the tree. > That sounds like a bug in the implementation of /proc/self/meminfo. > > It certainly is a legitimate question to ask what are the limits > from my perspective. > >> Not all items in meminfo have a container equivalent, either. > Not all items in meminfo were implemented. > >> The familiar format is likely a liability rather than an asset. > It could be. At the same time that is the only format anyone has > proposed so we good counter proposal would be appreciated if you don't > like the code that has been written. > >>> The simple fact that people are using /proc/meminfo when it doesn't m= ake >>> sense for anything except system monitoring tools is a pretty solid b= ug >>> report on the existing linux apis. >> I agree that we likely need a better interface for applications to >> query the memory state of their container. But I don't think we should >> try to emulate a format that is a poor fit for this. > I don't think it is the container that we care about (except for maybe > system managment tools). I think the truly interesting case is > applications asking what do I have available to me. Have heard that the JRE makes assumptions on the number of threads to=20 use based on memory. Lots of Humans use top and vmstat to try to figure out what is available=20 in their environment.=C2=A0 Debugging tools trying to figure out why an=20 application is running poorly. We would like to not need to mount the cgroup file system into a=20 container at all, and as Eric stated processes trying to differentiate=20 between cgroupv1 and cgroupv2. >> We should also not speculate what users intended to do with the >> meminfo data right now. There is a surprising amount of misconception >> around what these values actually mean. I'd rather have users show up >> on the mailing list directly and outline the broader usecase. > We are kernel developers, we can read code. We don't need to speculate= . > We can read the userspace code. If things are not clear we can ask > their developers. > > Eric > >