From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E91A0C4743C for ; Mon, 21 Jun 2021 18:20:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5025C606A5 for ; Mon, 21 Jun 2021 18:20:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5025C606A5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=metux.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C9CC66B008A; Mon, 21 Jun 2021 14:20:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4D206B008C; Mon, 21 Jun 2021 14:20:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC6F26B0092; Mon, 21 Jun 2021 14:20:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 7991E6B008A for ; Mon, 21 Jun 2021 14:20:35 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0A17A160A8 for ; Mon, 21 Jun 2021 18:20:35 +0000 (UTC) X-FDA: 78278546430.31.A4EA043 Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.134]) by imf24.hostedemail.com (Postfix) with ESMTP id 66B88A0021F9 for ; Mon, 21 Jun 2021 18:20:34 +0000 (UTC) Received: from [192.168.1.155] ([95.118.106.223]) by mrelayeu.kundenserver.de (mreue009 [212.227.15.167]) with ESMTPSA (Nemesis) id 1M6URd-1ltRf50Vq5-006z5z; Mon, 21 Jun 2021 20:20:30 +0200 Subject: Re: [PATCH v1] proc: Implement /proc/self/meminfo To: Shakeel Butt , "Eric W. Biederman" Cc: Alexey Gladkov , Christian Brauner , LKML , Linux Containers , Linux Containers , Linux FS Devel , Linux MM , Andrew Morton , Johannes Weiner , Michal Hocko , Chris Down , Cgroups References: <20210615113222.edzkaqfvrris4nth@wittgenstein> <20210615124715.nzd5we5tl7xc2n2p@example.org> <87zgvpg4wt.fsf@disp2133> From: "Enrico Weigelt, metux IT consult" Message-ID: Date: Mon, 21 Jun 2021 20:20:28 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: tl X-Provags-ID: V03:K1:KY5lFpctxJysfXmhr3/0X4zPaX01gE0qcAW6kk87sRpyKEYXmIY +ioYSFHPqsL79BjDqd51LuP8noghOU0I9UrvywQknlRF+T8gQUcJA/9N83NxVwLnT5sVvPa 30JyPuBrqYb+mmoKIv6LxCoRHdFreiuYJqn1pfZrZUksad7kcd2my/Uz+pzDMWLULmfuWXk xVp+EcUhYxfrvdhV1WeDQ== X-UI-Out-Filterresults: notjunk:1;V03:K0:Y6Z7D6wns+k=:dLaF925qRKNlyiRknu3bK0 HvCcCi7AixtiNj4exjyjcGLOHeCbSoibsN+zPViJ1fWok5J8zKwgogQkBy9zBZpmitM9UJA/p 6GZ+YuBinrXPHIUmq3QAjoVJwIlHBmenMh9lbF1AWExg7cSuR09dxhZCc6TCLtDm+qEdCz46D SVxFrhxqAdR5wt5X3o85zt+ObpE5fYQhvZfy5fxUzt+MGOsbabAAeZzzJsmtO1N4j6Aw6B7LP hBIl+hFDQkfuQse7XzM2ujo0QipO51QzHPDLWngzbjv8aRZyvCKD8EEz56ob2uxHEwcjyiYDm MPFkFhmgejWCBfBocnAUGrDxk/hVpqUiAGAxLAMdqGCnTm4YNGTJu06osh81ePI+7br+HxYh6 pqyhYsy0+vKNqNN8oRSInY+sYGDj5afnKpdM4UN0TRCAZtOU/bKzsGuUbyqX/5RjTlZzCfkuv AEwvjmNH2FveEOPw0VHnfrxOs3qRQDr7nWnR8yeMp7HMaWQ45ZBZJmMiwQW3Z+b9P8kIKKGFl sdoQfOQGu3fgPHuzGQfkg8= X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 66B88A0021F9 Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=none; spf=none (imf24.hostedemail.com: domain of lkml@metux.net has no SPF policy when checking 212.227.126.134) smtp.mailfrom=lkml@metux.net X-Stat-Signature: smyybw4s8dbzdxzd1ypxs7skitqw35e5 X-HE-Tag: 1624299634-183915 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 19.06.21 01:38, Shakeel Butt wrote: > Nowadays, I don't think MemAvailable giving "amount of memory that can > be allocated without triggering swapping" is even roughly accurate. > Actually IMO "without triggering swap" is not something an application > should concern itself with where refaults from some swap types > (zswap/swap-on-zram) are much faster than refaults from disk. If we're talking about things like database workloads, there IMHO isn't anything really better than doing measurements with the actual loads and tuning incrementally. But: what is the actual optimization goal, why an application might want to know where swapping begins ? Computing performance ? Caching + IO Latency or throughput ? Network traffic (e.g. w/ iscsi) ? Power consumption ? >> I do know that hiding the implementation details and providing userspa= ce >> with information it can directly use seems like the programming model >> that needs to be explored. Most programs should not care if they are = in >> a memory cgroup, etc. Programs, load management systems, and even >> balloon drivers have a legitimately interest in how much additional lo= ad >> can be placed on a systems memory. What kind of load exactly ? CPU ? disk IO ? network ? > How much additional load can be placed on a system *until what*. I > think we should focus more on the "until" part to make the problem > more tractable. ACK. The interesting question is what to do in that case. An obvious move by an database system could be eg. filling only so much caches as there's spare physical RAM, in order to avoid useless swapping (since we'd potentiall produce more IO load when a cache is written out to swap, instead of just discarding it) But, this also depends ... #1: the application doesn't know the actual performance of the swap device, eg. the already mentioned zswap+friends, or some fast nvmem for swap vs disk for storage. #2: caches might also be implemented indirectly by mmap()ing the storage file/device and so using the kernel's cache here. in that case, the kernel would automatically discard the pages w/o going to swap. of course that only works if the cache is nothing but copying pages from storage into ram. A completely different scenario would be load management on a cluster like k8s. Here we usually care of cluster performance (dont care about individual nodes so muck), but wanna prevent individual nodes from being overloaded. Since we usually don't know much about the indivdual workload, we probably don't have much other chance than contigous monitoring and acting when a node is getting too busy - or trying to balance when new workloads are started, on current system load (and other metrics). In that case, I don't see where this new proc file should be of much help. > Second, is the reactive approach acceptable? Instead of an upfront > number representing the room for growth, how about just grow and > backoff when some event (oom or stall) which we want to avoid is about > to happen? This is achievable today for oom and stall with PSI and > memory.high and it avoids the hard problem of reliably estimating the > reclaimable memory. I tend to believe that for certain use cases it would be helpful if an application gets notified if some of its pages are soon getting swapped out due memory pressure. Then it could decide on its own which whether it should drop certain caches in order to prevent swapping. --mtx --=20 --- Hinweis: unverschl=C3=BCsselte E-Mails k=C3=B6nnen leicht abgeh=C3=B6rt u= nd manipuliert werden ! F=C3=BCr eine vertrauliche Kommunikation senden Sie bitte ihren GPG/PGP-Schl=C3=BCssel zu. --- Enrico Weigelt, metux IT consult Free software and Linux embedded engineering info@metux.net -- +49-151-27565287