From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3996C433DF for ; Sun, 28 Jun 2020 22:15:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 74E7520757 for ; Sun, 28 Jun 2020 22:15:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CPgaqyT1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 74E7520757 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B1BDF6B0003; Sun, 28 Jun 2020 18:15:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACD2C6B0005; Sun, 28 Jun 2020 18:15:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E0CC6B0006; Sun, 28 Jun 2020 18:15:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 84BFF6B0003 for ; Sun, 28 Jun 2020 18:15:29 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D89191EE6 for ; Sun, 28 Jun 2020 22:15:28 +0000 (UTC) X-FDA: 76980027936.16.act69_290134526e6a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id ACD87100E6903 for ; Sun, 28 Jun 2020 22:15:28 +0000 (UTC) X-HE-Tag: act69_290134526e6a X-Filterd-Recvd-Size: 5542 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Sun, 28 Jun 2020 22:15:28 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id z5so7416663pgb.6 for ; Sun, 28 Jun 2020 15:15:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=bXlGWq9uTXnkYLRzPxcb6hpxU9VpPN38TAS14IL4DXU=; b=CPgaqyT1Jx9UOYLiIqMfnX7Tf8eNkL68Qr1/ndCHQChQEw0gyxZi+wZUzq6eBBEYRP pc+/CPbBhO3TLXbzOpeL6KkxlJO3CVKJeraSBqfF3sF+qpffnEgV02KXVmzBtv0OmsTp tycxMlzZy0EYrAHx7xZ6f6yuD9xnqYzgXjiAto3WSR/jC/tWPL6YMP+AkFYNzaBaUe2h iiaGn/d/zTOQrsHPipdPoico8WwIC7LCRefUt6cRYW2jYFVcUa/MCvQc7g+uFidtHVsz gRz566TTpsUYf7tNIoceau+1/KmB0K+YMCwhdT3EdozGhMZmhEgzCrwl1FuvPMJqmUpp m35w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=bXlGWq9uTXnkYLRzPxcb6hpxU9VpPN38TAS14IL4DXU=; b=l5W+C0M6d0Yp5JxBylbtVkuGOgUkOgnUapl9g6xV95J4ZyKIwWzm7mUuJhEAAPmbja g31nW4A935rPDaBeiuW5KhqPnv8DSz03mR0fo1FfEi1wCLrfwYA3svF09d2wDuiaI7+z VRC2mn7V+s5ZYbiQERzZPs5INjbdR0udSthdqriIsuhKRgbqd02sHsfjtgnfi06lCEtp T/NHd9kliqhYQIp4JV8BWHILnvhaW9QqaoRaoKXLYlA3yh55fs1mjbRCdm/Xb8GIUC05 9ZsYUcbRRWvt0sKigShjUBsnFMnUq6KPajAoiRv6ugFQUp48iARs2TokOrnJf8Qu9yam j6Qw== X-Gm-Message-State: AOAM531IV589zCd2dclMuq+MqVSOZwuOkM+NHfCljpqOpoRj2gER3APz uzirXlwp/gJG0rTqAo4rjRU66w== X-Google-Smtp-Source: ABdhPJw3SWyyPGWA4SeTounVeFBQUs7hNtLlrfRHk9vJ15BE96RxB65x681BZVgnTswZzLCDt28uuA== X-Received: by 2002:a62:5c02:: with SMTP id q2mr12255403pfb.232.1593382527157; Sun, 28 Jun 2020 15:15:27 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id t19sm8464123pgg.19.2020.06.28.15.15.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jun 2020 15:15:26 -0700 (PDT) Date: Sun, 28 Jun 2020 15:15:25 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Johannes Weiner , Michal Hocko , Vladimir Davydov cc: Andrew Morton , Shakeel Butt , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Memcg stat for available memory Message-ID: User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: ACD87100E6903 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi everybody, I'd like to discuss the feasibility of a stat similar to si_mem_available() but at memcg scope which would specify how much memory can be charged without I/O. The si_mem_available() stat is based on heuristics so this does not provide an exact quantity that is actually available at any given time, but can otherwise provide userspace with some guidance on the amount of reclaimable memory. See the description in Documentation/filesystems/proc.rst and its implementation. [ Naturally, userspace would need to understand both the amount of memory that is available for allocation and for charging, separately, on an overcommitted system. I assume this is trivial. (Why don't we provide MemAvailable in per-node meminfo?) ] For such a stat at memcg scope, we can ignore totalreserves and watermarks. We already have ~precise (modulo MEMCG_CHARGE_BATCH) data for both file pages and slab_reclaimable. We can infer lazily free memory by doing file - (active_file + inactive_file) (This is necessary because lazy free memory is anon but on the inactive file lru and we can't infer lazy freeable memory through pglazyfree - pglazyfreed, they are event counters.) We can also infer the number of underlying compound pages that are on deferred split queues but have yet to be split with active_anon - anon (or is this a bug? :) So it *seems* like userspace can make a si_mem_available()-like calculation ("avail") by doing free = memory.high - memory.current lazyfree = file - (active_file + inactive_file) deferred = active_anon - anon avail = free + lazyfree + deferred + (active_file + inactive_file + slab_reclaimable) / 2 For userspace interested in knowing how much memory it can charge without incurring I/O (and assuming it has knowledge of available memory on an overcommitted system), it seems like: (a) it can derive the above avail amount that is at least similar to MemAvailable, (b) it can assume that all reclaim is considered equal so anything more than memory.high - memory.current is disruptive enough that it's a better heuristic than the above, or (c) the kernel provide an "avail" stat in memory.stat based on the above and can evolve as the kernel implementation changes (how lazy free memory impacts anon vs file lru stats, how deferred split memory is handled, any future extensions for "easily reclaimable memory") that userspace can count on to the same degree it can count on MemAvailable. Any thoughts?