From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8981DC433DF for ; Thu, 2 Jul 2020 15:22:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4DB542089D for ; Thu, 2 Jul 2020 15:22:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RK3GXi1a" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4DB542089D Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CBDA86B00A0; Thu, 2 Jul 2020 11:22:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C6E8F6B00A2; Thu, 2 Jul 2020 11:22:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5E736B00A3; Thu, 2 Jul 2020 11:22:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194]) by kanga.kvack.org (Postfix) with ESMTP id 9C0376B00A0 for ; Thu, 2 Jul 2020 11:22:24 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 63116824805A for ; Thu, 2 Jul 2020 15:22:24 +0000 (UTC) X-FDA: 76993502208.11.base93_190d2d826e8a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 314C4180F8B80 for ; Thu, 2 Jul 2020 15:22:24 +0000 (UTC) X-HE-Tag: base93_190d2d826e8a X-Filterd-Recvd-Size: 6475 Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Thu, 2 Jul 2020 15:22:23 +0000 (UTC) Received: by mail-lf1-f50.google.com with SMTP id d21so16456642lfb.6 for ; Thu, 02 Jul 2020 08:22:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uibEynu9caySlMhz/bcWJ3z7yd2MwyAMeTu6m4gB4zc=; b=RK3GXi1aY7MEN+2tdQDYtYDkP2NKLLfuyONsx/qMFu5vMGiUeJ/EgWLsCvzj7eHo71 qCetpDEr0Kc60UJaafyl8JNrJP/G/rDmX+Z1H3aVCLacD74uEbKgOZIf9Gq3JTWFgrsk DSgPufYXy45QK2vdrFiD6QUUydkIBc9FZ86OULQeiIi9gdQvYrfqsZ4ZhiMAscR/Symi 7vNydxRvLI5UfDomaSdR8V0MpzjtJ0R8Oz8IVogo0jweNskHD30tPIZR7jmxzU+KlJAk /SdLdHG2uzOLb16MATHJgbD8P/e791l3j9AZ7yHzuAJxWTWU87mwYDDSFPqLZOsUe8kU hb5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uibEynu9caySlMhz/bcWJ3z7yd2MwyAMeTu6m4gB4zc=; b=ZEH8KmFgYPGXQaJzwkXfj42Qe+bEf5SGCjdBY5TwkwpLpfFASDafeC2pKT5nuA/5i4 e4qEB+zESHKhqUhoBNvPKxWBxPL/jwEy/YF0uGOVrtrGJiJKaKWj378aEzAL9pfc3GEu ywWDRkmNVtR7BeOcInh6Z1joV0sikVDolxidRxU0oQXzZEb1Wt3CcQFmABNTRZA8ecTi nMpsd1AoACi0IWT6tziVy/8tFMWgL5KuNhalrfEHf3IGS+E8MGC+kByK9WktYqBg2QJX YO4GlkVZhmMp9JasCqpHRYezwkOxZMWllbakSVKJjBlQ8n+XO5O4xLAKFbvEqII/fsl7 JPhQ== X-Gm-Message-State: AOAM530H1wS02+okj9/L3OGrJRSXGNYI/DE5RqwOaRoP0RsmFlKAXpMl vJ7DzAsrRr7josUApPX7R4YkNPTXyxQiikJhq+z0mw== X-Google-Smtp-Source: ABdhPJymxrHUoM7Aur1RTVcKEJSXASLPqIoUTB2V7nkFrCYLdjWGjzu128+2jK3XpBjhgOOdGIcouLK3OxSlp7PTK9Q= X-Received: by 2002:a05:6512:482:: with SMTP id v2mr18422051lfq.3.1593703341896; Thu, 02 Jul 2020 08:22:21 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Shakeel Butt Date: Thu, 2 Jul 2020 08:22:10 -0700 Message-ID: Subject: Re: Memcg stat for available memory To: David Rientjes , Yang Shi , Roman Gushchin , Greg Thelen Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Cgroups , Linux MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 314C4180F8B80 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: (Adding more people who might be interested in this) On Sun, Jun 28, 2020 at 3:15 PM David Rientjes wrote: > > Hi everybody, > > I'd like to discuss the feasibility of a stat similar to > si_mem_available() but at memcg scope which would specify how much memory > can be charged without I/O. > > The si_mem_available() stat is based on heuristics so this does not > provide an exact quantity that is actually available at any given time, > but can otherwise provide userspace with some guidance on the amount of > reclaimable memory. See the description in > Documentation/filesystems/proc.rst and its implementation. > > [ Naturally, userspace would need to understand both the amount of memory > that is available for allocation and for charging, separately, on an > overcommitted system. I assume this is trivial. (Why don't we provide > MemAvailable in per-node meminfo?) ] > > For such a stat at memcg scope, we can ignore totalreserves and > watermarks. We already have ~precise (modulo MEMCG_CHARGE_BATCH) data for > both file pages and slab_reclaimable. > > We can infer lazily free memory by doing > > file - (active_file + inactive_file) > > (This is necessary because lazy free memory is anon but on the inactive > file lru and we can't infer lazy freeable memory through pglazyfree - > pglazyfreed, they are event counters.) > > We can also infer the number of underlying compound pages that are on > deferred split queues but have yet to be split with active_anon - anon (or > is this a bug? :) > > So it *seems* like userspace can make a si_mem_available()-like > calculation ("avail") by doing > > free = memory.high - memory.current > lazyfree = file - (active_file + inactive_file) > deferred = active_anon - anon > > avail = free + lazyfree + deferred + > (active_file + inactive_file + slab_reclaimable) / 2 > > For userspace interested in knowing how much memory it can charge without > incurring I/O (and assuming it has knowledge of available memory on an > overcommitted system), it seems like: > > (a) it can derive the above avail amount that is at least similar to > MemAvailable, > > (b) it can assume that all reclaim is considered equal so anything more > than memory.high - memory.current is disruptive enough that it's a > better heuristic than the above, or > > (c) the kernel provide an "avail" stat in memory.stat based on the above > and can evolve as the kernel implementation changes (how lazy free > memory impacts anon vs file lru stats, how deferred split memory is > handled, any future extensions for "easily reclaimable memory") that > userspace can count on to the same degree it can count on > MemAvailable. > > Any thoughts? I think we need to answer two questions: 1) What's the use-case? 2) Why is user space calculating their MemAvailable themselves not good? The use case I have in mind is the latency sensitive distributed caching service which would prefer to reduce the amount of its caching over the stalls incurred by hitting the limit. Such applications can monitor their MemAvailable and adjust their caching footprint. For the second, I think it is to hide the internal implementation details of the kernel from the user space. The deferred split queues is an internal detail and we don't want that exposed to the user. Similarly how lazyfree is implemented (i.e. anon pages on file LRU) should not be exposed to the users. Shakeel