From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95BB6ECAA25 for ; Thu, 25 Aug 2022 17:59:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B63AF940007; Thu, 25 Aug 2022 13:59:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AEB966B0075; Thu, 25 Aug 2022 13:59:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91773940007; Thu, 25 Aug 2022 13:59:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7CE406B0074 for ; Thu, 25 Aug 2022 13:59:46 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5372E16093B for ; Thu, 25 Aug 2022 17:59:46 +0000 (UTC) X-FDA: 79838877972.08.6025B80 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf08.hostedemail.com (Postfix) with ESMTP id F3F82160026 for ; Thu, 25 Aug 2022 17:59:45 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id f17so14377621pfk.11 for ; Thu, 25 Aug 2022 10:59:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc; bh=Oidv9VqQ5BZvp2nfE+LkznopgBMVvrufzEclz2gYRyo=; b=Bj5Iv6LvIYj0/8CnpMN6CDJnlruw5qIUhOj9Sta6KqtxRxUVavkcpyn9tcVP2uBZa3 4v/gRXwe1SSJikUYjoM908yOqA7T0hS1Wj62M/MB83e+u64jPjC7Ig1YejWc4TfhPTuz FX2WkffIAkDHnfEJ21KSkPh4ME9fQyubJ6Y5jTsOmiVahmE09XOeG6NQH9xHMMnhhOTz xybdhkL/6qnCoT4lWcS9WH2Q1uxJ5TQvTTEUNuU0JwaJECQCXpwCcVW2zWdVTSMy+ZLV uQW7SdkBNItNHAvYtKkyW0UDXQ3jxq7uWyx2nrFfqNFB2ZKPe2Q6cxM+WWuFGPHqfD9A n7rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc; bh=Oidv9VqQ5BZvp2nfE+LkznopgBMVvrufzEclz2gYRyo=; b=A0aimqHLplNNaBv/Wn0/eayS3ST7f/yN4szRrmVfOm1caY634up58W4pVS37zMpjG9 bE9ExfdXyNjUhTOyibiyaRdGKDd/cSOehZYXEmdUH3ckmevE8uDM1pZKcUq8kMuewwA/ VzACV8vm6PMgSl2YWfme1QYaUFRZrc/l7Ylft4pSYUAn2uwUAO++96p4oGV36l819HBC 0mYAc/nGgSEXa4q6FN8vigPRKlFF11zMLg5gnSyHqu9uXYCK1lTmtm1uJjxXnEg4gb+Q 4//llYnYZ6x6PR6QOj8ODKLNGNC3XCexsAmvytKxQ9uMky5li194VMtLrGWrpbemzC5W mtZw== X-Gm-Message-State: ACgBeo1k4MxxADtr24fOUf+E7T8YftzONxOY0OGriMukLdyNDWttCvM8 Pw1lbgZMbJFA0CpTJIKRCZI= X-Google-Smtp-Source: AA6agR5osDkvTDrX87szxRDrNOGfSYAzH+0XsbvgiMT2ZACTQwgA4Y2e9zoFJsBrJ/Ai4QP/Nzpc0g== X-Received: by 2002:a05:6a00:ad1:b0:530:2cb7:84de with SMTP id c17-20020a056a000ad100b005302cb784demr260249pfl.3.1661450384724; Thu, 25 Aug 2022 10:59:44 -0700 (PDT) Received: from localhost (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com. [2603:800c:1a02:1bae:a7fa:157f:969a:4cde]) by smtp.gmail.com with ESMTPSA id d188-20020a6236c5000000b0052d1275a570sm15253855pfa.64.2022.08.25.10.59.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 10:59:44 -0700 (PDT) Date: Thu, 25 Aug 2022 07:59:42 -1000 From: Tejun Heo To: Mina Almasry Cc: Roman Gushchin , Johannes Weiner , Yafang Shao , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin Lau , Song Liu , Yonghong Song , john fastabend , KP Singh , Stanislav Fomichev , Hao Luo , jolsa@kernel.org, Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , Zefan Li , Cgroups , netdev , bpf , Linux MM , Yosry Ahmed , Dan Schatzberg , Lennart Poettering Subject: Re: [RFD RESEND] cgroup: Persistent memory usage tracking Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661450386; a=rsa-sha256; cv=none; b=e6Oos+uY5UmfxYaY13MSADa8hCfWrGsrJijGocThiollfkfKHHxotNmKtxsqPYvj36+ZN8 C6L8PQP0hQ3PjPqSXUJv0aLm5lvdj8JpO94IJmOJ71zma4/JwbOnXGtlrMBFb9Ks7fAOGy cp6p+kQH+UaEMaILM09cLx91PrBlIDM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Bj5Iv6Lv; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf08.hostedemail.com: domain of htejun@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=htejun@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661450386; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Oidv9VqQ5BZvp2nfE+LkznopgBMVvrufzEclz2gYRyo=; b=W0rQscvd9j+s1BLShhNhWnyIwTIox1synI71Jm0uNBPEhOe73cl96TVQxtLPfe9d6uZ/m3 M2cWBsyxgbhyESaMUqkVO9AaJGTTMO0P11m1vRRQR4MWjTnVBDNNESYjqe5RSE8C90KUDP cUbdkKtkOIx+z8ZE2wLDbvmG+HP6g+Q= Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Bj5Iv6Lv; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf08.hostedemail.com: domain of htejun@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=htejun@gmail.com X-Rspam-User: X-Rspamd-Queue-Id: F3F82160026 X-Rspamd-Server: rspam08 X-Stat-Signature: oychtd5hrd67613y1peixthzoypmhryi X-HE-Tag: 1661450385-305004 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Wed, Aug 24, 2022 at 12:02:04PM -0700, Mina Almasry wrote: > > If we can express all the resource contraints and structures in the cgroup > > side and configured by the management agent, the application can simply e.g. > > madvise whatever memory region or flag bpf maps as "these are persistent" > > and the rest can be handled by the system. If the agent set up the > > environment for that, it gets accounted accordingly; otherwise, it'd behave > > as if those tagging didn't exist. Asking the application to set up all its > > resources in separate steps, that might require significant restructuring > > and knowledge of how the hierarchy is setup in many cases. > > I don't know if this level of granularity is needed with a madvise() > or such. The kernel knows whether resources are persistent due to the > nature of the resource. For example a shared tmpfs file is a resource > that is persistent and not cleaned up after the process using it dies, > but private memory is. madvise(PERSISTENT) on private memory would not > make sense, and I don't think madvise(NOT_PERSISTENT) on tmpfs-backed > memory region would make sense. Also, this requires adding madvise() > hints in userspace code to leverage this. I haven't thought hard about what the hinting interface should be like. The default assumptions would be that page cache belongs to the persistent domain and anon belongs to the instance (mm folks, please correct me if I'm off the rails here), but I can imagine situations where that doesn't necessarily hold - like temp files which get unlinked on instance shutdown. In terms of hint granularity, more coarse grained (e.g. file, mount whatever) seems to make sense but again I haven't thought too hard on it. That said, as long as the default behavior is reasonable, I think adding some hinting calls in the application is acceptable. It doesn't require any structrual changes and the additions would be for its own benefit of more accurate accounting and control. That makes sense to me. One unfortunate effect this will have is that we'll be actively putting resources into intermediate cgroups. This already happens today but if we support persistent domains, it's gonna be a lot more prevalent and we'll need to update e.g. iocost to support IOs coming out of intermediate cgroups. This kinda sucks because we don't even have knobs to control self vs. children distributions. Oh well... Thanks. -- tejun