From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBA21C46467 for ; Wed, 11 Jan 2023 22:56:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33A6F8E0002; Wed, 11 Jan 2023 17:56:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EB518E0001; Wed, 11 Jan 2023 17:56:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 164288E0002; Wed, 11 Jan 2023 17:56:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 037F58E0001 for ; Wed, 11 Jan 2023 17:56:53 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D674240BFC for ; Wed, 11 Jan 2023 22:56:52 +0000 (UTC) X-FDA: 80344029864.30.384D1C3 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf28.hostedemail.com (Postfix) with ESMTP id D7159C0019 for ; Wed, 11 Jan 2023 22:56:50 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=Xhqr3JG2; dmarc=none; spf=none (imf28.hostedemail.com: domain of daniel@ffwll.ch has no SPF policy when checking 209.85.208.50) smtp.mailfrom=daniel@ffwll.ch ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673477811; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fYVYmrm8qRKhdfjNxir61IMwTDz9ljMKT2wwPAIP+WQ=; b=NV3QHFr0XT1nFrtkRAr8Si+QWMU3fHROF/VudLCUHLHd2CCA6zdkbArfmfbeKx3KDP0w2F tQV3ofBWdWBRCT1cd+v/EPUrt7hA5MHgLGcGkesw1fijtrbRFKBMlt/50ECgV8NLuXPkMw OR8ACB5Y2Q9qO3kKaKC7XZSI3bO5jlA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=Xhqr3JG2; dmarc=none; spf=none (imf28.hostedemail.com: domain of daniel@ffwll.ch has no SPF policy when checking 209.85.208.50) smtp.mailfrom=daniel@ffwll.ch ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673477811; a=rsa-sha256; cv=none; b=FEbCBpUZEaMPEcnELbCwzI7AE68J8uQwIUlKFLo+n2xo0B9XQLVIIPk/Lej24SI5xWkm8F nkJIDF9Jcdfc1wthx4NQbRDZ+R5emSIzOWLj0b+A7xHlkIb42gt2+gRUrRwJJEtZc0SFOm Fn6Mc1MBn4c2cccvMD32+X/9GldJ0o4= Received: by mail-ed1-f50.google.com with SMTP id v6so24493184edd.6 for ; Wed, 11 Jan 2023 14:56:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=fYVYmrm8qRKhdfjNxir61IMwTDz9ljMKT2wwPAIP+WQ=; b=Xhqr3JG2Kex0yJQiWOcc0PM+7tyhQNxjCcJ7T8wz62d+5YQPT9Rtd4nC10zyoXLoMT 4/vmihyklfSrZDCiyctpeidBCrvuTcUZqgtWDQJyCNaXgm6+tkBSAvDANuhYpWOmJyom dC+q84M7rFuFBjEUHK9o1KYkMoO2Zcisjc0WI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fYVYmrm8qRKhdfjNxir61IMwTDz9ljMKT2wwPAIP+WQ=; b=xe0QIGw8yj8VaaatcQfQWAkZNswVxm8z1FovVh6BnYFdYcB30BUknoEyptt54ynZg1 hQsNtB1BgKbCX3VRFdKBADnLeCWw5avjRNy4pbrM0GMfh9iZU52tRhp8oSlcuf6/vcgm nbBiJ8+7Y+HT5u96l3lzNOSFyDKykZ798wgf/up+7qzkRLxXzYLS+PsYqvoWWR+bTAFv 2Ereet84IX5n5TeTtBpl4NLOCjqOnTaYxZ3vRRebBpLX+yXhyoGTLvrDXepXxnS8mfjs y4/qixbN8PoW9Bitzgr5ffznzcFmhJ57sMG5ceLUSw4aK8d6cDXD7LVyjXSwPa09N2cT gc/A== X-Gm-Message-State: AFqh2kpV0ftP+7pO2rxcHHOuqorFpIGztsr9kH0yV6KfaGk/1QhVgidx BgYm58WTSbRJuqOQzGBXxtNcBQ== X-Google-Smtp-Source: AMrXdXvtJyhkyJcdGojea7328ZHYfQqEgGWt5WiLp900ZLPEIaZSTCU8Uhtlcr1KkDcOMzby5I7CqA== X-Received: by 2002:a05:6402:48c:b0:483:d49f:e26c with SMTP id k12-20020a056402048c00b00483d49fe26cmr54924068edv.15.1673477808940; Wed, 11 Jan 2023 14:56:48 -0800 (PST) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id a3-20020aa7cf03000000b0049019b48373sm6549362edy.85.2023.01.11.14.56.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Jan 2023 14:56:48 -0800 (PST) Date: Wed, 11 Jan 2023 23:56:45 +0100 From: Daniel Vetter To: Shakeel Butt Cc: "T.J. Mercier" , Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Greg Kroah-Hartman , Arve =?iso-8859-1?B?SGr4bm5lduVn?= , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Carlos Llamas , Suren Baghdasaryan , Sumit Semwal , Christian =?iso-8859-1?Q?K=F6nig?= , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , Stephen Smalley , Eric Paris , daniel.vetter@ffwll.ch, android-mm@google.com, jstultz@google.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, selinux@vger.kernel.org Subject: Re: [PATCH 0/4] Track exported dma-buffers with memcg Message-ID: Mail-Followup-To: Shakeel Butt , "T.J. Mercier" , Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Greg Kroah-Hartman , Arve =?iso-8859-1?B?SGr4bm5lduVn?= , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Carlos Llamas , Suren Baghdasaryan , Sumit Semwal , Christian =?iso-8859-1?Q?K=F6nig?= , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , Stephen Smalley , Eric Paris , android-mm@google.com, jstultz@google.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, selinux@vger.kernel.org References: <20230109213809.418135-1-tjmercier@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: Linux phenom 5.19.0-2-amd64 X-Rspamd-Queue-Id: D7159C0019 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: b9a1jas7df71anze6od9thumrw3swmpz X-HE-Tag: 1673477810-664581 X-HE-Meta: U2FsdGVkX1/OnqNbqtgnsXILh/n8ydNcoJwd8zqa2dbfwKvq9h3vLKjTmKPst6KBawadPJQ3uxATHiiKQLCvNH+vcJrVo4oGsvPNvwWKwNXk+FJ8pgSUT83hVcobMU/5si7cCoJgGajSev6l/apf3oDIxZYfXD50he13sKUnha2mTG8OKIRVT0briIVahVmjicWfT26BeDoWOBonHH/+PlFZwbVKAWEv+ajdLuz2Wi6P26hu63Ip1EzC14MjLJ/RkOtfGX/u6B3/9VNEbxif2whJPwQ1EnazjiL79S7nmMCGIKj9kxb7az13fwzDAzA1FugKCfqvRa8h3nGk5cuKCEIPn1kXfeg9TSuSk6FuUjyjn6kMUwGi4oXeMtIjXTMnLAAFxWWK9RH3gGNpU1PvjV34vEnyYtgTAXc/GSp1djS4SW2R5ruykXJkKZ5NQwm99EiOGC+lWhvNI4hkxmF3tkHuzrpLPV6hxb55fqTKwYMxCeou+cGAYLlv7jVbue93ESr2TDzGsEndqmueqCmRyWfnrReExxQu6aJFhyrjMqMRsdvESf7/3yuruLd/ZNsbbbVPTv9cEo5tlh1HCAhI6iNKm2ehHjIeQK6pbyBdK+BgqkdY3TMxHg6vhd2fiONEtag3N5nhu175ME2Ut2XA3Py2dWJHGrIWJxDiN5Sue4FT5apAh757d8HZ/zKVPx2S6ADJYfgnMK9mUfNtXqwG0WXqem2fW/2j1Zv2DJRvvKa8PGpx9YpnYcQgbc7+WbAtsCDnHxFxx4AyPszCJI3UvPQEfyLGI2Fmebqa4nZTSLhd3fz4p5cyWgNyaz1EsyBi3ITAaEVSPnh4i9F20W5qz9YxMq+kZRs4toRZS62PbdimrRRsJuc3Kaw1JCNFEtASlURUztQSX3+0iDBPbhZiDffPXaqPXpJ2FLSoqCJlQtgL6OAKleR0t/lmwuPj6DgmJOmQXyy+0H4vl/uA/Su Z2A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 09, 2023 at 04:18:12PM -0800, Shakeel Butt wrote: > Hi T.J., > > On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier wrote: > > > > Based on discussions at LPC, this series adds a memory.stat counter for > > exported dmabufs. This counter allows us to continue tracking > > system-wide total exported buffer sizes which there is no longer any > > way to get without DMABUF_SYSFS_STATS, and adds a new capability to > > track per-cgroup exported buffer sizes. The total (root counter) is > > helpful for accounting in-kernel dmabuf use (by comparing with the sum > > of child nodes or with the sum of sizes of mapped buffers or FD > > references in procfs) in addition to helping identify driver memory > > leaks when in-kernel use continually increases over time. With > > per-application cgroups, the per-cgroup counter allows us to quickly > > see how much dma-buf memory an application has caused to be allocated. > > This avoids the need to read through all of procfs which can be a > > lengthy process, and causes the charge to "stick" to the allocating > > process/cgroup as long as the buffer is alive, regardless of how the > > buffer is shared (unless the charge is transferred). > > > > The first patch adds the counter to memcg. The next two patches allow > > the charge for a buffer to be transferred across cgroups which is > > necessary because of the way most dmabufs are allocated from a central > > process on Android. The fourth patch adds a SELinux hook to binder in > > order to control who is allowed to transfer buffer charges. > > > > [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/ > > > > I am a bit confused by the term "charge" used in this patch series. > From the patches, it seems like only a memcg stat is added and nothing > is charged to the memcg. > > This leads me to the question: Why add this stat in memcg if the > underlying memory is not charged to the memcg and if we don't really > want to limit the usage? > > I see two ways forward: > > 1. Instead of memcg, use bpf-rstat [1] infra to implement the > per-cgroup stat for dmabuf. (You may need an additional hook for the > stat transfer). > > 2. Charge the actual memory to the memcg. Since the size of dmabuf is > immutable across its lifetime, you will not need to do accounting at > page level and instead use something similar to the network memory > accounting interface/mechanism (or even more simple). However you > would need to handle the reclaim, OOM and charge context and failure > cases. However if you are not looking to limit the usage of dmabuf > then this option is an overkill. I think eventually, at least for other "account gpu stuff in cgroups" use case we do want to actually charge the memory. The problem is a bit that with gpu allocations reclaim is essentially "we pass the error to userspace and they get to sort the mess out". There are some exceptions (some gpu drivers to have shrinkers) would we need to make sure these shrinkers are tied into the cgroup stuff before we could enable charging for them? Also note that at least from the gpu driver side this is all a huge endeavour, so if we can split up the steps as much as possible (and get something interim useable that doesn't break stuff ofc), that is practically need to make headway here. TJ has been trying out various approaches for quite some time now already :-/ -Daniel > Please let me know if I misunderstood something. > > [1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@google.com/ > > thanks, > Shakeel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch