From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BF68C5479D for ; Thu, 12 Jan 2023 00:49:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E34DC8E0002; Wed, 11 Jan 2023 19:49:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DE4CD8E0001; Wed, 11 Jan 2023 19:49:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C85B98E0002; Wed, 11 Jan 2023 19:49:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B801D8E0001 for ; Wed, 11 Jan 2023 19:49:51 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8A586A04A1 for ; Thu, 12 Jan 2023 00:49:51 +0000 (UTC) X-FDA: 80344314582.13.B0ED109 Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf02.hostedemail.com (Postfix) with ESMTP id F41C98000B for ; Thu, 12 Jan 2023 00:49:48 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qxxyHu+W; spf=pass (imf02.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673484589; a=rsa-sha256; cv=none; b=USvjFAdaUXnuKzU6FNB8A1gZAH5uxktzxvfTbduWfAI6y63bdGrgKL2LqcjAXZgM94d/NA C0jRUTOUf8HCyensU6roI5kkYx+cfTm5nEG6J2Dr/zyjEu8n2lnhL8RcE5qayooVRlYGOn niOYZ/NSA0eZ8mfc8f6DSBPmDzHovCI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qxxyHu+W; spf=pass (imf02.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673484589; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hDrqeCy2isEAcEZutQc18fIgv9oWhVupwebw91lgye4=; b=XYQvf3ghofAU97KuUrICHDZdin3FxOgNm3hYNB2bzQwL8kjrzXsZyUTBDBTlK8j72CktD9 MJpK7DFqe+3960azLBRsVoOXYzazlc3ZyHKChIPARKhnQRYVpt9kHHRpONEfXN7Wqy7FU3 LdyTx7dG+aw9cjLByV1qW5RRAK9y7X0= Received: by mail-yb1-f180.google.com with SMTP id t15so16984955ybq.4 for ; Wed, 11 Jan 2023 16:49:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hDrqeCy2isEAcEZutQc18fIgv9oWhVupwebw91lgye4=; b=qxxyHu+WDybsKdsOve3HdK7LpgzBN/3GZRqgrxvs9Q5W9Kzwd7Dzc9/jYzxkNuDrul kQkTCY47Z2auo0ah8YGi8N0fYkfJOF7YxzNDRGT2TVVKTxh3xTGn5tExmfv9+PHBUfjc yw/X/nMCu8FyYUb99VuG8LjTaud4O30po5zBH3LKaw5V6vBQRwOmH59u3sd47u/bsD8t wqFwxSQZsBtZNU5U8hrEp3462X5hXd5zeVSkG1FDkuO9G+kbmGLsdETBG2DaHv5igzc1 rEShl5bXFHof6n5bFLnvY/qrSb9Wg1ns/Lu0eWyU8NcGpi2ojYMe7+hIT1ATndOFXF0A cgSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hDrqeCy2isEAcEZutQc18fIgv9oWhVupwebw91lgye4=; b=w9urWk20llYg1ZoDdEcWWUSrtGNckTaOsmdZk9aunQ8f/QmFc0pjgYPhAReRAUg9dq ZPbs6p5u45sKrzs3LeiUF5PZloVa7ASB6SrxjHyZEUbj8a9diaPF214BU5rSSjKGifPH WP8cld2n1a/+lnN/vK0dVlhPLsRzwGq0sBYU0l3g467kZXesHX1UjjdupHrE0TgRujKw 3iLwlhl8Ru9G+RtrEVaSA8acZorgTPjtZ74009T+94Wn4UrNws9djRI3ClwSQEJAGt0m JS44QsjlrjIBhN768MD4/72lInBq2w6feCKrq+B//qZ2KZNM4umL0LD4Ct7bOg4CmKF6 fjXQ== X-Gm-Message-State: AFqh2kqR6PaS9Q0g6ufGmhRgVjMy0NxR9baMlVeroziEtHKICxdSbRC/ QFFGhNRA68p8GzFLbuUzZ822mMyuQkwn+m2SY6I9yQ== X-Google-Smtp-Source: AMrXdXu17IFbx26OPrHPKXjmemxAqNaMOD1OWegMg2ItX26DTAFbkKoo8CQ39aLybKhS2ukHUzL7EAGQYL9SN8e9NYA= X-Received: by 2002:a25:dd84:0:b0:756:35b9:e2de with SMTP id u126-20020a25dd84000000b0075635b9e2demr8569422ybg.117.1673484587884; Wed, 11 Jan 2023 16:49:47 -0800 (PST) MIME-Version: 1.0 References: <20230109213809.418135-1-tjmercier@google.com> In-Reply-To: From: "T.J. Mercier" Date: Wed, 11 Jan 2023 16:49:36 -0800 Message-ID: Subject: Re: [PATCH 0/4] Track exported dma-buffers with memcg To: Shakeel Butt , "T.J. Mercier" , Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Greg Kroah-Hartman , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Carlos Llamas , Suren Baghdasaryan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , Stephen Smalley , Eric Paris , android-mm@google.com, jstultz@google.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, selinux@vger.kernel.org Cc: daniel.vetter@ffwll.ch Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: F41C98000B X-Rspamd-Server: rspam01 X-Stat-Signature: 7aqna4tu1ggnhwwia5y6makhx411on7p X-HE-Tag: 1673484588-103774 X-HE-Meta: U2FsdGVkX19t5wHnw0tFa1F75xqKOm7t/SHTlDQMk5NuKZEMG4PtOt5ixaN39pVEF7G6r0wI60D1U+Cr1BgpGrDHcYjXVbXmLIB9QywodBeT1gU2CKQ+isB3SzLgLj3hlZjYSjdG+7AO5viJPIoWhAG11oWy0dM6Pg33qzEhGyk7kI7Aii9jU24RhQEDHopFe0xSw9JgxJ1qe7hUvL89ewgPPZmnCi6hT/nHp6Wo+QqH7QE79ILbAIt78mrtYJl3Rq3P6oqW8jtjw/72w8aPm3KS25YfoSpsgqOlpAp2xOMnDIabyGaQJfJ7xE9JfNDh6E1FkBjD9TkS4Nh6bmqeVwNKX35jmSPSqQ13chFl1FAtkSo0qU4Iw0MAO2BiPML3O8kvNDwpsWAOhqiVft8gmmiVpFPBYG+9rBCOOyt68z5TEiGwefP7T7YQQwXulDYnf3Qq8wkZgR+SP71TasV40sSM/2I8vCcGLqBYNRO0f0MtbpY1Rx0833Rwt9ImPtOQVEE/bhGS/9OGMYe+XAOdWmJ8bJMvizMIDNITOf1ZDHwVLz6biOYvTHXftKbwJ4g8iiyXadxvm24sHik2jbNbAPD4WpjzuZAJ4cdX3iiLTt+p6ugVkWFHt83gXfCegAp1jIlt9aGXi4IKUheGRMVvgGT1Y7YBxwkAiqX1VIIiP91RQctzXtqKrgCGvQG+PaaT3Qxqw81SFKIwiYmKgNaCnQyJXY+xIdXE4ZERfYyH3gxSQtJNFS0cBM2Tyesp4S4qDTIhTBo6CvBZskqA51LTccmbQ1sJMlfZBglB5W+W6J6KsfkVtBgCir0df1k3MFaEa4E4E54O06NkXrVwoQU5dBwRN8P6q07dYA9gpYBgJ29tA2Pr0wMi0YAME52JGDpMPqoncoAjoULzLZktUyCSJnYpFx/k+me5wpzLtwMq342c3vTXyR5tNji1k45NAUxq+yEvSsq5Pwg5bdJY6yg dsyaQrEO SvAlRrbIC0cNxf28KTUKTjEZWH5v7P7NwJWDWakV2e4J8+m2K36zKEMJdJPDprLv74xp3oZPFJhBnlcdsfiUB0ZLbwg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 11, 2023 at 2:56 PM Daniel Vetter wrote: > > On Mon, Jan 09, 2023 at 04:18:12PM -0800, Shakeel Butt wrote: > > Hi T.J., > > > > On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier wrote: > > > > > > Based on discussions at LPC, this series adds a memory.stat counter for > > > exported dmabufs. This counter allows us to continue tracking > > > system-wide total exported buffer sizes which there is no longer any > > > way to get without DMABUF_SYSFS_STATS, and adds a new capability to > > > track per-cgroup exported buffer sizes. The total (root counter) is > > > helpful for accounting in-kernel dmabuf use (by comparing with the sum > > > of child nodes or with the sum of sizes of mapped buffers or FD > > > references in procfs) in addition to helping identify driver memory > > > leaks when in-kernel use continually increases over time. With > > > per-application cgroups, the per-cgroup counter allows us to quickly > > > see how much dma-buf memory an application has caused to be allocated. > > > This avoids the need to read through all of procfs which can be a > > > lengthy process, and causes the charge to "stick" to the allocating > > > process/cgroup as long as the buffer is alive, regardless of how the > > > buffer is shared (unless the charge is transferred). > > > > > > The first patch adds the counter to memcg. The next two patches allow > > > the charge for a buffer to be transferred across cgroups which is > > > necessary because of the way most dmabufs are allocated from a central > > > process on Android. The fourth patch adds a SELinux hook to binder in > > > order to control who is allowed to transfer buffer charges. > > > > > > [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/ > > > > > > > I am a bit confused by the term "charge" used in this patch series. > > From the patches, it seems like only a memcg stat is added and nothing > > is charged to the memcg. > > > > This leads me to the question: Why add this stat in memcg if the > > underlying memory is not charged to the memcg and if we don't really > > want to limit the usage? > > > > I see two ways forward: > > > > 1. Instead of memcg, use bpf-rstat [1] infra to implement the > > per-cgroup stat for dmabuf. (You may need an additional hook for the > > stat transfer). > > > > 2. Charge the actual memory to the memcg. Since the size of dmabuf is > > immutable across its lifetime, you will not need to do accounting at > > page level and instead use something similar to the network memory > > accounting interface/mechanism (or even more simple). However you > > would need to handle the reclaim, OOM and charge context and failure > > cases. However if you are not looking to limit the usage of dmabuf > > then this option is an overkill. > > I think eventually, at least for other "account gpu stuff in cgroups" use > case we do want to actually charge the memory. > Yes, I've been looking at this today. > The problem is a bit that with gpu allocations reclaim is essentially "we > pass the error to userspace and they get to sort the mess out". There are > some exceptions (some gpu drivers to have shrinkers) would we need to make > sure these shrinkers are tied into the cgroup stuff before we could enable > charging for them? > I'm also not sure that we can depend on the dmabuf being backed at export time 100% of the time? (They are for dmabuf heaps.) If not, that'd make calling the existing memcg folio based functions a bit difficult. > Also note that at least from the gpu driver side this is all a huge > endeavour, so if we can split up the steps as much as possible (and get > something interim useable that doesn't break stuff ofc), that is > practically need to make headway here. TJ has been trying out various > approaches for quite some time now already :-/ > -Daniel > > > Please let me know if I misunderstood something. > > > > [1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@google.com/ > > > > thanks, > > Shakeel > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch