From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BA46C3DA64 for ; Tue, 6 Aug 2024 15:32:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 221286B007B; Tue, 6 Aug 2024 11:32:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D1D26B0083; Tue, 6 Aug 2024 11:32:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 072296B0085; Tue, 6 Aug 2024 11:32:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DEC5A6B007B for ; Tue, 6 Aug 2024 11:32:48 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 891EA1A0258 for ; Tue, 6 Aug 2024 15:32:48 +0000 (UTC) X-FDA: 82422213216.10.2A1B21A Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by imf12.hostedemail.com (Postfix) with ESMTP id DEDD34001F for ; Tue, 6 Aug 2024 15:32:45 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b="YFb/uiCX"; dmarc=none; spf=none (imf12.hostedemail.com: domain of daniel.vetter@ffwll.ch has no SPF policy when checking 209.85.128.49) smtp.mailfrom=daniel.vetter@ffwll.ch ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722958315; a=rsa-sha256; cv=none; b=GLoUNWsmMIy+R0BdoEfshazRIfwmuWk8aY/yl74eCvMuxm6XvDBHEFNpQKHsX283qUbFJd lwRqBszPnrCPg7HMQ40c9Et9z9pt98SK/63ozryLYuxv5Q4OpQqb2zcVsv/1Iw705QGSKK SZjjLX1Uf32q7HOYZ/vwUOu3xTJDPrY= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b="YFb/uiCX"; dmarc=none; spf=none (imf12.hostedemail.com: domain of daniel.vetter@ffwll.ch has no SPF policy when checking 209.85.128.49) smtp.mailfrom=daniel.vetter@ffwll.ch ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722958315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0+fL06b+wQr5tclAXI4DHNj3pVW5/wAb189nsPoBVec=; b=ANC2RehkPDuOczqwqTAQJJbtoCVaOTPQX99rgiJoDlWTyaVOLjtOY6qNAgEFOoBddsLcwS N/F+PneWmY3/VVHV2sKKa43ecQEA0psyUgM8/+JieoD8+WKu5eXKxFaAwN6uABicXwjYL5 wGyaL/WPjlnK0RFS68j0HN+mU57FhJs= Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-4280ac10b7cso138035e9.3 for ; Tue, 06 Aug 2024 08:32:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; t=1722958364; x=1723563164; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=0+fL06b+wQr5tclAXI4DHNj3pVW5/wAb189nsPoBVec=; b=YFb/uiCXdbfcM2g/+WaRsSpelPb+e1uk7B/8yC6rphA89fbD4R0VfIQY8pM5gKQ3gi EhciRb48v5LAi6qk2pWNkc4u6+6+F7ixmdCGM4oGE2WMgQwUss+SNt7nSAhlTDwzE3Py tt9u7ndxawrCtwmvjwz9Fdpyj8KjscoyEMlJ8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722958364; x=1723563164; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0+fL06b+wQr5tclAXI4DHNj3pVW5/wAb189nsPoBVec=; b=imJ1J5EDcFO7XnRkg1hbDGzmYifS+QjaQNJd2lshl7GTZCLNXOD8kaIqGu8K90Ili5 TkodaX5EO8+/ZxglUIDlu/NXiiEQo/gM6Z5u1GP5PPbhyERHTmZlhEHJ3s6m+8MaUd9n OWw1ghwjKI5/G6oRyAr35bgt0ItvAPd4MiBHENOhcRKKMVqHlkRid6Ay4wDC8xWmIBeB glEdDVaVMtlIn3eTAKoruWT3c0XFMDXGnRfrD81oZG0eBWLGizAtYAzVymAFaaHPro4o 9GIjsfvHzDFKLtloCde3Uy6CmcsB+mZeFsufVQJOjccHbHVBZG9m4pqcnTMwJ8hkBA2y RI5A== X-Forwarded-Encrypted: i=1; AJvYcCWDb1AiunK/6pwiPIRtsKvv1gMyWZHRuGaLwYMuqba70p247Hm24TB430F1nSmbbAFbGj1PcxkW/Q==@kvack.org X-Gm-Message-State: AOJu0Yw/bKsA9n4dA+Xfdnn/0KtWvGVdR3dSGSuBQI6A4vLZh5Wq+bgV 2G8LbGzXp1OeKzayzzeOCyOY95Mf4xKq7WQhZoYWI2Gb4qeW+T1x9QyUxKNTDdbXcK0sovrCsDD 8 X-Google-Smtp-Source: AGHT+IGK0D9M2zHFI1I+xT5WNgTf3TNb9xV6UVSFxustZKwGPFCSspGoMU5NCU49rf829VgNTtRJ9A== X-Received: by 2002:a05:6512:3e0f:b0:52f:c142:6530 with SMTP id 2adb3069b0e04-530bb39b3famr6156563e87.6.1722957984147; Tue, 06 Aug 2024 08:26:24 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a7dc9ec8860sm548413066b.209.2024.08.06.08.26.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Aug 2024 08:26:23 -0700 (PDT) Date: Tue, 6 Aug 2024 17:26:21 +0200 From: Daniel Vetter To: Maxime Ripard Cc: Tvrtko Ursulin , Maarten Lankhorst , intel-xe@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Tejun Heo , Zefan Li , Johannes Weiner , Andrew Morton , Jonathan Corbet , David Airlie , Thomas Zimmermann , Friedrich Vock , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org Subject: Re: [RFC PATCH 2/6] drm/cgroup: Add memory accounting DRM cgroup Message-ID: Mail-Followup-To: Maxime Ripard , Tvrtko Ursulin , Maarten Lankhorst , intel-xe@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Tejun Heo , Zefan Li , Johannes Weiner , Andrew Morton , Jonathan Corbet , David Airlie , Thomas Zimmermann , Friedrich Vock , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org References: <20240627154754.74828-1-maarten.lankhorst@linux.intel.com> <20240627154754.74828-3-maarten.lankhorst@linux.intel.com> <20240627-paper-vicugna-of-fantasy-c549ed@houat> <6cb7c074-55cb-4825-9f80-5cf07bbd6745@linux.intel.com> <20240628-romantic-emerald-snake-7b26ca@houat> <70289c58-7947-4347-8600-658821a730b0@linux.intel.com> <40ef0eed-c514-4ec1-9486-2967f23824be@ursulin.net> <20240806-gharial-of-abstract-reverence-aad6ea@houat> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240806-gharial-of-abstract-reverence-aad6ea@houat> X-Operating-System: Linux phenom 6.9.10-amd64 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: DEDD34001F X-Stat-Signature: 16za4tznca1jar9bhcgaop59jfc4f9uu X-Rspam-User: X-HE-Tag: 1722958365-383582 X-HE-Meta: U2FsdGVkX186oKx8GXv8D33ieSbJGHg7jEo23WM3a/bLkEoT0jaARSPpIzGK6P2u0fnMupB6wD/zfq5l8ZAbpBlja4il3I+uguvbSBxaypW9KAPLbrbb0vEahetagaL2piMzr+C8P/qVqSWldJLASorbwR5B0UK3zMa5cjh4NIrVxeIMx/fq5oLf5ihY3bfZekoh0Y9nBUSNPYEHHEiyAlVShewkvpYPM9SFg77h1jyInmadlTGoE7S0ydM8m7vaDflxuc3GntskKdmM094a4Hm3rYzU2kcQEu251lpYl+C3mPjNGOHQSg8CID0dQxyXVC3ZvFLZmPtorEKei36tLaHV2bOQWTRMJ65M6kPRpeEoe6tK3EsJsAz3m1B3JG5mPiLLRtVD7usJ5e7xG5jpzzdD/+qKXx3ae3XNaHUrFXqD+4M277YJ6BFb0JsyriQfQ8NnND2FHcgSrPsOUSyk2iViv0dHwjJDXtWBbffhpEe+yt+z0rNKFv0CHlgdLl8I3QHegI3EPURgngOKKz2B3cAqZR3kRGnd3LigahaX0Jr1g5TOy4AYrlIhzajIuBoNbW+p/FpkQSF1uzirOWUTXJS0g9X21Ww4SXCQjPBscydg3D8bs8fH5hQe0E5hOm7bZkCDXhe6sh9jZB4XvzCZ9um/L6EKXXFw0oz8Ov1jMs8wY9xXHc/5zcJNg7uS5mUVm0ig78ON4h5dtXNyJkPYgjszjdN7vZbEWh0ZxC09DqnwiLg1volgCvPI8joNZHnHZg2Ft0bAyqK331NRYniMCI9VvCoueFJ3TIrmjc3QLKPGbe6xXp3iI86RUonUWSR6s6TJWVpDAjm7ZtyiOr2iWXn+8ZAsqIXBNUTa+1NyHOjvu6T4CvyYO1SxQ7lZob3WWWa00QEp7OJ2b7aKI8OCzLRJP1nd5ElCpjlO84h9yEuXjaSTf6/QOuQrl1Tzujz8CrEEKOqmn1YByontOrU dFQQKcoy kSbCuiujcmbyL33TEakFH2eHKkEovDqHVcwyZ85aW/lb/PuLPHKjwTfKUNdpUvjXuux8LisJQ8+CU05s+tqne9+Q755jsMl/JcB3XdsgcNamiGfK2PuqJiHi0Q4epZZHU4Sf5GSNF/ZdB4bRtAxbQEJU3stOiYZugLi3KnMgRBdAHPtMoJ5C8fJbxGh6BjdJy6pEQQZKs0XqbCa0EwI4/yUVxaiKtuEv44n9oXpiCElZ2qtDadw6mlTJfxTXlsQFZq9nkjeI5t9WlGbpk/R3OEa9x5ioWRyTEbr3fr8raizfjIa6UT7XpTdXuhxQawxPiO7F4HtKfHW/AhiWelnak4M7xG++R3ngPnTWf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 06, 2024 at 04:09:43PM +0200, Maxime Ripard wrote: > On Tue, Aug 06, 2024 at 03:01:44PM GMT, Daniel Vetter wrote: > > On Mon, Jul 01, 2024 at 06:01:41PM +0100, Tvrtko Ursulin wrote: > > > > > > On 01/07/2024 10:25, Maarten Lankhorst wrote: > > > > Den 2024-06-28 kl. 16:04, skrev Maxime Ripard: > > > > > Hi, > > > > > > > > > > On Thu, Jun 27, 2024 at 09:22:56PM GMT, Maarten Lankhorst wrote: > > > > > > Den 2024-06-27 kl. 19:16, skrev Maxime Ripard: > > > > > > > Hi, > > > > > > > > > > > > > > Thanks for working on this! > > > > > > > > > > > > > > On Thu, Jun 27, 2024 at 05:47:21PM GMT, Maarten Lankhorst wrote: > > > > > > > > The initial version was based roughly on the rdma and misc cgroup > > > > > > > > controllers, with a lot of the accounting code borrowed from rdma. > > > > > > > > > > > > > > > > The current version is a complete rewrite with page counter; it uses > > > > > > > > the same min/low/max semantics as the memory cgroup as a result. > > > > > > > > > > > > > > > > There's a small mismatch as TTM uses u64, and page_counter long pages. > > > > > > > > In practice it's not a problem. 32-bits systems don't really come with > > > > > > > > > =4GB cards and as long as we're consistently wrong with units, it's > > > > > > > > fine. The device page size may not be in the same units as kernel page > > > > > > > > size, and each region might also have a different page size (VRAM vs GART > > > > > > > > for example). > > > > > > > > > > > > > > > > The interface is simple: > > > > > > > > - populate drmcgroup_device->regions[..] name and size for each active > > > > > > > > region, set num_regions accordingly. > > > > > > > > - Call drm(m)cg_register_device() > > > > > > > > - Use drmcg_try_charge to check if you can allocate a chunk of memory, > > > > > > > > use drmcg_uncharge when freeing it. This may return an error code, > > > > > > > > or -EAGAIN when the cgroup limit is reached. In that case a reference > > > > > > > > to the limiting pool is returned. > > > > > > > > - The limiting cs can be used as compare function for > > > > > > > > drmcs_evict_valuable. > > > > > > > > - After having evicted enough, drop reference to limiting cs with > > > > > > > > drmcs_pool_put. > > > > > > > > > > > > > > > > This API allows you to limit device resources with cgroups. > > > > > > > > You can see the supported cards in /sys/fs/cgroup/drm.capacity > > > > > > > > You need to echo +drm to cgroup.subtree_control, and then you can > > > > > > > > partition memory. > > > > > > > > > > > > > > > > Signed-off-by: Maarten Lankhorst > > > > > > > > Co-developed-by: Friedrich Vock > > > > > > > I'm sorry, I should have wrote minutes on the discussion we had with TJ > > > > > > > and Tvrtko the other day. > > > > > > > > > > > > > > We're all very interested in making this happen, but doing a "DRM" > > > > > > > cgroup doesn't look like the right path to us. > > > > > > > > > > > > > > Indeed, we have a significant number of drivers that won't have a > > > > > > > dedicated memory but will depend on DMA allocations one way or the > > > > > > > other, and those pools are shared between multiple frameworks (DRM, > > > > > > > V4L2, DMA-Buf Heaps, at least). > > > > > > > > > > > > > > This was also pointed out by Sima some time ago here: > > > > > > > https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/ > > > > > > > > > > > > > > So we'll want that cgroup subsystem to be cross-framework. We settled on > > > > > > > a "device" cgroup during the discussion, but I'm sure we'll have plenty > > > > > > > of bikeshedding. > > > > > > > > > > > > > > The other thing we agreed on, based on the feedback TJ got on the last > > > > > > > iterations of his series was to go for memcg for drivers not using DMA > > > > > > > allocations. > > > > > > > > > > > > > > It's the part where I expect some discussion there too :) > > > > > > > > > > > > > > So we went back to a previous version of TJ's work, and I've started to > > > > > > > work on: > > > > > > > > > > > > > > - Integration of the cgroup in the GEM DMA and GEM VRAM helpers (this > > > > > > > works on tidss right now) > > > > > > > > > > > > > > - Integration of all heaps into that cgroup but the system one > > > > > > > (working on this at the moment) > > > > > > > > > > > > Should be similar to what I have then. I think you could use my work to > > > > > > continue it. > > > > > > > > > > > > I made nothing DRM specific except the name, if you renamed it the device > > > > > > resource management cgroup and changed the init function signature to take a > > > > > > name instead of a drm pointer, nothing would change. This is exactly what > > > > > > I'm hoping to accomplish, including reserving memory. > > > > > > > > > > I've started to work on rebasing my current work onto your series today, > > > > > and I'm not entirely sure how what I described would best fit. Let's > > > > > assume we have two KMS device, one using shmem, one using DMA > > > > > allocations, two heaps, one using the page allocator, the other using > > > > > CMA, and one v4l2 device using dma allocations. > > > > > > > > > > So we would have one KMS device and one heap using the page allocator, > > > > > and one KMS device, one heap, and one v4l2 driver using the DMA > > > > > allocator. > > > > > > > > > > Would these make different cgroup devices, or different cgroup regions? > > > > > > > > Each driver would register a device, whatever feels most logical for that device I suppose. > > > > > > > > My guess is that a prefix would also be nice here, so register a device with name of drm/$name or v4l2/$name, heap/$name. I didn't give it much thought and we're still experimenting, so just try something. :) > > > > > > > > There's no limit to amount of devices, I only fixed amount of pools to match TTM, but even that could be increased arbitrarily. I just don't think there is a point in doing so. > > > > > > Do we need a plan for top level controls which do not include region names? > > > If the latter will be driver specific then I am thinking of ease of > > > configuring it all from the outside. Especially considering that one cgroup > > > can have multiple devices in it. > > > > > > Second question is about double accounting for shmem backed objects. I think > > > they will be seen, for drivers which allocate backing store at buffer > > > objects creation time, under the cgroup of process doing the creation, in > > > the existing memory controller. Right? > > > > We currently don't set __GFP_ACCOUNT respectively use GFP_KERNEL_ACCOUNT, > > so no. Unless someone allocates them with GFP_USER ... > > > > > Is there a chance to exclude those from there and only have them in this new > > > controller? Or would the opposite be a better choice? That is, not see those > > > in the device memory controller but only in the existing one. > > > > I missed this, so jumping in super late. I think guidance from Tejun was > > to go the other way around: Exclude allocations from normal system > > memory from device cgroups and instead make sure it's tracked in the > > existing memcg. > > > > Which might mean we need memcg shrinkers and the assorted pain ... > > > > Also I don't think we ever reached some agreement on where things like cma > > allocations should be accounted for in this case. > > Yeah, but that's the thing, memcg probably won't cut it for CMA. Because > if you pull the thread, that means that dma-heaps also have to register > their buffers into memcg too, even if it's backed by something else than > RAM. For cma I'm kinda leaning towards "both". If you don't have a special cma cgroup and just memcg, you can exhaust the cma easily. But if the cma allocations also aren't tracked in memcg, you have a blind spot there, which isn't great. > This is what this cgroup controller is meant to do: memcg for memory > (GFP'd) buffers, this cgroup for everything else. Yeah if there's no way you can get it through alloc_pages() it definitely shouldn't be in memcg. -Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch