From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8B06C3600C for ; Thu, 3 Apr 2025 08:27:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 85368280004; Thu, 3 Apr 2025 04:27:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 803AF280001; Thu, 3 Apr 2025 04:27:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A3AA280004; Thu, 3 Apr 2025 04:27:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4635B280001 for ; Thu, 3 Apr 2025 04:27:25 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6174B59B00 for ; Thu, 3 Apr 2025 08:27:26 +0000 (UTC) X-FDA: 83292053292.22.C6723B4 Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf03.hostedemail.com (Postfix) with ESMTP id 3F77B20004 for ; Thu, 3 Apr 2025 08:27:23 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=TnzPE7UY; dmarc=none; spf=none (imf03.hostedemail.com: domain of simona.vetter@ffwll.ch has no SPF policy when checking 209.85.218.53) smtp.mailfrom=simona.vetter@ffwll.ch ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743668844; a=rsa-sha256; cv=none; b=gMPhdAR90TBVm5mX9nKt5Gy9kTnej/fd2bxqNavLq7r1Ax/AYcJSB/LWxaDSnSs4CLBWG/ vnrC83FwTS1zINrEIsZuYoplabbj4HAAawYz/pTZPYTU+8vfn7aVUZLVM76ZQg/eCvevcH BJzVG9GmLRH1KVkfUgimHN+FKVqSYWY= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=TnzPE7UY; dmarc=none; spf=none (imf03.hostedemail.com: domain of simona.vetter@ffwll.ch has no SPF policy when checking 209.85.218.53) smtp.mailfrom=simona.vetter@ffwll.ch ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743668844; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o+Mp6jmtYK1H17CFfYiwi2sa1wA1pdf2H15LrCJwia8=; b=iI96bvrU6/fbP//8JUXKSaWAvo3jQ03yrcZaqIC40wPnRmM8bLpPkciG9IWtb77ZGcEbc7 lRAXB5TgRe+K2NTnPDmlAv5oF48ZWx4lVNHghzxNzZqNX/Hsq56RytHPoTwEXDTaHyg6T1 PTKIxctm+2rsi6EHeoofmBcsBtreNbw= Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-ac297cbe017so327187366b.0 for ; Thu, 03 Apr 2025 01:27:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; t=1743668842; x=1744273642; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:mail-followup-to:message-id:subject:cc:to :from:date:from:to:cc:subject:date:message-id:reply-to; bh=o+Mp6jmtYK1H17CFfYiwi2sa1wA1pdf2H15LrCJwia8=; b=TnzPE7UY1IeunEhbaKDCqI2yxU5iVH+HCBj1MDqlgWk5XCToykE0EAPaU3nt1EbSbc zttYzNfiwnIkCw1zdQLSxDjBVLsGbdhTU2HSHZmozD5F9oO9bw3kX8jg9YPevAYxofWD wKwSnBZAy9lF/KYuLGGDgspgvDtUTRGclHphY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743668842; x=1744273642; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:mail-followup-to:message-id:subject:cc:to :from:date:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=o+Mp6jmtYK1H17CFfYiwi2sa1wA1pdf2H15LrCJwia8=; b=XL24opEBFwfbBDESN3dna842ofSVPp1+DxdYk4CLxdN/J+s3iTH/oKQ5kp40U/G1xj JJMJjEL8R+Hd4drL/PhbRYYyLltsxAUpxtoKTOGIsTcVKFnvqfl/VIEXwiRCxVW2/XKl HBAHLcIEZQ+IOMinHymgTrhcFzfj/3h/yvM6AHgwgjHe/K3WhbaEEH+XbCjZWyu54Fzz M8fJ3iJtLQAK2JxfHL8z288xLBj5cRfzM+FDUEcCplonmGALHI3wIemrIKuyxYl29l4Z J323/3pXFYUOxR9i5+MQ9pMeg4JFs/OTpFoeNAx/AdL5E73396BjSmFv4fqV+W8JBt72 CpFg== X-Forwarded-Encrypted: i=1; AJvYcCVr2ADWVjBVmxhlm928veOaj29lLTnhxEI5NUasnUjK41Y5jbEdYyf2/c1QSmPO/1pGss3jn8XJUQ==@kvack.org X-Gm-Message-State: AOJu0YyR43QY72bpFAJLAydnPLHraxipT6N90aJKJShTnK0O6WAEiFmy XAhGFMmie1nKVB8hL3kdl5WpYPcGhLOH2+15p5+q0JgkRkRu94qcEGiN1Bl5mY8= X-Gm-Gg: ASbGnctoJrlp7HJBCP04I7aWEX1ECKuhxYlvxUSpfo5qfc6i6/nyM/ced8mQsRgSsNt VrpPt3bCwh0Okaxw5tMbWyHbFZkwORxMxp6ritxBwRlsOPTxztwAVOuPhWyQ6J8IhxUR4dADnAu 8w/2TDbWu8xE/wx1tkc51hl41rULs7EOLFueArRXOX/N5hDmDaxbuhyfGwosm1oZQ8h5QQmbBQM dVy6nuhBFw672ju3LJaBBhf7Op0mwF9r6rZCq5BZVhYETnPPsMPMabH9caHJuxvQbC6c3EBUeLu dkb8RmQhnn6eAoWcWln/5UNQFsUQ2ojp2WgY852TIclehpMuEsS8vbom X-Google-Smtp-Source: AGHT+IGcfR+2lbcGxpYCyqAi52EwdZb7Vb//nyaqQ4Ire5aanYllMG0HRfmeR5SThj3fwVAh+TVljw== X-Received: by 2002:a17:907:7f93:b0:ac3:ed4d:c9a1 with SMTP id a640c23a62f3a-ac7b6dd2448mr199507466b.17.1743668842335; Thu, 03 Apr 2025 01:27:22 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:5485:d4b2:c087:b497]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ac7c01c2170sm55536566b.170.2025.04.03.01.27.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Apr 2025 01:27:21 -0700 (PDT) Date: Thu, 3 Apr 2025 10:27:19 +0200 From: Simona Vetter To: Maxime Ripard Cc: Andrew Morton , Marek Szyprowski , Robin Murphy , Sumit Semwal , Christian =?iso-8859-1?Q?K=F6nig?= , Benjamin Gaignard , Brian Starkey , John Stultz , "T.J. Mercier" , Maarten Lankhorst , Thomas Zimmermann , David Airlie , Simona Vetter , Tomasz Figa , Mauro Carvalho Chehab , Hans Verkuil , Laurent Pinchart , linux-mm@kvack.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org Subject: Re: [PATCH RFC 00/12] dma: Enable dmem cgroup tracking Message-ID: Mail-Followup-To: Maxime Ripard , Andrew Morton , Marek Szyprowski , Robin Murphy , Sumit Semwal , Christian =?iso-8859-1?Q?K=F6nig?= , Benjamin Gaignard , Brian Starkey , John Stultz , "T.J. Mercier" , Maarten Lankhorst , Thomas Zimmermann , David Airlie , Simona Vetter , Tomasz Figa , Mauro Carvalho Chehab , Hans Verkuil , Laurent Pinchart , linux-mm@kvack.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org References: <20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org> X-Operating-System: Linux phenom 6.12.17-amd64 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3F77B20004 X-Stat-Signature: hgcj9xyhjrh8jjyx5xqf67hcx64qx8j6 X-Rspam-User: X-HE-Tag: 1743668843-294970 X-HE-Meta: U2FsdGVkX1+N0teRN/q7wMLhAhoLrWcBm73kN6XG4C38omHNbt/CKFe4A6V4OTb4uRIcRkxOs0E+gq9MDneOeWOYhCyqTyDpBR6RuX2F/ywT12kM1W0jTsLBhsahrYc9u5Uc99q4PBWuu10FCI/xxeF0BCQYbTlSyLTGcyCeCCk7xJfddRD/lDmSclpmUw5Eio86cKapCOpjYiAbmKICyFHy78HCP9kgjZjTLnB6dxD6U/f6IHcte7QOcQbHFiVpaaDBj6B1slRhln72qqUS09OSMAiFUhh7VyFgpntkD9sYoU2mEzgO5B4La7OKMSjEkHYz7MuxZbMUcn/Cx4/HRjmB5icLqf3/Pz+KR940nauj/bGswzUNI/2f4W7+JSWIjHenTPtNbIoHh+doB2mPK1OrddOKyQyHd8aK/8cM1A/EMhlh5bUolhbUM/IUNh9zlx7PBc7qOGwuIykRxVTsZ5H8TMzmM2uM+xZGPYG+/2jMqOEBDk8B1tl9wz3qJCsLdZ+RkeMqfrwFO6tEnDhw//7NLiW66S3ptxdQ1DcziTop0VdFd9GF7V+9CB5hTj0uHAivx6n7NeZaXsMMc4hsOoz7sd/WcvqgtPIpQEiOTVchmhmcgPMF6jZGGtXFlrYf+lmPSTyGC9t9AHuJ25Ux7ow3rM9IwMcOnPUM0DYa0/Uoos9YQL9xkK2Bxajoexd7mefgFuIzzJ0Dl5lcon0mhHz0FGSW4iMzxwrRnhgQ+BbHX03IDEHV+7JWYzABO0z4zJ2AWSUwbe6D7KIkCZeD05FlpHgoDorHEpGx+Qseh378DzzaznQ6npa7rBy3P+I1uaj9Uxe+de8P3IbMB1NNgmS6eWJu4a8aCN4/Jy9qAxpOA6d6rawk8VivqxvSuOKUO660Wu8m9gPuUTjKlxQYaW24oTJ1nmMnBXZcdRusK8pp3G7Y2CYvTQmh+kMGLGAkk9sJbjHnCc7CN/sOUFY GidP6RVM lNoQQ+oO9Xk5rpRwRq51deEjwx1cZrBrVLBw/sg9vOsUWydXiGmSpCDksJXQSLCKn5NxZfl0oHLXoegxV3YmXYf3Z/YrVqhTvUWpGKhvYxXSdj9RYpSabbU4C0ba6fDwkBNLNXr1dXhQF9xV9HzsPV8tX3hES6DYOBzm9mLyLtrXRwaottErcXzhbey0gZn0brYv+ND/6rE605wD3K/TUzean2sTpByUkg7qt6G2mt5TF69Jf8vpFM1mM9m0zwVEy/+pJzVm7zyFewbXwxXG/r/e+eyyAUKdjng3nZ6LKahY9Bjolti00SpvrGHMHhkTiW6/QWj1dmDb8roYCrmZiLYdTL5zJRi7FN2o1durfLkf21Ph+UTvUngfET2LK96ywhOlrGWOGAM2dP7HoKKzQS74Qwqpr8lLSDsdmjlS1WYoHTQb8lxKF2Q40yaHsxjxeWOb2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 10, 2025 at 01:06:06PM +0100, Maxime Ripard wrote: > Hi, > > Here's preliminary work to enable dmem tracking for heavy users of DMA > allocations on behalf of userspace: v4l2, DRM, and dma-buf heaps. > > It's not really meant for inclusion at the moment, because I really > don't like it that much, and would like to discuss solutions on how to > make it nicer. > > In particular, the dma dmem region accessors don't feel that great to > me. It duplicates the logic to select the proper accessor in > dma_alloc_attrs(), and it looks fragile and potentially buggy to me. > > One solution I tried is to do the accounting in dma_alloc_attrs() > directly, depending on a flag being set, similar to what __GFP_ACCOUNT > is doing. > > It didn't work because dmem initialises a state pointer when charging an > allocation to a region, and expects that state pointer to be passed back > when uncharging. Since dma_alloc_attrs() returns a void pointer to the > allocated buffer, we need to put that state into a higher-level > structure, such as drm_gem_object, or dma_buf. > > Since we can't share the region selection logic, we need to get the > region through some other mean. Another thing I consider was to return > the region as part of the allocated buffer (through struct page or > folio), but those are lost across the calls and dma_alloc_attrs() will > only get a void pointer. So that's not doable without some heavy > rework, if it's a good idea at all. > > So yeah, I went for the dumbest possible solution with the accessors, > hoping you could suggest a much smarter idea :) I've had a private chat with Maxime to get him up to speed on hopefully a lot of the past discussions, but probably best I put my notes here too. Somewhat unstructured list of challenges with trying to account all the memory for gpu/isp/camera/whatever: - At LPC in Dublin I think we've pretty much reached the conclusion that normal struct page memory should be just accounted in memcg. Otherwise you just get really nasty double-accounting chaos or issues where you can exhaust reserves. - We did not figure out what to do with mixed stuff like CMA, where we probably want to account it both into memcg (because it's struct page) but also separately into dmem (because the CMA region is a limited resource and only using memcg will not help us manage it). - There's the entire chaos of carve-out vs CMA and how userspace can figure out how to set reasonable limits automatically. Maxime brought the issue that limits need to be adjusted if carve-out/CMA/shmem aren't accounted the same, which I think is a valid concern. But due to the above conclusion around memcg accounting I think that's unavoidable, so we need some means for userspace to autoconfigure reasonable limits. Then that autoconfig can be done on each boot, and kernel (or dt or whatever) changes between these three allocators don't matter anymore. - Autoconfiguration challenges also exist for split display/render SoC. It gets even more fun if you also throw in camera and media codecs, and even more fun if you have multiple CMA regions. - Discrete gpu also has a very fun autoconfiguration issue because you have dmem limits for vram, and memcg limits for system memory. Vram might be swapped out to system memory, so naively you might want to assume that you need higher memcg limits than dmem limits. But there's systems with more dmem and smem (because the cpu with its memory is essentially just the co-processor that orchestrates the real compute machine, which is all gpus). - We need a charge transfer, least for Android since there all memory is allocated through binder. TJ Mercier did some patches: https://lore.kernel.org/dri-devel/20230123191728.2928839-3-tjmercier@google.com/ Ofc with dmem this would need to work for both dmem and memcg charges, since with CMA and discrete gpu we'll have bo that are tracked in both. - Hard limits for shmem/ttm drivers need a memcg-aware shrinker. TTM doesn't even have a shrinker yet, but with xe we now have a helper-library approach to enabling shrinking for TTM drivers. memcg-aware shrinking will be a large step up in complexity on top (and probably a good reason to switch over to the common shrinker lru instead of hand-rolling). See the various attempts at ttm shrinkers by Christian König and Thomas Hellstrom over the past years on dri-devel. This also means that most likely cgroup limit enforcement for ttm based drivers will be per-driver or at least very uneven. - Hard limits for dmem vram means ttm eviction needs to be able to account the evicted bo against the right memcg. Because this can happen in random other threads (cs ioctl of another process, kernel threads) accounting this correctly is going to be "fun". Plus I haven't thought through interactions with memcg-aware shrinkers, which might cause some really fundamental issues. - We also ideally need pin account, but I don't think we have any consensus on how to do that for memcg memory. Thus far it's all functionality-specific limits (e.g. mlock, rdma has its own for long-term pinned memory), not sure it makes sense to push for a unified tracking in memcg here? For dmem I think it's pretty easy, but there the question is how to differentiate between dmem that's always pinned (cma, I don't think anyone bothered with a shrinker for cma memory, vc4 maybe?) and dmem that generally has a shrinker and really wants a separate pin limit (vram/ttm drivers). - Unfortunately on top of the sometimes very high individual complexity these issues also all interact. Which means that we won't be able to roll this out in one go, and we need to cope with very uneven enforcement. I think trying to allow userspace to cope with changing cgroup support through autoconfiguration is the most feasible way out of this challenge. tldr; cgroup for device memory is a really complex mess Cheers, Sima > Thanks, > Maxime > > Signed-off-by: Maxime Ripard > --- > Maxime Ripard (12): > cma: Register dmem region for each cma region > cma: Provide accessor to cma dmem region > dma: coherent: Register dmem region for each coherent region > dma: coherent: Provide accessor to dmem region > dma: contiguous: Provide accessor to dmem region > dma: direct: Provide accessor to dmem region > dma: Create default dmem region for DMA allocations > dma: Provide accessor to dmem region > dma-buf: Clear cgroup accounting on release > dma-buf: cma: Account for allocations in dmem cgroup > drm/gem: Add cgroup memory accounting > media: videobuf2: Track buffer allocations through the dmem cgroup > > drivers/dma-buf/dma-buf.c | 7 ++++ > drivers/dma-buf/heaps/cma_heap.c | 18 ++++++++-- > drivers/gpu/drm/drm_gem.c | 5 +++ > drivers/gpu/drm/drm_gem_dma_helper.c | 6 ++++ > .../media/common/videobuf2/videobuf2-dma-contig.c | 19 +++++++++++ > include/drm/drm_device.h | 1 + > include/drm/drm_gem.h | 2 ++ > include/linux/cma.h | 9 +++++ > include/linux/dma-buf.h | 5 +++ > include/linux/dma-direct.h | 2 ++ > include/linux/dma-map-ops.h | 32 ++++++++++++++++++ > include/linux/dma-mapping.h | 11 ++++++ > kernel/dma/coherent.c | 26 +++++++++++++++ > kernel/dma/direct.c | 8 +++++ > kernel/dma/mapping.c | 39 ++++++++++++++++++++++ > mm/cma.c | 21 +++++++++++- > mm/cma.h | 3 ++ > 17 files changed, 211 insertions(+), 3 deletions(-) > --- > base-commit: 55a2aa61ba59c138bd956afe0376ec412a7004cf > change-id: 20250307-dmem-cgroups-73febced0989 > > Best regards, > -- > Maxime Ripard > -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch