From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BD47C36010 for ; Tue, 8 Apr 2025 01:03:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 559ED6B002C; Mon, 7 Apr 2025 21:03:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E02F6B002D; Mon, 7 Apr 2025 21:03:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A7EA6B002E; Mon, 7 Apr 2025 21:03:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 193E96B002C for ; Mon, 7 Apr 2025 21:03:47 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4BBA61CE0A0 for ; Tue, 8 Apr 2025 01:03:47 +0000 (UTC) X-FDA: 83309079294.17.D3AF7E2 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf24.hostedemail.com (Postfix) with ESMTP id 4D606180006 for ; Tue, 8 Apr 2025 01:03:45 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=u6rjCQM6; spf=pass (imf24.hostedemail.com: domain of tjmercier@google.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744074225; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N3NyAZWcqvz7fu/dk5KWmiijuV2cjTsspWnmvoi+M4g=; b=r9Ogms29nHWGOhJvblleQhyggKkEDZ9gWPPlj2UQJ9AsX+y61Fv08HU4y5XU/bOtGFVP8n I7H6mO0wYzaF6Qw+672zN9wy2349nKRxtslHpH3hCWa3ZN0UwjLpwKAZaM2cUm8DMlU8By MaKMZJ/opRfT57g9UyDQYB26tRGu6Cc= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=u6rjCQM6; spf=pass (imf24.hostedemail.com: domain of tjmercier@google.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744074225; a=rsa-sha256; cv=none; b=6aFWsHUEtoUKpNsq1KLmYvSf7Lf0u4Od1glJQUmRq7hWFH89UdH9XwZBIuRSoiFhuL/jFN 9mK1u+ocKxmMiJZ7WJc6fbyZ/uVdpfZvFvc6oMmJvB24Z3hCurgmAWU2I/4jh4faMBPI/f nXgDnATpfn3pg1IDBiSeDtnvbQ8cbyU= Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-43ef83a6bfaso15365e9.1 for ; Mon, 07 Apr 2025 18:03:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744074224; x=1744679024; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=N3NyAZWcqvz7fu/dk5KWmiijuV2cjTsspWnmvoi+M4g=; b=u6rjCQM6gUAZMmC7IriIdjmmLxvDUISEserZEEO6A1lJ4J6uG+C7z4QDaKMV9nA8rF /dwUO4/dBvqWBBaY4pSjdYvaVvRyPuFh4UcbgrdvZRo6e/ZfGtEP5kMMk6Nc0AFzUjPV HbjnuKMUglNQeh11DXq5xaFv5GcGfzp60xuu9L5LzR0ofqlxK3tASVjpXC8AgGft4xBy msFKPcVFksfwoev6cyFD2I22F2PbBBjnyCPthNFAf0W94YhB9pPmYHe8cRGJLeDlFwyz Jjl1qMivH9pU1We69h5XODs8QMymftFTXSF3lfdwHl/Vh1THo7dYl/yFShRZCyegn6/E SQXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744074224; x=1744679024; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N3NyAZWcqvz7fu/dk5KWmiijuV2cjTsspWnmvoi+M4g=; b=SoUXANxI3jhq8mlZth+BaUo/f7iYTv0apQTwhMEKkzoSVYLXwafW2PJyNe4jMS6DRc F8wFosctHaN7VDBCwnq8A4axStJ5CNRe9T1NQsC95eSRDoxZf8dJ+/BYz5igDrC8rFPP r0ixYjqN5jeZs62XHbod/ve5pR1l5I8UXPb4AoDLvujpVl/Fx7+zxf464iBDqDfj87wj q64M+MkJ2K9JwhJ/OUPPQ9Lrnm5qXGOM2HTkPQiJs2CzxXMqujU1lbV7AoIHMohjamV3 5eZEKvh0fTY5ZejkMA7GkhQ7aO1caBsV5IGgQ+OmVrVXQwR8u+2S1YKa8VQp5qCZjEvr S/eA== X-Forwarded-Encrypted: i=1; AJvYcCXsEej0ZDPxiG8yDzR4Tq/vxfOzxzeTFUz/W4zb5oTOv4eW4MuP7NwtD89tawuwkC5b+F6+C0EgVg==@kvack.org X-Gm-Message-State: AOJu0Yy/RLAfPsOjrmjhV2pLiCIy2+FeejqoNLxuQauxVkWfEmnZZ7nr mkgJjr05Rb6erBtHRK/GKJYo1fSHrTJbcx+1XX+LAMpn4CCGVlm+RSJGlLzdXmFdp5r+Cyxtbv9 v7lQR0RVylFNvbc/oVfdRgoYwdrNt7C2fddGz X-Gm-Gg: ASbGnct7enDyLHKus9ojRJuZOa4oHyWiNZQr9IGQrwLFl87bFi5LJ6NTmY0xcd6m9W0 C6mtoahAm5kG0HUdvEvvaX1E6qhFpyfYtswW7QoeHYAXszVv63nJjJK4EmUN3O03hL+9YEx2Kk7 EU0emo+JQ1FbN60Xtiu+oYntm2 X-Google-Smtp-Source: AGHT+IF8E+QG4m0isufZWPae8BywR8gPWm5THfralNVOJXPV/1TEOznb4Z/yFAFdMwZHDPMGruIUYT3jCRKLIcIGr/w= X-Received: by 2002:a05:600c:512a:b0:439:961d:fc7d with SMTP id 5b1f17b1804b1-43f108d2f14mr622335e9.6.1744074223271; Mon, 07 Apr 2025 18:03:43 -0700 (PDT) MIME-Version: 1.0 References: <20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org> <20250310-eccentric-wonderful-puffin-ddbb26@houat> <5ed87c80-6fe3-4f8c-bb98-ca07f1db8c34@amd.com> <20250403-quick-salamander-of-charisma-cab289@houat> <202c3a58-97a3-489c-b3f2-b1fd2735bd19@amd.com> <86a12909-4d40-46ec-95cc-539c346914e4@amd.com> In-Reply-To: <86a12909-4d40-46ec-95cc-539c346914e4@amd.com> From: "T.J. Mercier" Date: Mon, 7 Apr 2025 18:03:31 -0700 X-Gm-Features: ATxdqUG1XcwuvWGPoKuoToYV-NskQlGn2cxAGlcz7gx_QBU2p615RFpmzdS5pyY Message-ID: Subject: Re: [PATCH RFC 00/12] dma: Enable dmem cgroup tracking To: =?UTF-8?Q?Christian_K=C3=B6nig?= Cc: Maxime Ripard , Dave Airlie , Andrew Morton , Marek Szyprowski , Robin Murphy , Sumit Semwal , Benjamin Gaignard , Brian Starkey , John Stultz , Maarten Lankhorst , Thomas Zimmermann , Simona Vetter , Tomasz Figa , Mauro Carvalho Chehab , Ben Woodard , Hans Verkuil , Laurent Pinchart , linux-mm@kvack.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 4D606180006 X-Stat-Signature: 74rocqtad1fx39k7wtkcg6kqg1jeugjy X-HE-Tag: 1744074225-1155 X-HE-Meta: U2FsdGVkX1+BlMZP96jdvPkFkiBg18zpFQIZslQv/HzR8vLFf9yEFfwEicWy04YcBntcUcMspao/plKomzs2wiEKRzvDBo3YVkYGZkuFzHlOPTAFLTOK35CplkW2X4eEAzzyGLEu9VBhsFNNsM6iiYLA0JMoL4p9BKJLP1K17IK8JkV8LtTsT38II81u3u5sVc6e/f2ZV0uVpuLHz3AvmVvw5lvSqhmZYqH1jqpGQBQDQpJboMRrF2GPP8vH2pvHjUfs/0uN+uagraFpui/0G8DskVkXCnsYfuj8/7hOjVODbG3YTOflQkd+ZEdjXZVszJ6VdEmPEgxxYgTGK7mOnSV3Q/Ox8cKoxifB2iFF5mN5/5+n1cqWr1/RD1VWaGUS/xWBdMEn/noh/Au5yTY1JjSahet6tJiGdmj7AqDMT3Eik5yS4bgWaPLuO2TPF/9U8vZCOt6hiLlETcuesO41voQnvw1wpKSaluh99eDI8GeHFOLClX9CHzcp/ddLzK2kh2jAL9MCZ0XzVG0hks3HpYLmwFLobnkj+wKx4QLq91T270vyhndfuanX+5wOtNAlO0qAYdIq406z1muU7MVKuUoPJKFiggyFLrN20T+mLG+xRlfkQDPSbwpP4SKISl59uUimx9EQbT0fRGkfUlQ7WjZQrl7wqqbmhV2WIGnW+pYiep09WBD/8/FCmRl3iSejJkWeafuEbE1X+O7Qd0TccWXZrY0QDJ+sJEX5mzaE7K+iHFajY/vErZ2KcpLG4lEEwJhpZfqm4rbizqTWG52m5U16bzzNUQuoNT7R0PCO/BUoZVzDoCypQJQ3NS03O4xwsv6s27Qqa6ekF2pjOxby85zIX74THme2PsAheOYrgEL3MWosAOf0+eprPZG3AYskBol3bqOhL+rK9h382yHdm0rWirFRg6RDxcCfawIiwnnZDq5a9sJXpBX/8NvzwbaI0yiPUH16HQzTinJPxTy H8XadG/a kmEO+22GwjWMq0npSbrD/47wzio/xgJ0Cex0W0OgazJhBuCEGYViavc7hd+rICzRm6je8YHCwt4jeDg8u3qdhCYvREz0Yjt7SEN5tExRUsS+yxzRci2OY7IVKWfPeEzpX09C0JLA3vhWeehB/7sjDSpvs8N6req1XiHunMuIU68ITPArvlQ0SQiL1EQvtR214aBZTamRXI65SSqJYAqz4DGZiP9mE+Tk6IzMxvIj0vXwxzM6mhuds0mk7Yj1GZuEvSTjSKduPCbWEGtGWYd8HjWQtFdlloh07DNAve9XuwhcwPAzv9s5R5m3O5TFKPGsg3hnJskwXXQdcSQPMAQ3T2vn7yLFHBRcbhCVP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 7, 2025 at 4:46=E2=80=AFAM Christian K=C3=B6nig wrote: > > Am 05.04.25 um 03:57 schrieb T.J. Mercier: > > On Fri, Apr 4, 2025 at 1:47=E2=80=AFAM Christian K=C3=B6nig wrote: > >> Hi Maxime, > >> > >> Am 03.04.25 um 17:47 schrieb Maxime Ripard: > >>> On Thu, Apr 03, 2025 at 09:39:52AM +0200, Christian K=C3=B6nig wrote: > >>>>> For the UMA GPU case where there is no device memory or eviction > >>>>> problem, perhaps a configurable option to just say account memory i= n > >>>>> memcg for all allocations done by this process, and state yes you c= an > >>>>> work around it with allocation servers or whatever but the behaviou= r > >>>>> for well behaved things is at least somewhat defined. > >>>> We can have that as a workaround, but I think we should approach tha= t > >>>> differently. > >>>> > >>>> With upcoming CXL even coherent device memory is exposed to the core > >>>> OS as NUMA memory with just a high latency. > >>>> > >>>> So both in the CXL and UMA case it actually doesn't make sense to > >>>> allocate the memory through the driver interfaces any more. With > >>>> AMDGPU for example we are just replicating mbind()/madvise() within > >>>> the driver. > >>>> > >>>> Instead what the DRM subsystem should aim for is to allocate memory > >>>> using the normal core OS functionality and then import it into the > >>>> driver. > >>>> > >>>> AMD, NVidia and Intel have HMM working for quite a while now but it > >>>> has some limitations, especially on the performance side. > >>>> > >>>> So for AMDGPU we are currently evaluating udmabuf as alternative. Th= at > >>>> seems to be working fine with different NUMA nodes, is perfectly mem= cg > >>>> accounted and gives you a DMA-buf which can be imported everywhere. > >>>> > >>>> The only show stopper might be the allocation performance, but even = if > >>>> that's the case I think the ongoing folio work will properly resolve > >>>> that. > >>> I mean, no, the showstopper to that is that using udmabuf has the > >>> assumption that you have an IOMMU for every device doing DMA, which i= s > >>> absolutely not true on !x86 platforms. > >>> > >>> It might be true for all GPUs, but it certainly isn't for display > >>> controllers, and it's not either for codecs, ISPs, and cameras. > >>> > >>> And then there's the other assumption that all memory is under the > >>> memory allocator control, which isn't the case on most recent platfor= ms > >>> either. > >>> > >>> We *need* to take CMA into account there, all the carved-out, device > >>> specific memory regions, and the memory regions that aren't even unde= r > >>> Linux supervision like protected memory that is typically handled by = the > >>> firmware and all you get is a dma-buf. > >>> > >>> Saying that it's how you want to workaround it on AMD is absolutely > >>> fine, but DRM as a whole should certainly not aim for that, because i= t > >>> can't. > >> A bunch of good points you bring up here but it sounds like you misund= erstood me a bit. > >> > >> I'm certainly *not* saying that we should push for udmabuf for everyth= ing, that is clearly use case specific. > >> > >> For use cases like CMA or protected carve-out the question what to do = doesn't even arise in the first place. > >> > >> When you have CMA which dynamically steals memory from the core OS the= n of course it should be accounted to memcg. > >> > >> When you have carve-out which the core OS memory management doesn't ev= en know about then it should certainly be handled by dmem. > >> > >> The problematic use cases are the one where a buffer can sometimes be = backed by system memory and sometime by something special. For this we don'= t have a good approach what to do since every approach seems to have a draw= back for some use case. > > This reminds me of memory.memsw in cgroup v1, where both resident and > > swapped memory show up under the same memcg counter. In this dmem > > scenario it's similar but across two different cgroup controllers > > instead of two different types of system memory under the same > > controller. > > Yeah, nailed it. Exactly that was one of the potential solutions I had in= mind as well. > > It's just that I abandoned that idea when I realized that it actually wou= ldn't help. > > For example imagine you have 8GiB system and 8GiB local memory. So you se= t your cgroup limit to 12GiB. But when an application tries to use full 12G= iB as system instead of a combination of the two you still run into the OOM= killer. Yup to solve this with kernel enforcement, we would need a counter that includes both types. Then what if that system memory can be swapped and exceeds a swap-only counter. Yet another counter? (dmem, dmem+system, dmem+system+swap) :\ > > memsw doesn't exist in v2, and users are asking for it back. [1] I > > tend to agree that a combined counter is useful as I don't see a great > > way to apply meaningful limits to individual counters (or individual > > controller limits in the dmem+memcg case) when multiple cgroups are > > involved and eviction can cause memory to be transferred from one > > place to another. Sorry I'm not really offering a solution to this, > > but I feel like only transferring the charge between cgroups is a > > partial solution since the enforcement by the kernel is independent > > for each controller. So yeah as Dave and Sima said for accounting I > > guess it works, and maybe that's good enough if you have userspace > > enforcement that's smart enough to look in all the different places. > > But then there are the folks asking for kernel enforcement. Maybe just > > accounting as best we can is a good place to start? > > So we would account to memcg, but don't apply it's limits? I was thinking just do the accounting independently for each resource and not rely on the kernel for enforcement of combinations of resources across controllers. (The status quo.) That shouldn't affect how memcg enforces limits. If we could compose a "synthetic" counter file in cgroupfs at runtime that combines multiple arbitrary existing counters I think that'd help address the enforcement side. It'd also conveniently solve the memsw problem in v2 since you could combine memory.current and memory.swap.current into something like memsw.current and set a memsw.max, and only users who care about that combination would pay the overhead for it. > Mhm, that's a kind of interesting idea. It at least partially solves the = problem. > > Regards, > Christian. > > > > > [1] https://lore.kernel.org/all/20250319064148.774406-5-jingxiangzeng.c= as@gmail.com/ >