From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F538C4167B for ; Wed, 29 Nov 2023 19:45:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0EBAB6B03DF; Wed, 29 Nov 2023 14:45:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 09C406B03E1; Wed, 29 Nov 2023 14:45:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA6A26B03E2; Wed, 29 Nov 2023 14:45:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DBD526B03DF for ; Wed, 29 Nov 2023 14:45:46 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AC84E1405F1 for ; Wed, 29 Nov 2023 19:45:46 +0000 (UTC) X-FDA: 81512021892.22.7488E14 Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf28.hostedemail.com (Postfix) with ESMTP id C6BD0C0027 for ; Wed, 29 Nov 2023 19:45:44 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=Hw4jpg9S; spf=pass (imf28.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701287145; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fNNsoM3smwE3RnQXjosvlH5YCUc/qVQE7cVYHj28TpE=; b=8fC3CNcmks5diqrIOvg4MC91jyJGa2vMOFdneIEPWbdbELp364aGmC3CeopPI9qp8wpxqL kIiF7P99QDALDxZj9mblRal3PDNdZPiP6eANRhpK8iEIRauuUW8/rouoFUmjdGujGFU5mS w44k/YhZnFT7k+xabZqiqLvNhzAZQ80= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701287145; a=rsa-sha256; cv=none; b=R7eCHE30MdLXnfkl5vjV9MViWU1vaXKdSkv/AQZdJGSKEMVodt4zspiirwtN2TrxGxWMmm yhmSb2fPuribL2g1MrfCHdQkHs5V8xbVwTCpAbf67Ni0oqPzu83vNmyTswtCtVQQbr9hjL nHH7SOGZQarBt/F00YSg4FNlthNrhFE= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=Hw4jpg9S; spf=pass (imf28.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2c9c39b7923so2585731fa.0 for ; Wed, 29 Nov 2023 11:45:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1701287143; x=1701891943; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fNNsoM3smwE3RnQXjosvlH5YCUc/qVQE7cVYHj28TpE=; b=Hw4jpg9SFD20ySF3xjTpZFoBD5OoMc//z9jT0ozzOy6OHbUJb9feqJjqOTr9ebL5sL I3Oam34XAVq9ZRiZFVqh/qIOK2SMW1i4qJDbVqFQULLT0Q0ukoTr4CvlkfrrcMi3b88Z SR7l+E1dE5vAY8Y7YvnIVQicVV9CHDZF2So+YTU9hCbsqxF9wkXAnWjZdZxpn3cNmj2Q d48E70ADuWUJHoX7egjIY27o6y/a9MQQCfmxg6EKuJWgQjmCm+nkR0peY7PAOgcnD5ie 0tTdvQtyOPpTkTu4Famez7/IxLI2c6GQd8qnMPsoDPr1dGv8TTJX+7CwwaqPWKFrVK1Y vKMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701287143; x=1701891943; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fNNsoM3smwE3RnQXjosvlH5YCUc/qVQE7cVYHj28TpE=; b=Ps/wVTpODxF0hWSKnNV1mD96KVOSK0ybNImoiPc0suvUX1hXuGHIv043DypXb7BxuG JDM+PfNXkt8XHxw6gE0ChxvgxBfLydcpytL4nYsNl2k4wdsbBwRFejps80sF849IQeQQ JJ1iUfFk8E4ypA/xGxcc7T1o0E3jVb1DTnZRNBhWCiIRdOfdS7A+dvikM76FJ2X5EHDW Ns/Ht+ZpmHRCZmfEdGPgGVfAVYHe4EQStvVor6K39d2TI68WDqzZdPDB3FCM+esAZ79x HPMCQygrFJ9wHwr0Sy8UFXYHDjwe+nIkr7fkuAi/MT65vSfSkvSZmXX+oS381cSZEgrS L4NQ== X-Gm-Message-State: AOJu0YyetpGb+cnIm5LICF+L3g2mELTJDY9a0eqmZl2djHYd7cO4kW2/ AtXXXxpSgbsVYfoExurw/9xI9RnJCux8b3SmAajjag== X-Google-Smtp-Source: AGHT+IHo1CBmTuUhAhmoXR/KJsiUOGXe0H72uTjo3bJ8/f4ydzGN+L/1B0mI7eNNH55dLwJ3p7aHezZlyWQqax7aj9w= X-Received: by 2002:a2e:9b59:0:b0:2c6:ece6:5b65 with SMTP id o25-20020a2e9b59000000b002c6ece65b65mr12569535ljj.10.1701287142874; Wed, 29 Nov 2023 11:45:42 -0800 (PST) MIME-Version: 1.0 References: <20231128204938.1453583-1-pasha.tatashin@soleen.com> <20231128204938.1453583-9-pasha.tatashin@soleen.com> <1c6156de-c6c7-43a7-8c34-8239abee3978@arm.com> <20231128235037.GC1312390@ziepe.ca> <52de3aca-41b1-471e-8f87-1a77de547510@arm.com> In-Reply-To: <52de3aca-41b1-471e-8f87-1a77de547510@arm.com> From: Pasha Tatashin Date: Wed, 29 Nov 2023 14:45:03 -0500 Message-ID: Subject: Re: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h To: Robin Murphy Cc: Jason Gunthorpe , akpm@linux-foundation.org, alex.williamson@redhat.com, alim.akhtar@samsung.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, baolu.lu@linux.intel.com, bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net, david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org, heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com, jernej.skrabec@gmail.com, jonathanh@nvidia.com, joro@8bytes.org, kevin.tian@intel.com, krzysztof.kozlowski@linaro.org, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rockchip@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev, linux-tegra@vger.kernel.org, lizefan.x@bytedance.com, marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com, m.szyprowski@samsung.com, netdev@vger.kernel.org, paulmck@kernel.org, rdunlap@infradead.org, samuel@sholland.org, suravee.suthikulpanit@amd.com, sven@svenpeter.dev, thierry.reding@gmail.com, tj@kernel.org, tomas.mudrunka@gmail.com, vdumpa@nvidia.com, virtualization@lists.linux.dev, wens@csie.org, will@kernel.org, yu-cheng.yu@intel.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: c1xhzo5b7qisxqqwq93om5pwbt5okikq X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C6BD0C0027 X-Rspam-User: X-HE-Tag: 1701287144-276236 X-HE-Meta: U2FsdGVkX19NiN6n3vntchZhSnnbwOtrgFz0y036yRy2/ym2C86jGaxxLN04hXB6HooZCZr8tQxX0Eef1sn5xzGdNMBCxNg8UMKyU1HFBL06TzD8Y1zFSMG5nXgNI1b68mj8G5jw/czVXyPitUm3B8YEpVJe46Aor6xv5FfCejuZsUqF0PH6AK+kkB/MrUuax2nbFQLuZCf/HVn5XOCT68LLJdT+CrOMF48Wgadh5f9Z2gSqU38zNsDAXRZ6FLbHzUoRSuhjyCAZ3DBbqCk1ov9pG+Zi4fsrTZ8VDvxymf6acbSdzU5GY23/eRI5lYuptwjVoo0Vw2RTpmNY2Gz74+NTRIA94iiPNamhblLGd07hI+FwSZ9zWvejFNfVXzRd/n/+Mis3wW5hIvyqAvQ3RXJ9K9FKWWcBit+TrjhLHOJTG8j6AxlCrhHNIdn1E5XB52hu2eIrO70Ncmh3KqOvZ6Req4YeOZhdmPzPDoiyXwSL4ix3KQvvxobMgGC6h0u+aXaJ58EqGEcU5HNKpJtSFwl4fA7nRg1HHYajtT/KOo0hK9+ZQwuOWjhRqIZsEKsxAo5LEXTm1Yb2RZP8qRLsa/kPAx8wnTqUj+Cv8S2NqpFwiNnsKfaYXn22/1UZ7+gfHYnxIZemoKkO+7e314QzaWNv+Q5YRSYKahXKMOiQ32/O91eQurduKmMVWlLg0QqI1/hhb1XH8hPa7J6e2ytsxnQypnssVTnq2i80MZrsfPyG7E9FUcYUeLoKq+cCoYagQoqIqqB5RInpE5znp+J/+Pnah/3+2x9UY5cBabr2ECE+nGwqd0llQg8e4u/quGKYgbZKHqlZzQKPdHymYe6/Iab5SW5guJlpAvGJ7vhQQfm1uPXxqUw5FF/HzPeUei50TPzBQCvhoCUuE66mMno4Qa6T+8lX8HB2NKrbJgZsff/R18pxLhj6Xjez10pJ7+fcDAIUFZdIZYuSRw1RqLQ 21Rehulm 72wAo5MR5p9YglySez3N2/q8GKLgZcuMSIbpHNb9QX61R1DwTJLKUb/cedocFhmswikWo8xRAoTXomsTrvLx4daVJhFI8sOwG66KDTD89qYOzZE0ZQ2mzGTrkblmGZ3bK5yupnIAfiyBnNPk96lXFC8rItHnFVVcQgzgcHm+aZQe3voAbnc22JPuMD0SK13DYmH1S+Ypy8+OmAB3tCj9CdyPI5HK1J9T3kdglBy+McvtRJ9db6aeb9Vio6emuPKs4dZiix1/ZELXPJghHRgHIBmDzhg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > >> We can separate the metric into two: > >> iommu pagetable only > >> iommu everything > >> > >> or into three: > >> iommu pagetable only > >> iommu dma > >> iommu everything > >> > >> What do you think? > > > > I think I said this at LPC - if you want to have fine grained > > accounting of memory by owner you need to go talk to the cgroup people > > and come up with something generic. Adding ever open coded finer > > category breakdowns just for iommu doesn't make alot of sense. > > > > You can make some argument that the pagetable memory should be counted > > because kvm counts it's shadow memory, but I wouldn't go into further > > detail than that with hand coded counters.. > > Right, pagetable memory is interesting since it's something that any > random kernel user can indirectly allocate via iommu_domain_alloc() and > iommu_map(), and some of those users may even be doing so on behalf of > userspace. I have no objection to accounting and potentially applying > limits to *that*. Yes, in the next version, I will separate pagetable only from the rest, for the limits. > Beyond that, though, there is nothing special about "the IOMMU > subsystem". The amount of memory an IOMMU driver needs to allocate for > itself in order to function is not of interest beyond curiosity, it just > is what it is; limiting it would only break the IOMMU, and if a user Agree about the amount of memory IOMMU allocates for itself, but that should be small, if it is not, we have to at least show where the memory is used. > thinks it's "too much", the only actionable thing that might help is to > physically remove devices from the system. Similar for DMA buffers; it > might be intriguing to account those, but it's not really an actionable > metric - in the overwhelming majority of cases you can't simply tell a > driver to allocate less than what it needs. And that is of course > assuming if we were to account *all* DMA buffers, since whether they > happen to have an IOMMU translation or not is irrelevant (we'd have > already accounted the pagetables as pagetables if so). DMA mappings should be observable (do not have to be limited). At the very least, it can help with explaining the kernel memory overhead anomalies on production systems. > I bet "the networking subsystem" also consumes significant memory on the It does, and GPU drivers also may consume a significant amount of memory. > same kind of big systems where IOMMU pagetables would be of any concern. > I believe some of the some of the "serious" NICs can easily run up > hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. > - would you propose accounting those too? Yes. Any kind of kernel memory that is proportional to the workload should be accountable. Someone is using those resources compared to the idling system, and that someone should be charged. Pasha