From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74D12E77180 for ; Wed, 11 Dec 2024 20:20:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D17B76B0082; Wed, 11 Dec 2024 15:20:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CC78F6B0083; Wed, 11 Dec 2024 15:20:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8EFD6B0085; Wed, 11 Dec 2024 15:20:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 99BE36B0082 for ; Wed, 11 Dec 2024 15:20:43 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 420D88130B for ; Wed, 11 Dec 2024 20:20:43 +0000 (UTC) X-FDA: 82883795652.24.DDBD05A Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf09.hostedemail.com (Postfix) with ESMTP id F409714000B for ; Wed, 11 Dec 2024 20:20:23 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=htCtK7J5; spf=none (imf09.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733948422; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t1WFEmjVsxcDNS6E0UsJx7pO3Iuf1D+O0LkgcXU/hok=; b=ourOkYPjfvcUhxthaMgEsUn43iNa7HsWnEx/kc6Z5jHdMLvL7RnJPaWqjunc9e/o0ZiD78 8xmyrcm0G7nrCYVHgRfX11262zMzpgAzM/1EW/sOelpUblUIcDGlqHR+LWEPk/CaZnN1U4 7/HUZHLMktxLptgRZkJGjBuYvvymWJk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=htCtK7J5; spf=none (imf09.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733948422; a=rsa-sha256; cv=none; b=S5gZiHBy0SLjqEMDCqVm4njyDuAUq4PZBNbQ9OHvQWpGShY3t87S/cgl9jVWCwOt7euhaR cEyVlGpNyux6xBst9q7HrWMkGvIdeXi/GYLNgLw2EHz9qe/02HRP0fzXyVN4W6/iNGjoXc /eyemRRwBz/cRuRAyyLbrruf75YMBn4= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=t1WFEmjVsxcDNS6E0UsJx7pO3Iuf1D+O0LkgcXU/hok=; b=htCtK7J52uvo/vXzjM3f68XAAR ggYQAta8fi9ApWQIqGQHqdxViphK3BuJ0BmUZf3/oLLg355tTecNomnDGwFbZGL61PC4YME5QYl35 UXj3YqAMd3Ok3jIebDZT7b0V+w5yqSEQRFJ/eqXc6sMhhXZeXpZkhwC9ye6GhHbuLBy/9Hma5QoKA eDYqsQlyxPXaBYdujLstccmeO4QBhQnSKClgqooGEhEwnPKU9RFDpMfmJfX3ZizL2td6/I882uMAk NuZMShxOUo+DJT2LAejbmjBhWp5xTlTXDf0gNG0yoFvdrzXo5wDLZ4l5GZ9Bd7Cei6taEEZIfrHSv iEG5RGzg==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tLTCH-00000000g8e-0szL; Wed, 11 Dec 2024 20:20:37 +0000 Date: Wed, 11 Dec 2024 20:20:36 +0000 From: Matthew Wilcox To: Shakeel Butt Cc: Johannes Weiner , Andrew Morton , Christoph Hellwig , linux-mm@kvack.org, Michal Hocko , Roman Gushchin , Muchun Song , cgroups@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH 2/2] vmalloc: Account memcg per vmalloc Message-ID: References: <20241211043252.3295947-1-willy@infradead.org> <20241211043252.3295947-2-willy@infradead.org> <20241211160956.GB3136251@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: F409714000B X-Rspamd-Server: rspam12 X-Stat-Signature: gq6kqpki5xg6dxhjjcbmkwyuxmhga8r5 X-Rspam-User: X-HE-Tag: 1733948423-402150 X-HE-Meta: U2FsdGVkX18kzxu/Sx9KW0LPA8db6Bf9h9aQb6WzyaZ5/AduLdO0ykh+HoXqXSRMJqaTLVdi0B8LhwuukuosClAeEp77vgBiV4Ec+/0LBP6yu7evYq8Qkn6uydhPjH2C8TrhWflzHupBftc/Fk0/6JvF9Ldxgrlsb0mzcGye0wxeovZXu+aiVOHC9E37jnaIbeeC8XMQu0fAiqifcwcZrqhJbyn9AWk3bcCMzuWbYdYukkpWCU3i4nhtANLO7kA2JjfhX2pT/IZ8FJXQmTbpMMrl1wabZTXxoot2sNfSbs9ixU6WeOHxmY0u6SMtDi15D4ql+EOjttwAnL4Y3mtKY0dx0nnnR3meHt4DILrTpAve/yS7OEeibCbSy7g/q4Gcsi4fj6gRcnYHpwcMeIPS9COpg4rTYJogaH5LZsr2AL6aXAxnirBi01VE5Ou0VOdOUA55NQ9s/eSyo2o0pH80dbGLWoRhy9ewTuDgng2Win7iRYlHVAQZvHJO1IQm2roX2Z6N166GODWMqh3Nhi9dMs4+AND9tnB9XnfI4TQ5VJwVn9PqYSagPQe2haKf/iM9uMsEydovH9PPj41PZuYtzhlhyaajCGLNT6djsDcE5gH0WRfUobt3kPfIiydrD+IACa4TTIR7IiAYoj2EranebANph9HITDG8h6K2bKBTwaA51NJgWQxf8ewx5dp/oz8AmeRIx8piWaivGvJ5Fu19QXCHIqjm9NESNIFJkEs6Bq8PmDq7S0/e8UGxfopa9tRw5tIeNpYbS4QzWprBMAgYuFvpL/RSwF/ZW7e5RvPhduHQioBzPwXdK+nXG6bAIBIghYF8Xuh8CQ88ilFDi+2DWzgEGG6jKHrFxJTOmwkrH6tn30xBTtV7wNqKos5pssqkPjobBB+zw/OCKjrfl9r5GwbTyW9UfUvbVNmHU925D8/K6AnI7tLg1Wfgsi1rtd+NZFfFO6KjuqH41oWZkzx EI4kY7F/ eOne9Y5u3y7i69MZzrMWCWYxdtZ1hKhAtCPza2F9fHB6Fjw129zBc7N2jjRfB3egGHQ/uwE8B2R15bpl+s1PxwT0C27jJAFl5JHwYt+vVDAspB6ou6ZC53TskPVrm/WqsGX77j24WkfRzHPQZQQ+06EllX+eQcY3XRQIxdbdi7qbeUG7fdtzfx4wURmRSjlF9lpsq4DgeMws3nczD8wEnaQsgyaf14R6LfawkisgyZP7jiKs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 11, 2024 at 11:32:13AM -0800, Shakeel Butt wrote: > On Wed, Dec 11, 2024 at 04:50:39PM +0000, Matthew Wilcox wrote: > > Perhaps you'd be more persuaded by: > > > > (a) If we clear __GFP_ACCOUNT then alloc_pages_bulk() will work, and > > that's a pretty significant performance win over calling alloc_pages() > > in a loop. > > > > (b) Once we get to memdescs, calling alloc_pages() with __GFP_ACCOUNT > > set is going to require allocating a memdesc to store the obj_cgroup > > in, so in the future we'll save an allocation. > > > > Your proposed alternative will work and is way less churn. But it's > > not preparing us for memdescs ;-) > > We can make alloc_pages_bulk() work with __GFP_ACCOUNT but your second > argument is more compelling. > > I am trying to think of what will we miss if we remove this per-page > memcg metadata. One thing I can think of is debugging a live system > or kdump where I need to track where a given page came from. I think Umm, I don't think you know which vmalloc allocation a page came from today? I've sent patches to add that information before, but they were rejected. In fact, I don't think we know even _that_ a page belongs to vmalloc today, do we? Yes, we know that the page is accounted, and which memcg it belongs to ... but nothing more. I actually want to improve this, without adding additional overhead. What I'm working on right now (before I got waylaid by this bug) is: +struct choir { + struct kref refcount; + unsigned int nr; + struct page *pages[] __counted_by(nr); +}; and rewriting vmalloc to be based on choirs instead of its own pages. One thing I've come to realise today is that the obj_cgroup pointer needs to be in the choir and not in the vm_struct so that we uncharge the allocation when the choir refcount drops to 0, not when the allocation is unmapped. A regular choir allocation will (today) mark the pages in it as being allocated to a choir (and thus not having their own refcount / mapcount), but I'll give vmalloc a way to mark the pages as specifically being from vmalloc. There's a lot of moving parts to this ... it's proving quite tricky! > I think we can go with Johannes' solution for stable and discuss the > future direction more separately. OK, I'll send a patch to do that.