From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9C6BC021B2 for ; Sun, 23 Feb 2025 08:08:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B13D76B007B; Sun, 23 Feb 2025 03:08:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AC3DB6B0082; Sun, 23 Feb 2025 03:08:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B2AE6B0083; Sun, 23 Feb 2025 03:08:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 77C076B007B for ; Sun, 23 Feb 2025 03:08:17 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D2CF21CC70D for ; Sun, 23 Feb 2025 08:08:16 +0000 (UTC) X-FDA: 83150481792.06.2C6FA38 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf17.hostedemail.com (Postfix) with ESMTP id A26B940010 for ; Sun, 23 Feb 2025 08:08:14 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740298095; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Pn3XNq1tYJO7oqoL1Rue3BanohsQKNKcU9H0QePUb4c=; b=NPeoSJMGDM8c2H1quBnhdYPgfxAj9otDUBQfuweJTLSaLoERAd1K2paOgvCyoXeWDwBVzC M4YQuyx0B5kqn2mrDkqhvoEBCiunHMIYJduqYZm5SRsYurY3LI3bcxtq0RTjWReyGZp2rU /I5Zy4PEBaaDEEXjNYigk61G/ifZWfo= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740298095; a=rsa-sha256; cv=none; b=B2ERuzBnjkDeRHfnYs4Bz8mYDyZcXz4wZ8ehe14gnaLEifYNW5IjjX1YmyJzAakwA5C/kz qw0PG3+xnIpCfWoI9LdJdiKLw1t1okBdH0WfC5PfrAS5mZQW6JbFCg2qBJI7HVDmcJx6Ut lwDPTfm9oIxCpP6cc8yirthMp4TySPE= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A3D74176A; Sun, 23 Feb 2025 00:08:30 -0800 (PST) Received: from [10.163.40.63] (unknown [10.163.40.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 306AC3F5A1; Sun, 23 Feb 2025 00:08:11 -0800 (PST) Message-ID: <8763a109-a687-4e1e-a6d8-9b163031b77d@arm.com> Date: Sun, 23 Feb 2025 13:38:08 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [LSF/MM/BPF TOPIC] The future of anon_vma To: Lorenzo Stoakes , lsf-pc@lists.linux-foundation.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org References: Content-Language: en-US From: Dev Jain In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A26B940010 X-Stat-Signature: ofd1tmeqnjp4uqca69xdgz7ybsb46fut X-HE-Tag: 1740298094-471612 X-HE-Meta: U2FsdGVkX1/mgTYM/MjJ9q8ezTzsrNXPxXHQd1J/J8BZe7kVtNX9+cq+AkV7CYzFc433vZZIGEAVtbmyD9Btpu7+Em/EZInxeyFZ4gDgAFTOTqdvcryP/Na7S739GELhFbgTCxpTUOYhz4mMdVfxBCBuJCLys5FxgR4YWEqoA+cxXEZVTjLzAUuND+8GQEAHurFkPfdN5r9azZAs1Msg1lwkmrkcUc2qbXAncSkSm+v/zlH6VzBu7TuVbOOCmgR4HrKhQMjEP1RiVdwkQJVqnZz2tQNIMD1RlpoAVhkvc1EGtk02Qy0p40r4iI7zjfpsDWsbwgeCoZgNSHj6oyNFYWxoI5r9GWbwm79bMzpqcAFKl1Y3bI5sIuvVL+k/UOMiEwlNezI12M/fy6UyOz+xi95rSUCZFHf8zO44k++VZK6+fPYNgdC6j1q5V8//5qAtkxH9IJ2Qx/SYfPOujiJkxSv7hFCz5BPlLRb5lr4BUoUoeT3Vl82HgCjwnuVWlrlX7baYySq94tU4uyzSwvf9Pjov+edYqvzNhktAvbt8YBm4T3z8QlG0JucJZUwNJlzzypiCcAQ7TmXJVacplGFDAS8bl3Yo9YTBkqRT/I/wv5ajk0q3WZ3saahBjW5lVB4kTzGU2hu4A85UL1OKz69A+wpxF554zp5dqsfwamHYsiTATiIULPx0BQMgpYh4YsdgGWkViuX2cQrPhIlnkzWY/hRrciYeOzpmTjp8S1MFW5utc3Lazn/EKWU18jAacIbn0SG0F7N7aYATLcHEz+itszBw6VGx4o8O1QUiGEcBckEIAR+6WPh+FDmShu23cerHItopHXf4pMOg5Q4TE5KfX7WlvfN/PsnC+noLyuyxjHV265/ciuB1Kyb3VGWKviU5IC14/KygELdbYGJDXb9/V8M+1WbkBpEG6dJbelNykI+aZaDBIBcrCcMqC/g98LZclnG+sGCpLnRC5ZCCw+9 mK0bOKap CqQ4wafSPB6ESu286Ga7RFTfbPT6nTPDfM3kAp4g5ASxEFGniosROTFMhETLT38djsqFmkn5jfXR1dwhDG8Sx9qfNdC8fq3JjdZy/pVhhcjMFdsFHcULg8iH0wJKq+S3fCo4MRFWshO4DvOpd2G8Nd4ptp0PUeO6cZiGjhU0tc0kWhUthVx4zByjNfALzZPgtqRWn X-Bogosity: Unsure, tests=bogofilter, spamicity=0.478561, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 09/01/25 3:53 am, Lorenzo Stoakes wrote: > Hi all, > > Since time immemorial the kernel has maintained two separate realms within > mm - that of file-backed mappings and that of anonymous mappings. > > Each of these require a reverse mapping from folio to VMA, utilising > interval trees from an intermediate object referenced by folio->mapping > back to the VMAs which map it. > > In the case of a file-backed mapping, this 'intermediate object' is the > shared page cache entry, of type struct address_space. It is non-CoW which > keep things simple(-ish) and the concept is straight-forward - both the > folio and the VMAs which map the page cache object reference it. > > In the case of anonymous memory, things are not quite as simple, as a > result of CoW. This is further complicated by forking and the very many > different combinations of CoW'd and non-CoW'd folios that can exist within > a mapping. > > This kind of mapping utilises struct anon_vma objects which as a result of > this complexity are pretty well entirely concerned with maintaining the > notion of an anon_vma object rather than describing the underlying memory > in any way. > > Of course we can enter further realms of insan^W^W^W^W^Wcomplexity by > maintaining a MAP_PRIVATE file-backed mapping where we can experience both > at once! > > The fact that we can have both CoW'd and non-CoW'd folios referencing a VMA > means that we require -yet another- type, a struct anon_vma_chain, > maintained on a linked list, to abstract the link between anon_vma objects > and VMAs, and to provide a means by which one can manage and traverse > anon_vma objects from the VMA as well as looking them up from the reverse > mapping. > > Maintaining all of this correctly is very fragile, error-prone and > confusing, not to mention the concerns around maintaining correct locking > semantics, correctly propagating anonymous VMA state on fork, and trying to > reuse state to avoid allocating unnecessary memory to maintain all of this > infrastructure. > > An additional consequence of maintaining these two realms is that that > which straddles them - shmem - becomes something of an enigma - > file-backed, but existing on the anonymous LRU list and requiring a lot of > very specific handling. > > It is obvious that there is some isomorphism between the representation of > file systems and anonymous memory, less the CoW handling. However there is > a concept which exists within file systems which can somewhat bridge the gap > - reflinks. > > A future where we unify anonymous and file-backed memory mappings would be > one in which a reflinks were implemented at a general level rather than, as > they are now, implemented individually within file systems. > > I'd like to discuss how feasible doing so might be, whether this is a sane > line of thought at all, and how a roadmap for working towards the > elimination of anon_vma as it stands might look. > > As with my other proposal, I will gather more concrete information before > LSF to ensure the discussion is specific, and of course I would be > interested to discuss the topic in this thread also! > > Thanks! > Thanks for this, as a beginner I have tried understanding the rmap code a million times, after forgetting it a million times, thanks to the sheer complexity of the anon_vma and anon_vma_chain. Whenever I read it again, the first thought is "surely there has to be some better way, someone must figure it out" :)