From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E144C7EE25 for ; Wed, 7 Jun 2023 20:17:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1D6C6B0072; Wed, 7 Jun 2023 16:17:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DCD696B0074; Wed, 7 Jun 2023 16:17:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6E428E0001; Wed, 7 Jun 2023 16:17:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B38AD6B0072 for ; Wed, 7 Jun 2023 16:17:53 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7E14AC04D5 for ; Wed, 7 Jun 2023 20:17:53 +0000 (UTC) X-FDA: 80877062826.06.4735209 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) by imf12.hostedemail.com (Postfix) with ESMTP id 87CC840009 for ; Wed, 7 Jun 2023 20:17:51 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=hy+CFSje; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686169071; a=rsa-sha256; cv=none; b=wUaKBATJwFgeo5ubhFMyC39RTjm9aO7nGfYkpYC35zn701gPtn/IN50AgLcXjiZLjbVfzE fbtq7p7zmPQs6PMB8nmbYsYBl3APHWYruWn6vUf6juNIImeKZ8ZhH8QVk2PrSJuv3wFedr zt1iBeGsCaDDXVhu2RdSllpOBVLiArE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=hy+CFSje; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686169071; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZjIpS+9DDWPdZUeti9i32IuBAuaa2dTEgXNkRKMTJXY=; b=MdTeIXo2Vila3870vlZiN84+9xlb+r+ukZf9RmnYSKLiQ8Gu9FJYW/pxt4dvLAzIC5kIP3 84XHKyD0zh9rO46a0eh3KiF81VYBClqQYbf+/QHQbvPqh/cNd0EWJLNAwXkgzEAJLsplld QI5Eo7XHGrzp0tzIoouKQsUQ0xD9hx4= Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-30aeee7c8a0so844748f8f.1 for ; Wed, 07 Jun 2023 13:17:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686169070; x=1688761070; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ZjIpS+9DDWPdZUeti9i32IuBAuaa2dTEgXNkRKMTJXY=; b=hy+CFSjePMFciT4vJLq23m5nIHJJ/eUUVPO20yUoLr8nkwMcAoHEQ1ico1MAJaUgg0 GFdNWcDjcl10c5PXa9nsRxw1bqld3fbHodE0V1IvHrmVjrqsXD7SRYJ5kEMduYk/XJLZ BaoSgyxwrq22pqNwTWNTS9NwrvCsCfLs7N8DEki/6UV0KA2ER3uEomqprqWMai3MleoW rB1rdw6lY7icvCO/LvKtMySE6zGvqrt4XioZ8e6PhYMJP9PpebmJCZcwKSvXZyGtGpoV 8qCBC4Bs1pA+cfOs140wl4v9vSRtnF7rEOk6XphVKvtNWw6ljh/WxyJKxeXMpikjJuYm 8ioQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686169070; x=1688761070; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ZjIpS+9DDWPdZUeti9i32IuBAuaa2dTEgXNkRKMTJXY=; b=Fu+QG/iFbX07/0GLd98kUKQ+Z08R8lqHw8vAe1wtNa2HSQF0xLjQUd+xq0/4ANVFCc DpOsAYqiqtHfUYypE8QDBUfqrKDZIh4Ik5rbI/1q8o8hZtuGFU0uNkPcMObQV6BxHiOo W15vQ/4MeqcnI/3vVwouXbDXOMk8qd2tIgTs/FTBRze0IbAPfOvM+EHkkK9cNA/KO8Ue 6Tw6e+/m3hynPB0jCwIKerk8ASFEo+M4Jk9fE/U2eaSOWHbW1AOoWFMxUnG2FJhb97H8 8VGUFSjSbz96mA/SIzYPEwE4qd59xe9xDvK+iO3uZaVONxo00vUpGZUOs8tDt4/kpaS0 UwNg== X-Gm-Message-State: AC+VfDykwCTunXML/+k+WQrBFLLe+qjygE1+nA6r+KCVBiIngVQ9nBm6 hbdHxulXU8nPtX0oRZ4X8tA= X-Google-Smtp-Source: ACHHUZ5uMSzIHR+faL+dCqJzuHGY6d25HQf27aMv77FsC5gjzGXNnO07p8uGNwJC/Or4lHmqK7+IRA== X-Received: by 2002:adf:fa03:0:b0:307:869c:99ce with SMTP id m3-20020adffa03000000b00307869c99cemr78990wrr.21.1686169069641; Wed, 07 Jun 2023 13:17:49 -0700 (PDT) Received: from localhost ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.gmail.com with ESMTPSA id z9-20020a5d4d09000000b0030e5c8d55f2sm2192386wrt.6.2023.06.07.13.17.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jun 2023 13:17:48 -0700 (PDT) Date: Wed, 7 Jun 2023 21:17:47 +0100 From: Lorenzo Stoakes To: David Hildenbrand Cc: Peter Xu , Lokesh Gidra , Axel Rasmussen , Andrew Morton , "open list:MEMORY MANAGEMENT" , linux-kernel , Andrea Arcangeli , "Kirill A . Shutemov" , "Kirill A. Shutemov" , Brian Geffon , Suren Baghdasaryan , Kalesh Singh , Nicolas Geoffray , Jared Duke , android-mm , Blake Caldwell , Mike Rapoport Subject: Re: RFC for new feature to move pages from one vma to another without split Message-ID: <67045cc5-b35b-4274-b22e-21b3920e33e1@lucifer.local> References: <27ac2f51-e2bf-7645-7a76-0684248a5902@redhat.com> <3059388f-1604-c326-c66f-c2e0f9bb6cbf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3059388f-1604-c326-c66f-c2e0f9bb6cbf@redhat.com> X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 87CC840009 X-Stat-Signature: s6kj5ad6e7cku9mhqok4qcui4ujcpx16 X-HE-Tag: 1686169071-61390 X-HE-Meta: U2FsdGVkX1/KGxYJa6lT/iGE0bAnsShchaOBewr+dh582CvRza8VsYH56xGQ8qZcPwr7BNM8eTcLT//lmnnnRj7lcRV8fOrPPM/TzQuHouuQa4KQQZM9o9lBSwOqVt50dO9VP+bnIUb9d/pe7ocQa5jEQicFVBgDE2ziV5YqlTFjqbBnZvwOxgKqUxP89C3QgNr2WKxs75wIE6i3cySo+C1kbIKhokIc3OnMGwoSLqBRSeA1nRsmnGVXu8VY+P/daQtANcew0j4IL5gLzARzv9yse0gh3PecX1dsWucs1d8LXnWMzomEp/Bub4W/1jFL7avrEmUsZLG6LVECa/0YzbgPMy2oqDUVMvZXpsflizqwU400ZCUI0SZhkB4nVPI8TbTH5H+7zqG33xbOr2YSuMVvmTxeSxA/ipRvZSc+35EkZUwYdd8w1gqfOn39wGZPO4q5tM/KS3iaxhukwOmuRafLJJ1ifn/4wRwTeAY8g575pdtcCQEJ3Pk4F8z/tS6I/5dMIv8uPJCKImf9W4Ih03MgVIb20zekTauj5Cw+pEHOWUaDILo2jJP2mfZ7529zsYkNhcq+jVVQMGMGIZx6P0Ciq6B1RTd8zQnkZiq75zdIbhCGM4UOtXzaSn1qiLwKA2PH6Of5Lq8dxNpD9pK7unFaKZlgTajF5AE44IXnWX6Bgxw0Tupmc/V8OfQfCsaveJ+nXPiKbpHjrgLl/HkTQY26fqJX9iz8EzxjNpEX0Nhd7Qt5lT78fElb/BWPzhb0fGksLUgc9cc4SITrHIO3j8BR3OIkWCUceJ2gC51vQqiC2EAvnFVfqvlTS1WzZFJpdeXUSvfvlD9KnT4LlDlMzxEkhUxdAuP8BSqy1XtfRMYIV1Sgmve4qe3IzNDKm6aNg0S0vORiJ0hPhWcAHP+hxdNnN0wovBnBAm2+QTdb/lvfRubRAUo8Q56Gmdn+aecuI4d4b4fut0IpwnMNnGp VJ4je4sL nVPYrzhDbr6RqIFTnA2YtGO3b09JGUo9TEGmF50tPIpPruEO7dE9nEfbcAIAMpGE4gG6Ll2xuUBgA9Wo+XWfwaepwEw4JGYEHGonQWtJdaYix2QOrn42270S52ZSxjRybdknbNgk985t3L/ImLh1YBPRLOKqPk0Av+fMmSgiYslnG4Xb1PPQUsL5IWzPtMRSb0+8h4cQgIWP1yuVQBLWFQACPHLaOGIjSzrJKfjMsJWve5dq/NL3tP/KPAF54hX7UbTM+TyQ/WJg9KDAHlxmzH+LGoaQPtX3X4pHvEuI2Ox+ziJJB7RsVDT4J2bMJIGKVBMShJ9aAlgk03EulhtNYzwi0dUSg933UmJ3V/M5rnUXUolaWUksyiL4BYZRVNQ5rHzjlDiEho4jT3Kd1Wwe8w2S6ERVq9TTGqAiFYL45I0PrfOqFj/yx1NPz1tb1CnKkrMTu8AYf1sOvTvi5NK0w72J/fg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 13, 2023 at 10:10:44AM +0200, David Hildenbrand wrote: > For RMAP and friends (relying on linear_page_index), folio->index has to > match the index within the VMA. If would set pgoff to something else, we'd > have less VMA merging opportunities. So your system might work, but you'd > end up with many anon VMAs. I thik the reverse situation, i.e. splitting the VMA, is the more serious one, and without a correct index would simply break rmap. Consider:- [ VMA ] ^ | [ avc ] ^ | [ anon_vma ] ^ ^ ^ / | \ page 1 page 2 page 3 If we unmap page 2, we cannot (or would rather not) update page 1 and page 3 to point to a new anon_vma and instead end up with:- [ VMA 1 ] [ VMA 3 ] ^ ^ | | [ avc ] [ avc ] ^ ^ \ / [ anon_vma ] ^ ^ / \ page 1 page 3 Now you need some means of knowing which VMA each belongs to - we have to use the folio->index to look up which anon_vma_chain (avc) in the anon_vma's interval tree (which is keyed on folio->index) contains its VMA (actually this could be multiple VMAs due to forking). mremap() seems to me to be a lot of the reason we don't just put vma->vm_start >> PAGE_SHIFT in folio->index the fly, as when a block of memory is moved, we don't want to have to go and update all of the underlying pages, so we just keep the vm_pgoff the same as the old position even after it's moved. We keep this in vm_pgoff so we know what pgoff's to give to new pages to put in their index fields. As a result, we obviously wouldn't want to merge an mremap'd VMA with that special handling with one that didn't have it to avoid the pages not being able to be rmap'd back to the correct VMAs, so requiring vm_pgoff to be linearly monotonically increasing across the merged range achieves this. Doing it this way keeps the code for the VMA manipulation logic the same for file-backed and anon mappings so is (kind of) neat in that respect. Oh as a point of interest there is _yet another_ thing that can go in vm_pgoff, which is remapped kernel mappings via remap_pfn_range_notrack() which puts PFN in there :)) (as you can imagine I've torn out my rapidly diminishing hair writing about this stuff in the book) > > > Imagine the following: > > [ anon0 ][ fd ][ anon1 ] > > Unmap the fd: > > [ anon0 ][ hole ][ anon1 ] > > Mmap anon: > > [ anon0 ][ anon2 ][ anon1 ] > > > We can now merge all 3 VMAs into one, even if the first and latter already > map pages. > > > A simpler and more common example is probably: > > [ anon0 ] > > Mmmap anon1 before the existing one > > [ anon1 ][ anon0 ] > > Which we can merge into a single one. > > > > Mapping after an existing one could work, but one would have to carefully > set pgoff based on the size of the previous anon VMA ... which is more > complicated > > So instead, we consider the whole address space as a virtual, anon file, > starting at offset 0. The pgoff of a VMA is then simply the offset in that > virtual file (easily computed from the start of the VMA), and VMA merging is > just the same as for an ordinary file. This is a very good way of explaining it (though mremap complicates things somewhat). > > -- > Thanks, > > David / dhildenb >