From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 222E9C77B61 for ; Thu, 13 Apr 2023 08:10:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 88885900003; Thu, 13 Apr 2023 04:10:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81218900002; Thu, 13 Apr 2023 04:10:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68A32900003; Thu, 13 Apr 2023 04:10:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 53822900002 for ; Thu, 13 Apr 2023 04:10:52 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 33427C0248 for ; Thu, 13 Apr 2023 08:10:52 +0000 (UTC) X-FDA: 80675646744.20.DE0D0D6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 9635E1C000B for ; Thu, 13 Apr 2023 08:10:49 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NpqyWQ90; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681373450; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BicKsHmWZzac/YBlZAspRK8qzvBfIdiX2v+jxSpHdUc=; b=033D4U2RnA3zMSWUgbNeQ6e0MO37edgL39aBrtR0FNKcvjiM2h7ZtTaecAt75GYjOd/J+y Z0Z7Iu0Fm81fR1gtGPzJBD6LJmhqXG5006fn0btG22kQr24+geR9l+TpZ+3U6yeMC4JLe9 6z7fc0yZmZWwfqblFWMD9+Qy6THK/JY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NpqyWQ90; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681373450; a=rsa-sha256; cv=none; b=fsGShgUTjcOC6kl3bWY/nn4UwFZAe8dlfRT9WSbbg7mC1IVqMIi6hYAhA8ZeOHfWL8vzaa Z6rBvSjirujMC4+iJX08Ma0u3QBK5wy6pEyZF4RNiUXSOIErX2J0MoQhYqMNxBRjHOIHtx rCcIhsHrls0XGD7apwUtUs6M2VYxHRA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681373448; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BicKsHmWZzac/YBlZAspRK8qzvBfIdiX2v+jxSpHdUc=; b=NpqyWQ90AXUD9xS7thB1IFECNM5IayuitQV6EWlmj6MrwVfTBy0kECP2SlP0Um5dRBDI/a WYKswSJviCdrJ3OAicjlwX4/oqkWcYcP3MUpAsf5oOMoSoHttSWoBjttJS9lUXL5O1t8gm K8RSSIk4IeuRNRvMxop7t31PuxjZ/Qs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-618-kUQWBbk3NESN64bzfdF-lA-1; Thu, 13 Apr 2023 04:10:47 -0400 X-MC-Unique: kUQWBbk3NESN64bzfdF-lA-1 Received: by mail-wm1-f69.google.com with SMTP id n11-20020a05600c3b8b00b003f04739b77aso17130293wms.9 for ; Thu, 13 Apr 2023 01:10:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681373446; x=1683965446; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BicKsHmWZzac/YBlZAspRK8qzvBfIdiX2v+jxSpHdUc=; b=ADiS1tE8vs5AIyVql9ADOKJ/U/okDYVZePNATGPijgY/Y07OI107FdSDAO8cAT8QHD L8QrcCxaGwtS1U3k4KOSKHxg4WfiDfoAiiDKn5EibW2EMi+dcmBHTdJ1mT7O3xmQSszn tl7U9T/AwsTnvjC7P0ldYsPCrfhESR5GDLUj6Z7jj0TKRThyQOzAnsTG14vDViwUf0Oa 252ToBt7VPiyUHbY9lkD4PwctLLNw74Cbz6kfe5vqpCzXoO8iAwdOzy2/Z5Ped7UL9k/ szGS8hJZ21X+sZqW8366/BvVVev0fQPhN+jIGVeB1RoXRE2cb4iNnbfeSrVclF6TQnTs V11w== X-Gm-Message-State: AAQBX9d1zL7+7sCORNE6Ac+J1SD8AXupa0OCIkpf+cWrNdfS5b3s8Y7d E3CaRR48A7sQH3uS1TXIMCivj3k3o5UoJwuCmNcJn6uSfMn/gPdLQBIg2E846NWNfXagOeHs1VC nPaVBdCVI6V4= X-Received: by 2002:a5d:4241:0:b0:2e4:cc81:8a80 with SMTP id s1-20020a5d4241000000b002e4cc818a80mr751830wrr.26.1681373446205; Thu, 13 Apr 2023 01:10:46 -0700 (PDT) X-Google-Smtp-Source: AKy350Z+1v6fqv73Ez8CQYOt3Gxgg06TKHkadfH9tvU0savFWMLQFBC/hyWk7Box+pdqdLyncGNB2Q== X-Received: by 2002:a5d:4241:0:b0:2e4:cc81:8a80 with SMTP id s1-20020a5d4241000000b002e4cc818a80mr751805wrr.26.1681373445820; Thu, 13 Apr 2023 01:10:45 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id v10-20020a05600c470a00b003ef36ef3833sm4836553wmo.8.2023.04.13.01.10.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 13 Apr 2023 01:10:45 -0700 (PDT) Message-ID: <3059388f-1604-c326-c66f-c2e0f9bb6cbf@redhat.com> Date: Thu, 13 Apr 2023 10:10:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 To: Peter Xu Cc: Lokesh Gidra , Axel Rasmussen , Andrew Morton , "open list:MEMORY MANAGEMENT" , linux-kernel , Andrea Arcangeli , "Kirill A . Shutemov" , "Kirill A. Shutemov" , Brian Geffon , Suren Baghdasaryan , Kalesh Singh , Nicolas Geoffray , Jared Duke , android-mm , Blake Caldwell , Mike Rapoport References: <27ac2f51-e2bf-7645-7a76-0684248a5902@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: RFC for new feature to move pages from one vma to another without split In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: p7gmf3bzzu9msdst48nih7f89p5uu5a3 X-Rspam-User: X-Rspamd-Queue-Id: 9635E1C000B X-Rspamd-Server: rspam06 X-HE-Tag: 1681373449-548609 X-HE-Meta: U2FsdGVkX18x6klOuvqaJyaYQ+0yfp/J9vn1ilwK3omNrd+U3u6KRsIy/saDtR20/FnT7eQVmFWiq1PHegSlYU5RvPpWypraqkfM35tSWgLoDsh0K+usgipvXv+eIS3gB9xPb+U+mdOQyL3ja4Ig2tzIdOmBRqNybxw3TmHOkWX5vzP8S24+LdzFED9hmnXDdEXgL+SbMU352pNY9eTGtU2B/am6LSNioELoOMHKiIy62sFdWu9Xxc6MrOeINKfbdEt8j4vjjQQ3Gv7p4BVyv8hg7yvlPlb0rNVg5OIe590iigEHdgEgcr4iTz8iOkgUtaJt4zbobxFQiC+/SXL87yRuhTHpXAX/sadpl1QTz4zMejXYk99ZxtdAkAoyi9lIFgvxXZd3iWlrDTzn1rZjysFbgYPN3T6AyyDSFyn2q2MnjwbuHoP0ZcXMsX4YHxORbwr/ZXYajN4HODuIhFEgwaY2S99NLJIoUgHvsyEQYwcV4PKRujHZapZDCtKk461OFJPy5UbEDASzTOrX4nK0nQj+XREd0Q4z54WMePD26xenwNmHz/WuyrisonddhPYivk4jAdgjSEHWAf2oDaMC8TmsE5OeDG8+jMV+fgfOwWXyXJ/hfvuJfdaMPCHxdoVUt8XfuP0wuUoDakIQlaKtKHPmFnXKcD90GP8Cwnyt2dMIiPDLseCxs23D71HCFUUfiP6CPioQxtUe+PuhhPF8DRrF/Y1D/59OSnuHCLYBOc3wArQnEJBRMAr+opOVwPuFXGWVrKn+FB6q61yss3XE1vsr6lcCaxC4L3ky2F/8jDVJvjIQ9p6pS3ykCLz+9RqWT1jYbAxiTLLypGLk8Jy6J7zXCPfZJScfZ8k6T0XLiJk6z0SZt2wA1ONbbqYrrG32KKAJ+1GFLJ8uTQI8a2iGMtAkh2YvDcydhPi1omlIBGeioAGJo1wZRpb62VilJhuW3mOi5zwF9uaTwr7YASn Nj+KhMSv +KDDdaKyZPVx5bOi5Kfmjxg2CZxxT0QBfy7F8YtMZMvLk9QrEBtRXHzZQ66t+mctM89GahYfh3AcAO7+eB7HNiAwSde2I5vFXC/9qzk0nd1fM5TuA4mmM+y9CcYxq0vmfOJsCGdfrqovsfZfETfO8O8UtOYnUnqPjty26Oz7nwfqmCqTu6yYBnr8gTW1LGRIBl1+h9be7K0lnyi+KedQQKwS/r+EJluosYLmJRq9ns9G/UNRqJoGhj6gV6tTvbWPmPSL5MhzZz+hX1RNazk8W9Oi4FNVz7aRW3/Al1SAunW21FML77eS1GTcs+Y1yI+l+9oZDb9vH1AByaLCb8F7VQGeSrtjHO4jV2mP9Gax+dehtee7qHKckC3bYdr8mfZGClhQLEHEyi5P1TAJxPvZIVF856PJ4IqL6BhV/k8+LSvC+LlczIf2Yq/09DvXqccJI6LvQ3jfO7h8Ok6lth8U/JSaRZNX5NrZhQCadBQjmqL1Au3U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12.04.23 17:58, Peter Xu wrote: > On Wed, Apr 12, 2023 at 10:47:52AM +0200, David Hildenbrand wrote: >>> Personally it was always a mistery to me on how vm_pgoff works with >>> anonymous vmas and why it needs to be setup with vm_start >> PAGE_SHIFT. >>> >>> Just now I tried to apply below oneliner change: >>> >>> @@ -1369,7 +1369,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, >>> /* >>> * Set pgoff according to addr for anon_vma. >>> */ >>> - pgoff = addr >> PAGE_SHIFT; >>> + pgoff = 0; >>> break; >>> default: >>> return -EINVAL; >>> >>> The kernel even boots without a major problem so far.. >> >> I think it's for RMAP purposes. >> >> Take a look at linear_page_index() and how it's, for example, used in >> ksm_might_need_to_copy() alongside page->index. > > From what I read, the vma's vm_pgoff is set before setup any page->index > within the vma, while the latter will be calculated out of the vma pgoff > with linear_page_index() (in __page_set_anon_rmap()). > > folio->index = linear_page_index(vma, address); > > I think I missed something, but it seems to me any comparisions between > page->index and linear_page_index() will just keep working for anonymous > even if we change vma pgoff to 0 when vma is mapped. > > Do you perhaps mean this is needed for ksm only? I really am not familiar > enough with ksm, especially when it's swapped out. I do see that > ksm_might_need_to_copy() wants to avoid reusing a page if anon_vma is setup > not for current vma, but I don't know when it'll happen. > > if (PageKsm(page)) { > if (page_stable_node(page) && > !(ksm_run & KSM_RUN_UNMERGE)) > return page; /* no need to copy it */ > } else if (!anon_vma) { > return page; /* no need to copy it */ > } else if (page->index == linear_page_index(vma, address) && > anon_vma->root == vma->anon_vma->root) { > return page; /* still no need to copy it */ > } > > I think when all these paths don't trigger (aka, we need to copy) it means > there's anon_vma assigned to the page but not the right one (even though I > don't know how that could happen..). Meanwhile I don't see either on how > vma pg_off affects this (and I assume a real KSM page ignores page->index > completely). I think you are right with folio->index = linear_page_index(vma, address). I did not check the code yet, but thinking about it I figured out why we want to set pgoff to the start of the VMA in the address space for anonymous memory: For RMAP and friends (relying on linear_page_index), folio->index has to match the index within the VMA. If would set pgoff to something else, we'd have less VMA merging opportunities. So your system might work, but you'd end up with many anon VMAs. Imagine the following: [ anon0 ][ fd ][ anon1 ] Unmap the fd: [ anon0 ][ hole ][ anon1 ] Mmap anon: [ anon0 ][ anon2 ][ anon1 ] We can now merge all 3 VMAs into one, even if the first and latter already map pages. A simpler and more common example is probably: [ anon0 ] Mmmap anon1 before the existing one [ anon1 ][ anon0 ] Which we can merge into a single one. Mapping after an existing one could work, but one would have to carefully set pgoff based on the size of the previous anon VMA ... which is more complicated So instead, we consider the whole address space as a virtual, anon file, starting at offset 0. The pgoff of a VMA is then simply the offset in that virtual file (easily computed from the start of the VMA), and VMA merging is just the same as for an ordinary file. -- Thanks, David / dhildenb