From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 481B2EB64DB for ; Tue, 20 Jun 2023 22:00:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7ACD8D0002; Tue, 20 Jun 2023 18:00:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2AB08D0001; Tue, 20 Jun 2023 18:00:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACAFC8D0002; Tue, 20 Jun 2023 18:00:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9B8388D0001 for ; Tue, 20 Jun 2023 18:00:08 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4A1C8160673 for ; Tue, 20 Jun 2023 22:00:08 +0000 (UTC) X-FDA: 80924494896.08.DAFD014 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf22.hostedemail.com (Postfix) with ESMTP id 4ACB1C002B for ; Tue, 20 Jun 2023 22:00:05 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="Y/NP7+OR"; spf=pass (imf22.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687298406; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zzT9JowwR4WrNgD4+M6vydnUzUQdWJgu/dR3PcW05zQ=; b=2dRex8CM4Q9ErP3EKdWHukNeBfC/jvWnwnKJOtpMKi/tg8YD7IMJ2DWSU+wuDGi/5UMzGv +TCuqH0In8OnIMkZjpw2I/+RQC8tsKClY4sJhEqZvf+CzIikgtCC6PBjDxVSm6+oPZKqVL HGxCrCQ+oZJEeFSZk8NoH1YYYsONGro= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687298406; a=rsa-sha256; cv=none; b=Bhmj9TDp1LU4jucUO+RXCQdUPVVR3DjU7BgxMMPeFlA8jD2yk1/2OvT0/7U01bDIGsHowZ P+sk+m2Dxoe78aeRf2ajf5D3hGomGyBfNlBrHdm96emI/X41pUD+s9dcntreVJptGy77fu SOU/PBjtJzDhRo34Tdnn8lmzjCzoI3U= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="Y/NP7+OR"; spf=pass (imf22.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-3f918922954so28851945e9.2 for ; Tue, 20 Jun 2023 15:00:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687298404; x=1689890404; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=zzT9JowwR4WrNgD4+M6vydnUzUQdWJgu/dR3PcW05zQ=; b=Y/NP7+OR5ppHjZromXh4ABh23CjtUmmGZm0UF8X3bB93LEN07hvkvHjDemIdK7EWO9 PtvJQ4y+VvWoZIifE9BJ0VbH8kv3Y9wp47vQ0gn3orbpBzHrVbpIXKfkFwvn7sGTMv3Z WG9/IZaGkr7F0TY00yxemdkqNWQD2acLpE82OuuRoti4Bg87NDA37Ws7FbXwwNZ00GRJ ZpJTwK497uxRBeFzHL4CD7IWZS53tJ4BSY0OgMy50JWWi2v97hGV3MKYN20XcXU1VP9u 9+rbcxzPPi9vWv6+3sNfVHHl3igrMOthEDXBRl73PwG1OyXSavMNqKohMcQjQq7sYaI8 7Lzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687298404; x=1689890404; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=zzT9JowwR4WrNgD4+M6vydnUzUQdWJgu/dR3PcW05zQ=; b=ay3NeM1oI1j/QumeqtKM9foaGHkk233rWWecXsgNDPRif5MH4L72NuCl9mQp2PNjDw dlIoiMA8FUq4BN9b0HjE+dRzBeiUdCdL7PwR5tTfvu/+AdxOromsBfS/MLHVFwtisufu ZM2BggjSPGLvqWqUHIfPedVrn0gvIU0IVYy+5HJwA5Woz6hPpii+hSPxz1aPEJZBR1EV S0IU/6pTuMSdxk69iXH0HJJ8AJlG+OyNmvRLll037PXI6lSluKLBZDbqmpiuzBqfAUYR BqAWQHZ323LQ566XJ6yjbHK1zMZMl0ICQErXz2lTRrH9Ml9fYfyTMgSdpGL5VthHMQjI H71g== X-Gm-Message-State: AC+VfDy6DMeicee1YCjDUREoP2Ou62cBcC7SX3aUIaOsDTFgyKJpm3kM ERM2aTyCTVwav9rvMpf1XVg= X-Google-Smtp-Source: ACHHUZ6IICTm9FTjV1sa6qUnPvWEPwisMhdtmi3Vz/I/iOyMjeVt36OvYAGMcFnUihK7yN/ctscF1Q== X-Received: by 2002:a5d:644d:0:b0:30e:3d3d:4329 with SMTP id d13-20020a5d644d000000b0030e3d3d4329mr9297009wrw.35.1687298404265; Tue, 20 Jun 2023 15:00:04 -0700 (PDT) Received: from localhost ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.gmail.com with ESMTPSA id j18-20020adff012000000b0030e56a9ff25sm2850780wro.31.2023.06.20.15.00.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 15:00:03 -0700 (PDT) Date: Tue, 20 Jun 2023 23:00:02 +0100 From: Lorenzo Stoakes To: Joel Fernandes Cc: linux-kernel@vger.kernel.org, Linus Torvalds , linux-kselftest@vger.kernel.org, linux-mm@kvack.org, Shuah Khan , Vlastimil Babka , Michal Hocko , Kirill A Shutemov , "Liam R. Howlett" , "Paul E. McKenney" , Suren Baghdasaryan , Kalesh Singh , Lokesh Gidra , Vineeth Pillai Subject: Re: [PATCH v4 1/7] mm/mremap: Optimize the start addresses in move_page_tables() Message-ID: <28e641d6-37ba-44a7-b7a8-76e797fab1f7@lucifer.local> References: <20230531220807.2048037-1-joel@joelfernandes.org> <20230531220807.2048037-2-joel@joelfernandes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: owf18ge6yts9bm7585wc9t689g65eeak X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 4ACB1C002B X-Rspam-User: X-HE-Tag: 1687298405-658277 X-HE-Meta: U2FsdGVkX1/U6ZYdjKxPaEYWEaPyfpONyuoddWIDA8MQ22gc/WZnyHplI0vVV6TNjp+BGsltoa9QEeOiEOrLhqzBEkztsoUFVNJ46ahtoMbpn9bJEStgG2xU0iEABM8SQgMAM3TtiUoxC4ia7ymvugVCLSebiFQrbDxXHiZ4v7WFhwq97wXWfstx8qr6b0o/S9ZGB10AOZBioDqYwXen6UGuEev0WuoKkprWilXtvF9fwX5tSvnkamYUO/LWAhFJUumDHLjabI+Ni2Vm6kh8X3jAP4iRdun3QU+WsRVB1Xq6JKwYjXWU1bbO6pKh5ceF1cJPUEW6UsEkIVk1RPLY2+HnyNf+Hccp7rtcaahtUozWhVB4ZqFnA+8Hp6BH0zkgJsWCAOWb+CYAM5vTFeRz7OSY6hHWFgh5j+7PjVq7iDqnsh07hPUlkU/pfY2+knDGoNgXHk+Y34jtuFbkcQ8Mdodi+Cq5coFs+GxppzEzS5jGyhcG20eXRlpYCyMM6LaYnUn+4pVdpD9CuAPY6Yw82t0RHUMIRxQm5mt69uGNCi4Jr2ZKzikLJh22mjfoKIPxj2ALX0aTwkIJ3jJwWAqoDFDkiDsSTzgUwVTMCIkl2JVJbXzL+2KlMmdBiTVVKua0wq4T6W10g8DEIs2PdPGt7MVJxcRfZRS9G5jWSB8+odg7sh8BrXH2gtmzTUfpQEHuth3+ofTbxJsk+yAok9eBtsMPZGc9FHQPDm9qjA9LoqSipPhX53XczK019/5qJ9/PgvRoS/tm4ejgMT+1PUO5cglbVZywMlkhOQmwDTgN2IqzMA/gftWvMTvJid0ivN+xLpAXi5OZBPVjKzNNCUJSVgWmF22yvL34FQ8xKZUUhlH269vn7Y+dq45DyvkjLcKrqFt13wyfNpqfELZC+GEMZ3Dpysmu3mTg0R7fDabDIqXdxtYg9EnKK2n4smKGqb8zUxO6TYdqm3AmkvI+YAw 61tYZeMo 9dg3pNq3QaPRdw38W0L7l9EyxV/fStok3H8K8W+XFodAcTg8Hvl8TftehVL/QPY2vnig9dhm9QG03Dc3psT2D7oAwIW/ZvmPBb0Dqfm3BLa7NHbutiY7t2cxEZ7s3UIx0o2HefLjEqXQ0FTZD7/oGj7HaUr1muilmFZBPDXWp6U1oCCoaeaUWYFzFCFpV3SEDmAuDjssJ9alJHEJGgDiD17cumiJ+hUnJt3IDwDJmwnQCuUrWy78odomD6bN1mCshV8Wvc8FoaCCtLA406r8jdYYDMVHrB3aLrM9pKUQjIlUXs5c1HsoNE7SkhfjXhkCPOS/WhVd8hyhP1p8nB9IDqzzV3ubgxH098s7ropMMCAgbFDdznrB/QxWisBi8AesAWUbhMrKAMCnnK5EfZLxYNw5Hf9No5Z75MJML28B5h1K3NVTRdLkAz+gWCcYEU1C8YEvB X-Bogosity: Ham, tests=bogofilter, spamicity=0.034247, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 20, 2023 at 05:16:57PM -0400, Joel Fernandes wrote: > Hi Lorenzo, > > > As far as I can tell, find_vma_prev() already does a walk? I mean this is > > equivalent to find_vma() only retrieving the previous VMA right? I defer to > > Liam, but I'm not sure this would be that much more involved? Perhaps he > > can comment. > > > > An alternative is to create an iterator and use vma_prev(). I find it > > extremely clunky that we search for a VMA we already possess (and it's > > previous one) while not needing the the former. > > > > I'm not hugely familiar with the maple tree (perhaps Liam can comment) but > > I suspect that'd be more performant if that's the concern. Either way I > > would be surprised if this is the correct approach. > > I see your point. I am not sure myself, the maple tree functions for both > APIs are indeed similar. We already have looked up the VMA being aligned > down. If there is a way to get the previous VMA quickly, given an existing > VMA, I can incorporate that change. > > Ideally, if I had access to the ma_state used for lookup of the VMA being > aligned down, I could perhaps reuse that somehow. But when I checked, that > seemed a lot more invasive to pass that state down to these align functions. > > But there is a merit to your suggestion itself in the sense it cuts down a > few more lines of code. Yeah it's thorny, the maple tree seems to add a separation between the vmi and vma that didn't exist previously, and you'd have to thread through the mas or vmi that was used in the first instance or end up having to rewalk the tree anyway. I have spesnt too much time staring at v6 code where it was trivial :) I think given we have to walk the tree either way, we may as well do the find_vma_intersection() [happy to stand corrected by Liam if this turns out not to be less efficient but I don't think it is]. > > > > Considering this, I would keep the code as-is and perhaps you/we could > > > consider the replacement with another API in a subsequent patch as it does > > > the job for this patch. > > > > See above. I don't think this kind of comment is helpful in code > > review. Your disagreement above suffices, I've responded to it and of > > course if there is no other way this is fine. > > > > But I'd be surprised, and re-looking up a VMA we already have is just > > horrid. It's not really a nitpick, it's a code quality issue in my view. > > > > In any case, let's please try to avoid 'if you are bothered, write a follow > > up patch' style responses. If you disagree with something just say so, it's > > fine! :) > > I wasn't disagreeing :) Just saying that the find_vma_prev() suggested in a > previous conversation with Liam fixes the issue (and has been tested a lot > in this series, on my side) so I was hoping to stick to that and we could > iterate more on that in the future. > > However, after taking a deeper look at the maple tree, I'd like to give the > find_vma_intersection() option at least a try (with appropriate attribution > to you). Cheers appreciate it, sorry to be a pain and nitpicky but I think if that works as well it could be quite a nice solution. > > Apologies if the response style in my previous email came across badly. That > wasn't my intent and I will try to improve myself. No worries, text is a sucky medium and likely I misinterpreted you in any case. We're having a productive discussion which is what matters! :) > > [..] > > > > > > + realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK); > > > > > + } > > > > > + > > > > > if (is_vm_hugetlb_page(vma)) > > > > > return move_hugetlb_page_tables(vma, new_vma, old_addr, > > > > > new_addr, len); > > > > > @@ -565,6 +619,13 @@ unsigned long move_page_tables(struct vm_area_struct *vma, > > > > > > > > > > mmu_notifier_invalidate_range_end(&range); > > > > > > > > > > + /* > > > > > + * Prevent negative return values when {old,new}_addr was realigned > > > > > + * but we broke out of the above loop for the first PMD itself. > > > > > + */ > > > > > + if (len + old_addr < old_end) > > > > > + return 0; > > > > > + > > > > > > > > I find this a little iffy, I mean I see that if you align [old,new]_addr to > > > > PMD, then from then on in you're relying on the fact that the loop is just > > > > going from old_addr (now aligned) -> old_end and thus has the correct > > > > length. > > > > > > > > Can't we just fix this issue by correcting len? If you take my review above > > > > which checks len in [maybe_]realign_addr(), you could take that as a > > > > pointer and equally update that. > > > > > > > > Then you can drop this check. > > > > > > The drawback of adjusting len is it changes what move_page_tables() users > > > were previously expecting. > > > > > > I think we should look at the return value of move_page_tables() as well, > > > not just len independently. > > > > > > len is what the user requested. > > > > > > "len + old_addr - old_end" is how much was actually copied and is the return value. > > > > > > If everything was copied, old_addr == old_end and len is unchanged. > > > > Ah yeah I see, sorry I missed the fact we're returning a value, that does > > complicate things... > > > > If we retain the hugetlb logic, then we could work around the issue with > > that instance of len by storing the 'actual length' of the range in > > a new var actual_len and passing that. > > > > If we choose to instead just not do this for hugetlb (I wonder if the > > hugetlb handling code actually does the equivalent of this since surely > > these pages have to be handled a PMD at a time?) then we can drop the whole > > actual_len idea [see below on response to hugetlb thing]. > > Thanks. Yes, you are right. We should already b good with hugetlb handling > as it does appear that hugetlb_move_page_tables() does copy by > huge_page_size(h), so the old_addr should already be PMD-aligned for it to > be able to do that. Cool, in which case we can drop the actual_len idea as this is really the only place we'd need it. > > [..] > > > > > > return len + old_addr - old_end; /* how much done */ > > > > > } > > > > Also I am concerned in the hugetlb case -> len is passed to > > > > move_hugetlb_page_tables() which is now strictly incorrect, I wonder if > > > > this could cause an issue? > > > > > > > > Correcting len seems the neat way of addressing this. > > > > > > That's a good point. I am wondering if we can just change that from: > > > > > > if (is_vm_hugetlb_page(vma)) > > > return move_hugetlb_page_tables(vma, new_vma, old_addr, > > > new_addr, len); > > > > > > to: > > > if (is_vm_hugetlb_page(vma)) > > > return move_hugetlb_page_tables(vma, new_vma, old_addr, > > > new_addr, old_addr - new_addr); > > > > > > Or, another option is to turn it off for hugetlb by just moving: > > > > > > if (len >= PMD_SIZE - (old_addr & ~PMD_MASK)) > > > realign_addr(...); > > > > > > to after: > > > > > > if (is_vm_hugetlb_page(vma)) > > > return move_hugetlb_page_tables(...); > > > > > > thanks, > > > > I think the actual_len solution should sort this right? If not maybe better > > to be conservative and disable for the hugetlb case (I'm not sure if this > > would help given you'd need to be PMD aligned anyway right?), so not to > > hold up the series. > > > > If we do decide not to include hugetlb (the endless 'special case' for so > > much code...) in this then we can drop the actual_len idea altogether. > > > > (Yes I realise it's ironic I'm suggesting deferring to a later patch here > > but there you go ;) > > ;-). Considering our discussion above that hugetlb mremap addresses should > always starts at a PMD boundary, maybe I can just add a warning to the if() > like so to detect any potential? > > if (is_vm_hugetlb_page(vma)) { > WARN_ON_ONCE(old_addr - old_end != len); > return move_hugetlb_page_tables(vma, new_vma, old_addr, > new_addr, len); > } > Yeah looks good [ack your follow up that it should be old_end - old_addr], maybe adding a comment explaining why this is a problem here too. > > Thank you so much and I learnt a lot from you and others in -mm community. No worries, thanks very much for this patch series, this is a nice fixup for a quite stupid failure we were experiencing before with a neat solution. Your hard work is appreciated! > > - Joel >