From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83CE0C77B7F for ; Sat, 20 May 2023 03:17:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9AA9900004; Fri, 19 May 2023 23:17:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4AAF900003; Fri, 19 May 2023 23:17:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C124B900004; Fri, 19 May 2023 23:17:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B22BD900003 for ; Fri, 19 May 2023 23:17:51 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5003914089E for ; Sat, 20 May 2023 03:17:51 +0000 (UTC) X-FDA: 80809173942.20.2F68686 Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) by imf14.hostedemail.com (Postfix) with ESMTP id 7B158100006 for ; Sat, 20 May 2023 03:17:49 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=joelfernandes.org header.s=google header.b=LaVArUpz; dmarc=none; spf=pass (imf14.hostedemail.com: domain of joel@joelfernandes.org designates 209.85.128.177 as permitted sender) smtp.mailfrom=joel@joelfernandes.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684552669; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MpCD1jtGuwu5J2PBZwRsJvmd7HPbIcqI3mtODbtvJa0=; b=tthPCOowBth6OtgfMz+I5MnFPJCxNHKMLfTHq46weIPv6tRx2As/vpMktDSn4EPgsTGPyd eZtHTaF+fabfnKtPwiadjr7ax1iC6upJjvD/A7SVhQKXjxfxkEYbEPo1Cs3dm7xavEO0/Z /KOTsC/rTlUoaLmzToS2T58m62wmDP4= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=joelfernandes.org header.s=google header.b=LaVArUpz; dmarc=none; spf=pass (imf14.hostedemail.com: domain of joel@joelfernandes.org designates 209.85.128.177 as permitted sender) smtp.mailfrom=joel@joelfernandes.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684552669; a=rsa-sha256; cv=none; b=ADUONj6e/2Rbp4jFXD7pLvqp7/c/nqVDkv82S4dQ3A6nThhzVaR4ePgl52osZa9gm0DJ+R gW3MJe50RhjLjRADRfCkO8hyGRxNW8jcXz1wU8tr7wuHS7gAPkJeEPa1GuI86pD9Wg1XHg ltwtu4FXR44W44jGo089dLH9FfrRniQ= Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-561d5a16be0so56273327b3.2 for ; Fri, 19 May 2023 20:17:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1684552668; x=1687144668; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MpCD1jtGuwu5J2PBZwRsJvmd7HPbIcqI3mtODbtvJa0=; b=LaVArUpzA/aA+gVeXaGSxBECo4G2Rf7ssuzQXn5M3i/GNDUeu1d28yPSn79uqV2+1I ZtLg1ix+cxOKdOR6y8PgUXNwkh7syZACcYLC1OtxJKfpzhiepiW8ZvsNFA8hp9VHwywt mh+HdkRD4FGGt2auwIsvwUuul9lqfSewlhzs8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684552668; x=1687144668; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MpCD1jtGuwu5J2PBZwRsJvmd7HPbIcqI3mtODbtvJa0=; b=bmB7+85HR4BF6/YluQyF0VJg1PLG86IZaOhFkbhD7jLAAaVUjc0+ioJX5tm6saR+Pm u95uIygtjkQocCLtkO1QaoFt7Cz3iTfW80w2ORVrZ9VvIpaP6dUxIsKkyo+vvpO/I+gu 525Bgis0lMnhej7soVxsNmVShAXPK/9Z/I4iCNG1io4lx0OxKdYd3FVEGmnYruneUmXg HXfxhNsdvWaAK1AjDmMdrpaKb4bH9qIPSFCw1YAY7OE6vXXtfObgd0fldfelOI47Z/Da o6S5kdqG7UinTWtn/ns8kginjxQx0oiE9ic8Dg5mIoH0kk0/AFvjxexxdO+z1G3fjG46 sVJw== X-Gm-Message-State: AC+VfDzK/kqr/bgf0VWxqnST/1x2kYnQoQIVwKfgNjXHyGEN5LxVnCmP M6yQdGRkhJQyeH/96gMQb3nci6yQ4lMNw2hKaSiO7Q== X-Google-Smtp-Source: ACHHUZ5tsf7hnMvLWVbHF1KHd68Wkr8LFV0a3I7RI0M0+ZPAQAzMG2ULQkC55IlU0+khIkjwG5XVm7xTlZnvP6FWyVA= X-Received: by 2002:a81:840a:0:b0:561:eb35:a660 with SMTP id u10-20020a81840a000000b00561eb35a660mr3941669ywf.1.1684552668397; Fri, 19 May 2023 20:17:48 -0700 (PDT) MIME-Version: 1.0 References: <20230519190934.339332-1-joel@joelfernandes.org> <20230519190934.339332-2-joel@joelfernandes.org> In-Reply-To: From: Joel Fernandes Date: Fri, 19 May 2023 23:17:37 -0400 Message-ID: Subject: Re: [PATCH v2 1/4] mm/mremap: Optimize the start addresses in move_page_tables() To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, Shuah Khan , Vlastimil Babka , Michal Hocko , Lorenzo Stoakes , Kirill A Shutemov , "Liam R. Howlett" , "Paul E. McKenney" , Suren Baghdasaryan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7B158100006 X-Stat-Signature: g5z9byknt1cxsz56fhu8y8kzacrs8nqo X-Rspam-User: X-HE-Tag: 1684552669-896604 X-HE-Meta: U2FsdGVkX18piN8ssiVpNHpvzi4NbJK7XYnQTzxLwO8XVYCjG3gU7AUpFgBadfzzZUFhb7ngGCDl6DU+9mnv1U0SoMrQHrwYJFzAcC1F5NoI4lPlx5Ku2QRAyDr71dYO83v5XbtiB/XvuqiEV38lzrSVaHlEdJJUVTwaQZ+I3xCaOb+jpqvonyWtGjciBcVIC9sH3Q8fVS7U6p8bRRGoG3QJTku4IU+jIfcTKS4FZwJNGBYb8t0NOP85WER1/+JL8ZFiQeFHPagdj6aKqfda0TKMfRUNFOkB+O507LJIvs+OvcSCjAhXcYqAyiJRp1pChpb6T6bxfcFDZ1qj/DeJaHlY95h3lRQQGccSnqwrsCs9TwkmcumO8Ftnj+Tbj2/aQkLtFsIPcHw6LDrPZAfOZ8lMvIw4T+py0UgXht1w13i2UjGWzJ2R7SkyGbgOzFA5GOSA6Zu+8Bw1tPH3QkYi3WaADEBfb4goRxlm0C7jmBWJt6mg+TiA1d3QeWZSyZZUFUJFpePqLARs6BqDztb9U0B7K+Xkv2hqq8QU+p3wIgSAjRdmakfxjs0iGSa1RYs1Ru005TFE/t8Xw4pAzGZKeccDt1FW8bJ5c2qQ4TDqSVVgAxRvsyIsKojZI1/PJurWfOUZUawHuITIPybSUaWNrTW7rKtJvZ22wqmrP2GO/JfZil+APvxaCe1j9iUr190HWM4TrRyoKzBYUkSx3W/2NlyZeU+odAHW2c8OO/9kKln7FRFgGWK8Asu7gjsQ99XnMM5FNVrqGuciR6J/+QW8shcTPx0+1rkhd1PL2UrHgd64AKYLG78VfI1NyJhglW7JuwDcivS4JeAR6LPCcz8OSZZTmV1EgDB4oQGK3jTnVVH6J58YcSgk5D4lY4hs2L2ytJGSFqX5OjOX9XlB954nSxu217t/KULmtq6SAXrgVDK9tCQaaQkZ++AVeKI0crw4IKcKiADWK8egsFhSDPH eZhYU+cL phdVypJfSfrZPLJgs/7gQm6/65wnT+/8i68C2dzM/054ANx+D/ww5V072cK8UkIdjsKi4yEBUzpkZSOlEVwSY7EQzkt8DrKDiXZWnFr4JQNDL/seOvrwUcMX5UYP0qGJr/j9f5HhNnYk70jWFAGe+t1ezjr8bVPJBkL3ej4hbjncNrGsLc3GMNwFpQmFijrr7qBM15awgt5O4aNi+tj7hcTLq8ibEN7Gh4Z64pJztGyR+xFwKnVEVTE30SiL1B2gG4UJ0LpG8WYjxa8qScvv6zmtdR0f0Po3ulCBw8YooWJYVjBiiT29zwFDiS/Ri7YG6lBqKbHhO/IfM1BkiOPCtmmd1CWkflnmgo0qjOisaQc/HfW9Xdwn3gZCnpHddIzn5RiWSgJpgeyPNffM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Linus, On Fri, May 19, 2023 at 10:34=E2=80=AFPM Linus Torvalds wrote: > > On Fri, May 19, 2023 at 3:52=E2=80=AFPM Joel Fernandes wrote: > > > > > > I *suspect* that the test is literally just for the stack movement > > > case by execve, where it catches the case where we're doing the > > > movement entirely within the one vma we set up. > > > > Yes that's right, the test is only for the stack movement case. For > > the regular mremap case, I don't think there is a way for it to > > trigger. > > So I feel the test is simply redundant. > > For the regular mremap case, it never triggers. Unfortunately, I just found that mremap-ing a range purely within a VMA can actually cause the old and new VMA passed to move_page_tables() to be the same. I added a printk to the beginning of move_page_tables that prints all the a= rgs: printk("move_page_tables(vma=3D(%lx,%lx), old_addr=3D%lx, new_vma=3D(%lx,%lx), new_addr=3D%lx, len=3D%lx)\n", vma->vm_start, vma->vm_end, old_addr, new_vma->vm_start, new_vma->vm_end, new_addr, len); Then I wrote a simple test to move 1MB purely within a 10MB range and I found on running the test that the old and new vma passed to move_page_tables() are exactly the same. [ 19.697596] move_page_tables(vma=3D(7f1f985f7000,7f1f98ff7000), old_addr=3D7f1f987f7000, new_vma=3D(7f1f985f7000,7f1f98ff7000), new_addr=3D7f1f98af7000, len=3D100000) That is a bit counter intuitive as I really thought we'd be splitting the VMAs with such a move. Any idea what am I missing? Also, such a usecase will break with my patch as we may accidentally overwrite parts of a range that were not part of the mremap request. Maybe I should just turn off the optimization if vma =3D=3D new_vma, however that will also turn it off for the stack move so then maybe another way is to special case stack moves in move_page_tables(). So this means I have to go back to the drawing board a bit on this patch, and also add more tests in mremap_test.c to test such within-VMA moving. I believe there are no such existing tests... More work to do for me. :-) > And for the stack movement case by execve, I don't think it matters if > you just were to change the logic of the subsequent checks a bit. > > In particular, you do this: > > /* If the masked address is within vma, there is no prev > mapping of concern. */ > if (vma->vm_start <=3D addr_masked) > return false; > > /* > * Attempt to find vma before prev that contains the address. > * On any issue, assume the address is within a previous mapping. > * @mmap write lock is held here, so the lookup is safe. > */ > cur =3D find_vma_prev(vma->vm_mm, vma->vm_start, &prev); > if (!cur || cur !=3D vma || !prev) > return true; > /* The masked address fell within a previous mapping. */ > if (prev->vm_end > addr_masked) > return true; > > return false; > > And I think that > > if (!cur || cur !=3D vma || !prev) > return true; > > is actively wrong, because if there is no 'prev', then you should return = false. During my tests, I observed that there was always an existing, unrelated memory mapping present prior to the new memory region allocated by mmap. Based on this observation, I concluded that if there is no previous mapping (i.e., if prev is NULL), it indicates a potential issue with find_vma_prev(). Therefore, I designed this function to return here indicating that the masked address is not suitable for optimization, whenever prev is NULL. That's obviously confusing so I'll try to rewrite this part of the patch a bit better with appropriate comments. > So I *think* all of the above could just be replaced with this instead: > > find_vma_prev(vma->vm_mm, vma->vm_start, &prev); > return prev && prev->vm_end > addr_masked; > > because only if we have a 'prev', and the prev is into that masked > address, do we need to avoid doing the masking. > > With that simplified test, do you even care about that whole "the > masked address was already in the vma"? Not that I can see. > > And we don't even care about the return value of 'find_vma_prev()', > because it had better be 'vma'. We're giving it 'vma->vm_start' as an > address, for chrissake! > > So if you *really* wanted to, you could do something like > > cur =3D find_vma_prev(..); > if (WARN_ON_ONCE(cut !=3D vma)) > return true; > > but even that WARN_ON_ONCE() seems pretty bogus. If it triggers, we > have some serious corruption going on. > > So I stil find that whole "vma->vm_start <=3D addr_masked" test a bit > confusing, since it seems entirely redundant. > > Is it just because you wanted to avoid calling "find_vma_prev()" at > all? Maybe just say that in the comment. Yes exactly, I did not want to run find_vma_prev() unnecessarily. I will add such clarifications in the comments. Thanks for all the comments so far, I will continue to work on this. - Joel