From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7EF0C6FD1F for ; Sat, 25 Mar 2023 16:33:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9B526B0072; Sat, 25 Mar 2023 12:33:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4B876B0074; Sat, 25 Mar 2023 12:33:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AED6D6B0075; Sat, 25 Mar 2023 12:33:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 996B76B0072 for ; Sat, 25 Mar 2023 12:33:27 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5688F1A020D for ; Sat, 25 Mar 2023 16:33:27 +0000 (UTC) X-FDA: 80607966054.02.B8816F7 Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf09.hostedemail.com (Postfix) with ESMTP id 7A86B14000C for ; Sat, 25 Mar 2023 16:33:25 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=joelfernandes.org header.s=google header.b=LmxnjU8y; spf=pass (imf09.hostedemail.com: domain of joel@joelfernandes.org designates 209.85.219.42 as permitted sender) smtp.mailfrom=joel@joelfernandes.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679762005; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=L+IGcq5whZ5cIGpFrj3EMqmDdJU1miHHxTnqhKHWiYM=; b=SaKUilLV/LDQHFScAd6llZLS5yzEiUcNY/DsWt7TJEb1cPt0HlTUH441efmFHFg7u/QIwQ hKp2OfyO0fOE1mdLo9JtM+q/acGRM97gOKtFCaU28HOITdPvHwlLtapxJYEHCHOXvJ+z+R eqrHCihEHpyCX/wqlnNGLAH3NVDOsV0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=joelfernandes.org header.s=google header.b=LmxnjU8y; spf=pass (imf09.hostedemail.com: domain of joel@joelfernandes.org designates 209.85.219.42 as permitted sender) smtp.mailfrom=joel@joelfernandes.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679762005; a=rsa-sha256; cv=none; b=hygpwyRQEvfjab7J/ZR157uA+ZKqnsYCmd4fh2XOEI+/3ISOUI4dbNXBLEbWTabKzkEj7X jILSwIpI0X1kb0BlwDrfAXIbz/a5Qwr8NK2bUUdKw+GmLKEO5Hi0l6xgyerB8hliJTl8qm XHBXzmXGoQ57kHkbe2lF3Zn8FyTZ//I= Received: by mail-qv1-f42.google.com with SMTP id 31so3772356qvc.1 for ; Sat, 25 Mar 2023 09:33:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1679762004; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=L+IGcq5whZ5cIGpFrj3EMqmDdJU1miHHxTnqhKHWiYM=; b=LmxnjU8yJBNlf2YBqtna34dmEJA0ILPFrF6R6g4031+0eswNuQ2DJ3Xy2kXJTfJ339 JIODLbtiSEyzOf+Zn+RcdIpdPahLRSzGK5IEwDIXKS4ooUZC2zPhUcIUaQNORZJYhHfK SpaUoJ7VV6nrkXRyu8VMcGmqPfiy9YSjG8upg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679762004; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L+IGcq5whZ5cIGpFrj3EMqmDdJU1miHHxTnqhKHWiYM=; b=GgnOG/5pWOG0hHY66sJcU4YQ9CDhXgLwwCjTvsmDpNfuinCRDmHZEUMS1nUv1n3yXP DIHsknSrxSlrePIWD90AVAc6tNbKeQLIxMgLu6wylpYPuSK6kNFv35TMGuIZIprt2ZNK YU9eXY+FhrQ6Nv/tFN9ry1oL7DTd3kT0oK0k9KqkIRtCWeQYjlHsiIkmUj9IkZ0xSUxR BCrhCZcT2bpxg5hwxZ3pZOr8cb54iHy84ShQziUv5UcC5WhQRPX+YYEdSodquY2yPvLQ IciQC+wPwOJvbwQcJp3I4T4hUyhfCDIkCf7fF38W8/PGmfN8XYfE2tQ4sN+Ywpnt0uLu OrRA== X-Gm-Message-State: AAQBX9fX1dP7f7gwWBsRG0MxKjRNirsRcN2qnmZ7njQ48jI+hunX/kYr fstKrRypQxX79ZT1EIwH7U2Umg== X-Google-Smtp-Source: AKy350bgIokcJSqtEpxPRvLjwT5rK/qhjYoPwIA2RTUAyMR6SXuHmSeZThiEHWx3c5UnGOVNVYB5nQ== X-Received: by 2002:a05:6214:d84:b0:5a6:9028:7b89 with SMTP id e4-20020a0562140d8400b005a690287b89mr10319487qve.35.1679762004470; Sat, 25 Mar 2023 09:33:24 -0700 (PDT) Received: from localhost (129.239.188.35.bc.googleusercontent.com. [35.188.239.129]) by smtp.gmail.com with ESMTPSA id eh13-20020a056214186d00b005dd8b9345fasm1551290qvb.146.2023.03.25.09.33.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Mar 2023 09:33:23 -0700 (PDT) Date: Sat, 25 Mar 2023 16:33:23 +0000 From: Joel Fernandes To: Linus Torvalds Cc: "Kirill A. Shutemov" , Michal Hocko , Naresh Kamboju , Andrew Morton , linux-mm@kvack.org, LKML Subject: Re: WARN_ON in move_normal_pmd Message-ID: <20230325163323.GA3088525@google.com> References: <20230324130530.xsmqcxapy4j2aaik@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 7A86B14000C X-Stat-Signature: yp6yfdh7ttdmb8gdtujoaduggdyugtdj X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1679762005-974816 X-HE-Meta: U2FsdGVkX1/g8VOqLynepyTAN6bzM1PD7cwWxWyuvEkVHzhPpagdutIKTAEvjy78ycyH9PU7zYZybl8imL+RYLQeVDlBZFLlMyoQ3ZaSwDSArOu/BkicWgMjZm+A6NMPwfaQlvzLK6r+d0z0LUIMvrpWovOpEdz6ix6IvT9zv37jRsrnogB2fCtQsQg7f92v6svQVBkAyM6d6g0luugAbi0pakLpUCJ7qCfsGM0ecIdZRz9Hxa59ikzoikaUvpk2zuh7YxV1oPsmmCglIPH+X9ix/sQkCHdBjSoCG+qMmvQ51JUDH13waT/HlmxlUoV8oWe2I75o3CETYHNjxxVQ56CXlWQ1UDi3oqVVhoh4NYoydadJOlT13qUJpGeI0fGkdAvbqtua9n/1lEqKsLljVTLh4l46ENvMk9m3VuYZjBguFuF/9e0RBw+lHSUU6qdmCmdPoY3l17T6WO6XmPNiIMgm0qc/Vcw0PMKgMjM3Ax7gQFEd/ZKJFi6qINaqOMjcG4sa1pMBZ81/iwMzE8HtyfXy7Q67cBKNcuw/Ou9gmuHkaBHPMZ1NrOD6vHE7EmRWmj79S5BZh7I5G2FnkfmI3hQWsNvu9NDvSmS0ZwxQsitpQF7x83MNFuMGSUJWA0sTY1++aJ+LZCR53TWoniKb2vEw02y6vW3ONwVjmawnk4eDXoHiWA0HopaKavrdC4nBO9XD8ER/JAvPcFvTh9RyZROaqxkrN7uG5OrDWOcLlGU0JGsIeqiRyyaofO3QOEhs3W9pkP32GEy7fpThudnUnNUU8jKjAAgC3HpA5Jn/GZ4avEiwAp4wFS0c+ka67T2O88sEsa8gLrYbTDhnsM7ozvxnCOfSPIWuFSfNIvdUfDo8fPlUiWGo99yyW5u1A8FLibFHL/0l+fyHS4h9/cxEbQ4WTLrxLTLmyBcaJHPp22hdVencb58jWgBtpxadEjz5ukfo3q1G10dS5nmAWAH WtGfSkdD KduCKPUmdJUr+777OtYCKb5CUqwEbQq4yOSKP2cxb6K5iX/Ck0CcAI2wTJKg5ggc2DL5A6+veCTF/dEpupl1/an55cBKC/WbB/xohvJJ5c7L3nEA6I1CHMXIamNjZFTWabIZaZfa5NKnwB5PZjtqlQiDuiuYG5dDbxe0KOMR5dHhFdUvTBtoB4uI2ltCtv5Am+G5V4scZCCQfkPqX3VNGKETYB3Veevr9RAWTP0ei+WD//YGj0A4rLLRCefA2wQVLfGpikH/TZIECSO7mF+vqZ/cFZAsZ6UOw3noT26XGampPOKlzibafjYewHQGXSbMeGT/BE4H8v+w9+AnrQzVbUzl/CF/zPyOrPD7auApHsP66HUgAqqnVs0Lxk9yGUYiDKh2jIQVUZE/hug0QvelcHdv78VtoX7GqFfyrApaMO/NTRxmohsp15ufHqA9bsfhLcUkFu+Nzb1g56WpTn0SDEchD3HGAn0a++By5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Linus, On Fri, Mar 24, 2023 at 04:38:03PM -0700, Linus Torvalds wrote: > On Fri, Mar 24, 2023 at 6:43 AM Joel Fernandes wrote: > > > > Wouldn't it be better to instead fix it from the caller side? Like > > making it non-overlapping. > > I wonder if we could just do something like this in mremap() instead > > - if old/new are mutually PMD_ALIGNED > > - *and* there is no vma below new within the same PMD > > - then just expand the mremap to be PMD-aligned downwards > > IOW, the problem with the exec stack moving case isn't really that > it's overlapping: that part is fine. We're moving downwards, and we > start from the bottom, so the moving part works fine. > > No, the problem is that we *start* by moving individual pages, and > then by the time we've a few pages down by a whole PMD, we finish the > source PMD (and we've cleared all the contents of it), but it still > exists. > > And at *that* point, when we go and start copying the next page, we're > suddenly fully PMD-aligned, and now we try to copy a whole PMD, and > then that code is unhappy about the fact that the old (empty) PMD is > there in the target. > You are very right. I am able to also trigger the warning with a synthetic program that just mremaps a range and moves it in the same way that triggers this issue, however I had to hack the kernel to prevent mremap from erroring out if ranges overlap (unlike exec, mremap does some initial checks for that). Also had to do other hacks but I did reproduce it consistently. The issue is that even though the PMD is empty, it is allocated. So pmd_none() is kind of a lie in some sense, it is pointing to empty PTEs when the range is really empty. How about we replace the warning with something like the following? + if (unlikely(!pmd_none(*new_pmd))) { + // Check if any ptes in the pmd are non-empty. Doing this here + // is ok since this is not a fast path. + bool pmd_empty = true; + unsigned long tmp_addr = new_addr; + pte_t* check_pte = pte_offset_map(new_pmd, new_addr); + + new_ptl = pte_lockptr(mm, new_pmd); + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + for (; tmp_addr < old_addr + PMD_SIZE; check_pte++, tmp_addr += PAGE_SIZE) { + if (!pte_none(*check_pte)) { + pmd_empty = false; + break; + } + } + WARN_ON_ONCE(!pmd_empty); + spin_unlock(new_ptl); + } + > And for all of this to happen, we need to move things by an exact > multiple of PMD size, because otherwise we'd never get to that aligned > situation at all, and we'd always do all the movement in individual > pages, and everything would be just fine. > > And more importantly, if we had just *started* with moving a whole > PMD, this also wouldn't have happened. But we didn't. We started > moving individual pages. > > So you could see the warning not as a "this range overlaps" warning > (it's fine, and happens all the time, and we do individual pages that > way quite happily), but really as a "hey, this was very inefficient - > you shouldn't have done those individual pages as several small > independent invidual pages in the first place" warning. > Exactly. > So some kind of > > /* Is the movement mutually PMD-aligned? */ > if ((old_addr ^ new_addr) & ~PMD_MASK == 0) { > .. try to extend the move_vma() down to the *aligned* > PMD case .. > } > I actually didn't follow what you meant by "mutually PMD-aligned". Could you provide some example address numbers to explain? AFAIK, only 2MB aligned memory addresses can be directly addressed by a PMD. Otherwise how will you index the bytes in the 2MB page? You need bits in the address for that. > logic in move_page_tables() would get rid of the warning, and would > make the move more efficient since you'd skip the "move individual > pages and allocate a new PMD" case entirely. > > This is all fairly com,plicated, and the "try to extend the move > range" would also have to depend on CONFIG_HAVE_MOVE_PMD etc, so I'm > not saying it's trivial. > > But it would seem to be a really nice optimization, in addition to > getting rid of the warning. > > It could even help real world cases outside of this odd stack > remapping case if users ever end up moving vma's by multiples of > PMD_SIZE, and there aren't other vma's around the source/target that > disable the optimization. > > Hmm? Anybody want to look into that? It looks hairy enough that I > think that "you could test this with mutually aligned mremap() > source/targets in some test program" would be a good thing. Because > the pure execve() case is rare enough that using *that* as a test-case > seems like a fool's errand. > Just to mention, mremap errors out if you try to move overlapping ranges because this in mremap_to(): /* Ensure the old/new locations do not overlap if (addr + old_len > new_addr && new_addr + new_len > addr) { pr_err("%s: (%s) (%d)", __func__, __FILE__, __LINE__); goto out; } Or is there an mremap trick I might be missing which actually allows overlapping range moves? thanks, - Joel