From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DA67C433F5 for ; Mon, 27 Sep 2021 10:11:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D26F660F58 for ; Mon, 27 Sep 2021 10:11:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D26F660F58 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3D874900002; Mon, 27 Sep 2021 06:11:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 389576B0072; Mon, 27 Sep 2021 06:11:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2504A900002; Mon, 27 Sep 2021 06:11:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id 11E216B0071 for ; Mon, 27 Sep 2021 06:11:37 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B04C1181AF5D3 for ; Mon, 27 Sep 2021 10:11:36 +0000 (UTC) X-FDA: 78632936592.12.493471E Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf23.hostedemail.com (Postfix) with ESMTP id 6A912900009B for ; Mon, 27 Sep 2021 10:11:36 +0000 (UTC) Received: by mail-pj1-f45.google.com with SMTP id rm6-20020a17090b3ec600b0019ece2bdd20so3547510pjb.1 for ; Mon, 27 Sep 2021 03:11:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XNQWM8vHtyvzzujjIn2e0jTxEShB8PjsFdGS9LUBmPI=; b=TjJPhKgqdaCdcrfvFnIKZiAAZwwGAKuSVhx3CczAUD6hFiA8zjoIUhDHIkye40/Oym PPRmVlIAFP3JD8QjBRWzkhkP8wPYFITH6Gfpbeed3AzPcNYx+qxkSRiaiblfQbqYnlVN LTnP3lP2lDV0sxbpgqd/vjIgmtDBkVzvLmN7Cm+eKfOKiXEIlbTrtE/6wIcRy1OVKW2U vse9pIYGBEDgbRjfkSK1hA2Xp00rBkk+UI7Fob00Wc5yKL2g3UKveCsm6PHPnXzSruRS GUdjZvX+bG4Jz5mB2B+yKsRngj8yVzRQAGJjGIdeG/H5WOgZtXmH8ObF4i5NTdI2P8Ua SliA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XNQWM8vHtyvzzujjIn2e0jTxEShB8PjsFdGS9LUBmPI=; b=7aFu0GeStavV5D21So9O61dD7VEV3dLr17xO6AsM3tovZjteb5UM/Sx3yNzfszxiih U+PeSCP+rgNuVhXhlhS+6PqakQwfMGtmI/eBjOu3onoKtqpsYfePyfG5+v+Fp8pQbfjx Uf1YwDFJPA69v6QpqHu9+OzVaumOVO4kYPt9llmeTjTey0z+3oBuuv8rNRlhaCLje0ko 0gBHlo5VA0wumfJUZtRXhL/NBhftGC5WtgOQS/8O1fFsUmnJp/b3fV2o/r2QYQUHSZrJ iLpX9du1S9s/wF1uhfassHKx05ZqEkNJ04ejllfoIrClUnPUZmQxiXoWd6z2ePNkjfsK nlVA== X-Gm-Message-State: AOAM533K9/DJMt0ijgpRhrhaSJxojEzF52b5pckGGUQj1M2y4kx/SIyZ ZfWKpioAmDBF1jIV1JxIuFg= X-Google-Smtp-Source: ABdhPJy4jklaP6MvZd9i3vVF/X/TOTlo7d2xrM3/h/uVG8+J2TQb8Ncp/5HZ02HjnnhyR6p7FytfoA== X-Received: by 2002:a17:90a:3b09:: with SMTP id d9mr17056586pjc.128.1632737483118; Mon, 27 Sep 2021 03:11:23 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id c18sm17670890pge.69.2021.09.27.03.11.21 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Sep 2021 03:11:22 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes From: Nadav Amit In-Reply-To: <20210927090852.sc5u65ufwvfx57rl@box.shutemov.name> Date: Mon, 27 Sep 2021 03:11:20 -0700 Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Peter Xu , Andrea Arcangeli , Minchan Kim , Colin Cross , Suren Baghdasarya , Mike Rapoport Content-Transfer-Encoding: quoted-printable Message-Id: References: <20210926161259.238054-1-namit@vmware.com> <20210926161259.238054-2-namit@vmware.com> <20210927090852.sc5u65ufwvfx57rl@box.shutemov.name> To: "Kirill A. Shutemov" X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6A912900009B X-Stat-Signature: ckny7dh4z76yd37pt9rgisp3bjidkqf3 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TjJPhKgq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-HE-Tag: 1632737496-742972 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Sep 27, 2021, at 2:08 AM, Kirill A. Shutemov = wrote: >=20 > On Sun, Sep 26, 2021 at 09:12:52AM -0700, Nadav Amit wrote: >> From: Nadav Amit >>=20 >> The comment in madvise_dontneed_free() says that vma splits that = occur >> while the mmap-lock is dropped, during userfaultfd_remove(), should = be >> handled correctly, but nothing in the code indicates that it is so: = prev >> is invalidated, and do_madvise() will therefore continue to update = VMAs >> from the "obsolete" end (i.e., the one before the split). >>=20 >> Propagate the changes to end from madvise_dontneed_free() back to >> do_madvise() and continue the updates from the new end accordingly. >=20 > Could you describe in details a race that would lead to wrong = behaviour? Thanks for the quick response. For instance, madvise(MADV_DONTNEED) can race with mprotect() and cause the VMA to split. Something like: CPU0 CPU1 ---- ---- madvise(0x10000, 0x2000, MADV_DONTNEED) -> userfaultfd_remove() [ mmap-lock dropped ] mprotect(0x11000, 0x1000, PROT_READ) [splitting the VMA] read(uffd) [unblocking userfaultfd_remove()] [ resuming ] end =3D vma->vm_end [end =3D=3D 0x11000] madvise_dontneed_single_vma(vma, 0x10000, 0x11000) Following this operation, 0x11000-0x12000 would not be zapped. > If mmap lock was dropped any change to VMA layout can appear. We can = have > totally unrelated VMA there. Yes, but we are not talking about completely unrelated VMAs. If userspace registered a region to be monitored using userfaultfd, it expects this region to be handled as any other region. This is a change of behavior that only affects regions with uffd. The comment in the code explicitly says that this scenario should be handled: /* * Don't fail if end > vma->vm_end. If the old * vma was split while the mmap_lock was * released the effect of the concurrent * operation may not cause madvise() to * have an undefined result. There may be an * adjacent next vma that we'll walk * next. userfaultfd_remove() will generate an * UFFD_EVENT_REMOVE repetition on the * end-vma->vm_end range, but the manager can * handle a repetition fine. */ Unless I am missing something, this does not happen in the current code. >=20 > Either way, if userspace change VMA layout for a region that is under > madvise(MADV_DONTNEED) it is totally broken. I don't see a valid = reason to > do this. >=20 > The current behaviour looks reasonable to me. Yes, we can miss VMAs, = but > these VMAs can also be created just after madvise() is finished. Again, we are not talking about newly created VMAs. Alternatively, this comment should be removed and perhaps the documentation should be updated.