From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 411BFC433EF for ; Mon, 27 Sep 2021 12:33:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D2B216103B for ; Mon, 27 Sep 2021 12:33:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D2B216103B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3AE0B6B0071; Mon, 27 Sep 2021 08:33:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 336A9900002; Mon, 27 Sep 2021 08:33:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18A186B0073; Mon, 27 Sep 2021 08:33:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 025376B0071 for ; Mon, 27 Sep 2021 08:33:46 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9EA1B32330 for ; Mon, 27 Sep 2021 12:33:46 +0000 (UTC) X-FDA: 78633294852.13.888488A Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf04.hostedemail.com (Postfix) with ESMTP id 5CF3550000B7 for ; Mon, 27 Sep 2021 12:33:46 +0000 (UTC) Received: by mail-pj1-f54.google.com with SMTP id on12-20020a17090b1d0c00b001997c60aa29so11391529pjb.1 for ; Mon, 27 Sep 2021 05:33:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=vIUxvPG7r6WRv2s+586G3uUFcKQV9EEVz94m50u5c8o=; b=Dm20pXbuODJi9E4d26gWxQdWF4WKYPY57n49iilt38oynabqdwl19DLcjw2vVzs9YC qvz/annyS7qAVq3ybRpSMaqwVrdRXK0glutOr0AY53lsvz+poExSyOE/+lVbVyr9piY9 Jlcw4gKurIA6xL8lxT3HejdEb4SAZZtr0W+w2ZZPmxVfowsgpppqHl2N7ImJm+Biotiu jDqIO/6O6paUZjk9b9bIHTM8j45mNdAiMBBbQAAsU9fiZbs94AaEya8Q0qFDiBEPrB57 WkwzhlXfKAn/rI2gaxlgM9lLKXf9a1a4mTQGl6yaTZMLcIz1ijyxwkeYoV/BJqRVBZLO sntQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=vIUxvPG7r6WRv2s+586G3uUFcKQV9EEVz94m50u5c8o=; b=oZFLcBXM2ozdmfitoTT6bvmU0HzTuv5KjvloVtesxv1g/tatBFCBkrE6zn/bft9gzm Wgug9Ltzjbhky9f8Yq5sYi9asrhfvOIiWY7B4zO6H5qpIeEASre8c0Y/1im/MMEHbF3Z 3Vb0Gpq3cek1fJcbmzwyNxbuuo16PFT2xh4A41TkhcBqk8Vi3QcVat/osyQhAOPsQgJ6 4J8jdx3tgNmAwgeHAW2kFpoft9gcFoxajqC146Tpj0Gwe1/ScH9dgCDFj5FShBLZR4XD ZuC+/w4HRafjerAHQ0voWoWGICNesP/hPREtR0HIuY9qPuTfMq1Ab9MC3ErwnMK9QapA 3P7g== X-Gm-Message-State: AOAM53042FY323lMj2s0paUPGYec6n5PK16BmEPusmViENfrypQM+l9u NOZr/zQlXiSdyxWapq7stw0= X-Google-Smtp-Source: ABdhPJy8+zMmYNGCKw92KCp0c5+82Rp5iWOQwe4wIgfH6nhB+Flpu840zpmqz6cBeQFb0DHvBx7oeQ== X-Received: by 2002:a17:90a:351:: with SMTP id 17mr19887649pjf.145.1632746025137; Mon, 27 Sep 2021 05:33:45 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id s17sm6643460pge.50.2021.09.27.05.33.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Sep 2021 05:33:44 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes From: Nadav Amit In-Reply-To: <20210927115507.6xfpugeg3swookbh@box> Date: Mon, 27 Sep 2021 05:33:39 -0700 Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Peter Xu , Andrea Arcangeli , Minchan Kim , Colin Cross , Suren Baghdasarya , Mike Rapoport Content-Transfer-Encoding: quoted-printable Message-Id: <4211F6D4-A282-4AB4-8D96-E273C5ABE0DF@gmail.com> References: <20210926161259.238054-1-namit@vmware.com> <20210926161259.238054-2-namit@vmware.com> <20210927090852.sc5u65ufwvfx57rl@box.shutemov.name> <20210927115507.6xfpugeg3swookbh@box> To: "Kirill A. Shutemov" X-Mailer: Apple Mail (2.3654.120.0.1.13) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Dm20pXbu; spf=pass (imf04.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5CF3550000B7 X-Stat-Signature: rsf86esa3beozw3cgwr1eaia7ig9wkow X-HE-Tag: 1632746026-794872 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Sep 27, 2021, at 4:55 AM, Kirill A. Shutemov = wrote: >=20 > On Mon, Sep 27, 2021 at 03:11:20AM -0700, Nadav Amit wrote: >>=20 >>> On Sep 27, 2021, at 2:08 AM, Kirill A. Shutemov = wrote: >>>=20 >>> On Sun, Sep 26, 2021 at 09:12:52AM -0700, Nadav Amit wrote: >>>> From: Nadav Amit >>>>=20 >>>> The comment in madvise_dontneed_free() says that vma splits that = occur >>>> while the mmap-lock is dropped, during userfaultfd_remove(), should = be >>>> handled correctly, but nothing in the code indicates that it is so: = prev >>>> is invalidated, and do_madvise() will therefore continue to update = VMAs >>>> from the "obsolete" end (i.e., the one before the split). >>>>=20 >>>> Propagate the changes to end from madvise_dontneed_free() back to >>>> do_madvise() and continue the updates from the new end accordingly. >>>=20 >>> Could you describe in details a race that would lead to wrong = behaviour? >>=20 >> Thanks for the quick response. >>=20 >> For instance, madvise(MADV_DONTNEED) can race with mprotect() and = cause >> the VMA to split. >>=20 >> Something like: >>=20 >> CPU0 CPU1 >> ---- ---- >> madvise(0x10000, 0x2000, MADV_DONTNEED) >> -> userfaultfd_remove() >> [ mmap-lock dropped ] >> mprotect(0x11000, 0x1000, PROT_READ) >> [splitting the VMA] >>=20 >> read(uffd) >> [unblocking userfaultfd_remove()] >>=20 >> [ resuming ] >> end =3D vma->vm_end >> [end =3D=3D 0x11000] >>=20 >> madvise_dontneed_single_vma(vma, 0x10000, 0x11000) >>=20 >> Following this operation, 0x11000-0x12000 would not be zapped. >=20 > Okay, fair enough. >=20 > Wouldn't something like this work too: >=20 > diff --git a/mm/madvise.c b/mm/madvise.c > index 0734db8d53a7..0898120c5c04 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -796,6 +796,7 @@ static long madvise_dontneed_free(struct = vm_area_struct *vma, > */ > return -ENOMEM; > } > + *prev =3D vma; > if (!can_madv_lru_vma(vma)) > return -EINVAL; > if (end > vma->vm_end) { Admittedly (embarrassingly?) I didn=E2=80=99t even consider it since all = the comments say that once the lock is dropped prev should be invalidated. Let=E2=80=99s see, considering the aforementioned scenario and that = there is initially one VMA between 0x10000-0x12000. Looking at the code from do_madvise()): [ end =3D=3D 0x12000 ] tmp =3D vma->vm_end; [ tmp =3D=3D 0x12000 ] if (end < tmp) tmp =3D end; /* Here vma->vm_start <=3D start < tmp <=3D = (end|vma->vm_end). */ error =3D madvise_vma(vma, &prev, start, tmp, behavior); [ prev->vm_end =3D=3D 0x11000 after the split] if (error) goto out; start =3D tmp; [ start =3D=3D 0x12000 ] if (prev && start < prev->vm_end) start =3D prev->vm_end; [ The condition (start < prev->vm_end) is false, start not updated ] error =3D unmapped_error; if (start >=3D end) goto out; [ start >=3D end; so we end without updating the second part of the = split ] So it does not work. Perhaps adding this one on top of yours? I can test it when I wake up. It is cleaner, but I am not sure if I am missing something. diff --git a/mm/madvise.c b/mm/madvise.c index 0734db8d53a7..548fc106e70b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1203,7 +1203,7 @@ int do_madvise(struct mm_struct *mm, unsigned long = start, size_t len_in, int beh if (error) goto out; start =3D tmp; - if (prev && start < prev->vm_end) + if (prev) start =3D prev->vm_end; error =3D unmapped_error; if (start >=3D end)=