From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49159C433EF for ; Mon, 27 Sep 2021 12:45:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BC4BD60184 for ; Mon, 27 Sep 2021 12:45:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BC4BD60184 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 48BD86B0071; Mon, 27 Sep 2021 08:45:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43A2D6B0072; Mon, 27 Sep 2021 08:45:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3028B900002; Mon, 27 Sep 2021 08:45:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 216DD6B0071 for ; Mon, 27 Sep 2021 08:45:22 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D085A31EA0 for ; Mon, 27 Sep 2021 12:45:21 +0000 (UTC) X-FDA: 78633324042.15.4160E0E Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf23.hostedemail.com (Postfix) with ESMTP id 83FE190000AA for ; Mon, 27 Sep 2021 12:45:21 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id y26so37514467lfa.11 for ; Mon, 27 Sep 2021 05:45:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=6AC5eSoepb2bNyO+Jia6IE49hxeFqqw1JuenYNorqo4=; b=vXniwCEdXDi2T3uDIXy9oa0b+6aeI50sj6hLwIQuxpGCZ7uGCzvfHiqO5cH+Puuqsi VYQd+KbHgmF2V9P4r+5L1DQQi3craYId+eZW+Bd8hRyKRC8jFI6p1VM48YSZB4y2r6UE DnEpS8+eGR3QUBmFIS5wMQnttd34NOGooB2UsUXpin7LlGWXKe914n0N9cFpesbG22Cf ufNV3MBCD2WWb03RFb/935AS0OXHtgt8+ftwL1cQ4J80QoGKlnl2pAhMt6KDMH/cKK4e +lvcvyd05sLmAZQv4besER/2f8A74izsJpPCdTj6AagxbuKLi1aB9XtI/jAr5g3S6/v4 C+Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=6AC5eSoepb2bNyO+Jia6IE49hxeFqqw1JuenYNorqo4=; b=RkpDXKR5AyLv1AIGoQfeY3ZLd24Gc3CDfwam/V/jNh/E4DfA0b88Tdq8K+kAdJxV5A 26wlOwrHbVP8okqvcGvGc3krF8zC8F7cI1USuHXpStxXCWr4aN31U3pXjl7aWXk4YPPa gFiIoV8NlgD374hDxpY051tLCpOA8nLeLRWQ8mDlzTVCeJm2MyZ5nhBcKowyF/1/4IN4 RbMtV4DHHA6WRWjXe7UvTh9x8+C7NpfAX1qUMQDEKbVVr5sxOOrGoLHlLs2v4dSnauXm FR57EHbZJRAnLJ+4g9GsevOzPQWXpAesNvjeHt5osSVsn+bTCDbTqgop5Zw5ld+eFF15 QVfw== X-Gm-Message-State: AOAM530sV//UO5tqBcQOH2sX3mAAz6zRBiUxU6539L5ZJE2OUx0iGDfW WUeaPkEOklx2sEAAJU4dMHZUYg== X-Google-Smtp-Source: ABdhPJw9SyzSSubOBfwLsvSDXlJtTzZUfkcyEOR9rF8BfRzzh4S+j3fEJINDTCHQEBUy8+e5X1jtKA== X-Received: by 2002:a05:6512:22c1:: with SMTP id g1mr25454128lfu.169.1632746720060; Mon, 27 Sep 2021 05:45:20 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id j24sm1587249lfh.302.2021.09.27.05.45.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Sep 2021 05:45:19 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id EE2B4102FD9; Mon, 27 Sep 2021 15:45:18 +0300 (+03) Date: Mon, 27 Sep 2021 15:45:18 +0300 From: "Kirill A. Shutemov" To: Nadav Amit Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Peter Xu , Andrea Arcangeli , Minchan Kim , Colin Cross , Suren Baghdasarya , Mike Rapoport Subject: Re: [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes Message-ID: <20210927124518.gjas4itro5c3park@box.shutemov.name> References: <20210926161259.238054-1-namit@vmware.com> <20210926161259.238054-2-namit@vmware.com> <20210927090852.sc5u65ufwvfx57rl@box.shutemov.name> <20210927115507.6xfpugeg3swookbh@box> <4211F6D4-A282-4AB4-8D96-E273C5ABE0DF@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <4211F6D4-A282-4AB4-8D96-E273C5ABE0DF@gmail.com> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 83FE190000AA X-Stat-Signature: g1i58azdif4xeb4cxnombhqr6sp3itqo Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=shutemov-name.20210112.gappssmtp.com header.s=20210112 header.b=vXniwCEd; dmarc=none; spf=none (imf23.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.167.53) smtp.mailfrom=kirill@shutemov.name X-HE-Tag: 1632746721-196174 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 27, 2021 at 05:33:39AM -0700, Nadav Amit wrote: >=20 >=20 > > On Sep 27, 2021, at 4:55 AM, Kirill A. Shutemov wrote: > >=20 > > On Mon, Sep 27, 2021 at 03:11:20AM -0700, Nadav Amit wrote: > >>=20 > >>> On Sep 27, 2021, at 2:08 AM, Kirill A. Shutemov wrote: > >>>=20 > >>> On Sun, Sep 26, 2021 at 09:12:52AM -0700, Nadav Amit wrote: > >>>> From: Nadav Amit > >>>>=20 > >>>> The comment in madvise_dontneed_free() says that vma splits that o= ccur > >>>> while the mmap-lock is dropped, during userfaultfd_remove(), shoul= d be > >>>> handled correctly, but nothing in the code indicates that it is so= : prev > >>>> is invalidated, and do_madvise() will therefore continue to update= VMAs > >>>> from the "obsolete" end (i.e., the one before the split). > >>>>=20 > >>>> Propagate the changes to end from madvise_dontneed_free() back to > >>>> do_madvise() and continue the updates from the new end accordingly= . > >>>=20 > >>> Could you describe in details a race that would lead to wrong behav= iour? > >>=20 > >> Thanks for the quick response. > >>=20 > >> For instance, madvise(MADV_DONTNEED) can race with mprotect() and ca= use > >> the VMA to split. > >>=20 > >> Something like: > >>=20 > >> CPU0 CPU1 > >> ---- ---- > >> madvise(0x10000, 0x2000, MADV_DONTNEED) > >> -> userfaultfd_remove() > >> [ mmap-lock dropped ] > >> mprotect(0x11000, 0x1000, PROT_READ) > >> [splitting the VMA] > >>=20 > >> read(uffd) > >> [unblocking userfaultfd_remove()] > >>=20 > >> [ resuming ] > >> end =3D vma->vm_end > >> [end =3D=3D 0x11000] > >>=20 > >> madvise_dontneed_single_vma(vma, 0x10000, 0x11000) > >>=20 > >> Following this operation, 0x11000-0x12000 would not be zapped. > >=20 > > Okay, fair enough. > >=20 > > Wouldn't something like this work too: > >=20 > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 0734db8d53a7..0898120c5c04 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -796,6 +796,7 @@ static long madvise_dontneed_free(struct vm_area_= struct *vma, > > */ > > return -ENOMEM; > > } > > + *prev =3D vma; > > if (!can_madv_lru_vma(vma)) > > return -EINVAL; > > if (end > vma->vm_end) { >=20 > Admittedly (embarrassingly?) I didn=E2=80=99t even consider it since al= l the > comments say that once the lock is dropped prev should be invalidated. >=20 > Let=E2=80=99s see, considering the aforementioned scenario and that the= re is > initially one VMA between 0x10000-0x12000. >=20 > Looking at the code from do_madvise()): >=20 > [ end =3D=3D 0x12000 ] >=20 > tmp =3D vma->vm_end; >=20 > [ tmp =3D=3D 0x12000 ] >=20 > if (end < tmp) > tmp =3D end; >=20 > /* Here vma->vm_start <=3D start < tmp <=3D (end|vma->v= m_end). */ >=20 > error =3D madvise_vma(vma, &prev, start, tmp, behavior)= ; >=20 > [ prev->vm_end =3D=3D 0x11000 after the split] >=20 > if (error) > goto out; > start =3D tmp; >=20 > [ start =3D=3D 0x12000 ] >=20 > if (prev && start < prev->vm_end) > start =3D prev->vm_end; >=20 > [ The condition (start < prev->vm_end) is false, start not updated ] >=20 > error =3D unmapped_error; > if (start >=3D end) > goto out; >=20 > [ start >=3D end; so we end without updating the second part of the spl= it ] >=20 > So it does not work. >=20 > Perhaps adding this one on top of yours? I can test it when I wake up. > It is cleaner, but I am not sure if I am missing something. It should work. BTW, shouldn't we bring madvise_willneed() and madvise_remove() to the same scheme? --=20 Kirill A. Shutemov