From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB2BDC433E0 for ; Wed, 23 Dec 2020 09:45:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4DA1D22482 for ; Wed, 23 Dec 2020 09:45:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4DA1D22482 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 79C456B00C9; Wed, 23 Dec 2020 04:45:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 725EF6B00CA; Wed, 23 Dec 2020 04:45:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 578856B00CB; Wed, 23 Dec 2020 04:45:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id 3EF166B00C9 for ; Wed, 23 Dec 2020 04:45:04 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E9FE03631 for ; Wed, 23 Dec 2020 09:45:03 +0000 (UTC) X-FDA: 77624063286.19.duck46_160ba1027467 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id C3DB31ACC22 for ; Wed, 23 Dec 2020 09:45:03 +0000 (UTC) X-HE-Tag: duck46_160ba1027467 X-Filterd-Recvd-Size: 6704 Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Dec 2020 09:45:03 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id y19so38545553lfa.13 for ; Wed, 23 Dec 2020 01:45:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=nGD/IzQ8B854voURxPB+V56vRqmJO6UpGkJrR67WHaw=; b=Q/sgS5Hi/obloj/5gsQkvmd2wLqZ6prYhK0W1YOEK2FQvZDEIB/tM9W58WLSQDPHDd rgZygccbXO5HlUib555+1dyxTlbXIqg45ICso56K8SBlVGavSYBi8xZPNBtkSGOke1cB xgiKQ4BGaiLF563JI063gBUzVI0x6h/e/YM6U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nGD/IzQ8B854voURxPB+V56vRqmJO6UpGkJrR67WHaw=; b=luMUvVFNteeF2z2v6RjDq2sKnltOLM1RrIVqdarZBVRdQOzQaN7zQkNfIrhzHjVEmZ WTd3RbvB5nDIAXqfblywaghbRiowYZldRv4LYW+7vzlP3FwN02CGw8QeKcSScILbZ6i1 JAB1giT2wezx+873QdNnlnsostvUQ2mNJLy2i8xv9CBag2eO2rbNVA6u7kNTPuDrxgqE 6WwZDmvBVUVeGIHEAgJBzbB1TzVHxIew7UhG+g5YAx7TDVFkMMc+5Mipbr15BEbAyC7f M1hQsQTRea5TKnLU6fmEby9UJsVCFTjkz/3X+CTdWWYBbBceQxCxp2XQk7CkvVml9Xa0 dnOQ== X-Gm-Message-State: AOAM532RMKq4BUpKLmO6dBJQM/mQ0rMQd7OOGvFsLkLDrKB8cnShpt+8 wPeCnv4dJOqdC/PLTIh4aE5aM12eLvBaCw== X-Google-Smtp-Source: ABdhPJzVPmm5eVjwcr2cRyoBzRgjRio9ic8MzfW63XwI6mZ6ur0VG1mjD5Tzo/KfiHRHD5zM5iIikA== X-Received: by 2002:a2e:88d2:: with SMTP id a18mr10915796ljk.42.1608716700948; Wed, 23 Dec 2020 01:45:00 -0800 (PST) Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com. [209.85.167.45]) by smtp.gmail.com with ESMTPSA id m17sm3113850lfo.132.2020.12.23.01.44.59 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Dec 2020 01:44:59 -0800 (PST) Received: by mail-lf1-f45.google.com with SMTP id y19so38545264lfa.13 for ; Wed, 23 Dec 2020 01:44:59 -0800 (PST) X-Received: by 2002:a19:7d85:: with SMTP id y127mr10914984lfc.253.1608716699063; Wed, 23 Dec 2020 01:44:59 -0800 (PST) MIME-Version: 1.0 References: <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> <1FCC8F93-FF29-44D3-A73A-DF943D056680@gmail.com> <20201221223041.GL6640@xz-x1> In-Reply-To: From: Linus Torvalds Date: Wed, 23 Dec 2020 01:44:42 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect To: Yu Zhao Cc: Andrea Arcangeli , Andy Lutomirski , Peter Xu , Nadav Amit , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 22, 2020 at 4:01 PM Linus Torvalds wrote: > > The more I look at the mprotect code, the less I like it. We seem to > be much better about the TLB flushes in other places (looking at > mremap, for example). The mprotect code seems to be very laissez-faire > about the TLB flushing. No, this doesn't help. > Does adding a TLB flush to before that > > pte_unmap_unlock(pte - 1, ptl); > > fix things for you? It really doesn't fix it. Exactly because - as pointed out earlier - the actual page *copy* happens outside the pte lock. So what can happen is: - CPU 1 holds the page table lock, while doing the write protect. It has cleared the writable bit, but hasn't flushed the TLB's yet - CPU 2 did *not* have the TLB entry, sees the new read-only state, takes a COW page fault, and reads the PTE from memory (into vmf->orig_pte) - CPU 2 correctly decides it needs to be a COW, and copies the page contents - CPU 3 *does* have a stale TLB (because TLB invalidation hasn't happened yet), and writes to that page in users apce - CPU 1 now does the TLB invalidate, and releases the page table lock - CPU 2 gets the page table lock, sees that its PTE matches vmf->orig_pte, and switches it to be that writable copy of the page. where the copy happened before CPU 3 had stopped writing to the page. So the pte lock doesn't actually matter, unless we actually do the page copy inside of it (on CPU2), in addition to doing the TLB flush inside of it (on CPU1). mprotect() is actually safe for two independent reasons: (a) it does the mmap_sem for writing (so mprotect can't race with the COW logic at all), and (b) it changes the vma permissions so turning something read-only actually disables COW anyway, since it won't be a COW, it will be a SIGSEGV. So mprotect() is irrelevant, other than the fact that it shares some code with that "turn it read-only in the page tables". fork() is a much closer operation, in that it actually triggers that COW behavior, but fork() takes the mmap_sem for writing, so it avoids this too. So it's really just userfaultfd and that kind of ilk that is relevant here, I think. But that "you need to flush the TLB before releasing the page table lock" was not true (well, it's true in other circumstances - just not *here*), and is not part of the solution. Or rather, if it's part of the solution here, it would have to be matched with that "page copy needs to be done under the page table lock too". Linus