From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 138F8C433DB for ; Mon, 21 Dec 2020 19:55:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8FABC225AB for ; Mon, 21 Dec 2020 19:55:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FABC225AB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DC2A66B005D; Mon, 21 Dec 2020 14:55:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D71E06B0068; Mon, 21 Dec 2020 14:55:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3AFE6B006C; Mon, 21 Dec 2020 14:55:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0139.hostedemail.com [216.40.44.139]) by kanga.kvack.org (Postfix) with ESMTP id 9F9F76B005D for ; Mon, 21 Dec 2020 14:55:23 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6E71E1EF3 for ; Mon, 21 Dec 2020 19:55:23 +0000 (UTC) X-FDA: 77618343726.11.neck52_1d064ef2745a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 4DB10180F8B98 for ; Mon, 21 Dec 2020 19:55:23 +0000 (UTC) X-HE-Tag: neck52_1d064ef2745a X-Filterd-Recvd-Size: 5671 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 19:55:22 +0000 (UTC) Received: by mail-lf1-f49.google.com with SMTP id b26so17025208lff.9 for ; Mon, 21 Dec 2020 11:55:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vCBFPsCpCA9jFChy+wbQC9wCUWkajD5jyJSjFoZ0ofo=; b=RoUfL1UQrju6L48+iQbnAuB+O5aGNFDdmcnK2mRc37NBlEcyy3oNTYfXpUHSUk0B35 dyl68X414Hb7YAzAj1hsl3TCaId7C7UFi67zJ4SoucOav2U+/iSnEu4JiV4cxpWLeNcM 8qVKyAtxm6Xi83AKNirlrvn4iBrF7Tec+b0I0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vCBFPsCpCA9jFChy+wbQC9wCUWkajD5jyJSjFoZ0ofo=; b=JKJgzIaqsEQI6LtstqvpFIHzcnc9KbssmpOUX2ML2/M10FuxDqk52HwF5XNWbz3i+I rDquP/fyHdSa+mMZoVDXua3CxNwCRg5BuPlsI9vgovcCcaNcGpFPKzTyQ8/BkFHXh60Q H68WMMiL/lZR+NTmiDCf3mVKntYb1RZKhx2XNdiTuysp08nYe/zswe4iGIhnzfHjWgj8 Fb5szrNdgsh5usdjcLZXIIV3mytAEh8vA908Me6QwuV2vlaPFj8PSl2Hyk6C7RRARrEf 3ODNrvs/afynceJVNm6KSzOmXozRk2oExR6ZIT6/GC6dqauacKfE2v+GIj5AjshMQYrl +K+g== X-Gm-Message-State: AOAM5336jdjMbHpRCiK5rT3Np5SQspPrsPmn/JrxiL7/+2iqsOTyArf9 63xDEc1k9dnS7BhS25CwChE8l7jPQr7vGQ== X-Google-Smtp-Source: ABdhPJw1YN/2wAsPUZ5Dxit1QI6EdKeUy94BOiZ6b5uzMW66HX3DJKv9crT8aObK91z7Ekyl9SVTOw== X-Received: by 2002:a05:651c:287:: with SMTP id b7mr7906334ljo.223.1608580520821; Mon, 21 Dec 2020 11:55:20 -0800 (PST) Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com. [209.85.167.49]) by smtp.gmail.com with ESMTPSA id 14sm2205945lfq.221.2020.12.21.11.55.19 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 21 Dec 2020 11:55:19 -0800 (PST) Received: by mail-lf1-f49.google.com with SMTP id a12so26546013lfl.6 for ; Mon, 21 Dec 2020 11:55:19 -0800 (PST) X-Received: by 2002:ac2:41d9:: with SMTP id d25mr6675489lfi.377.1608580519056; Mon, 21 Dec 2020 11:55:19 -0800 (PST) MIME-Version: 1.0 References: <20201219043006.2206347-1-namit@vmware.com> <20201221172711.GE6640@xz-x1> <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> In-Reply-To: From: Linus Torvalds Date: Mon, 21 Dec 2020 11:55:02 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect To: Yu Zhao Cc: Peter Xu , Andrea Arcangeli , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Andy Lutomirski , Will Deacon , Peter Zijlstra , Nadav Amit Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao wrote: > > Nadav Amit found memory corruptions when running userfaultfd test above. > It seems to me the problem is related to commit 09854ba94c6a ("mm: > do_wp_page() simplification"). Can you please take a look? Thanks. > > TL;DR: it may not safe to make copies of singly mapped (non-COW) pages > when it's locked or has additional ref count because concurrent > clear_soft_dirty or change_pte_range may have removed pte_write but yet > to flush tlb. Hmm. The TLB flush shouldn't actually matter, because anything that changes the writable bit had better be serialized by the page table lock. Yes, we often load the page table value without holding the page table lock (in order to know what we are going to do), but then before we finalize the operation, we then re-check - undet the page table lock - that the value we loaded still matches. But I think I see what *MAY* be going on. The userfaultfd mwriteprotect_range() code takes the mm lock for _reading_. Which means that you can have Thread A Thread B - fault starts. Sees write-protected pte, allocates memory, copies data - userfaultfd makes the regions writable - usefaultfd case writes to the region - userfaultfd makes region non-writable - fault continues, gets the page table lock, sees that the pte is the same, uses old copied data But if this is what's happening, I think it's a userfaultfd bug. I think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be a mmap_write_lock(). mprotect() does this right, it looks like userfaultfd does not. You cannot just change the writability of a page willy-nilly without the correct locking. Maybe there are other causes, but this one stands out to me as one possible cause. Comments? Linus