From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FSL_HELO_FAKE, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36DA6C433DB for ; Mon, 21 Dec 2020 20:21:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B81DB224D1 for ; Mon, 21 Dec 2020 20:21:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B81DB224D1 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CD92E6B0036; Mon, 21 Dec 2020 15:21:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CAF366B005C; Mon, 21 Dec 2020 15:21:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC6106B0068; Mon, 21 Dec 2020 15:21:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id A4E8B6B0036 for ; Mon, 21 Dec 2020 15:21:12 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6D8E51F0A for ; Mon, 21 Dec 2020 20:21:12 +0000 (UTC) X-FDA: 77618408784.20.mark25_2f0e4a42745a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 42D71180C07AB for ; Mon, 21 Dec 2020 20:21:12 +0000 (UTC) X-HE-Tag: mark25_2f0e4a42745a X-Filterd-Recvd-Size: 6191 Received: from mail-il1-f179.google.com (mail-il1-f179.google.com [209.85.166.179]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 20:21:11 +0000 (UTC) Received: by mail-il1-f179.google.com with SMTP id q5so10000972ilc.10 for ; Mon, 21 Dec 2020 12:21:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=39yturDQFY/cvVTZmJmushizui4zmBufjTYAX1ZRVr4=; b=AsNW8sxxEEVT9w5tme1yJxKp9Dl95DhTmg+u63fBnsSTtdk9NJRMhf4ZW9FyvNXDNF c6zk+d9yjb29quTJe78WAH/vR4vrgewSuHtG6UVuM2bTAO57IcV7hqwAw/WelEzPTK9p EWrI8JVYWH33S5IpvS72BL7JfexOp1pLjHVso/vJQVec9ExiDjskynoTD2sxPPTryjh+ ht+NtbaQNP7/qs2QmR0PEysPS/FtEEJnqhFm+GxGLmVXSe07QP9WEzIP4B8YKgW/lhP/ srzgWHOhT/1PLIvn1fnLAmkNaa66fIVZtMM8TNOrWynt2AN+6uKiySazdlRptjObqMCh nXCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=39yturDQFY/cvVTZmJmushizui4zmBufjTYAX1ZRVr4=; b=CFcM3VwMphGHTUQR0M6eq7l71kfyv3NbZliXE+8HpKVIZcmI4tpmJUjqceiWHjxBn/ qdeyYRgt5AlWAXYiZj64jea9RlWKDQ5qqMXTFwqtGhd81TviRxaH6kiqI331AlZFXdn2 hBGc8NhlZiMkYp+dmfM33WY0kojwxHPpO3NZVPbqDCxH2II+8wmeynn8ug6AAGANVoHu Ku3EeEY4eO2hFThZujoTAiwFY6Kr9jC6bookzrprCF6xeRtP4A2QO9s/U/cByvRs7bAo D7uCJVvFh9jitq2SXyaEBF5DdewmKmEt/BBummbwxLKJQnics8QkUCcewUbINj5HDp0+ 2eBw== X-Gm-Message-State: AOAM5333+5UedvcUtcm8gJZZOpsWSSCf3BE4VzYsHPcdJdwoYEqmNHKk f0tfXcX1fw2d22yYIcikYebUIw== X-Google-Smtp-Source: ABdhPJxcCE/izSGRkZZ/8WMAn5Fu6c2zGo1sMI3bumbFrTE9aeKshhtCCKIglCHfJBwjFsrgImvP1w== X-Received: by 2002:a92:c26c:: with SMTP id h12mr17219261ild.165.1608582070907; Mon, 21 Dec 2020 12:21:10 -0800 (PST) Received: from google.com ([2620:15c:183:200:7220:84ff:fe09:2d90]) by smtp.gmail.com with ESMTPSA id m7sm21272065iow.46.2020.12.21.12.21.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Dec 2020 12:21:10 -0800 (PST) Date: Mon, 21 Dec 2020 13:21:06 -0700 From: Yu Zhao To: Linus Torvalds Cc: Peter Xu , Andrea Arcangeli , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Andy Lutomirski , Will Deacon , Peter Zijlstra , Nadav Amit Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <20201219043006.2206347-1-namit@vmware.com> <20201221172711.GE6640@xz-x1> <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 21, 2020 at 11:55:02AM -0800, Linus Torvalds wrote: > On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao wrote: > > > > Nadav Amit found memory corruptions when running userfaultfd test above. > > It seems to me the problem is related to commit 09854ba94c6a ("mm: > > do_wp_page() simplification"). Can you please take a look? Thanks. > > > > TL;DR: it may not safe to make copies of singly mapped (non-COW) pages > > when it's locked or has additional ref count because concurrent > > clear_soft_dirty or change_pte_range may have removed pte_write but yet > > to flush tlb. > > Hmm. The TLB flush shouldn't actually matter, because anything that > changes the writable bit had better be serialized by the page table > lock. Well, unfortunately we have places that use optimizations like inc_tlb_flush_pending() lock page table pte_wrprotect flush_tlb_range() dec_tlb_flush_pending() which complicate things. And usually checking mm_tlb_flush_pending() in addition to pte_write() (while holding page table lock) would fix the similar problems. But for this one, doing so apparently isn't as straightforward or the best solution. > Yes, we often load the page table value without holding the page table > lock (in order to know what we are going to do), but then before we > finalize the operation, we then re-check - undet the page table lock - > that the value we loaded still matches. > > But I think I see what *MAY* be going on. The userfaultfd > mwriteprotect_range() code takes the mm lock for _reading_. Which > means that you can have > > Thread A Thread B > > - fault starts. Sees write-protected pte, allocates memory, copies data > > - userfaultfd makes the regions writable > > - usefaultfd case writes to the region > > - userfaultfd makes region non-writable > > - fault continues, gets the page table lock, sees that the pte is the > same, uses old copied data > > But if this is what's happening, I think it's a userfaultfd bug. I > think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be > a mmap_write_lock(). > > mprotect() does this right, it looks like userfaultfd does not. You > cannot just change the writability of a page willy-nilly without the > correct locking. > > Maybe there are other causes, but this one stands out to me as one > possible cause. > > Comments? > > Linus