From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4055C001DF for ; Mon, 24 Jul 2023 17:11:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6175D6B007B; Mon, 24 Jul 2023 13:11:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C7B58E0002; Mon, 24 Jul 2023 13:11:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48F7B8E0001; Mon, 24 Jul 2023 13:11:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3B5496B007B for ; Mon, 24 Jul 2023 13:11:41 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0599912084E for ; Mon, 24 Jul 2023 17:11:41 +0000 (UTC) X-FDA: 81047147202.17.9528B8D Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id 0EA9320012 for ; Mon, 24 Jul 2023 17:11:38 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=1IWTAX7Y; dmarc=none; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690218699; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2zHCRdb/jEtZGceoa4SsEeGGVqtDZVwM2yrcXPGJY4g=; b=XMWO2glX+kCjbWFMQgqpUkeJQTFdkr+/lkC4XGe1omxGjwqzLtRrLko9f15WjTlZ5BNRiA qtaySJEdJitnHt/ShUSfi5C+qXRLPqFFmD6RbF0XnfFPTUytOK8NFLXlR47+1AmhSa884h HRCq09uUAoLVHGyHaT3WWBf0hEVvmdk= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=1IWTAX7Y; dmarc=none; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690218699; a=rsa-sha256; cv=none; b=pQHWEjxOarKBJLNdUk3bJvogteBl2a6c9CR3ZnVE7/ZlGbBEz2YsORIjGmzC/6n4Yd+q0X bTcjlSh0BXx+7YrZptyM+qeBuhMTDo4oMznTVxPuzWp1KLWtqJdx3CSk6mQ84V2lwSmjku Ppu0Y7LcMj/UvUxMKP9Ufc+ou8PRvso= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2082D612C9; Mon, 24 Jul 2023 17:11:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 37A78C433C7; Mon, 24 Jul 2023 17:11:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1690218697; bh=/iEHzZZI/uM1YwURbFTNXhK6+11qyGduzfaCBn2j9xk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=1IWTAX7YAuRQKQpYUUvovEu6BdmlHkXDehUYuxFG7eK5ZU4w13BfmUlcM5iJV+/0q 6lnQ6u3+QeDdEQh2IOsUDDUl0hd4Zrmo1F595c19pK+ooVkhNJWOQwybSewEOPYf1s v7/SLJ35QcU2hKDUUGgyxeA08yV1YO/Qzu6M4vgk= Date: Mon, 24 Jul 2023 10:11:36 -0700 From: Andrew Morton To: Jann Horn Cc: Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Will Deacon , Peter Zijlstra , Hugh Dickins Subject: Re: [PATCH v2] mm: Fix memory ordering for mm_lock_seq and vm_lock_seq Message-Id: <20230724101136.4c58e8291961e87f6c5c1c79@linux-foundation.org> In-Reply-To: <20230721225107.942336-1-jannh@google.com> References: <20230721225107.942336-1-jannh@google.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: wq89bxuugkew9ucn38dj7cpqshjbda3w X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0EA9320012 X-HE-Tag: 1690218698-855448 X-HE-Meta: U2FsdGVkX19qA9/aQsijIMbcziUVQVTrUOuNwuW57mZ87AwiAK+BWZXhjVDFRCqHbkYd5lsxrNV9G39xgg0IeNACV+dBL/JnyigLzKAzPy4VuKBZO1Wh+JpXIAbd4eItalSK1MprgNI7mfN4S6T2Ijx2pJLz0xj5RmunLu7HNzvwhIuNJmcMTstsLO/2fQGR2WHC/Il/CV/EPhwHSiMBlhj9S0XzlVDDjCqxUGbTd8vjnPEUsJFdk4zYbQWQKkCdUpQHeHXeiDMe9M7NORucoF7knMUUM83vuW3FHHqijqJ6KbG4XI2VvPWvVVzEA91yNM+bXnvtbUMmWSkyfRSxWuraNXhcniGQ5QQHxK0hAU+g6C/vOIQGa8Akrq68ivZRcfBIaE2VuWLuSzftJCTue5/FQln0yeA+88mXyDJsX6NOHIpZbklZUNjI245UVfMr51Le07VNNrrPF0VdbrkH66hiJ6GEPnGu8wS0Lqg8RCrfeidXxDKwHw33QpNaKfV5PIy0ZHguLNvQxgGR+28/0sHp9mPY6nrHGmYC5viWswDnxLQSrROkwZBzxHbwsxkTlBqT4MQsZpPqb0dt2pVEBKSWQE52AHFj1TcT63neqcprrjB1yCbtEt1667MaWbv2lrncwWrk2A2hr9vMHDnQxFO+TnrzcJOMqHqQHYtswT1NVQO447XKg/W3Dz1Tk4KzfrMA5HRyr9Vwx4tZO6utDj6x3dFFRLUEHiPsPgp9rohc56vmwHuSiumpfIy0xu6zze5Zswm1/vOMITW0BxjsUtItpSvLiynj574JqClKVHsquVF78pBB+UjvCUSm45m9LLAQbvLJwkV3etDkfm9jPzVfz+MHeUmkRp5nV2mp13wsfRd6nbRm1AbpTHHaZtuC8f4p3zQ264hy+JsQWUuN8kEb7tsCmC6xt2i3cB8hSDx7dMHzfipNBxsDbOamz97mqHC9tnILY/j93QYWWqE amu6t8tz 0h7M4Af2xjbHbpqor3TmdrC+l7AEnDuqBaBOVVuD4jXOZZx6fs4uXtt7ZbK/fzPWIelzWQcbD1lhbIzJyym/2f4UkpCPEOdC0CvYAb0T4l7Cma1iLQbngSlZGNOmcDVGN8MkVkiwHddqH4nn3F+ZiNaU61eO/JLgXsmeT7OjG9AXjUR1Fk6IWdQcZhCw8i8nVY2F3WLUpLOc70QZl96c01H4y87DwDyhnQS2Zwmdcv5C8sE2pSMfSFCG1xnVwsSANlo+YRlZH3aP2x47shWO2oI8hMjN+rsAetlSg8pzEatSzOa9Xc9qr/drcWBAiDByfxgI48pIMfj+4KiZyNIwrCADre2FJlTDHnJgrkpQ5+Gk75ejePhBxhR0uRGAoaY0njPn7s/XLD35KvmahuL7xr8frMi7kuhyhUJHcSeq4fjT5kHCZPV/rKsnaJb3zLdhGFEuYLq/mEdk0YbA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 22 Jul 2023 00:51:07 +0200 Jann Horn wrote: > mm->mm_lock_seq effectively functions as a read/write lock; therefore it > must be used with acquire/release semantics. > > A specific example is the interaction between userfaultfd_register() and > lock_vma_under_rcu(). > userfaultfd_register() does the following from the point where it changes > a VMA's flags to the point where concurrent readers are permitted again > (in a simple scenario where only a single private VMA is accessed and no > merging/splitting is involved): > > userfaultfd_register > userfaultfd_set_vm_flags > vm_flags_reset > vma_start_write > down_write(&vma->vm_lock->lock) > vma->vm_lock_seq = mm_lock_seq [marks VMA as busy] > up_write(&vma->vm_lock->lock) > vm_flags_init > [sets VM_UFFD_* in __vm_flags] > vma->vm_userfaultfd_ctx.ctx = ctx > mmap_write_unlock > vma_end_write_all > WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA] > > There are no memory barriers in between the __vm_flags update and the > mm->mm_lock_seq update that unlocks the VMA, so the unlock can be reordered > to above the `vm_flags_init()` call, which means from the perspective of a > concurrent reader, a VMA can be marked as a userfaultfd VMA while it is not > VMA-locked. That's bad, we definitely need a store-release for the unlock > operation. > > The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly > fine because all accesses to vma->vm_lock_seq that matter are always > protected by the VMA lock. There is a racy read in vma_start_read() though > that can tolerate false-positives, so we should be using WRITE_ONCE() to > keep things tidy and data-race-free (including for KCSAN). > > On the other side, lock_vma_under_rcu() works as follows in the relevant > region for locking and userfaultfd check: > > lock_vma_under_rcu > vma_start_read > vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout] > down_read_trylock(&vma->vm_lock->lock) > vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check] > userfaultfd_armed > checks vma->vm_flags & __VM_UFFD_FLAGS > > Here, the interesting aspect is how far down the mm->mm_lock_seq read > can be reordered - if this read is reordered down below the vma->vm_flags > access, this could cause lock_vma_under_rcu() to partly operate on > information that was read while the VMA was supposed to be locked. > To prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we > need to read it with a load-acquire. > > Some of the comment wording is based on suggestions by Suren. > > BACKPORT WARNING: One of the functions changed by this patch (which I've > written against Linus' tree) is vma_try_start_write(), but this function > no longer exists in mm/mm-everything. I don't know whether the merged > version of this patch will be ordered before or after the patch that > removes vma_try_start_write(). If you're backporting this patch to a > tree with vma_try_start_write(), make sure this patch changes that > function. I staged this patch as a hotfix, ahead of mm-unstable material. The conflict is with Hugh's "mm: delete mmap_write_trylock() and vma_try_start_write()" (https://lkml.kernel.org/r/4e6db3d-e8e-73fb-1f2a-8de2dab2a87c@google.com) I fixed the reject in the obvious way (deleted the function anyway), but there's a possibility that the ordering issue you have addressed will now be reintroduced by Hugh's series, so please let's review that.