From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0ABEFC54EE9
	for <linux-mm@archiver.kernel.org>; Thu, 22 Sep 2022 15:14:57 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3D5E26B0071; Thu, 22 Sep 2022 11:14:57 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 385FA6B0072; Thu, 22 Sep 2022 11:14:57 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 226B1940007; Thu, 22 Sep 2022 11:14:57 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 0FA326B0071
	for <linux-mm@kvack.org>; Thu, 22 Sep 2022 11:14:57 -0400 (EDT)
Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id DA8721A1508
	for <linux-mm@kvack.org>; Thu, 22 Sep 2022 15:14:56 +0000 (UTC)
X-FDA: 79940068992.20.3599B5F
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf09.hostedemail.com (Postfix) with ESMTP id 8534A140005
	for <linux-mm@kvack.org>; Thu, 22 Sep 2022 15:14:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1663859694;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Qjp7gSqru1V14obi6EFQxvOUgdoMmkmY1jvpy9DdMTw=;
	b=RwoYeyI2r6CPScBMbGAWblAvmAxKlOfjLjBR71sCrSiQxsOPrGVTQxiFsdAPAVwcIAQ/rz
	iUMiDHyCST2ljV6BQKOkeMwMl2XBkno0+3EnHapJ1flqMHj6YNKPQjWku8rUUPqKiMHaW1
	U/bYl3h++qOYhuvj+wuxhCG64KuChaQ=
Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com
 [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-445-cnesfAWzMiOeNl3oQ1l6vw-1; Thu, 22 Sep 2022 11:14:53 -0400
X-MC-Unique: cnesfAWzMiOeNl3oQ1l6vw-1
Received: by mail-qv1-f69.google.com with SMTP id g12-20020a0cfdcc000000b004ad431ceee0so6565525qvs.7
        for <linux-mm@kvack.org>; Thu, 22 Sep 2022 08:14:53 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date;
        bh=Qjp7gSqru1V14obi6EFQxvOUgdoMmkmY1jvpy9DdMTw=;
        b=pkFehtVIoFwkjYtmdMMfIZ1qP2ykxvh4ppNVo78lPQNffuZ0VjvatfWSA9z/Gpgu8C
         +gRnhzI5eX59CQ/b40TSPfhFpUGB1eK8BnM//sxTmArJ4k17U+sO0TpyWeQzGG1PXO3b
         elefU87zEiyhdgotXRVbN8/uBUJgOiSr5sW5jd+A4ycOYztWUB/H71ZIIfsKXLFm4zbs
         xLXndHJQZYmqYulAAQGWncqZQhyTtIgkIUzFs1KfVLO6psndNPa88UlrTgPSzaFhe7h4
         D0v6z3Q2N8vPsBYG5gMijX+ZZ5r2+XfjFJQ3DOQ8hrXgmetCcAzVtC8YoeDVyFIYxk89
         QBbA==
X-Gm-Message-State: ACrzQf1ohdIOvZ0yiZU5BIvpMwWV0suPHU21ZCnNpoAtO7uOwJFZ79xd
	ok5B3ODClIRqv7F42I9CjDY+Dt+h/ZA4kBkJGfbwBeXz6+Dt15Be2MTfxk2Ux0EMvVzBchkbXs6
	bEvUZ+dPQ7FE=
X-Received: by 2002:a05:620a:462b:b0:6ce:7dce:82d6 with SMTP id br43-20020a05620a462b00b006ce7dce82d6mr2502601qkb.476.1663859692937;
        Thu, 22 Sep 2022 08:14:52 -0700 (PDT)
X-Google-Smtp-Source: AMsMyM64Y1dl6PNU7Z1b6oFSNnzd8HcoaFOTopooptpW4R0paiaMW1ijLneOM1S44vEwHsDY/ok6qQ==
X-Received: by 2002:a05:620a:462b:b0:6ce:7dce:82d6 with SMTP id br43-20020a05620a462b00b006ce7dce82d6mr2502579qkb.476.1663859692674;
        Thu, 22 Sep 2022 08:14:52 -0700 (PDT)
Received: from xz-m1.local (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79])
        by smtp.gmail.com with ESMTPSA id c3-20020a37e103000000b006ce3fcee2bdsm3851025qkm.50.2022.09.22.08.14.51
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 22 Sep 2022 08:14:52 -0700 (PDT)
Date: Thu, 22 Sep 2022 11:14:51 -0400
From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Liu Shixin <liushixin2@huawei.com>, Liu Zixian <liuzixian4@huawei.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	John Hubbard <jhubbard@nvidia.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH] mm: hugetlb: fix UAF in hugetlb_handle_userfault
Message-ID: <Yyx76wt2LYWSKLUs@xz-m1.local>
References: <20220921083440.1267903-1-liushixin2@huawei.com>
 <YytOYH1MSo5cNoB6@monkey>
 <Yyuk83B4VHh+pbFp@monkey>
MIME-Version: 1.0
In-Reply-To: <Yyuk83B4VHh+pbFp@monkey>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663859696; a=rsa-sha256;
	cv=none;
	b=1ZAN2yLMc+N+q4oRtdlkrT+3lQQXFK+soLMbrFbTvrfnD23n0RefKsaLEZ9Vm7tFJz7saQ
	xdCYtJIZBGZMBrvcIMq8O/bCN7O2WzS5IKzOWQGgRWmLEvkP+BVMejF9fS2t+d4VE+6/CN
	EAo07XNpvshYiqLbHdWDHBUC3hVrQmM=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RwoYeyI2;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf09.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1663859696;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Qjp7gSqru1V14obi6EFQxvOUgdoMmkmY1jvpy9DdMTw=;
	b=uJ2iUIg+V0N6xshFAnSH0x7t+wzCnmsn0oBbcG5OGWOeU74+7rc9hbyGD5NItMOr1hpKpA
	/PdcOwzmcjl5vFj1H+u2fzl7LLSprvSOJEj8kBKA89XbY4Okd3yt1X9dSqLaRZg01nScPn
	hrWifdO12603gcuz5Pu5o4rX1MNk8Xw=
X-Rspamd-Server: rspam04
X-Rspam-User: 
X-Rspamd-Queue-Id: 8534A140005
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RwoYeyI2;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf09.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com
X-Stat-Signature: e3gxr95humatc1wougtngjwh78siffus
X-HE-Tag: 1663859695-454609
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Sep 21, 2022 at 04:57:39PM -0700, Mike Kravetz wrote:
> On 09/21/22 10:48, Mike Kravetz wrote:
> > On 09/21/22 16:34, Liu Shixin wrote:
> > > The vma_lock and hugetlb_fault_mutex are dropped before handling
> > > userfault and reacquire them again after handle_userfault(), but
> > > reacquire the vma_lock could lead to UAF[1] due to the following
> > > race,
> > > 
> > > hugetlb_fault
> > >   hugetlb_no_page
> > >     /*unlock vma_lock */
> > >     hugetlb_handle_userfault
> > >       handle_userfault
> > >         /* unlock mm->mmap_lock*/
> > >                                            vm_mmap_pgoff
> > >                                              do_mmap
> > >                                                mmap_region
> > >                                                  munmap_vma_range
> > >                                                    /* clean old vma */
> > >         /* lock vma_lock again  <--- UAF */
> > >     /* unlock vma_lock */
> > > 
> > > Since the vma_lock will unlock immediately after hugetlb_handle_userfault(),
> > > let's drop the unneeded lock and unlock in hugetlb_handle_userfault() to fix
> > > the issue.
> > 
> > Thank you very much!
> > 
> > When I saw this report, the obvious fix was to do something like what you have
> > done below.  That looks fine with a few minor comments.
> > 
> > One question I have not yet answered is, "Does this same issue apply to
> > follow_hugetlb_page()?".  I believe it does.  follow_hugetlb_page calls
> > hugetlb_fault which could result in the fault being processed by userfaultfd.
> > If we experience the race above, then the associated vma could no longer be
> > valid when returning from hugetlb_fault.  follow_hugetlb_page and callers
> > have a flag (locked) to deal with dropping mmap lock.  However, I am not sure
> > if it is handled correctly WRT userfaultfd.  I think this needs to be answered
> > before fixing.  And, if the follow_hugetlb_page code needs to be fixed it
> > should be done at the same time.
> > 
> 
> To at least verify this code path, I added userfaultfd handling to the gup_test
> program in kernel selftests.

IIRC vm/userfaultfd should already have GUP tested with pthread mutexes
(which iiuc uses futex, and futex uses GUP).

But indeed I didn't trigger any GUP paths after a quick run..  I agree we
should have some unit test that can at least cover GUP with userfaultfd.
I'll further check it up from vm/userfaultfd side later.

> When doing basic gup test on a hugetlb page in
> a userfaultfd registered range, I hit this warning:
> 
> [ 6939.867796] FAULT_FLAG_ALLOW_RETRY missing 1
> [ 6939.871503] CPU: 2 PID: 5720 Comm: gup_test Not tainted 6.0.0-rc6-next-20220921+ #72
> [ 6939.874562] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
> [ 6939.877707] Call Trace:
> [ 6939.878745]  <TASK>
> [ 6939.879779]  dump_stack_lvl+0x6c/0x9f
> [ 6939.881199]  handle_userfault.cold+0x14/0x1e
> [ 6939.882830]  ? find_held_lock+0x2b/0x80
> [ 6939.884370]  ? __mutex_unlock_slowpath+0x45/0x280
> [ 6939.886145]  hugetlb_handle_userfault+0x90/0xf0
> [ 6939.887936]  hugetlb_fault+0xb7e/0xda0
> [ 6939.889409]  ? vprintk_emit+0x118/0x3a0
> [ 6939.890903]  ? _printk+0x58/0x73
> [ 6939.892279]  follow_hugetlb_page.cold+0x59/0x145
> [ 6939.894116]  __get_user_pages+0x146/0x750
> [ 6939.895580]  __gup_longterm_locked+0x3e9/0x680
> [ 6939.897023]  ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0
> [ 6939.898939]  ? lockdep_hardirqs_on+0x7d/0x100
> [ 6939.901243]  gup_test_ioctl+0x320/0x6e0
> [ 6939.902202]  __x64_sys_ioctl+0x87/0xc0
> [ 6939.903220]  do_syscall_64+0x38/0x90
> [ 6939.904233]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 6939.905423] RIP: 0033:0x7fbb53830f7b
> 
> This is because userfaultfd is expecting FAULT_FLAG_ALLOW_RETRY which is not
> set in this path.
> 
> Adding John, Peter and David on Cc: as they are much more fluent in all the
> fault and FOLL combinations and might have immediate suggestions.  It is going
> to take me a little while to figure out:
> 1) How to make sure we get the right flags passed to handle_userfault

As David mentioned, one way is to have "locked" passed in with non-NULL.

The other way is to have FOLL_NOWAIT even if locked==NULL.

Here IIUC the trick is when the GUP caller neither wants to release the
mmap lock, nor does it want to stop quickly (i.e. it wants to wait for the
page fault with mmap lock held), then we'll have both locked==NULL and
!FOLL_NOWAIT.  Userfaultfd currently doesn't think it's wise so generated
that warning with CONFIG_DEBUG_VM.

> 2) How to modify follow_hugetlb_page as userfaultfd can certainly drop
>    mmap_lock.  So we can not assume vma still exists upon return.

I think FOLL_NOWAIT flag might work if the only thing we want to do is to
trigger handle_userfault() path.  But I'll also look into vm/userfaultfd as
mentioned above to make sure we'll have GUP covered there too.  I'll update
if I found anything useful there.

Off-topic a bit: the whole discussion reminded me something on whether
userfaultfd is doing correctly here.  E.g., here userfaultfd should really
look like the case when a swap in is needed for a file.  FOLL_NOWAIT on
swap-in will mean:

#define FOLL_NOWAIT	0x20	/* if a disk transfer is needed, start the IO
				 * and return without waiting upon it */

Now userfaultfd returns VM_FAULT_RETRY immediately with FOLL_NOWAIT.  I'm
wondering whether it should really generate the message before doing that,
to match with the semantics of initial use of FOLL_NOWAIT on swapping.

Thanks,

-- 
Peter Xu