From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67068C05027 for ; Thu, 2 Feb 2023 09:30:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8778A6B0071; Thu, 2 Feb 2023 04:30:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8270E6B0072; Thu, 2 Feb 2023 04:30:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EE856B0073; Thu, 2 Feb 2023 04:30:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5F9006B0071 for ; Thu, 2 Feb 2023 04:30:55 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1ED23160D9E for ; Thu, 2 Feb 2023 09:30:55 +0000 (UTC) X-FDA: 80421832470.06.C349AE7 Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by imf29.hostedemail.com (Postfix) with ESMTP id 24EF112000F for ; Thu, 2 Feb 2023 09:30:51 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=mG4uvBQ1; spf=pass (imf29.hostedemail.com: domain of stevensd@chromium.org designates 209.85.208.181 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675330252; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kn+p2LYYjtrS2RWO9tALAnwisM7EY9rkbYqDL8CmUX4=; b=u4IceROQBLWi8CMVICIwQbcmVdOnEy/A8yRhbk5f24TdMWJIAF5PXz2jXsUJKZmPxxsyhs Djk51Eg7DrbXv2zu4ZB3J72gjJf4IH1eizXJXLiAcjEe0G5bufRuqsFur/Nbu+1eKEN7oM v4gONSsscbQBdRf7C06o5VC+Ro6sICc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=mG4uvBQ1; spf=pass (imf29.hostedemail.com: domain of stevensd@chromium.org designates 209.85.208.181 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675330252; a=rsa-sha256; cv=none; b=ipGTEn3O1QpV5yCpvv8WMR5suiqnclawkhHQGUlhI3VbagS36KL4+e6lCXqL14upKxWoBn G+xt+W178ebyBRXSNydaE92coCMN7DC5gpOB+iLH4DXXPU4NUFMfradCjg4qMGovu/vMqF ixhvDXZ3+bzIcsxGNPRFabikgP78nlk= Received: by mail-lj1-f181.google.com with SMTP id o12so1219639ljp.11 for ; Thu, 02 Feb 2023 01:30:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=kn+p2LYYjtrS2RWO9tALAnwisM7EY9rkbYqDL8CmUX4=; b=mG4uvBQ1f8SK7Dx3CFWmtwoG+Rj4aAdSiap6UNGiH2Hei1njFX/e0h7WZ9D2SAGmsw EwvFXJwkEanu0Qwk5HVBW7M21hq7rqpXA9pG+auTMjE4oznEotqj1L5cDkIiZBLFSU3Q MqmwyCFjQUtAkv5nyAj7iVTx0YppLIGN9y/5o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kn+p2LYYjtrS2RWO9tALAnwisM7EY9rkbYqDL8CmUX4=; b=B3PZz/l1HFlUr8Q4GlxeMOabng2PQlmdMJDymOcCC9yyLNS3ZpFZ6opdoiSwpk+VJn F9a6x9wjqlZ8J8ACqt5fGlnzxI4KvtQlyyvvZsT561vs1gV99dv6WxO047JE4OMt2Mk3 br+mXQ27uBUmi+NjzHk9HuQcN1jzX23bsGl9UuGqtvxvVn7r3tAR4OZp4QhC+2Cb1vfF UpBhpuliztOQGGjoXHLzFLgVabefmneHeK3lGjgtHGWWZM5SunvNNxeCtI1WNbVxSXYN E2z9kWniIxLjr+TUk00rK+r1SH05JF/L+4YXFJS2KIcmDJeWutaEbcXerlLGcdYMNuOT 4Bgw== X-Gm-Message-State: AO0yUKVARfRfWbuzP8The5lBtt5kk/draZjCAZmuh5g8oWKX5hYTSPPM fwEKgJtwFzRMr9Xu+5dgrXXgjowcJ2ybB2sGV3wAag== X-Google-Smtp-Source: AK7set8Weqp+jsBne++gh2QsAaLnawWtpoxBWD+Utl/ASFFidXWVAjceHPpovOZGj7yDecthJEi1/4qO+crVsA+3tHs= X-Received: by 2002:a2e:9c9a:0:b0:28e:8831:e244 with SMTP id x26-20020a2e9c9a000000b0028e8831e244mr890177lji.4.1675330250046; Thu, 02 Feb 2023 01:30:50 -0800 (PST) MIME-Version: 1.0 References: <20230201034137.2463113-1-stevensd@google.com> <20230201230943.fg2q6fmvu7gggxar@box.shutemov.name> In-Reply-To: <20230201230943.fg2q6fmvu7gggxar@box.shutemov.name> From: David Stevens Date: Thu, 2 Feb 2023 18:30:38 +0900 Message-ID: Subject: Re: [PATCH] mm/khugepaged: skip shmem with armed userfaultfd To: "Kirill A. Shutemov" Cc: linux-mm@kvack.org, Andrew Morton , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 24EF112000F X-Stat-Signature: 9bzo4a59nnujzbkj1mmry4b3xttpwypy X-Rspam-User: X-HE-Tag: 1675330251-862809 X-HE-Meta: U2FsdGVkX19mkIsC0QMse6XlOQKxxOjOKMpBNX0pxf0rin4Ok2UzZXesWWN/qh5By/4dG/psfeTRc5m9LASDwDQghVBk/0By2iEQtsqIQQIGvy3TeUXm69dSgu3E8Ho8Vo8QZjU8MhaiqsdbEnHAmlr2Q3OcDRY1kBZOohIoPXRMx7d2xtvwDM9y6BKQ225hbrNxIb1wIZjL2tS3lrM5zEpzvE/Ql+SlMDfX1u9g95ckeNrP5EYuj5UBoChDKuFtZ7P/Pbbqh1hnZuYvrhuAcZV9YuD5iUettIJ3zsbTaR9CpRMmdG6vjYCVQKoj8cKe1BT68/2IWrHkcMaDZW+DsFyK7jwEfHJx5fMIERpvkBDTVXYpL7NlX+YPGQwtNbLWKNhgRep88Gfaxf4xLv+jxN8BaSX9eXBOIeHD2XivtyYVDQBK71cogCFnSBOebyCY43g8lI8SAt6JdMbMZlW5qEAAVVETifd5KBn+wnT6FEFsZkozUXLc6iEF2W7bd7c7L6If5iKjYkO4ZrPmYZF6o/uboH1OAWbIycvX6ctwV7nbQLCvd+sJMAPds7DFG9QTddxt2Jau6pdKjxTwr5G3fs2yNY+b9R3Lb8F7X7xMBECUKSjxjYvztm+E01Xq/goSRHwdKgK//bAGdV2dA9+lD4aD2JKzYHaHgiOSsWQt7ktcAS/5OmHwFZ7JNnh7skyjwMpXRWt5oy0ZdX3EtDU20PkVv3UDMtszUlh/M5bkxiTISx/s+/wHUfy0VbaBfk10PQJcAyg+kEZntUgdxNNcgV1xIFUsGszkZ+r3nxKdXL6rjGTJqkcdwygUMOhZsmVGE3jr3KlVV4i3BUpOTw89t2kr7B13knHlrgL+5LFuaiv4JEJqG5L//vrFtxM9UpzcAG+T0aBg9yQ4QaJNzsMDCCiktnuB/jLrm/CZCK8qvhQC2C76DXP8EUukXhKt5cvJ+fSFsdlce70eJIfJxm6 4BUW0o91 8ijW/8dc0fYHW7mM2IXDn+gPeoiMmckwBp1IhUajBETb//R1WZPgJ26M25ozWVZlQdiwRDzXrKkGvD6wv3Hu9lx/JDD9VHrX4r7LlDsIC/fmOT3ClWVxqm9pV18ofbhFLnhx8kzoTrfP2G9pYLz66uT7Y2svdGSl75nys/dHqNfR6rI67IBcFkL6cQOYvTXzjimA+qxGmjWkP/cC1NgxbnnXlGNBopPRWFVxpoB8GhbMMtPQSZw2/FQTsOHB6o/1R9J+PnwGQAg3malpGwZuKJmY/lnfdYpcllQDSAlrXIUmsC6CkGXgf+SlzhJg8LkIu/+ga1sZ4QZoLvlMHMF0R1Zd/Hw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 2, 2023 at 8:09 AM Kirill A. Shutemov wrote: > > On Wed, Feb 01, 2023 at 12:41:37PM +0900, David Stevens wrote: > > From: David Stevens > > > > Collapsing memory in a vma that has an armed userfaultfd results in > > zero-filling any missing pages, which breaks user-space paging for those > > filled pages. Avoid khugepage bypassing userfaultfd by not collapsing > > pages in shmem reached via scanning a vma with an armed userfaultfd if > > doing so would zero-fill any pages. > > Could you elaborate on the failure? Will zero-filling the page prevent > userfaultfd from catching future access? Yes, zero-filling the page causes future major faults to be lost, since it populates the pages in the backing shmem. The path for anonymous memory in khugepaged does properly handle userfaultfd_armed, but the path for shmem does not. > A test-case would help a lot. Here's a sample program that demonstrates the issue. On a v6.1 kernel, no major fault is observed by the monitor thread. I used MADV_COLLAPSE to exercise khugepaged_scan_file, but you would get the same effect by replacing the madvise with a sleep and waiting for khugepaged to scan the test process. #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include int had_fault; int monitor_thread(void *arg) { int ret, uffd = (int) (uintptr_t) arg; struct uffd_msg msg; ret = read(uffd, &msg, sizeof(msg)); assert(ret > 0); assert(msg.event == UFFD_EVENT_PAGEFAULT); had_fault = 1; struct uffdio_zeropage zeropage; zeropage.range.start = msg.arg.pagefault.address & ~0xfff; zeropage.range.len = 4096; zeropage.mode = 0; ret = ioctl(uffd, UFFDIO_ZEROPAGE, &zeropage); assert(ret >= 0); } int main() { int ret; int uffd = syscall(__NR_userfaultfd, 0); assert(uffd >= 0); struct uffdio_api uffdio_api; uffdio_api.api = UFFD_API; uffdio_api.features = UFFD_FEATURE_MISSING_SHMEM; ret = ioctl(uffd, UFFDIO_API, &uffdio_api); assert(ret >= 0); int memfd = memfd_create("memfd", MFD_CLOEXEC); assert(memfd >= 0); ret = ftruncate(memfd, 2 * 1024 * 1024); assert(ret >= 0); uint8_t *addr = mmap(NULL, 2 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_SHARED, memfd, 0); assert(addr != MAP_FAILED); addr[0] = 0xff; struct uffdio_register uffdio_register; uffdio_register.range.start = (unsigned long) addr; uffdio_register.range.len = 2 * 1024 * 1024; uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; ret = ioctl(uffd, UFFDIO_REGISTER, &uffdio_register); assert(ret >= 0); thrd_t t; ret = thrd_create(&t, monitor_thread, (void *) (uintptr_t) uffd); assert(ret >= 0); ret = madvise(addr, 2 * 1024 * 1024, 25 /* MADV_COLLAPSE */); printf("madvise ret %d\n", ret); addr[4096] = 0xff; printf("%s major fault\n", had_fault ? "had" : "no"); return 0; } > And what prevents the same pages be filled (with zeros or otherwise) via > write(2) bypassing VMA checks? I cannot immediately see it. There isn't any such check. You can bypass userfaultfd on a shmem with write syscalls, or simply by writing to the shmem through a different vma. However, it is definitely possible for userspace to use shmem plus userfaultfd in a safe way. And the kernel shouldn't come along and break a setup that should be safe from the perspective of userspace. > BTW, there's already a check that prevent establishing PMD in the place if > VM_UFFD_WP is set. > > Maybe just an update of the check in retract_page_tables() from > userfaultfd_wp() to userfaultfd_armed() would be enough? It seems like it will be a little more complicated than that, because the target VM having an armed userfaultfd is a hard error condition. However, it might not be that difficult to modify collapse_file to rollback if necessary. I'll consider this approach for v2 of this patch. -David > I have very limited understanding of userfaultfd(). Sorry in advance for > stupid questions. > > -- > Kiryl Shutsemau / Kirill A. Shutemov