From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D3FFC433E0 for ; Mon, 11 Jan 2021 23:08:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EC75022D2B for ; Mon, 11 Jan 2021 23:08:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC75022D2B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 37BA68D005F; Mon, 11 Jan 2021 18:08:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32CC18D0051; Mon, 11 Jan 2021 18:08:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A5E38D005F; Mon, 11 Jan 2021 18:08:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 053468D0051 for ; Mon, 11 Jan 2021 18:08:58 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id BDA8C1EE6 for ; Mon, 11 Jan 2021 23:08:57 +0000 (UTC) X-FDA: 77695036314.23.mark99_3e140d427510 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 9E8B837606 for ; Mon, 11 Jan 2021 23:08:57 +0000 (UTC) X-HE-Tag: mark99_3e140d427510 X-Filterd-Recvd-Size: 7874 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 11 Jan 2021 23:08:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610406536; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XZDeRCQYJoezv5keDk7+o32vhu8r74OG0tRv9cLa8Bs=; b=JVx7WjusI+in290KoptublzLF6olRuvf8ROzDF1b8KQjtwn1fep6By8H/OwZGMrKvXWyQX 0HM0B/ce3NpHo1cRTrbvLBq9JgeDeC8loWqdkoSHBhRCvXMPnxBm68dUEOzDHBBj1TZYVh DSo6FBXxerfswEKcT8YWy5IecRZf2Ds= Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-243-c5s1EkKWMg662qDQZaWVcQ-1; Mon, 11 Jan 2021 18:08:53 -0500 X-MC-Unique: c5s1EkKWMg662qDQZaWVcQ-1 Received: by mail-il1-f197.google.com with SMTP id x15so757877ilu.11 for ; Mon, 11 Jan 2021 15:08:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=XZDeRCQYJoezv5keDk7+o32vhu8r74OG0tRv9cLa8Bs=; b=ReAIitdzRKKaDaUrOvCdqkrmApQhADlybrAYN1c6B/dvDoLVg4HOMPqHf7Jo2K87bf 9ILq+lt7d47mr/0Pb/oPy8trVPfTblEio9tVqmY/UvM6oeQPfXVcyhmkwK1pisTN5BUI lntJGaW8FSWuH+QC0mnuGYCsSiffmi1PvBQkuSXvdLfFt4rrQcJ0BEDJZK2dN7uf3stF LBiUIsva23sz9DuA7sPWyOPRD9nQAkBzpbPs+ttlKUx77rLbTqKS2xKNyhxDoy9xhz7v RZQrC9zH0TsFcCoQNuExso4J59oKE6KWVQLyaAF63iCpgwVRYbR5deNZaqxQMrbTMjp4 dWGw== X-Gm-Message-State: AOAM533Z4rthFkLyw89nzTemy9mMHkFhhYqQyUZQScuKFRJznWmYuYhu pRyPjN1fSjM+inq+U+vq/Gbk7zusoQEXc7bAedZPqX7EuiHQgnRr+D0Y1pvg7oohthdLnSFOvVa JHM51FV94cMc= X-Received: by 2002:a92:6f07:: with SMTP id k7mr1342449ilc.18.1610406532566; Mon, 11 Jan 2021 15:08:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJzi58bzqh6TdOv13kET/jPoSNAHx1PfMuqHIzilGW0nnoTifMrHyGHUJjjT4/c/z4CZPFG8BQ== X-Received: by 2002:a92:6f07:: with SMTP id k7mr1342424ilc.18.1610406532311; Mon, 11 Jan 2021 15:08:52 -0800 (PST) Received: from xz-x1 ([142.126.83.202]) by smtp.gmail.com with ESMTPSA id l20sm669280ioh.49.2021.01.11.15.08.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Jan 2021 15:08:51 -0800 (PST) Date: Mon, 11 Jan 2021 18:08:48 -0500 From: Peter Xu To: Mike Kravetz Cc: Axel Rasmussen , Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , Michal =?utf-8?Q?Koutn=C3=BD?= , Michel Lespinasse , Mike Rapoport , Nicholas Piggin , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Oliver Upton Subject: Re: [RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE Message-ID: <20210111230848.GA588752@xz-x1> References: <20210107190453.3051110-1-axelrasmussen@google.com> <48f4f43f-eadd-f37d-bd8f-bddba03a7d39@oracle.com> MIME-Version: 1.0 In-Reply-To: <48f4f43f-eadd-f37d-bd8f-bddba03a7d39@oracle.com> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 11, 2021 at 02:42:48PM -0800, Mike Kravetz wrote: > On 1/7/21 11:04 AM, Axel Rasmussen wrote: > > Overview > > ======== > > > > This series adds a new userfaultfd registration mode, > > UFFDIO_REGISTER_MODE_MINOR. This allows userspace to intercept "minor" faults. > > By "minor" fault, I mean the following situation: > > > > Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). > > One of the mappings is registered with userfaultfd (in minor mode), and the > > other is not. Via the non-UFFD mapping, the underlying pages have already been > > allocated & filled with some contents. The UFFD mapping has not yet been > > faulted in; when it is touched for the first time, this results in what I'm > > calling a "minor" fault. As a concrete example, when working with hugetlbfs, we > > have huge_pte_none(), but find_lock_page() finds an existing page. > > > > We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, > > userspace resolves the fault by either a) doing nothing if the contents are > > already correct, or b) updating the underlying contents using the second, > > non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, > > or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel > > "I have ensured the page contents are correct, carry on setting up the mapping". > > > > One quick thought. > > This is not going to work as expected with hugetlbfs pmd sharing. If you > are not familiar with hugetlbfs pmd sharing, you are not alone. :) > > pmd sharing is enabled for x86 and arm64 architectures. If there are multiple > shared mappings of the same underlying hugetlbfs file or shared memory segment > that are 'suitably aligned', then the PMD pages associated with those regions > are shared by all the mappings. Suitably aligned means 'on a 1GB boundary' > and 1GB in size. > > When pmds are shared, your mappings will never see a 'minor fault'. This > is because the PMD (page table entries) is shared. Thanks for raising this, Mike. I've got a few patches that plan to disable huge pmd sharing for uffd in general, e.g.: https://github.com/xzpeter/linux/commit/f9123e803d9bdd91bf6ef23b028087676bed1540 https://github.com/xzpeter/linux/commit/aa9aeb5c4222a2fdb48793cdbc22902288454a31 I believe we don't want that for missing mode too, but it's just not extremely important for missing mode yet, because in missing mode we normally monitor all the processes that will be using the registered mm range. For example, in QEMU postcopy migration with vhost-user hugetlbfs files as backends, we'll monitor both the QEMU process and the DPDK program, so that either of the programs will trigger a missing fault even if pmd shared between them. However again I think it's not ideal since uffd (even if missing mode) is pgtable-based, so sharing could always be too tricky. They're not yet posted to public yet since that's part of uffd-wp support for hugetlbfs (along with shmem). So just raise this up to avoid potential duplicated work before I post the patchset. (Will read into details soon; probably too many things piled up...) Thanks, -- Peter Xu