From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A50D5C2D0E4 for ; Fri, 27 Nov 2020 13:51:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BF0C420674 for ; Fri, 27 Nov 2020 13:51:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eHTeT/i3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF0C420674 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D24276B006C; Fri, 27 Nov 2020 08:51:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CAD4E6B006E; Fri, 27 Nov 2020 08:51:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9F416B0071; Fri, 27 Nov 2020 08:51:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id A14D96B006C for ; Fri, 27 Nov 2020 08:51:25 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 544A9180AD822 for ; Fri, 27 Nov 2020 13:51:25 +0000 (UTC) X-FDA: 77530335330.17.stove01_05107fc27388 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 36BDB180D0180 for ; Fri, 27 Nov 2020 13:51:25 +0000 (UTC) X-HE-Tag: stove01_05107fc27388 X-Filterd-Recvd-Size: 7751 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Nov 2020 13:51:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1606485083; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oL1K04H6TOx2kchIg7eSiLGosXHvS8ImhKROc4cKP4Y=; b=eHTeT/i3ngXOL5lFVBYUs+QtRxTCaGlN2zlf7CyaIVRXAOBJ5N156FnKuV3CGSGmxskfGH Gx9U49MugBj9PqZGlcUbMjMcmxn6eHfwaCZawjhRY4xM1eLwZu2rb4Gg+MXZzNFzfxjFvK NWyciIPADWR11bn/sHxn8CkeQpddrvo= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-330-8EeCW4iXMPaaMhQeMNjwAQ-1; Fri, 27 Nov 2020 08:51:21 -0500 X-MC-Unique: 8EeCW4iXMPaaMhQeMNjwAQ-1 Received: by mail-qv1-f69.google.com with SMTP id m3so3140723qvw.5 for ; Fri, 27 Nov 2020 05:51:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=oL1K04H6TOx2kchIg7eSiLGosXHvS8ImhKROc4cKP4Y=; b=lNB1zqAHJDpX/SNeHefn+1mcwMDxC4jonSZ00xeAwViE+D4zWKI7d4IMK642doT7oi Z2hfang/uOFGS7vY4aCNVvvy1VRZpalZ73XYjhyxBHgxo2O/bLZpIIhjzHo2nMHQ3qqo 3BjQ0Bs8ylyPSihVk/ENZmsWCVRNytaT+1chFNMz+c9cIk76Fzuzw5bCU8oCEn0BWCPY v9e/Bc/3xOpC5kzLs7oPEKFikttPaSJhAWix3XLlqcs8caP7oDLoW8pQCtEak2Eh+jqW 0j/rMwprooHWN9SN2FTX6efKVar4lHNEkmEQ4I1EZha7Fy7eiX7RurJ0Ke/4/tMpSH24 Fzhg== X-Gm-Message-State: AOAM531UMLDXJDFFSCnLcRbTM/e6hpysU2lGoxRSNDzxVjCPNhuS1YMU fJm4EhEyjNxQC93gr8wjXYebBGaHuJOPqreM350mwmeQVm95oZku2CsSC+2+2Pr2MrZlDHVHFBT +94gNql52eRk= X-Received: by 2002:a05:6214:2b4:: with SMTP id m20mr8544616qvv.34.1606485080428; Fri, 27 Nov 2020 05:51:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJyJXidIHT7EKDJVOLGExk+yC7fmwmSmUOpt0hf48/TLb8mpxRnxTmgDelXwtSx2ylZxw5X/Cw== X-Received: by 2002:a05:6214:2b4:: with SMTP id m20mr8544596qvv.34.1606485080151; Fri, 27 Nov 2020 05:51:20 -0800 (PST) Received: from xz-x1 ([142.126.81.247]) by smtp.gmail.com with ESMTPSA id r201sm5992534qka.114.2020.11.27.05.51.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Nov 2020 05:51:19 -0800 (PST) Date: Fri, 27 Nov 2020 08:51:17 -0500 From: Peter Xu To: Matthew Wilcox Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Hugh Dickins , Andrea Arcangeli , Mike Rapoport Subject: Re: [PATCH] mm: Don't fault around userfaultfd-registered regions on reads Message-ID: <20201127135117.GB6573@xz-x1> References: <20201126222359.8120-1-peterx@redhat.com> <20201127122224.GX4327@casper.infradead.org> MIME-Version: 1.0 In-Reply-To: <20201127122224.GX4327@casper.infradead.org> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Matthew, Thanks for the review comments. On Fri, Nov 27, 2020 at 12:22:24PM +0000, Matthew Wilcox wrote: > On Thu, Nov 26, 2020 at 05:23:59PM -0500, Peter Xu wrote: > > For missing mode uffds, fault around does not help because if the page cache > > existed, then the page should be there already. If the page cache is not > > there, nothing else we can do, either. If the fault-around code is destined to > > be helpless for userfault-missing vmas, then ideally we can skip it. > > But it might have been faulted into the cache by another task, so skipping > it is bad. Is there a real use case of such? I thought about it, at least for qemu postcopy it's not working like that. I feel like CRIU neither, but Mike could correct me. Even if there's a case of that, for example, if task A tries to copy the pages over to a tmpfs file and task B accesses the pages too with uffd missing registered, the ideal solution is still that when a page is copied task A can installs the pte for the current task, rather than relying on the fault around on reads, IMHO. I don't know whether there's a way to do it, though. > > > For wr-protected mode uffds, errornously fault in those pages around could lead > > to threads accessing the pages without uffd server's awareness. For example, > > when punching holes on uffd-wp registered shmem regions, we'll first try to > > unmap all the pages before evicting the page cache but without locking the > > page (please refer to shmem_fallocate(), where unmap_mapping_range() is called > > before shmem_truncate_range()). When fault-around happens near a hole being > > punched, we might errornously fault in the "holes" right before it will be > > punched. Then there's a small window before the page cache was finally > > dropped, and after the page will be writable again (NOTE: the uffd-wp protect > > information is totally lost due to the pre-unmap in shmem_fallocate(), so the > > page can be writable within the small window). That's severe data loss. > > Sounds like you have a missing page_mkwrite implementation. I think it's slightly different issue, because shmem may not know whether the page should be allowed to write or not. AFAIU, uffd-wp is designed and implemented in a way that the final protect information is kept within ptes so e.g. vmas does not have a solid understanding on whether a page should be write-protected or not (so VM_UFFD_WP in vma flags is a hint only, and also that's why we won't abuse creating tons of vmas). We tried hard to keep the pte information on track, majorly _PAGE_UFFD_WP, alive even across swap in/outs and migrations. If pte is lost, we can't get that information from page cache, at least for now. > > > This patch comes from debugging a data loss issue when working on the uffd-wp > > support on shmem/hugetlbfs. I posted this out for early review and comments, > > but also because it should already start to benefit missing mode userfaultfd to > > avoid trying to fault around on reads. > > A measurable difference? Nop. I didn't measure missing case. It should really depend on whether there's a use case of such, imho. If there's, then we may still want that (however uffd-wp might be a different story, as discussed above). Otherwise maybe we should just avoid doing that for all. The other thing that led me to this patch (rather than only check against uffd-wp, for which case I'll just keep that small patch in my own tree until posting the uffd-wp series) is I thought about when the community would like to introduce things like "minor-faults" for userfaultfd. In that case we'd need to be careful too here otherwise we may lose some minor faults. And from that pov I'm thinking whether it's easier to just forbid kernel-side tricks like this for uffd in general, since as I also stated previously that IMHO if a memory region is registered with userfaultfd, maybe it's always good to "always" rely on the userspace on resolving page faults, so that we don't need to debug things like uffd-wp data crash as userfaultfd evolves, because that'll be non-trivial task at least to me... Thanks, -- Peter Xu