From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72F28C433EF for ; Mon, 27 Jun 2022 13:12:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E12B36B0071; Mon, 27 Jun 2022 09:12:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9C3C8E0002; Mon, 27 Jun 2022 09:12:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C147B8E0001; Mon, 27 Jun 2022 09:12:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AB0EE6B0071 for ; Mon, 27 Jun 2022 09:12:32 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 54FE334B7B for ; Mon, 27 Jun 2022 13:12:32 +0000 (UTC) X-FDA: 79624054944.09.9C10B97 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 859AF40038 for ; Mon, 27 Jun 2022 13:12:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656335551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nYf5sDzqh1DIV76rUwOR5sfB/IH4WuPEoC2e9PNEwLo=; b=Z6qXxR/KHzRBwPKduOUKg4k8WXONoP2yML10fPjBvxfFLQCv+QsOMqyyB0zvlSFmJo12bg 4UVZlZqaJVWGsjAL1YWdXk6GrHuib5hF8+CKKMfokAOsUaDOdVn0SVqxg8naps/3FAe5Jm JAMr9kO6ZRyX79Uxs+SFdQ6voXz6jq8= Received: from mail-io1-f69.google.com (mail-io1-f69.google.com [209.85.166.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-554-lOrBfPoaNPCZTq8RLQFVlQ-1; Mon, 27 Jun 2022 09:12:29 -0400 X-MC-Unique: lOrBfPoaNPCZTq8RLQFVlQ-1 Received: by mail-io1-f69.google.com with SMTP id f18-20020a5d8592000000b0067289239d1dso5712700ioj.22 for ; Mon, 27 Jun 2022 06:12:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=nYf5sDzqh1DIV76rUwOR5sfB/IH4WuPEoC2e9PNEwLo=; b=MyEJqNUJNEwJbv+z3fI+XlzNShOdFZUV2XLCqyEifPjAEA7ILwtVp/D2PDnDwouUxb szXjIhdmBkTNn8Sq4BsGeH00yCl27GGDsS3STlAsst5d6SZaLd00hXXzhCkkWIM0+n4n zmK/14VwiB6N9zAq3hxBdQCZHumPpyFo8xsWflNYn6K6gg2Yk28lhvBqnQLWMEXih804 0pnGXA/MAVf5DgdYaPf8kIokQmcSKgcpdBUsRnaJTJNv9R0z1RzIq0v6T8e1QkxcoNtK Jgz7B/p6Ggz0oB1oqxpNbwlVrw/msbCR65yPNa49S3KKT2SW6UcjlOQwkyL+/N3+9TZy sOWg== X-Gm-Message-State: AJIora+9WaFQv3Yr/9cOx0HiIb35pmRIV2p7B4ic7r/CUQu7d0rmjlV4 HqvATJIdo+RJN8QcDLiBNKoFpveAwWCSmCRjL83wEHG1BMS8islWu/VHslziwYAKPUfwHpp5t0i IUlFI0yZdzbI= X-Received: by 2002:a05:6e02:1bc4:b0:2da:7a81:1c9a with SMTP id x4-20020a056e021bc400b002da7a811c9amr5799025ilv.213.1656335548704; Mon, 27 Jun 2022 06:12:28 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uAswN+4r2S7MRTvFSnOzL229vh3NE5w+BnEDZniVovfyRIVrJnNxgFMU704RJibxhZJa72+A== X-Received: by 2002:a05:6e02:1bc4:b0:2da:7a81:1c9a with SMTP id x4-20020a056e021bc400b002da7a811c9amr5799007ilv.213.1656335548347; Mon, 27 Jun 2022 06:12:28 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id s8-20020a92cb08000000b002d900368a19sm4552817ilo.22.2022.06.27.06.12.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jun 2022 06:12:27 -0700 (PDT) Date: Mon, 27 Jun 2022 09:12:25 -0400 From: Peter Xu To: Nadav Amit Cc: Linux MM , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , David Hildenbrand , Mike Rapoport Subject: Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Message-ID: References: <20220622185038.71740-3-namit@vmware.com> <18BCC23E-B344-41A8-926D-A49D768485AF@vmware.com> <6EF7D3B4-CF17-407B-A50F-B14D595E99A5@vmware.com> <07B65135-CA6D-4839-BAC0-6D63A94F50C2@vmware.com> MIME-Version: 1.0 In-Reply-To: <07B65135-CA6D-4839-BAC0-6D63A94F50C2@vmware.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Z6qXxR/K"; spf=none (imf11.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656335551; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nYf5sDzqh1DIV76rUwOR5sfB/IH4WuPEoC2e9PNEwLo=; b=iW/DbSv3LfwkP+b7Snl3vf4sxZ16pzUuIuJ1f1iZ48MNZ2YVe6FOZ3MIpvO8+dsIlqGB5s 6ZOZaa/D1rzVzYfbzB+MxorOym51U77JUoUJ6Q9dMTk2zcwkMNpNEP8OOdHwENxCv/DbpP Hl5/7uLhco5hT3qhSDET8PkR8XmFTfM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656335551; a=rsa-sha256; cv=none; b=YbYSUrX7frj/Bp/8bnRMt0cdM2Oe+/dRkfLl8dqfBTHuVegsIgJgfjgnhpP1mqbYtH6j+m w7wa6c2Y7JS78I9RmKkcOPAYg1v39VTPBJ9okmNfLSKrHsPKfN6yc6KBSqyl1U7e6sb9ss Pgi0rP2fw6lc71bdbZMIZJNeiF5/DVE= X-Stat-Signature: rtyrez4cz4cqs1yo6cf87bg1okk6898e X-Rspamd-Queue-Id: 859AF40038 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Z6qXxR/K"; spf=none (imf11.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam02 X-HE-Tag: 1656335551-28864 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jun 25, 2022 at 07:49:54AM +0000, Nadav Amit wrote: > > > > On Jun 24, 2022, at 3:17 PM, Peter Xu wrote: > > > > On Fri, Jun 24, 2022 at 05:58:17PM -0400, Peter Xu wrote: > >> [Sorry for replying late] > >> > >> Said that, I think it doesn't really necessary need to be that complex, > >> since make_huge_pte() already sets dirty bit when "writable=1", so IIUC > >> what you need to do is simply make sure dirty bit set when write_hint=1. > >> > >> Does it sounds correct to you? > > > > Hmm, hold on... I failed to figure out how that write-likely hint could > > help us for either huge or non-huge pages, since: > > > > (1) Old code always set dirty, so no perf degrade anyway with/without the > > hint > > > > (2) If we want to rework dirty bit (which I'm totally fine with..), then > > we don't apply it when we shouldn't, and afaict we should set D bit > > whenever we should... if the user assumes this page is likely to be > > written but made it read-only, say, with UFFDIO_COPY(wp_mode=1), > > setting D bit will not help, instead, the user should simply use an > > UFFDIO_COPY(wp_mode=0) then the dirty will be set with write=1.. > > > > It'll be helpful but only helpful for UFFDIO_ZEROCOPY because it avoids one > > COW. But that seems to be it. > > > > In short: I'm wondering whether we only really need the ACCESS_LIKELY hint > > as you proposed earlier. We may want UFFDIO_ZEROPAGE_MODE_ALLOCATE > > separately, but keep that only for zeropage op (and it shouldn't really be > > called WRITE_LIKELY)? Or did I miss something? > > Let’s see if I get you correctly. I am not sure whether we had this > discussion before. > > We are talking about a scenario in which WP=0. You argue that if the page > is already set as dirty, what is the benefit of not setting the dirty-bit, > right? > > So first, IIUC, there are cases in which the page would not be set as > dirty, e.g., UFFDIO_CONTINUE. [ I am admittedly not too familiar with this > use-case, so I say it based on the comments. ] > > Second, even if the page is dirty (e.g., following UFFDIO_COPY), but it > is not written by the user after UFFDI_COPY, marking the PTE as dirty > when it is mapped would induce overhead, as we discussed before, since > if/when the PTE is unmapped, TLB flush batching might not be possible. I'd hope we don't make an interface design just to service that purpose of when write=0 and dirty=1 use case that is internal to the kernel so far, and I still think it's the tlb flush code to change.. or do we have other use case for this WRITE_LIKELY hint? For UFFDIO_CONTINUE, if we want to make things clear on dirty bit, then IMHO for UFFDIO_CONTINUE the right place for the dirty process is where the user writes to the page in the other mapping, where PageDirty() will start to be true already even if the pte that to be CONTINUEd will have dirty=0 in the pte entry. From that pov I still don't see why we need to grant the user on the dirty bit control, no matter with a hint only, or explicit. > > So I don’t think there is a problem in having WRITE_LIKELY hint. Moreover, > I would reiterate my position (which you guys convinced me in!) David convinced you I think :) > that having hints that indicate what the user does (WRITE_LIKELY) is a > better API than something that indicates directly what the kernel should > do (e.g., UFFDIO_ZEROPAGE_MODE_ALLOCATE). The hint idea sounds good to me, it's just that we actually have two steps here: (1) We think providing user the control of dirty bit makes sense, then, (2) We think the flag should be a hint not explicit "set dirty bit" I agree with (2) in this case if (1) is applicable. And now I think I'm questioning myself on (1). Fundamentally, access bit has more meaningful context (0 means cold, 1 means hot), for dirty it's really more a perf thing to me (when clear, it'll take extra cycles to set it when memory write happens to it; being clear _may_ help only for the tlb flush example you mentioned but I'm not fully convinced that's correct). Maybe with the to be proposed RFC patch for tlb flush we can know whether that should be something we can rely on. It'll add more dependency on this work which I'm sorry to say. It's just that IMHO we should think carefully for the write-hint because this is a solid new uABI we're talking about. The other option is we can introduce the access hint first and think more on the dirty one (we can always add it when proper). What do you think? Also, David please chim in anytime if I missed the whole point when you proposed the idea. > > But this discussion made me think that there are two somewhat related > matters that we may want to address: > > 1. mwriteprotect_range() should use MM_CP_TRY_CHANGE_WRITABLE when !wp > to proactively make entries writable and save . I'm not sure I'm right here, but I think David's patch should have covered that case? The new helper only checks pte_uffd_wp() based on my memory, and when resolving page faults uffd-wp bit should have been gone, so it should be treated the same as normal ptes. > > 2. The WRITE_LIKELY hint should be propagated from mwriteprotect_range() > to change_pte_range() through cp_flags, and the entry should be set > dirty accordingly. Sounds correct. Though again I hope we can think more thoroughly on whether we need the write-hint first. Thanks, -- Peter Xu