From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7072C64EC7 for ; Tue, 28 Feb 2023 19:42:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E9B86B0071; Tue, 28 Feb 2023 14:42:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 199C76B0073; Tue, 28 Feb 2023 14:42:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0127F6B0074; Tue, 28 Feb 2023 14:42:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E13246B0071 for ; Tue, 28 Feb 2023 14:42:55 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AE50DA0C3D for ; Tue, 28 Feb 2023 19:42:55 +0000 (UTC) X-FDA: 80517723510.24.11F531F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 92172180005 for ; Tue, 28 Feb 2023 19:42:53 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YRSFgOhd; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677613373; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2PdsM/iL56+yhqvj5vNisyU3+lVwEZ4AAgNHpSpMNXI=; b=QhTNVf/IK/1D+r6bSMfGyKtfQH4s01yhFx1fiKaL/eV1pPrUosVAnRemOL2p75+Tklc/v9 8q7gzPSTpiWPOX9NIjfXjNzU3g/Cr/AgUseEMEdMOBPicDrquOTGzqKjzZiVhN9X8CN4iu KNVXBWRNCBFcdV2mAtPXBkFBf3SlXaU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YRSFgOhd; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677613373; a=rsa-sha256; cv=none; b=IMbXtwrQUOfHMglyrSyzAfhnq7oo/8X7/KWnNU7y8Uj1g6+SpyzGzEGknP91dkNKA/i9s+ fQOgS4CnZ08GTZraKItkoFnPQkhnZuZqbwoN5OYAWcET2mQJ9mlORBn6Pcxql/Ehu7yh79 YtscbSRCQLVNsPXtnemlEiinGAyjT3U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677613373; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2PdsM/iL56+yhqvj5vNisyU3+lVwEZ4AAgNHpSpMNXI=; b=YRSFgOhdOyhBssxCIAZ00LPqdexF5VMLUtqooerpu+KeYRez92hi+cLfhNnpvK7NJzMrGA MOycMKW4xynHJp1yNoweqZLQbCdPm8D6dbjA6WfaqcJ3UKE26GrP4GfMlvyDTnHCwB50mh 7VZaHgy3n41CRU5Bm8gd6AhroBPsGEA= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-60-uGVL_-TMOw-CfyJoYSqUug-1; Tue, 28 Feb 2023 14:42:48 -0500 X-MC-Unique: uGVL_-TMOw-CfyJoYSqUug-1 Received: by mail-qt1-f198.google.com with SMTP id r4-20020ac867c4000000b003bfefb6dd58so1040375qtp.2 for ; Tue, 28 Feb 2023 11:42:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677613368; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=2PdsM/iL56+yhqvj5vNisyU3+lVwEZ4AAgNHpSpMNXI=; b=EFLmC/pm+zA/uuiQvLloqIzgLpz/SYAwZESZ7SZoTDRYDbkJXdqAzwyTyszKati0rC kvoy3xbJHDJGsCM9gBhvW6KD326XcmkQo+2/iyPFaG8ARQTab0jrPSf3HW43xP1e86Q7 B9gf0ne9aZWWc7f74TSsrX9JwxSiMreeV9UfrANgd47+SE3wknn6OFkS76FzkFykDcUq kQjFNnBqmmIgwIuTcHqhNrebXy/WQGmgRGllCjkX6yEvu5QIiB6GAdndIJRmBlEo6oa/ WW/udcZd3FCiPyo/SsfSAsLFDss0C8gk7q+/5ei0HNH+iZCJDrURhUldwWdU6aXXnu3e UFFw== X-Gm-Message-State: AO0yUKWlecS+tzR5cqkDdsSgSPHqsW/fhgAMNsuSBhCnKR8BLdvfcz3q usZP2aLDDQt2LstMgZh4cZ/6tA4gRwKCCyFkZ9q5JSInhgwSHnXQ75Jys7J7a/G4KFwJ1CmX+ox Smu8WSX34xdo= X-Received: by 2002:a05:622a:306:b0:3b9:fc92:a6 with SMTP id q6-20020a05622a030600b003b9fc9200a6mr8263345qtw.6.1677613367776; Tue, 28 Feb 2023 11:42:47 -0800 (PST) X-Google-Smtp-Source: AK7set+J/kJ+JwXDuFlF2fZVP0JuHo1NNbT5QNuUO6XGXHiZyL5u8JtWFUD7rdaXpjhYR2VZpthXGA== X-Received: by 2002:a05:622a:306:b0:3b9:fc92:a6 with SMTP id q6-20020a05622a030600b003b9fc9200a6mr8263304qtw.6.1677613367374; Tue, 28 Feb 2023 11:42:47 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id 20-20020a370a14000000b0072396cb73cdsm7399499qkk.13.2023.02.28.11.42.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Feb 2023 11:42:46 -0800 (PST) Date: Tue, 28 Feb 2023 14:42:45 -0500 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Axel Rasmussen , Mike Rapoport , Andrew Morton , Andrea Arcangeli , Nadav Amit , Muhammad Usama Anjum Subject: Re: [PATCH] mm/uffd: UFFD_FEATURE_WP_ZEROPAGE Message-ID: References: <456f8e2e-9554-73a3-4fdb-be21f9cc54b6@redhat.com> <4dbc9913-3483-d22d-bbd2-e4f510fff56d@redhat.com> <91d7c512-ee57-7d71-34b7-90e45f5c109b@redhat.com> <4b3c2f37-3b84-3147-7513-4293e5408fdd@redhat.com> MIME-Version: 1.0 In-Reply-To: <4b3c2f37-3b84-3147-7513-4293e5408fdd@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 92172180005 X-Stat-Signature: u6jgrznqwfs5nqqk4shqiqhzez97pn8u X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1677613373-985311 X-HE-Meta: U2FsdGVkX18cf+0cMvzJOuKmy1TCgnKreUqCaNnylgXLoINCgCruMfPdFgNvY5bfqIotUqPAP+Bjg4pUUdNP8touNadmrsNUloi+hd3EDlz1K6CwGbH7f4P7cStV0WGVEHMOWLZ7Y+LmWSVkUHphy6DVAGsD+hN6SrDtMFjw0d0Umv/zY13104VSNEDcOZBU9qMLunPadCXBtxNlisAIbciHEXDXXJP1k/merPbzEOIjCrRkmB9rIGSugGKZQyHKnvDXwTx+u4qwvXbGeDMgJRukKijVtbnQ2e9HoS0SAj89+KoKgAs0OEjQUDGMx0RWtknwUmOP9WEHkIQrTkfkJemELss+dLCMhOCUH0LmDN6G+XXGiyBQRxNWpzdHsuFU6QEGlMAd3C6G/+GHcwdaN/38IB/d6KFyJwjv4ISlZ0Rgndl45oxZlkS6ERCTt5b/vn2annqxfNssDnvXyITyyFSJzLpASpnme4Wd+zXauOpufixrFuk+9zP5JgDxvwAZPBApMWiEZ8N7X8MMM6nuxvCUvJgD3kPKm2nutGJDUtKNhM3pne6g9YRIZjHFM2OOTAOV5LVSfNhGfqZfLvqJJfk7HEYrLbZgMw063K+wqhOoaHRl5Cp62pXhpkLNX2HjBsjbpdI4kxmE30OyiwluSJSujz++wziDB4h2HGt6NOfd5bMDhEo0atC0kby9QS/SbJMLsZ4FhwnNGXrWk1pLleKU1J27oC4/kf7BpddLnei8b1eogmFHxWi6brrZovjslMlZgHmFTomUNBBfk6CcwolRrkAJTjJtciE7jyURqNq4pxWlG0OO4GGx9WbnXsS2s4O9uvxP51bYB5HRkmwU+Y9oWzkb2UjHsm9nPueVZW39glzd1MBQ/55lxviK2/hoWdeQHbwaNQKEhaeSM14JFlFDxkrVWV2a5iNaypKkvLVNF9nhs+bWs8MH3Tiosd4iQWptNqciw30sSIP4gs3 1TL31ee1 UZOD01rp3agsMeym21i2V9xDc+Pqqpm6iur0aDAky2ExgUMbGiEXCntUCmx5roBPNRp5HqGPtgcwTFzaBS7PcgShhiDnGdP46Yofq7rpMssaMc7WV/h8bSYg/NoucogA1lFPDfIbJqxC8RNk0MPYy0FTa5PPaLTmuYdrEdDksVGehrl4fTIKuMiyBozSQTQg2PDRqPBV/PI3u7Sue8k8z2fAaMv54Wfw8bTIysRnRtITM8Ugr3kfyfBRUzzSTswb6UdJvxZw14Ge2pLDvI1XifDgmf+8TtfP9JTbCyIJEAyFHTjIhh1XgjrGqliQjJ2Se+5Wv22i9Ewt+VbczAlBfR5+AKG0lcQxr6Oz0adtxyUoERNAB2K30trjYl0NSlFNgb54giR5CTztiEue5R61hSIiQzT/tWwJ7Zfk+TyyTs/61ugBpBZWKEgdCGFmSClOmlU7FaL277tKGPO3xEJyFQfrt/o8LL1w2tZUSkhHx3w8uYn5SkSoQVB572A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 23, 2023 at 03:35:13PM +0100, David Hildenbrand wrote: > I had some idea of using two markers: PTE_UFFD_WP and PT_UFFD_NO_WP, and > being pte_none() being something fuzzy in between that the application knows > how to deal with ("not touched since we registered uffd-wp"). > > The goal would be to not populate page tables just to insert PTE > markers/zeropages, but to only special case on the "there is a page table > with a present PTE and we're unmapping something with uffd-wp set or uffd-wp > not set". Because when we're unmapping we already have a page table entry we > can just set instead of allocating a page table. > > Sorry for throwing semi-finished ideas at you, but the basic idea would be > to only special case when we're unmapping something: there, we already do > have a page mapped and can remember uffd-wp-set (clean) vs. !uffd-wp-set > (dirty). > > > uffd-wp protecting a range: > * !pte_none() -> set uffd-wp bit and wrprotect > * pte_none() -> nothing to do > * PTE_UFFD_WP -> nothing to do > * PTE_UFFD_NO_WP -> set PTE_UFFD_WP > > unmapping a page (old way: !pte_none() -> pte_none()): > * uffd-wp bit set: set PTE_UFFD_WP > * uffd-wp bit not set: set PTE_UFFD_NO_WP > > (re)mapping a page (old: pte_none() -> !pte_none()): > * PTE_UFFD_WP -> set pte bit for new PTE > * PTE_UFFD_NO_WP -> don't set pte bit for new PTE > * pte_none() -> set pte bit for new PTE > > Zapping an anon page using MADV_DONTNEED is a bit confusing. It's actually > similar to a memory write (-> write zeroes), but we don't notify uffd-wp for > that (I think that's something you comment on below). Theoretically, we'd > want to set PTE_UFFD_NO_WP ("dirty") in the async mode. But that might need > more thought of what the expected semantics actually are. > > When we walk over the page tables we would get the following information > after protecting the range: > > * PTE_UFFD_WP -> clean, not modified since last protection round > * PTE_UFFD_NO_WP -> dirty, modified since last protection round > * pte_none() -> not mapped and therefore not modified since beginning of > protection. > * !pte_none() -> uffd-wp bit decides I can't say I thought a lot but I feel like it may work. I'd probably avoid calling it PTE_UFFD_NO_WP or it'll be confusing.. maybe WP_WRITTEN or WP_RESOLVED instead. But that interface looks weird in that the protection happens right after VM_UFFD_WP applied to VMA and that keeps true until unregister. One needs to reprotect using ioctl(UFFDIO_WRITEPROTECT) OTOH after the 1st round of tracking. It just looks a little bit over-complicated, not to mention we will need two markers only for userfault-wp. I had a feeling this complexity can cause us some trouble elsewhere. IIUC this can be something done on top even if it'll work (I think the userspace API doesn't need to change at all), so I'd suggest giving it some more thoughts and we start with simple and working. In general, I'll be happy with anything simpler if Muhammad is happy with its current performance.. For myself, WP_UNPOPULATED is definitely much better than the old workaround in QEMU live snapshots, so I never worry that. > Yes, I focused on anon. Let's see if any of the above I said makes sense. :) > > Anyhow, what we're discussing here is yet another uffd-wp addition, if ever, > so don't feel blocked by my comments. Thanks. I've just posted a new version. -- Peter Xu