From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5C4AC02193 for ; Thu, 30 Jan 2025 09:40:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A1BB28027F; Thu, 30 Jan 2025 04:40:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0663D2800D0; Thu, 30 Jan 2025 04:40:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD37F28027F; Thu, 30 Jan 2025 04:40:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B64202800D0 for ; Thu, 30 Jan 2025 04:40:16 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 63F65B0813 for ; Thu, 30 Jan 2025 09:40:16 +0000 (UTC) X-FDA: 83063622432.16.614F68B Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by imf14.hostedemail.com (Postfix) with ESMTP id 2A79B10000F for ; Thu, 30 Jan 2025 09:40:13 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=el1yo6ES; dmarc=none; spf=none (imf14.hostedemail.com: domain of simona.vetter@ffwll.ch has no SPF policy when checking 209.85.128.41) smtp.mailfrom=simona.vetter@ffwll.ch ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738230014; a=rsa-sha256; cv=none; b=VgxeKfAcx4bDGAY2Hn90Ab9U9/wt/DkZ5Zoi0hFE3juQQxtPNoZCcfRi57ncYKv6JqWqXA QrN8S/6DoDlcu53rW2o1IBZytrvCoPjlZ9qpzYvLPXSsL9EPK2MFqKf+X30YFQKRS2awVi r6vHFb1sEEprXk0Zv6PooRB0sm8oFqc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=el1yo6ES; dmarc=none; spf=none (imf14.hostedemail.com: domain of simona.vetter@ffwll.ch has no SPF policy when checking 209.85.128.41) smtp.mailfrom=simona.vetter@ffwll.ch ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738230014; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i//528DJ0ZQ/PMRElAHDERTNirRnwgk7bcm+UBSh/UY=; b=gbU7ppBzyHaMY9iphYugPnNiCW/V9sGiA8SLk1JRZUoKDEa/bz/hlOhvXO5LiB94VYXBy0 NfzCE5flfsREMJrtHDu5Vvg9vlJt6lDpjTe47XxQ1eNzA/kD7WjOfwgt3Ago58zdpV9U3y kG+CnSE07S6cFM51gOy7u6EFYCQnpc4= Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-43635796b48so3162165e9.0 for ; Thu, 30 Jan 2025 01:40:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; t=1738230012; x=1738834812; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=i//528DJ0ZQ/PMRElAHDERTNirRnwgk7bcm+UBSh/UY=; b=el1yo6ESA383acoR/pYMQKVbDtHJ4ELN712yWHLLpWEfbhuVf54yILmndpJx0UBVdq j7ctdXsN0s0mBJbj3PDn7RIkLBXVXUFUO+n5iItKMgdSXIJ0AilSuTy5gpE7hpct6jwc mzjwQ+/zxnqF6UBT6NHcRkzw7/OdayVTrVqqs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738230012; x=1738834812; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=i//528DJ0ZQ/PMRElAHDERTNirRnwgk7bcm+UBSh/UY=; b=JapPXKGB5Uh0Ffor30cL3WdmEf1RZqBUdM9tCf8K6YjmvANXxqbPuZP3MUJwdlMCAd 5viOQuUFG5T7TdWXzkiTcB0O2tAoG5Z224l0J7s//yveEksWf+H1bQsmkpqGLveO1YMx /lDqbUph8WiwnWLA8NJ2NdKRpRwB5c5wVE+ai+fW0doR8h8lPG3v3MWKYNjj09HBzGRs MRh9fOB50/7dpVia7igkO66S92ObA/q/oP8on5zgub+CPhInXlIKgBtKaP1gV3SSO2bO Z3zXf8zqicEN6pYgLYXLW5Hle7wPGaV4dH0x8q9chogmTG1bJG1XOiiQWGyNFLCHgX+O VMVg== X-Forwarded-Encrypted: i=1; AJvYcCUsIarv1ksoAlF8HSfVhKH/KiiWwSakh9T3Egh84vo5iuqljZuqSjL840o3NrP3BLK+Cyi3yumMwQ==@kvack.org X-Gm-Message-State: AOJu0YwSpRlVj/FQJJ0WWcYQ81ZyLh9KKcd+wvZErmAQu+cnbwfqv+Mb DDrbb6lxLW7oRie+b71ZlXN7HkZNPK7nDwzDgIFwHY4YsOPmc6+98JuKWk49DpokTBt2kGGaR9n K X-Gm-Gg: ASbGnctUf81C8tXZ0V4U/jNfbFqshrpfelUb7+Pk87Ua4EbspdRvJqDgVJJ9VSVIX0A PMKoeoDmX1AxgP0/SyYadCgBDI0Th0+o8oAMjHWH9tU0WHk16IXnaWnl3T3kJ+h/1nEVGDQgQt2 dUYzrgbj2B+bdi4eKsravwPKrk4wg9vm4SBjn82sfukE6aa1sYVQDpFxEg4eJTlTHni/A0X4ZUZ PMk0cYYCeJBQpZJfdz1fuLNGVfFRLaluVxvuN+uz1PzPsuS9BUBAqKHaMjFaV2erZMrI5oJeYmS wOldEGFEx9G1CZGkZaelJ9XfhcE= X-Google-Smtp-Source: AGHT+IHELNdPunqNgpdAjDR3NKuJWudzqHHVDYg/asujGGhvRfTPwZ1lEqRiXSYsTCFWsh5kt+/SYQ== X-Received: by 2002:a05:600c:4e03:b0:434:f9ad:7222 with SMTP id 5b1f17b1804b1-438e15ee101mr21454555e9.7.1738230012440; Thu, 30 Jan 2025 01:40:12 -0800 (PST) Received: from phenom.ffwll.local ([2a02:168:57f4:0:5485:d4b2:c087:b497]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438e244ed3fsm16976735e9.31.2025.01.30.01.40.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Jan 2025 01:40:11 -0800 (PST) Date: Thu, 30 Jan 2025 10:40:09 +0100 From: Simona Vetter To: Alistair Popple Cc: David Hildenbrand , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, Andrew Morton , =?iso-8859-1?B?Suly9G1l?= Glisse , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Jason Gunthorpe Subject: Re: [PATCH v1 04/12] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk Message-ID: Mail-Followup-To: Alistair Popple , David Hildenbrand , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, Andrew Morton , =?iso-8859-1?B?Suly9G1l?= Glisse , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Jason Gunthorpe References: <20250129115411.2077152-1-david@redhat.com> <20250129115411.2077152-5-david@redhat.com> <7tzcpx23vufmp5cxutnzhjgdj7kwqrw5drwochpv5ern7zknhj@h2s6y2qjbr3f> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7tzcpx23vufmp5cxutnzhjgdj7kwqrw5drwochpv5ern7zknhj@h2s6y2qjbr3f> X-Operating-System: Linux phenom 6.12.11-amd64 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2A79B10000F X-Stat-Signature: 1x9egz6gpmeyozzyunpa4brccgcwh78o X-HE-Tag: 1738230013-424029 X-HE-Meta: U2FsdGVkX1/yjbSrNDRvOOgoKCvmFmJFlgHrVaM3EA+5Urys/RpAuT9re5aRJNr1+Q4EmCtzYXc2kZZC+5ZFnF5l4S7/xUPhvkxxVEHzrdwKoPoX7H3ni9B88pUl2HWN+aGalxaLEdQn/Rx7StD9IDVwMQGn7o0GdBnd+JNqeTZ6pQWde6MI5tDzVEzZYlD54NKH+txGvDGGk3vjDSDdhZAoFoOU24hZSZsXDfJZKsk7RT9Qoo6EyKqpYvizH7D2vlOvRURJin4+baRfIQWEf4WkUFD8Yf5MbTHQDA83paCGKBHwgnickg/lm1fxenYYe1Lk+YpTZnL47wV9wTVXAjQEB4R1MpWs/eJ9R6zFdowLiNRknHKWEM7B6U+TwbsLHKqdR5NjQ5tUhZNIrBYPc3JOejPaqn1jgfF7x+s/46rO3/5jh/tYOoKJihaDpbxRtBfGaAWpU9hdgdTYS3hc+5lZVkeYWyMXKrob5b1kBATQj5aCuB89fnFfT0O5TwLMs0SjwjEy8gJ+AZVGuPvKNZlQuxUGZXTAQkoMIkXi35Gq6+u78na4XNZtJJlrtgzvE/uy53HMGpwE18GIrXtp8TyjLcjPc2lzC9VhSvam9dS7pG5ZayAegveKti2EgTqKhRVA7VgrhyDeWX5vL1DWJZMBZH47+T+7KWKmbltGq5tydjch4MS9C/Ru3qR01DJ/zehdoHXnfbsQK4ao7LqtVNs+ENGQD3kF/rT3ikWcqC5qpc1/nOTPFsClE3jyUTXYNTjp1c2jtScyorhvvvImBVIbrvvcgF5g+BvVpN9iHluUBFCBVSWHve0ONn+HxKDE6e/+7cxMBaeO5GUKV5oe4XDnAVy7W6foh8sQ2DA/uKl+OLizd5udEMHzkrQx6kBzXn6Hfot5uoKBWGbvawuy4JWQLhTf0Hny1rNUonKnLc3dtioGIc933a4k3cG97DD8g2FWBI8I8qwIwk7yiaW mlWkF7ef M2MHEhEIGPqGKx1oZxeIBbu9EVjeAZSzc7VMFGKMNOwxyrmDMLx8DrC66cJrpmgU4rWqeG4O1cmT7Lfm/5DJ1ZGDJ33QWN4vKnRIp+nq+QkIU3Mjq0Ss653diSjPanLsSP2Z2ajysu4oAHH3TNTWT9JdBHn5zErE9vafU5C4Zg7YhOHYzU7TaSxjom6chb0tRK/FuawHSmMqqT375XNrvofsf8X5lR9hiT0mz0CCWMheJm2cKefRORz66lNkZs13N1lh4tiNx0TnCxetevrCvFxNKIvu8ExxiWqw7EB0DFNF+fdjqAFrAHbmancqjykc7E/82muZYziEEBAPGdYq38ebWx9FVBOP7kvpEHh1C/tRny07LF2VoGsj4Yg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 30, 2025 at 05:11:49PM +1100, Alistair Popple wrote: > On Wed, Jan 29, 2025 at 12:54:02PM +0100, David Hildenbrand wrote: > > We require a writable PTE and only support anonymous folio: we can only > > have exactly one PTE pointing at that page, which we can just lookup > > using a folio walk, avoiding the rmap walk and the anon VMA lock. > > > > So let's stop doing an rmap walk and perform a folio walk instead, so we > > can easily just modify a single PTE and avoid relying on rmap/mapcounts. > > > > We now effectively work on a single PTE instead of multiple PTEs of > > a large folio, allowing for conversion of individual PTEs from > > non-exclusive to device-exclusive -- note that the other way always > > worked on single PTEs. > > > > We can drop the MMU_NOTIFY_EXCLUSIVE MMU notifier call and document why > > that is not required: GUP will already take care of the > > MMU_NOTIFY_EXCLUSIVE call if required (there is already a device-exclusive > > entry) when not finding a present PTE and having to trigger a fault and > > ending up in remove_device_exclusive_entry(). > > I will have to look at this a bit more closely tomorrow but this doesn't seem > right to me. We may be transitioning from a present PTE (ie. a writable > anonymous mapping) to a non-present PTE (ie. a device-exclusive entry) and > therefore any secondary processors (eg. other GPUs, iommus, etc.) will need to > update their copies of the PTE. So I think the notifier call is needed. I guess this is a question of semantics we want, for multiple gpus do we require that device-exclusive also excludes other gpus or not. I'm leaning towards agreeing with you here. > > Note that the PTE is > > always writable, and we can always create a writable-device-exclusive > > entry. > > > > With this change, device-exclusive is fully compatible with THPs / > > large folios. We still require PMD-sized THPs to get PTE-mapped, and > > supporting PMD-mapped THP (without the PTE-remapping) is a different > > endeavour that might not be worth it at this point. I'm not sure we actually want hugepages for device exclusive, since it has an impact on what's allowed and what not. If we only ever do 4k entries then userspace can assume that as long atomics are separated by a 4k page there's no issue when both the gpu and cpu hammer on them. If we try to keep thp entries then suddenly a workload that worked before will result in endless ping-pong between gpu and cpu because the separate atomic counters (or whatever) now all sit in the same 2m page. So going with thp might result in userspace having to spread out atomics even more, which is just wasting memory and not saving any tlb entries since often you don't need that many. tldr; I think not supporting thp entries for device exclusive is a feature, not a bug. Cheers, Sima > > This gets rid of the "folio_mapcount()" usage and let's us fix ordinary > > rmap walks (migration/swapout) next. Spell out that messing with the > > mapcount is wrong and must be fixed. > > > > Signed-off-by: David Hildenbrand > > --- > > mm/rmap.c | 188 ++++++++++++++++-------------------------------------- > > 1 file changed, 55 insertions(+), 133 deletions(-) > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 676df4fba5b0..49ffac6d27f8 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -2375,131 +2375,6 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) > > } > > > > #ifdef CONFIG_DEVICE_PRIVATE > > -struct make_exclusive_args { > > - struct mm_struct *mm; > > - unsigned long address; > > - void *owner; > > - bool valid; > > -}; > > - > > -static bool page_make_device_exclusive_one(struct folio *folio, > > - struct vm_area_struct *vma, unsigned long address, void *priv) > > -{ > > - struct mm_struct *mm = vma->vm_mm; > > - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); > > - struct make_exclusive_args *args = priv; > > - pte_t pteval; > > - struct page *subpage; > > - bool ret = true; > > - struct mmu_notifier_range range; > > - swp_entry_t entry; > > - pte_t swp_pte; > > - pte_t ptent; > > - > > - mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, > > - vma->vm_mm, address, min(vma->vm_end, > > - address + folio_size(folio)), > > - args->owner); > > - mmu_notifier_invalidate_range_start(&range); > > - > > - while (page_vma_mapped_walk(&pvmw)) { > > - /* Unexpected PMD-mapped THP? */ > > - VM_BUG_ON_FOLIO(!pvmw.pte, folio); > > - > > - ptent = ptep_get(pvmw.pte); > > - if (!pte_present(ptent)) { > > - ret = false; > > - page_vma_mapped_walk_done(&pvmw); > > - break; > > - } > > - > > - subpage = folio_page(folio, > > - pte_pfn(ptent) - folio_pfn(folio)); > > - address = pvmw.address; > > - > > - /* Nuke the page table entry. */ > > - flush_cache_page(vma, address, pte_pfn(ptent)); > > - pteval = ptep_clear_flush(vma, address, pvmw.pte); > > - > > - /* Set the dirty flag on the folio now the pte is gone. */ > > - if (pte_dirty(pteval)) > > - folio_mark_dirty(folio); > > - > > - /* > > - * Check that our target page is still mapped at the expected > > - * address. > > - */ > > - if (args->mm == mm && args->address == address && > > - pte_write(pteval)) > > - args->valid = true; > > - > > - /* > > - * Store the pfn of the page in a special migration > > - * pte. do_swap_page() will wait until the migration > > - * pte is removed and then restart fault handling. > > - */ > > - if (pte_write(pteval)) > > - entry = make_writable_device_exclusive_entry( > > - page_to_pfn(subpage)); > > - else > > - entry = make_readable_device_exclusive_entry( > > - page_to_pfn(subpage)); > > - swp_pte = swp_entry_to_pte(entry); > > - if (pte_soft_dirty(pteval)) > > - swp_pte = pte_swp_mksoft_dirty(swp_pte); > > - if (pte_uffd_wp(pteval)) > > - swp_pte = pte_swp_mkuffd_wp(swp_pte); > > - > > - set_pte_at(mm, address, pvmw.pte, swp_pte); > > - > > - /* > > - * There is a reference on the page for the swap entry which has > > - * been removed, so shouldn't take another. > > - */ > > - folio_remove_rmap_pte(folio, subpage, vma); > > - } > > - > > - mmu_notifier_invalidate_range_end(&range); > > - > > - return ret; > > -} > > - > > -/** > > - * folio_make_device_exclusive - Mark the folio exclusively owned by a device. > > - * @folio: The folio to replace page table entries for. > > - * @mm: The mm_struct where the folio is expected to be mapped. > > - * @address: Address where the folio is expected to be mapped. > > - * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier callbacks > > - * > > - * Tries to remove all the page table entries which are mapping this > > - * folio and replace them with special device exclusive swap entries to > > - * grant a device exclusive access to the folio. > > - * > > - * Context: Caller must hold the folio lock. > > - * Return: false if the page is still mapped, or if it could not be unmapped > > - * from the expected address. Otherwise returns true (success). > > - */ > > -static bool folio_make_device_exclusive(struct folio *folio, > > - struct mm_struct *mm, unsigned long address, void *owner) > > -{ > > - struct make_exclusive_args args = { > > - .mm = mm, > > - .address = address, > > - .owner = owner, > > - .valid = false, > > - }; > > - struct rmap_walk_control rwc = { > > - .rmap_one = page_make_device_exclusive_one, > > - .done = folio_not_mapped, > > - .anon_lock = folio_lock_anon_vma_read, > > - .arg = &args, > > - }; > > - > > - rmap_walk(folio, &rwc); > > - > > - return args.valid && !folio_mapcount(folio); > > -} > > - > > /** > > * make_device_exclusive() - Mark an address for exclusive use by a device > > * @mm: mm_struct of associated target process > > @@ -2530,9 +2405,12 @@ static bool folio_make_device_exclusive(struct folio *folio, > > struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, > > void *owner, struct folio **foliop) > > { > > - struct folio *folio; > > + struct folio *folio, *fw_folio; > > + struct vm_area_struct *vma; > > + struct folio_walk fw; > > struct page *page; > > - long npages; > > + swp_entry_t entry; > > + pte_t swp_pte; > > > > mmap_assert_locked(mm); > > > > @@ -2540,12 +2418,16 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, > > * Fault in the page writable and try to lock it; note that if the > > * address would already be marked for exclusive use by the device, > > * the GUP call would undo that first by triggering a fault. > > + * > > + * If any other device would already map this page exclusively, the > > + * fault will trigger a conversion to an ordinary > > + * (non-device-exclusive) PTE and issue a MMU_NOTIFY_EXCLUSIVE. > > */ > > - npages = get_user_pages_remote(mm, addr, 1, > > - FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, > > - &page, NULL); > > - if (npages != 1) > > - return ERR_PTR(npages); > > + page = get_user_page_vma_remote(mm, addr, > > + FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, > > + &vma); > > + if (IS_ERR(page)) > > + return page; > > folio = page_folio(page); > > > > if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { > > @@ -2558,11 +2440,51 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, > > return ERR_PTR(-EBUSY); > > } > > > > - if (!folio_make_device_exclusive(folio, mm, addr, owner)) { > > + /* > > + * Let's do a second walk and make sure we still find the same page > > + * mapped writable. If we don't find what we expect, we will trigger > > + * GUP again to fix it up. Note that a page of an anonymous folio can > > + * only be mapped writable using exactly one page table mapping > > + * ("exclusive"), so there cannot be other mappings. > > + */ > > + fw_folio = folio_walk_start(&fw, vma, addr, 0); > > + if (fw_folio != folio || fw.page != page || > > + fw.level != FW_LEVEL_PTE || !pte_write(fw.pte)) { > > + if (fw_folio) > > + folio_walk_end(&fw, vma); > > folio_unlock(folio); > > folio_put(folio); > > return ERR_PTR(-EBUSY); > > } > > + > > + /* Nuke the page table entry so we get the uptodate dirty bit. */ > > + flush_cache_page(vma, addr, page_to_pfn(page)); > > + fw.pte = ptep_clear_flush(vma, addr, fw.ptep); > > + > > + /* Set the dirty flag on the folio now the pte is gone. */ > > + if (pte_dirty(fw.pte)) > > + folio_mark_dirty(folio); > > + > > + /* > > + * Store the pfn of the page in a special device-exclusive non-swap pte. > > + * do_swap_page() will trigger the conversion back while holding the > > + * folio lock. > > + */ > > + entry = make_writable_device_exclusive_entry(page_to_pfn(page)); > > + swp_pte = swp_entry_to_pte(entry); > > + if (pte_soft_dirty(fw.pte)) > > + swp_pte = pte_swp_mksoft_dirty(swp_pte); > > + /* The pte is writable, uffd-wp does not apply. */ > > + set_pte_at(mm, addr, fw.ptep, swp_pte); > > + > > + /* > > + * TODO: The device-exclusive non-swap PTE holds a folio reference but > > + * does not count as a mapping (mapcount), which is wrong and must be > > + * fixed, otherwise RMAP walks don't behave as expected. > > + */ > > + folio_remove_rmap_pte(folio, page, vma); > > + > > + folio_walk_end(&fw, vma); > > *foliop = folio; > > return page; > > } > > -- > > 2.48.1 > > -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch