From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74730C282C6 for ; Mon, 3 Mar 2025 21:30:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D49B1280002; Mon, 3 Mar 2025 16:30:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CF83E280001; Mon, 3 Mar 2025 16:30:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B97D3280002; Mon, 3 Mar 2025 16:30:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 98EDF280001 for ; Mon, 3 Mar 2025 16:30:17 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0924481060 for ; Mon, 3 Mar 2025 21:30:17 +0000 (UTC) X-FDA: 83181533274.12.E7F1B33 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 9B8AA16000C for ; Mon, 3 Mar 2025 21:30:14 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gZfls4Ea; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741037414; a=rsa-sha256; cv=none; b=ErZGJJkoA8SwMhBJCq03pis7a+Itm9UqP8uMSZnYCtXMEjlMd4mmzZ9Mzgz5bZvFvM69m6 ghVx1IAsJqDBzemcI6LwrUcilGmE1z0BTZX37/f8HILjh7c6aZW5Wj4QnAkws1CC5t3V7t qJSjAQwIu2MElTs1cKjLuYQTVha2H4c= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gZfls4Ea; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741037414; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4XHKmdvqQLV0kinajoxf6W7WfTOeuSd3If/M5JQf010=; b=DYSh0OqxegzVq+mVkcykEnG8d/vSTs3Ir1Qusr3kJSYPE/9RD31DcWj8H6/4b74SAwJx1I AZppDNGDdlLWAWN313CdwF/vJMe4e7Kao8qP05heMzYGEeiMLang/l9VYjXfnWpVeWznJj RXKrGKg/KH+Gv/MWo+crQ6oid+T7v90= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741037413; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=4XHKmdvqQLV0kinajoxf6W7WfTOeuSd3If/M5JQf010=; b=gZfls4EaLIY5l1LnKFGRJtvmcQuehnAU+pGqMMUas/DHuZ8yybEs6E5dOz8htmWtWg3J29 CLTVpzROW9Xnh75R5wYgUtVbTopZVReudimH+MHFcLm14U73F1RbRmPB5OPxR1YVnf5+W/ vSeGDSlOcHpPKQi0O9Uaq1K/ZWi8ijQ= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-522-n3G8gUyePOCP3PF2F-USDA-1; Mon, 03 Mar 2025 16:29:35 -0500 X-MC-Unique: n3G8gUyePOCP3PF2F-USDA-1 X-Mimecast-MFC-AGG-ID: n3G8gUyePOCP3PF2F-USDA_1741037375 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-472122713b6so107478391cf.3 for ; Mon, 03 Mar 2025 13:29:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741037374; x=1741642174; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4XHKmdvqQLV0kinajoxf6W7WfTOeuSd3If/M5JQf010=; b=E/bc9PHNsA6THUCXOdw/ap+MKIgObt4aKm3Q26n7ysOtrc4ge2T9jryogNK8RJ0/ti 9/Y+ZgQ86E9/yfq9bI9YPV4xM04Ribby4xscjJhKqps+eFVnEknzHXm2JvKzv2+7k0r+ NgFAlHJp7AfptnHBdY8kjwy5NAXvzfLt5ZjcmeLSRJtmX86BMfRR9q8sPmjcoj48jtyg piqPZl9ly059H9nH7DRPoN0V8Enb7ose6wKpnNgHvdNqJYSczzk9TNhhvsfG/3GvyFnB g1U2Wt/qLQQo8/IhyYvdcNpWYbAWLqsLVRdfaB/ptYJv12RphXwBRs/Ro0uMK0URuaEp p1Tg== X-Forwarded-Encrypted: i=1; AJvYcCWrrErxMX9HumMCqBnA8ygC/HKEOgHXazFRJ+6OGcEGnQOcNxsDSYgWhI9wGf5wfbL9UrFaSjAxpg==@kvack.org X-Gm-Message-State: AOJu0Yxwkg5aMjW43awfLpRXZcd2URnmQSPrz6Sk/TiJ3nmz0J40ea1J 5thW1tLGCcG4PZiFccdU1fr1YdTpXVdOEb+RrJh/vdrMVWVTTntRSJ1h18pG+eJsgGt/iUA4M11 GpF80WaZ2T/u7bmJg/eOKqz1P63pQCFM2CP5Quo0VVnQ8XO/bW+an0hIh X-Gm-Gg: ASbGnctk2jpAiJQZmJnEN33Rs4fDjrDdnhxyW70LUfaz2Lav2B+avIJJk5pQdtFrirU hC8b3/D/uqNVbi28DJ44Y2TNrLgPbwaVFFBizs/Ol6H/chy6WQcU76w621YLwj9qC7HKH9OTVQu j2h+mn6RRlPPLlkiNi4wI3EsiYzIDHr6qbiItEyBMo5iwdcrZYxOe8ZUqnHFmPtNm4bro9og+5y 0RVTtbw6UkcnBVI8gwuhDppHaU8fxixb1jxhUBeQFTCZtYVdbZkRoMl3n+ujMK/1OXuCipvRl/k U0I3xcg= X-Received: by 2002:a05:622a:1195:b0:471:f1c6:54a6 with SMTP id d75a77b69052e-474bbf8989cmr203598861cf.0.1741037374581; Mon, 03 Mar 2025 13:29:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IGyEWewqQv6y2LaIpicJJ1HPCHenu4T0jWbilaFvqyXO2dsNaQSyH7LiAJDW9m7DNjxLw6PZg== X-Received: by 2002:a05:622a:1195:b0:471:f1c6:54a6 with SMTP id d75a77b69052e-474bbf8989cmr203598501cf.0.1741037374182; Mon, 03 Mar 2025 13:29:34 -0800 (PST) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-474fd3a8bdasm603971cf.74.2025.03.03.13.29.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 13:29:33 -0800 (PST) Date: Mon, 3 Mar 2025 16:29:29 -0500 From: Peter Xu To: Nikita Kalyazin Cc: akpm@linux-foundation.org, pbonzini@redhat.com, shuah@kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lorenzo.stoakes@oracle.com, david@redhat.com, ryan.roberts@arm.com, quic_eberman@quicinc.com, jthoughton@google.com, graf@amazon.de, jgowans@amazon.com, roypat@amazon.co.uk, derekmn@amazon.com, nsaenz@amazon.es, xmarcalx@amazon.com Subject: Re: [RFC PATCH 0/5] KVM: guest_memfd: support for uffd missing Message-ID: References: <20250303133011.44095-1-kalyazin@amazon.com> MIME-Version: 1.0 In-Reply-To: <20250303133011.44095-1-kalyazin@amazon.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: aCNt6CCMsLI54BpWiY3A3POoq8qOs6iG0R3rLWYCCg4_1741037375 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 9B8AA16000C X-Stat-Signature: sz7c7q1ffwzf46ggzi31zgz59xct9pno X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1741037414-735366 X-HE-Meta: U2FsdGVkX19hNo8KH03voLTmO9beCZwku0EBf4K6q+L6aOvibxuv6dKFnQEAgIYb/G9NeWwMGnaEsznliNCQwyFoAH8beOqAy6Zvp0OsuZv3iFObqHPSuoRFNZEqjEaQaxmQuM5xvpsPr5Zt2Kgw6KBl07/j5srw09EHymlYQrNxm4zRCoPeY+4HUu4aylnP/FeU6d3Pg4xz7UqAEmXCepSaZchPVZxwI9nT6nXnY7l2rAeDIj+Q4fIa7FNxpyksyCfk4l1jySKPJg76Z1yFscUysjcV8xybP9srdF4aOxJfZTAm+Dv7itZKf3ktA8OIR6Sd/2+CbP6IbBbCHXn2bKnS4zk/y5hCOkhIH/Hmd3ozG1fWnEZNJ+kz+rJ4x8s8xR2sUdADQn6zNdbaUbLfcjWkMZX947Djjo3smFD54/rTIe1mLSn44RU/V9msw/bX6ud9Plf47BX70v8HHKE8tBuEKaELXtIG4RtDZgheqGDPZNniyMOEOX83sIX7br3wbhyJ3bkrXwFh4f0bqvvq+qmDuQrxp3HI4EWV7UTC56nofCNi3/vpOL3L4S9geOEFouZphrqD8mbPbd+Vz+Bxtd3RysZOnARlXVsL4JWimIDxAYYBo39CDlYXPNafr2GjIsN0prQj1yubw6RyB1HoT+vRka1DHR7bLwgrKQFMcMFfehAcfk4jKV50NAu/Q/yNE4J6RaAa6fZuXjI9xzDKLaoRHH3UiG6No7UvKga1nWpOPSTAvFp/6x4P3FNw/uTD+AsF/ch+ufzszHKV+wosXLULKDHkNI1au8I24JDqiGdi4P6cwexCXEWQ0BOxUE4jRct5urDpU3iUV6X+axGXQLM6nE3WeNntm+vZSBD8WhrE1zbZBjYmukkOrnhoEGO6dmCY4nQvoN47XbAhwHWTWE6fqP/UFgnDgJyOTO7uE5QZYWcpbYTAsJnjgdJGz9z6ACsxkM+DS8Dw6M5+cU9 QyT14lJb vzDveHfiiITTlOG1Kh5mK0dRPEHIHO5ftd5bTBAMkF0h+otP3/NcKM954XLkTkH2p0PHLVDBjBTnfD6pDJwGlp4UOC1y0e3BnzltNXVugMrIjB8N5UrMi7kGx+rLGr/OhvNxq21A0rErS4jiCBjewzdLYwKUamHcAwVvVirC5pLAIrQTImKEIYUceQ86R3DpgEmuIJHiqeGBxU6eX09TiCYNi+stuyXK/bfeNwavhMFnZjR65JLljeB9rcEUcZUHJHBh8Xra2+wWxV28f7F1afQreAO4gZxtgwji3KlqaC53jii5kWg2N7+Pt2p3xIhLFU3LLlEuGpV0bTYnjC+9bTfZX3TbssK2BTSMBz+QwgdYGlhzw4wA4dz1ZH1PZPNvjIj/Z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000168, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 03, 2025 at 01:30:06PM +0000, Nikita Kalyazin wrote: > This series is built on top of the v3 write syscall support [1]. > > With James's KVM userfault [2], it is possible to handle stage-2 faults > in guest_memfd in userspace. However, KVM itself also triggers faults > in guest_memfd in some cases, for example: PV interfaces like kvmclock, > PV EOI and page table walking code when fetching the MMIO instruction on > x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] > that KVM would be accessing those pages via userspace page tables. In > order for such faults to be handled in userspace, guest_memfd needs to > support userfaultfd. > > This series proposes a limited support for userfaultfd in guest_memfd: > - userfaultfd support is conditional to `CONFIG_KVM_GMEM_SHARED_MEM` > (as is fault support in general) > - Only `page missing` event is currently supported > - Userspace is supposed to respond to the event with the `write` > syscall followed by `UFFDIO_CONTINUE` ioctl to unblock the faulting > process. Note that we can't use `UFFDIO_COPY` here because > userfaulfd code does not know how to prepare guest_memfd pages, eg > remove them from direct map [4]. > > Not included in this series: > - Proper interface for userfaultfd to recognise guest_memfd mappings > - Proper handling of truncation cases after locking the page > > Request for comments: > - Is it a sensible workflow for guest_memfd to resolve a userfault > `page missing` event with `write` syscall + `UFFDIO_CONTINUE`? One > of the alternatives is teaching `UFFDIO_COPY` how to deal with > guest_memfd pages. Probably not.. I don't see what protects a thread fault concurrently during write() happening, seeing partial data. Since you check the page cache it'll let it pass, but the partial page will be faulted in there. I think we may need to either go with full MISSING or full MINOR traps. One thing to mention is we probably need MINOR sooner or later to support gmem huge pages. The thing is for huge folios in gmem we can't rely on missing in page cache, as we always need to allocate in hugetlb sizes. > - What is a way forward to make userfaultfd code aware of guest_memfd? > I saw that Patrick hit a somewhat similar problem in [5] when trying > to use direct map manipulation functions in KVM and was pointed by > David at Elliot's guestmem library [6] that might include a shim for that. > Would the library be the right place to expose required interfaces like > `vma_is_gmem`? Not sure what's the best to do, but IIUC the current way this series uses may not work as long as one tries to reference a kvm symbol from core mm.. One trick I used so far is leveraging vm_ops and provide hook function to report specialties when it's gmem. In general, I did not yet dare to overload vm_area_struct, but I'm thinking maybe vm_ops is more possible to be accepted. E.g. something like this: diff --git a/include/linux/mm.h b/include/linux/mm.h index 5e742738240c..b068bb79fdbc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -653,8 +653,26 @@ struct vm_operations_struct { */ struct page *(*find_special_page)(struct vm_area_struct *vma, unsigned long addr); + /* + * When set, return the allowed orders bitmask in faults of mmap() + * ranges (e.g. for follow up huge_fault() processing). Drivers + * can use this to bypass THP setups for specific types of VMAs. + */ + unsigned long (*get_supported_orders)(struct vm_area_struct *vma); }; +static inline bool vma_has_supported_orders(struct vm_area_struct *vma) +{ + return vma->vm_ops && vma->vm_ops->get_supported_orders; +} + +static inline unsigned long vma_get_supported_orders(struct vm_area_struct *vma) +{ + if (!vma_has_supported_orders(vma)) + return 0; + return vma->vm_ops->get_supported_orders(vma); +} + In my case I used that to allow gmem report huge page supports on faults. Said that, above only existed in my own tree so far, so I also don't know whether something like that could be accepted (even if it'll work for you). Thanks, -- Peter Xu