From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C2C7D64070 for ; Fri, 8 Nov 2024 17:31:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E84A26B007B; Fri, 8 Nov 2024 12:31:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E34C66B0082; Fri, 8 Nov 2024 12:31:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFC626B0089; Fri, 8 Nov 2024 12:31:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B20ED6B007B for ; Fri, 8 Nov 2024 12:31:58 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5D29D16017A for ; Fri, 8 Nov 2024 17:31:58 +0000 (UTC) X-FDA: 82763620380.14.8F10BD9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id DCC2F140025 for ; Fri, 8 Nov 2024 17:31:27 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=P1JI7XtX; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of pbonzini@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=pbonzini@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731086990; a=rsa-sha256; cv=none; b=Tect/TNAj0cjGbNuyyVDgGwWMaD65U7nI/6FjemID1r3LLhNg122Yd7Zz+t3I6hV6AOMqx diNM/08KHuR+7SEGbU8yqvio3rQxaWg4ATT+qBas+1dsjsitp+F8dbkOpXNaW9TDuvGxR6 0vcuVLr2k8ta02hoXWHWgoCRy+vKsnY= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=P1JI7XtX; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of pbonzini@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=pbonzini@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731086990; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yd4O8mZKu4eBAZvLdoHZhgPng6vXj2+Z+n0AqkWNQHs=; b=c0CGQtSBfGC6C2ru+hKQUMqXpjRcfuYQ+Qc8o/FKnIcmN+wt9AMi5HAyU8q1l/qG2ukFOm BTQ45MTyYFSyvUqDSoWy22g6rIMurzOSIWLVf8RX3nd8ansXLD1n+2mvVMKhDyqaP16uQm XXdDt9G2v9Rlc0qBBJcoDjJOlcejhJc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731087115; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=yd4O8mZKu4eBAZvLdoHZhgPng6vXj2+Z+n0AqkWNQHs=; b=P1JI7XtXiIKu+6ZhalCUjJyO+lvQJscvxYt9sxeIGGPDl3lnkDUqLVhYJBUsdHAxvYg8fX vgI6cWZhpTvrc5vohdvgOUjqWDlJOgCHihOEenrRYDdF88KRwmEhNx7nthOwFrQ4Nu3hxr kOCiqDs0frtaCYbXvzIzYK5dNwvbgp8= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-265-4ypxiYiePuKokueYHQB36A-1; Fri, 08 Nov 2024 12:31:52 -0500 X-MC-Unique: 4ypxiYiePuKokueYHQB36A-1 X-Mimecast-MFC-AGG-ID: 4ypxiYiePuKokueYHQB36A Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-37d5a3afa84so1402420f8f.3 for ; Fri, 08 Nov 2024 09:31:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731087111; x=1731691911; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yd4O8mZKu4eBAZvLdoHZhgPng6vXj2+Z+n0AqkWNQHs=; b=B7RKITb9YEFIMdOzzhFSXpcFsz5wlP3ohHMFX2v6W9TB91YKEFuHezmXZ6VAmSI6jL 3eXO0cbbYx4bEAsph7WQzTVu03HuFgZNBEhNYYep7w9sNN+vD/dVtaO9iMtZ1oaQVcYj Qdozj/iWgtj+CJ4ZYn9FvPfnLatKouLtX6Si24BlQsrNKu/CqUUxHIfPofMNK+ssRafh 8mJnllrbDU+wAydig19B+2qAadld6pkRwu+FRTFn+UUaEynEsYIx4IdGAZuxSErLqMcZ 54uoC9JqjJpnBRzjNz8HsaRyOOPFVgEyaJX1UKn3oRt6v4TfS8/ep87s0AwftN5f/vec doNw== X-Forwarded-Encrypted: i=1; AJvYcCXAp9pVhbgOJHCL+OphJknja5QmSxY2tVODmlG6j71EHEcd14f4UjLFiK5jIV9F2wCpuCaO9drFHA==@kvack.org X-Gm-Message-State: AOJu0YxnXcqrcz09QiLwifzox819ZKugWTefTimLUcIoeA4J9d3/5pTf C2SVL6/zF6BBrP9hsj+4c0xKynreDZ1nchle4/qT92hXb6TdKcbdTT0LMN1Bz5d0jIXWvdLPHQZ fI9I9PMqsinoYHQmmvoIR7wvxkBqQu0+V0eRg336jrThCB16R X-Received: by 2002:a5d:6d8a:0:b0:37d:4e80:516 with SMTP id ffacd0b85a97d-381f186d184mr3065625f8f.34.1731087110745; Fri, 08 Nov 2024 09:31:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IF3QWFDdRozO+PK2MbzpOgYsv6Dill3GhRB3QqEZymzjtGl7Y3z7IR1KySm+lULn51+VPdYsA== X-Received: by 2002:a5d:6d8a:0:b0:37d:4e80:516 with SMTP id ffacd0b85a97d-381f186d184mr3065582f8f.34.1731087110333; Fri, 08 Nov 2024 09:31:50 -0800 (PST) Received: from [192.168.10.47] ([151.49.84.243]) by smtp.googlemail.com with ESMTPSA id ffacd0b85a97d-381ed998e6esm5551930f8f.55.2024.11.08.09.31.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Nov 2024 09:31:49 -0800 (PST) Message-ID: <10ffac79-0dba-4c30-991e-f3ca2b5ff639@redhat.com> Date: Fri, 8 Nov 2024 18:31:47 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 0/4] Add fbind() and NUMA mempolicy support for KVM guest_memfd To: Matthew Wilcox , Shivank Garg Cc: x86@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, kvm@vger.kernel.org, chao.gao@intel.com, pgonda@google.com, thomas.lendacky@amd.com, seanjc@google.com, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, arnd@arndb.de, kees@kernel.org, bharata@amd.com, nikunj@amd.com, michael.day@amd.com, Neeraj.Upadhyay@amd.com, linux-coco@lists.linux.dev, Linux API References: <20241105164549.154700-1-shivankg@amd.com> <6004eaa4-934c-48f4-b502-cf7e436462fc@amd.com> From: Paolo Bonzini Autocrypt: addr=pbonzini@redhat.com; keydata= xsEhBFRCcBIBDqDGsz4K0zZun3jh+U6Z9wNGLKQ0kSFyjN38gMqU1SfP+TUNQepFHb/Gc0E2 CxXPkIBTvYY+ZPkoTh5xF9oS1jqI8iRLzouzF8yXs3QjQIZ2SfuCxSVwlV65jotcjD2FTN04 hVopm9llFijNZpVIOGUTqzM4U55sdsCcZUluWM6x4HSOdw5F5Utxfp1wOjD/v92Lrax0hjiX DResHSt48q+8FrZzY+AUbkUS+Jm34qjswdrgsC5uxeVcLkBgWLmov2kMaMROT0YmFY6A3m1S P/kXmHDXxhe23gKb3dgwxUTpENDBGcfEzrzilWueOeUWiOcWuFOed/C3SyijBx3Av/lbCsHU Vx6pMycNTdzU1BuAroB+Y3mNEuW56Yd44jlInzG2UOwt9XjjdKkJZ1g0P9dwptwLEgTEd3Fo UdhAQyRXGYO8oROiuh+RZ1lXp6AQ4ZjoyH8WLfTLf5g1EKCTc4C1sy1vQSdzIRu3rBIjAvnC tGZADei1IExLqB3uzXKzZ1BZ+Z8hnt2og9hb7H0y8diYfEk2w3R7wEr+Ehk5NQsT2MPI2QBd wEv1/Aj1DgUHZAHzG1QN9S8wNWQ6K9DqHZTBnI1hUlkp22zCSHK/6FwUCuYp1zcAEQEAAc0j UGFvbG8gQm9uemluaSA8cGJvbnppbmlAcmVkaGF0LmNvbT7CwU0EEwECACMFAlRCcBICGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRB+FRAMzTZpsbceDp9IIN6BIA0Ol7MoB15E 11kRz/ewzryFY54tQlMnd4xxfH8MTQ/mm9I482YoSwPMdcWFAKnUX6Yo30tbLiNB8hzaHeRj jx12K+ptqYbg+cevgOtbLAlL9kNgLLcsGqC2829jBCUTVeMSZDrzS97ole/YEez2qFpPnTV0 VrRWClWVfYh+JfzpXmgyhbkuwUxNFk421s4Ajp3d8nPPFUGgBG5HOxzkAm7xb1cjAuJ+oi/K CHfkuN+fLZl/u3E/fw7vvOESApLU5o0icVXeakfSz0LsygEnekDbxPnE5af/9FEkXJD5EoYG SEahaEtgNrR4qsyxyAGYgZlS70vkSSYJ+iT2rrwEiDlo31MzRo6Ba2FfHBSJ7lcYdPT7bbk9 AO3hlNMhNdUhoQv7M5HsnqZ6unvSHOKmReNaS9egAGdRN0/GPDWr9wroyJ65ZNQsHl9nXBqE AukZNr5oJO5vxrYiAuuTSd6UI/xFkjtkzltG3mw5ao2bBpk/V/YuePrJsnPFHG7NhizrxttB nTuOSCMo45pfHQ+XYd5K1+Cv/NzZFNWscm5htJ0HznY+oOsZvHTyGz3v91pn51dkRYN0otqr bQ4tlFFuVjArBZcapSIe6NV8C4cEiSTOwE0EVEJx7gEIAMeHcVzuv2bp9HlWDp6+RkZe+vtl KwAHplb/WH59j2wyG8V6i33+6MlSSJMOFnYUCCL77bucx9uImI5nX24PIlqT+zasVEEVGSRF m8dgkcJDB7Tps0IkNrUi4yof3B3shR+vMY3i3Ip0e41zKx0CvlAhMOo6otaHmcxr35sWq1Jk tLkbn3wG+fPQCVudJJECvVQ//UAthSSEklA50QtD2sBkmQ14ZryEyTHQ+E42K3j2IUmOLriF dNr9NvE1QGmGyIcbw2NIVEBOK/GWxkS5+dmxM2iD4Jdaf2nSn3jlHjEXoPwpMs0KZsgdU0pP JQzMUMwmB1wM8JxovFlPYrhNT9MAEQEAAcLBMwQYAQIACQUCVEJx7gIbDAAKCRB+FRAMzTZp sadRDqCctLmYICZu4GSnie4lKXl+HqlLanpVMOoFNnWs9oRP47MbE2wv8OaYh5pNR9VVgyhD OG0AU7oidG36OeUlrFDTfnPYYSF/mPCxHttosyt8O5kabxnIPv2URuAxDByz+iVbL+RjKaGM GDph56ZTswlx75nZVtIukqzLAQ5fa8OALSGum0cFi4ptZUOhDNz1onz61klD6z3MODi0sBZN Aj6guB2L/+2ZwElZEeRBERRd/uommlYuToAXfNRdUwrwl9gRMiA0WSyTb190zneRRDfpSK5d usXnM/O+kr3Dm+Ui+UioPf6wgbn3T0o6I5BhVhs4h4hWmIW7iNhPjX1iybXfmb1gAFfjtHfL xRUr64svXpyfJMScIQtBAm0ihWPltXkyITA92ngCmPdHa6M1hMh4RDX+Jf1fiWubzp1voAg0 JBrdmNZSQDz0iKmSrx8xkoXYfA3bgtFN8WJH2xgFL28XnqY4M6dLhJwV3z08tPSRqYFm4NMP dRsn0/7oymhneL8RthIvjDDQ5ktUjMe8LtHr70OZE/TT88qvEdhiIVUogHdo4qBrk41+gGQh b906Dudw5YhTJFU3nC6bbF2nrLlB4C/XSiH76ZvqzV0Z/cAMBo5NF/w= In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 5d_gAn9G2WgqEZyA0yiCqu2RFnbHmvZLX53Y0-R4ZTU_1731087111 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: DCC2F140025 X-Stat-Signature: 5ugtnd8g37o3ss79pdmi361dps8a95bm X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1731087087-834833 X-HE-Meta: U2FsdGVkX18yCJ9bY6ecDAQerm1R/biFNcYMDXROdaEQNtR46W4nylLyrMxQimw0RwGgOOEWyNk7hReKm4HGBinhyNMMzFM3g6ho1aD6jKjdcA7v22GEydNaZ14N8WfHHet0uGozKAvPupp5sA5mPX1DyOL+faUIvi/BCbynNN0ULA7LByuyY1xBrSzYiArFQzXdkO6qktd0fKOjx8jvUXZCzifJ8XXyfoal67yIYzXhdAO4WgBewGqTQc1u1vVridlZc4lsNcgMz4LIwG6q/Ew3m0+JIjbySdqX2OLa6cprvtyStFzfWuMxhW85pSoW6Du0QcBmH1veeOj5IDj0a4PaxqnQm55Ur/XQoNu0DAvO0SpxFkoCWKsYu3UVojRTb3VG4uJ0uV5zsOzJShKe+u/uV8HOKOBeWuTK6V2ZBQeNAtTh452kbfToURVWzGzwz20/brjnVVJfNwvSc5Nv2u9jFi3hZWMDoziM4aNK0qG/t3RWsQaLcEgFxJhv6MYpP7Yr5ZqupjnPzRsqpZaJWI38Lpm54mVi6O4QZCPb9clomZ1dYy+zeGWgo0W5p2uwPcuD5Ho0QQp5FEiCKm57Jygiu+0hDtj8H8RRbf2LFAi3vswN64urqgSaQTdEj6PrXq2TNMMUqO9kpJ4HldTm8hKujq51U9RiJVnf444hHNqSGt6bgTx33hbcrU8yIxhPJIS4jXpm2E+hWmqrw2NiTxBYOjBnmC/KoRSpqAIMEREBgOd18S6nWbRAPQpaLFFg3XO8X1cMDlr9q4pDPPGPX8djMHFcWiPagYUk7EDR6O9r1HH4r7F7KXzUtIkUeNSQrRCfdsPDK5Xz1idf7GXH7FA0ghQJI0+58NcCnChColoZ5t+rbKwbqGW5m/e4ZsaH1JYBbHwhVij4RiNEVthGELfsLsdiOvLDuYeSV7rGJbWGhaIype8+mv+bH9W6VN2cCQH9UKmMStv2C5Lv56N OvOblUlO 9EevNj7OAg9r+qJFV4sNkfnpDJXq7Z4Zi6ZUR08bEIalVZb8JvqtGCcFdsN4onQgsjpIVY/NqC2IFMTGjrabvRdzWZ6WKNbwEWTxSG4/EVTyQgTGL5UyYty4Q/xvcNez7JANvL96G5TEJRkVZrTcTWSEc6ZvDcp4PtGxran68wBtwPTyXDUslCj0Vh3ArTJKOcOo8n54UsS8YG0J5t21T+2FB8eOhJqNhr7PzDl1d35RpjXCMWEc0kTRmsnTl4b8O20ZN6lK8mPfWJzzNmp3F9JfQfOcmgI3lM5ViUTgqDE1RjnV1VFYDZElfjGdHA0Z8hEz3HKserygtNxFVx8BZPrGzKAks3sRWkmBAS3dGcWBED13hn2hdOh0YQaMA43JlFGgOLkjS4VsJO0NX6Xu0gkaBchQs0piGw0d7Uk03PqzlWdczM4kgxz69HBV5QeXGJZQgxtSaX6ef6ac4XxwtYg2ppwm3JZ71mBSD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11/7/24 16:10, Matthew Wilcox wrote: > On Thu, Nov 07, 2024 at 02:24:20PM +0530, Shivank Garg wrote: >> The folio allocation path from guest_memfd typically looks like this... >> >> kvm_gmem_get_folio >> filemap_grab_folio >> __filemap_get_folio >> filemap_alloc_folio >> __folio_alloc_node_noprof >> -> goes to the buddy allocator >> >> Hence, I am trying to have a version of filemap_alloc_folio() that takes an mpol. > > It only takes that path if cpuset_do_page_mem_spread() is true. Is the > real problem that you're trying to solve that cpusets are being used > incorrectly? If it's false it's not very different, it goes to alloc_pages_noprof(). Then it respects the process's policy, but the policy is not customizable without mucking with state that is global to the process. Taking a step back: the problem is that a VM can be configured to have multiple guest-side NUMA nodes, each of which will pick memory from the right NUMA node in the host. Without a per-file operation it's not possible to do this on guest_memfd. The discussion was whether to use ioctl() or a new system call. The discussion ended with the idea of posting a *proposal* asking for *comments* as to whether the system call would be useful in general beyond KVM. Commenting on the system call itself I am not sure I like the file_operations entry, though I understand that it's the simplest way to implement this in an RFC series. It's a bit surprising that fbind() is a total no-op for everything except KVM's guest_memfd. Maybe whatever you pass to fbind() could be stored in the struct file *, and used as the default when creating VMAs; as if every mmap() was followed by an mbind(), except that it also does the right thing with MAP_POPULATE for example. Or maybe that's a horrible idea? Adding linux-api to get input; original thread is at https://lore.kernel.org/kvm/20241105164549.154700-1-shivankg@amd.com/. Paolo > Backing up, it seems like you want to make a change to the page cache, > you've had a long discussion with people who aren't the page cache > maintainer, and you all understand the pros and cons of everything, > and here you are dumping a solution on me without talking to me, even > though I was at Plumbers, you didn't find me to tell me I needed to go > to your talk. > > So you haven't explained a damned thing to me, and I'm annoyed at you. > Do better. Starting with your cover letter. >