From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2750AC433EF for ; Tue, 14 Jun 2022 18:56:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A5C116B0072; Tue, 14 Jun 2022 14:56:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A0ABE6B0073; Tue, 14 Jun 2022 14:56:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8852D6B0074; Tue, 14 Jun 2022 14:56:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 799176B0072 for ; Tue, 14 Jun 2022 14:56:21 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4E9D221043 for ; Tue, 14 Jun 2022 18:56:21 +0000 (UTC) X-FDA: 79577746962.05.D0FC95B Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf12.hostedemail.com (Postfix) with ESMTP id 64EA3400AB for ; Tue, 14 Jun 2022 18:56:20 +0000 (UTC) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25EGa9vV015391; Tue, 14 Jun 2022 18:56:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : content-transfer-encoding : in-reply-to; s=pp1; bh=DbHoz0EWg5FT9a4yGtK7NZIMSGCrXKl3+a401lo8Z0o=; b=oVOvTAZgxms/gNpApxL/GIf/5VdM8WKPKX4ldv8mB69oeriZdLFejApYIDO+Xq4mNiBl j6Ui1tlF/Bjmgi0bdoePXIzU1U4UrPxUfb+xOCYqFOcMdPH3PmmFrOMkZNiCX1ImcVFZ mTf5ieGD0o9v6KmQ2GzYhjCvqVldsdaw4YL7Et6PYDOcZRGIbOmt5veeg6rKUuJVIXd3 TxcLNcaB/1OI6ggXf1hXQVzd/R5JM7V0I7Q5auoXHAM5vYSO7kcZLeXUjKLqtSJMkym5 TY/8X1VXyrFxbx/1F0+c9yAShShr0r5JNUeA61g4Y+rqKje5RTSYe00hjGQzp/Xv7Zpd +Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3gpqmw0ntv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 18:56:19 +0000 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25EHMwQd011987; Tue, 14 Jun 2022 18:56:19 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3gpqmw0ntb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 18:56:18 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25EIoB6m028443; Tue, 14 Jun 2022 18:56:17 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma05fra.de.ibm.com with ESMTP id 3gmjp93nxu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 18:56:17 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25EIuEg014483754 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 14 Jun 2022 18:56:14 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8A474A405B; Tue, 14 Jun 2022 18:56:14 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CB110A405C; Tue, 14 Jun 2022 18:56:13 +0000 (GMT) Received: from linux.ibm.com (unknown [9.145.33.218]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Tue, 14 Jun 2022 18:56:13 +0000 (GMT) Date: Tue, 14 Jun 2022 21:56:11 +0300 From: Mike Rapoport To: Nadav Amit Cc: David Hildenbrand , Peter Xu , Linux MM , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen Subject: Re: [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG Message-ID: References: <20220613204043.98432-1-namit@vmware.com> <3eea2e6e-1646-546a-d9ef-d30052c00c7d@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 5r3MNHkdpl1_y7sE3o-Ydk-Q9PO2o58a X-Proofpoint-ORIG-GUID: Z16ciHwPNrMCIq1ArIgwzn5T41z-Fvlf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-14_07,2022-06-13_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 adultscore=0 impostorscore=0 mlxscore=0 lowpriorityscore=0 bulkscore=0 priorityscore=1501 suspectscore=0 phishscore=0 malwarescore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206140069 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655232980; a=rsa-sha256; cv=none; b=U5r2y/5+sczwFb+XD7G2tWKPMMNGXf1q2Pj3RGDHPcrwGBmIpjlbOXLENOvvbUpjrq69t7 hd1W4V7nerDhN8IiFZMeSDjLXd3jMdognlATUMUvTU5wwtfBTw5yxK4XQ+nVckOIt6iFwT KPLyse0QfcmSO4q/L0a/PwMDcOsE7PM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=oVOvTAZg; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf12.hostedemail.com: domain of rppt@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=rppt@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655232980; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DbHoz0EWg5FT9a4yGtK7NZIMSGCrXKl3+a401lo8Z0o=; b=Hay8oIVPT1BQ7QA4AWGCyUboEa4N3IE6mntE3X33X1ECIG44HzHQh0l0GABbviCB5RrCuw 2Nnr6NXo0T0yV+uknjdNNzjOSUHgjxh9qu+kRsPWAr9dkd/Ozdeccaf00zr8pjdL1eFc/G vGIh43fAaLYA4dlxBgfSxA5D8flAomU= X-Rspamd-Queue-Id: 64EA3400AB X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=oVOvTAZg; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf12.hostedemail.com: domain of rppt@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=rppt@linux.ibm.com X-Rspamd-Server: rspam06 X-Stat-Signature: kbdakdj1cn1sprgme83p63yc5jw9apxg X-HE-Tag: 1655232980-919472 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 14, 2022 at 09:18:43AM -0700, Nadav Amit wrote: > On Jun 14, 2022, at 8:22 AM, David Hildenbrand wrote: > > > On 13.06.22 22:40, Nadav Amit wrote: > >> From: Nadav Amit > >> > >> As we know, using a PTE on x86 with cleared access-bit (aka young-bit) > >> takes ~600 cycles more than when the access-bit is set. At the same > >> time, setting the access-bit for memory that is not used (e.g., > >> prefetched) can introduce greater overheads, as the prefetched memory is > >> reclaimed later than it should be. > >> > >> Userfaultfd currently does not set the access-bit (excluding the > >> huge-pages case). Arguably, it is best to let the uffd monitor control > >> whether the access-bit should be set or not. The expected use is for the > >> monitor to request userfaultfd to set the access-bit when the copy > >> operation is done to resolve a page-fault, and not to set the young-bit > >> when the memory is prefetched. > > > > Thinking out loud about existing users: postcopy live migration in QEMU > > has two usage for placement of pages > > > > a) Resolving a fault. E.g., a VCPU might be waiting for resolution to > > make progress. > > b) Background migration to converge without faults on all relevant > > pages. > > > > I guess in a) we'd want UFFDIO_COPY_MODE_YOUNG in b) we don't want it. > > > > > > I wonder, however, instead of calling this "young", which implies what > > the OS should or shouldn't do, to define this as a hint that the placed > > page is very likely to be accessed next. > > > > I'm bad at naming, UFFDIO_COPY_MODE_ACCESS_LIKELY would express what I > > have in mind. > > How about UFFDIO_COPY_MODE_WILLNEED_READ ? > > > > >> Introduce UFFDIO_COPY_MODE_YOUNG to enable userspace to request the > >> young bit to be set. For UFFDIO_CONTINUE and UFFDIO_ZEROPAGE set the bit > >> unconditionally since the former is only used to resolve page-faults and > >> the latter would not benefit from not setting the access-bit. > >> > >> Cc: Mike Kravetz > >> Cc: Hugh Dickins > >> Cc: Andrew Morton > >> Cc: Axel Rasmussen > >> Cc: Peter Xu > >> Cc: David Hildenbrand > >> Cc: Mike Rapoport > >> Signed-off-by: Nadav Amit > >> > >> --- > >> > >> There are 2 possible enhancements: > >> > >> 1. Use the flag to decide on whether to mark the PTE as dirty (for > >> writable PTEs). I guess that setting the dirty-bit is as expensive as > >> setting the access-bit, and setting it introduces similar tradeoffs, > >> as mentioned above. > >> > >> 2. Introduce a similar mode for write-protect and use this information > >> for setting both the young and dirty bits. Makes one wonder whether > >> mprotect() should also set the bit in certain cases... > > > > I wonder if UFFDIO_COPY_MODE_READ_ACCESS_LIKELY vs. > > UFFDIO_COPY_WRITE_ACCESS_LIKELY could evenmake sense. I feel like it could. > > > > For example, QEMU knows if a page fault it's resolving was due to a read > > or a write fault and could use that information accordingly. Of course, > > we don't completely know if we currently have a read fault, if we could > > get a write fault immediately after. > > > > Especially in the context of UFFDIO_ZEROPAGE, > > UFFDIO_ZEROPAGE_WRITE_ACCESS_LIKELY could ... not place the zeropage but > > instead populate an actual page and mark it accessed+dirty. I even have > > a use case for that ;) > > > > > > The kernel could decide how to treat these hints -- for example, if it > > doesn't want user space to mess with access/dirty bits, it could just > > mostly ignore the hints. > > I can do that. I think users can do the zero page-copy themselves today, but > whatever you prefer. > > But, I cannot take it anymore: the list of arguments for uffd stuff is > crazy. I would like to collect all the possible arguments that are used for > uffd operation into some “struct uffd_op”. Squashing boolean parameters into int flags will also reduce the insane amount of parameters. No strong feelings though. > Any objection? > > -- Sincerely yours, Mike.