From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15B66C7EE22 for ; Wed, 17 May 2023 22:12:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A936900005; Wed, 17 May 2023 18:12:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 658DB900003; Wed, 17 May 2023 18:12:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F9F0900005; Wed, 17 May 2023 18:12:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3A44F900003 for ; Wed, 17 May 2023 18:12:43 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EF9C1120618 for ; Wed, 17 May 2023 22:12:42 +0000 (UTC) X-FDA: 80801147364.10.FB34832 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 6BE931A0017 for ; Wed, 17 May 2023 22:12:40 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YMN89pz3; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684361560; a=rsa-sha256; cv=none; b=ycjRSE3yxUL3O5sRhLS4ryD1YwiejS1iLiTcNj0DuG56K3gBr4q+hh1xGV/UeNbq2VkE4S NUL9aGKZDORf/Ki2eAtvbZcyaEJQSUlFhMFS6IjyUM0685Zkmmvj0nc1Jd/lDst/FXS7ML D6la6olPvth008GyTyxWgd4NYN29zDA= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YMN89pz3; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684361560; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PQ4eY+QHN6MZUcecmxgnEyYB8n0iVk8zO1FGyd+f7WY=; b=IjFU3wb2jqBb3ShpmeCmxlP9b+xsXq6tg9cYvDKdBvDoPlW3Gtw49OYmEyPawhn98cthbt iciGfwcJkw/oNkmSoE7D79E0lMXtKJARtC4h1e15ZkQjoXzDafj3+cnMrTdvo6SKrwDNjh 9dEcILIgVpSjDNJnbVCXOYpPvIy0MFc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684361559; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PQ4eY+QHN6MZUcecmxgnEyYB8n0iVk8zO1FGyd+f7WY=; b=YMN89pz3rC1DryWWkrcJkvMpId2UffQ9AaVpy9L5gBhuSFhXwQvVC+PuvP9z7N4lazNPLW 4kf23wi9snV2+N0wpyAVrAD4mt1npdPs3B3MspeWaCkfRTBQfzZP7KoCFLe8vhAbdazpzR 7iHuv98YntM0oXris+9HVk7NXeaQsYE= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-78-VMcZs7X7PbSr857EDoYwCA-1; Wed, 17 May 2023 18:12:36 -0400 X-MC-Unique: VMcZs7X7PbSr857EDoYwCA-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-61b6f717b6eso2518266d6.0 for ; Wed, 17 May 2023 15:12:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684361556; x=1686953556; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PQ4eY+QHN6MZUcecmxgnEyYB8n0iVk8zO1FGyd+f7WY=; b=HuGXnfuuAHCxKghlxytMo7qmq9jUtKvkwKZPHgZ43/ckkV+2a6vE9uJfs4ZoUahKcu m2frIFg9elv6TS3cBL90J369zmNFYIYyDT/AapN9sFxvkDP5xB7KtLiaO0ewKX+yiDM7 L80h5V+nxSaK5fQb7+R4EQinhkaF7X46Obvoax7CDProKndO2UKD9mFT3DcPEPiNwZpq mvYPE6ZUNIaycr2aBBybj35O3OQJRhQZb1zSa/SFB4ugTEglrvjF0TxShpcDr90aVXq9 KtD9CZjy/dDpBh1d1x7yLAteyi6OBkshTS9uAAAgJYC5wxrUfjEqbhjOhOl6GaIK6rrK dKBg== X-Gm-Message-State: AC+VfDzR+PvpbTrklq+CG55yRNSoff/jsme2LFpmIRGD2X10ZnqOe4g3 KWMD/GV695JBQuuSAVsF0dwrnBrq6KoAiaof3oWITCKgkhrQf2zgs1xlG3+wt19ZuAWdYZ989Em n24UCkPgDqOA= X-Received: by 2002:a05:6214:21a7:b0:616:73d9:b9d8 with SMTP id t7-20020a05621421a700b0061673d9b9d8mr7954312qvc.3.1684361556033; Wed, 17 May 2023 15:12:36 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4IBdp4rAChvSgr7/T0wB6VEVMyIOGDgFjPF/1CWdQad7Ykw0oUGaxhyHzwZBHh/W1zQHQi7A== X-Received: by 2002:a05:6214:21a7:b0:616:73d9:b9d8 with SMTP id t7-20020a05621421a700b0061673d9b9d8mr7954268qvc.3.1684361555684; Wed, 17 May 2023 15:12:35 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-62-70-24-86-62.dsl.bell.ca. [70.24.86.62]) by smtp.gmail.com with ESMTPSA id mk1-20020a056214580100b0061b5a3d1d54sm54453qvb.87.2023.05.17.15.12.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 May 2023 15:12:35 -0700 (PDT) Date: Wed, 17 May 2023 18:12:33 -0400 From: Peter Xu To: James Houghton Cc: Axel Rasmussen , Alexander Viro , Andrew Morton , Christian Brauner , David Hildenbrand , Hongchen Zhang , Huang Ying , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Shuah Khan , ZhangPeng , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Anish Moorthy Subject: Re: [PATCH 1/3] mm: userfaultfd: add new UFFDIO_SIGBUS ioctl Message-ID: References: <20230511182426.1898675-1-axelrasmussen@google.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: tw63sykdis4y9wm69is7nwwqkexabt86 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 6BE931A0017 X-HE-Tag: 1684361560-925996 X-HE-Meta: U2FsdGVkX1+SXGX8w6bZS0md8+LswV1yz0Km+xHgt6uUHkjWFpjLW37TLrgWdT2Y5Ggg+L2J2BynRwOjrd/4N7wlVeyraLWRNoG62QHv0V1l0Ya6rInw7plyefatcN1jwasTlkUeon6h+g2pkedqsKnwiiElGUGqSOurUFFf5OHK9A3P01bsa/t5+zXMYvR97ye1tKkElwI6QpZQTXhD7n3IuHvBcLeMVVhJAMJy/tyBMmLjzn3iTseIgkjncvbzt7KQoVqziBnezInUlhPn/v1MprARFnLAlVqrYzXoi39yy/wRRpLnjhnBUadVeNszWKVEwi8lBrkbVHdoKliISXIBNKkkJfOZy8ExgzeORu6qxfAxn0PnszyWyhw8iA1LUkIf+mJWbfXxiEtXoBYJVknyNXsj4NSSnQems0mhKAubq86KQyrC1GdXS+4cI5ujsibzvLDj/1bKChW4jbla7Yy3k0lnR/EqfuHu4ZJVoiWiTAUjVK44KzhI5rEXuLmbdNN3p1SbUaZwWgqbHoCz9vFVW3alivklFdhMaSYLC6vkVPzgYseM5+8fa6jdd19yjoK4Q6DosLNR8x7fOAJnNpKXNJj/ZWciCYyyYT7I4kNP1LpIwwKj4HT1cs5JjFFeeDfdl8hxCho/G3wl9XSs3+oLQ6CXlZDD1VBmE1UpBqiCFrMt1KR5zxwtr9OV36Uhj/qH1vp2QecHj22vTSuPW0svwrsTFC5tNu6p0a3JniOe0dRSUIQfMrIS0081JLF3br5Lqi7F7ecJ6iZOd24cH/rBjuQOrB5g+Br0y2pEdcVW/UBUUt6NvscpTzSCTPgPnSmntOUYRc+YkWcbm5j76zTUv0TKGzdxzfLWzIpsFrr0QfV1PeL2/KN8Pu0qKcHv6en7lg3CHlz0FhEolcm1uNDzelckFWJiuDytZrcey0WU45AyNgb0UK/AxHcUJDRhBUL5FLBbX3zFwBAfxSG ZzxWxr3G V86uoJeKGfc+1MVYTbXvu68ALXUlzkogDAYG3rm80I29WHNCc605pT3ZTO5+zmgvCUuhjcK2JVI78AblE9wIk4ShxXPwO5Mb0faKDBbL6R1BMjgHiShpedibjsrvhIqd9pBbQlAZRULYBryw5U2k3wJh7yXzmS76ib9IQgA6B3CyppRHf5mX/zzQi3UFm7ng+zYTV+PyjnSZtYHKYi3k8RcNGFUfGZombusDW0hL0D6nED/xze0uTDMfF4l63Byh1LRDWLGLLOB0g33sFgoK9ZmFBM8wd8oqxG5yR004MO3XGZ22PHv4zd6IHfAuv1bXVcTC/uOePh/MmKJA9kkd2M7PDszNb7xfygozQxGKeNJ1v4Yx97fS0Vlfz5xU7dNvNeGxryC+8DrahOe4z8bzHKKfNb+Il0ZqC1WuII9NnmUCX3lBi/1ik+KMdcA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 11, 2023 at 03:00:09PM -0700, James Houghton wrote: > On Thu, May 11, 2023 at 11:24 AM Axel Rasmussen > wrote: > > > > So the basic way to use this new feature is: > > > > - On the new host, the guest's memory is registered with userfaultfd, in > > either MISSING or MINOR mode (doesn't really matter for this purpose). > > - On any first access, we get a userfaultfd event. At this point we can > > communicate with the old host to find out if the page was poisoned. > > - If so, we can respond with a UFFDIO_SIGBUS - this places a swap marker > > so any future accesses will SIGBUS. Because the pte is now "present", > > future accesses won't generate more userfaultfd events, they'll just > > SIGBUS directly. > > I want to clarify the SIGBUS mechanism here when KVM is involved, > keeping in mind that we need to be able to inject an MCE into the > guest for this to be useful. > > 1. vCPU gets an EPT violation --> KVM attempts GUP. > 2. GUP finds a PTE_MARKER_UFFD_SIGBUS and returns VM_FAULT_SIGBUS. > 3. KVM finds that GUP failed and returns -EFAULT. > > This is different than if GUP found poison, in which case KVM will > actually queue up a SIGBUS *containing the address of the fault*, and > userspace can use it to inject an appropriate MCE into the guest. With > UFFDIO_SIGBUS, we are missing the address! > > I see three options: > 1. Make KVM_RUN queue up a signal for any VM_FAULT_SIGBUS. I think > this is pointless. > 2. Don't have UFFDIO_SIGBUS install a PTE entry, but instead have a > UFFDIO_WAKE_MODE_SIGBUS, where upon waking, we return VM_FAULT_SIGBUS > instead of VM_FAULT_RETRY. We will keep getting userfaults on repeated > accesses, just like how we get repeated signals for real poison. > 3. Use this in conjunction with the additional KVM EFAULT info that > Anish proposed (the first part of [1]). > > I think option 3 is fine. :) Or... option 4) just to use either MADV_HWPOISON or hwpoison-inject? :) Besides what James mentioned on "missing addr", I didn't quickly see what's the major difference comparing to the old hwpoison injection methods even without the addr requirement. If we want the addr for MCE then it's more of a question to ask. I also didn't quickly see why for whatever new way to inject a pte error we need to have it registered with uffd. Could it be something like MADV_PGERR (even if MADV_HWPOISON won't suffice) so you can inject even without an userfault context (but still usable when uffd registered)? And it'll be alawys nice to have a cover letter too (if there'll be a new version) explaining the bits. Thanks, -- Peter Xu