From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44FFCC7EE24 for ; Thu, 11 May 2023 22:00:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EDE56B0071; Thu, 11 May 2023 18:00:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39D8F6B0074; Thu, 11 May 2023 18:00:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 265C16B0075; Thu, 11 May 2023 18:00:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 18B956B0071 for ; Thu, 11 May 2023 18:00:49 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D26E416088E for ; Thu, 11 May 2023 22:00:48 +0000 (UTC) X-FDA: 80779344576.24.FF81EC4 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf20.hostedemail.com (Postfix) with ESMTP id C87ED1C0014 for ; Thu, 11 May 2023 22:00:46 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=EuTOAJAS; spf=pass (imf20.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683842446; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EEnuBFRoeoCUMFkZqOok9vk168rBBPu1XkXprJKNlqs=; b=Ul9kswSus0vBZ0zF0mOhWCIeKNqi76h1+GrqrK1W+V1+MEulKLESXZq/6pJx0TxzE9XUuv 02VAccKWRSpDY4FNO/zqledpoC2ir6PF9cEBFxAMDYLU6Pj+7dwqbuJJhvnb+NwOVH7tY6 rkqVrxU0TNSsa3IXgCSHFoiwNGt4Kr8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683842446; a=rsa-sha256; cv=none; b=FZ2+p3J5no7tknPadINmVSu2QzV8fMjWw0LuViSVQqz97zXP2+o5hKaV1bshV/iPHejefk v22arwQDSOsVZKve8KXOM0D4H8B5twDAxIlUk5dJsRhbKeugo/ES9Si5ShiTr4qQSfXF/R zqksUsj2KUs2YavWoHfmhxWqlqxwSx4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=EuTOAJAS; spf=pass (imf20.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-3f38824a025so913961cf.0 for ; Thu, 11 May 2023 15:00:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683842446; x=1686434446; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EEnuBFRoeoCUMFkZqOok9vk168rBBPu1XkXprJKNlqs=; b=EuTOAJASK3+B5vjp1G5fp4oCLaqBqstFV5xmxZMlYUxLY4Y9iaOjEZaaBdlcUMJoyK Rxd9ZhJUgNmNCOYrDQmB/rYzrmzr599ZoTdKdi4lM/nJcZiqeJCLmtMVItEvr8pzShjd cla+UzOFyahonvzdiuuSkFkC8Vkz05yeAaUxWrG+89kfFDk4aSrMQsUZ1QrLVOcVN9Hn RzCX8FwrGWSCvxWJw5Mu9LPk1XnDDAWiuuUFh5udKk4tTKqIeN+c+H4k0vSrR42I7UIF g0M3HzoMVLzfJtHxpGrMdDQIESwD2veYvVSOu3Ex2qW5Kg+wC8Pfpyy8yjTV67vSS5Ro XMzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683842446; x=1686434446; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EEnuBFRoeoCUMFkZqOok9vk168rBBPu1XkXprJKNlqs=; b=TBQHKtMNBfJEdG/jyQHkScpW3SpYjgyTdT1X+yLDe0TnKVXSdcZGIBcxuX+CasSBcY bUt7Dwm3K1vTat/ATBV+fidB3aLqE1jJiic/MwTKQbv+ElvvfPDZvry6CdYmcKjRhbcF s49M9SgvuTUEuzSLq1wvUnstBSD06NQysmpFNNCAi7gPqkvaS/4uZ0XHIR44l/VSrUTj XmZMdN2wjnX1yxMtzUCt1I+k0fre0zp0IE/FmYwPVwogpCnxjch8zMV3+U1DD5csVLjb R0uRBU7C/fZzS6YV+uGdT5KaeFFXCYOWHyO0hteRto2WTa5J8Jnq7qrFHAZgEKUASfwL ROAA== X-Gm-Message-State: AC+VfDwPajAHSHFxplbz1yRbFSyn0FwxL3TyX+jo4ME3tf9nqF6n/+3H oodAPzpZFpE5CqS7zX6zbORWWXP5tqHWcSw3lnykFg== X-Google-Smtp-Source: ACHHUZ76UIMQRNxt62E19OmIL/umDvVX387jn2iloTd079HNSRy91xM1m1n4PpSK9EnBriiaEHnVjYd84JhsYIBiHUs= X-Received: by 2002:a05:622a:11cd:b0:3ef:19fe:230d with SMTP id n13-20020a05622a11cd00b003ef19fe230dmr5923qtk.17.1683842445770; Thu, 11 May 2023 15:00:45 -0700 (PDT) MIME-Version: 1.0 References: <20230511182426.1898675-1-axelrasmussen@google.com> In-Reply-To: <20230511182426.1898675-1-axelrasmussen@google.com> From: James Houghton Date: Thu, 11 May 2023 15:00:09 -0700 Message-ID: Subject: Re: [PATCH 1/3] mm: userfaultfd: add new UFFDIO_SIGBUS ioctl To: Axel Rasmussen Cc: Alexander Viro , Andrew Morton , Christian Brauner , David Hildenbrand , Hongchen Zhang , Huang Ying , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Peter Xu , Shuah Khan , ZhangPeng , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Anish Moorthy Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C87ED1C0014 X-Stat-Signature: w3dwdfank4z65765m7zusum3k9qt7pc5 X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1683842446-318633 X-HE-Meta: U2FsdGVkX1+/F02C3YB6d+2GuSmMC7vISMD4ac4CiOuRATN/Wir2DJqJxSDMySESDPQzHR93tDSvL1dQ0YlvOmQ5TBftZn/K7YF5jR101hSoz+OFPsY46IqaBqpcS/DNQEetwDy6D8lBTmJXoK86IB/nqMSmRe/OEgV+44lXtPkaqOX+b4Vn2OWlUXaU0NRgfHV6168RgPXwR3FtSqbVdTY9z7Ybl4IvmlObbBefmd/0iM+TcxScoweE3X72XgDPtZcV0E7GyFg55cAhq/KnaeDJGWcKmo86jTbg68e2vITTRJg21RcW8V66pEoHYWEVjhTng8NHn1cgYCEM3utuLJ4pJqAWbnh73v238RrQ3YhXbgFgb5KIcRfLz/YNfmkxqYm2N6isfDZ/EPOJBnCRlPYD4vR/IUYnmYC5WOqEWjYgKA4NnvhHCHPiyzpfFcRMn9YjAAblXEzHNytlIQeqMhMZ9QnE7glwnEgXKGcvYrF5r8yCqXEBmvcX5UV2M4f8m84ncDrmUMZYCCecHW7HM02wTarHc1dyYMTYf+gW/9XXdYAKU2tS7Yt9cw1EyPT6XeuB8AehBReKhi2UPtDB0IWA8g3UGSslwGqZwfwZf1wEwB6xdMIXSo43Y5g0oFyUuJvQg7MNUGhkSYreAYRGuRvtvK7RTsXdeDADx9NvVv4E/Wxn8Hl0Mmfz1VCbTspmHuSB1Gj/hnfoxuvg9e2Fpn7uFeyHwxDZmXKtgjSNvq9cpsLjdk7ToXVMkregko85UlLMWLF3u0P4UvDpXYqxy2cfm0CTu1QtGk5s0/KMmqi2enSFuzwSzthaoLD+n25+tpcJ/BDsz4UgHoFf0TDg9VnLSz9yANfJdTqTmMOu8HwFSO0QOjrTPaRj/YYF3Jb2MmUCGsjLiiVShYjhHqt8CdesFejWNU/VGq5JGeHGuTlcFMglL7Jcn314zEa8t/rY9q9/8mFlOHSgx8o5pys OduzXc2j t+6o5CDU2vciZwbqrA8XEKx6LDddLdUDQm2qb1SmFnNm/vsd81xlfcfEbbCabx5gQuR78cYFIbV3zbGy+PIf6H47xlbKtCd8oAsAZ1BJHI3jl2eDD5Zdg20E9kSfzL6OyDRaJ/DMja7Jn67+uYbuHE5EexZNTpZuHVS6rZpZJ+Qdu+GWHimnwForCx0qbs/7VWEw7EYYUQusBCC3lSZ4i/xcwE6SOvgVxMHAc4vdRr2iH0oJiOiUvUqkNGyBY6uBwcY10xdtL3H+7mWY3yBJirmVNMDZIXG10U4DVTNQ+ly5x2avUxbtY7n/+iMq/M24jIn/EeRIy073hLG+e5C8cUDIsbcAnJf7uK3Vo4+jJxeKZ0Kg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 11, 2023 at 11:24=E2=80=AFAM Axel Rasmussen wrote: > > So the basic way to use this new feature is: > > - On the new host, the guest's memory is registered with userfaultfd, in > either MISSING or MINOR mode (doesn't really matter for this purpose). > - On any first access, we get a userfaultfd event. At this point we can > communicate with the old host to find out if the page was poisoned. > - If so, we can respond with a UFFDIO_SIGBUS - this places a swap marker > so any future accesses will SIGBUS. Because the pte is now "present", > future accesses won't generate more userfaultfd events, they'll just > SIGBUS directly. I want to clarify the SIGBUS mechanism here when KVM is involved, keeping in mind that we need to be able to inject an MCE into the guest for this to be useful. 1. vCPU gets an EPT violation --> KVM attempts GUP. 2. GUP finds a PTE_MARKER_UFFD_SIGBUS and returns VM_FAULT_SIGBUS. 3. KVM finds that GUP failed and returns -EFAULT. This is different than if GUP found poison, in which case KVM will actually queue up a SIGBUS *containing the address of the fault*, and userspace can use it to inject an appropriate MCE into the guest. With UFFDIO_SIGBUS, we are missing the address! I see three options: 1. Make KVM_RUN queue up a signal for any VM_FAULT_SIGBUS. I think this is pointless. 2. Don't have UFFDIO_SIGBUS install a PTE entry, but instead have a UFFDIO_WAKE_MODE_SIGBUS, where upon waking, we return VM_FAULT_SIGBUS instead of VM_FAULT_RETRY. We will keep getting userfaults on repeated accesses, just like how we get repeated signals for real poison. 3. Use this in conjunction with the additional KVM EFAULT info that Anish proposed (the first part of [1]). I think option 3 is fine. :) [1]: https://lore.kernel.org/kvm/20230412213510.1220557-1-amoorthy@google.c= om/ - James