From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B9DEC77B7F for ; Thu, 11 May 2023 20:40:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51F066B0075; Thu, 11 May 2023 16:40:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CF016B0078; Thu, 11 May 2023 16:40:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 396CB6B007B; Thu, 11 May 2023 16:40:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 287346B0075 for ; Thu, 11 May 2023 16:40:55 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D19A9407FF for ; Thu, 11 May 2023 20:40:54 +0000 (UTC) X-FDA: 80779143228.19.7714372 Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) by imf28.hostedemail.com (Postfix) with ESMTP id 0F2D5C0005 for ; Thu, 11 May 2023 20:40:52 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=V31+M9mg; spf=pass (imf28.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683837653; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vZx8FpQdLVzyY5XjqVFGkizAxjSRSvIZJciSNIfEcDI=; b=VHHgtAKq1aTPShLArzQSNIxcrF4qKP1rXD/3STMhYOZZNysbJnJTBcQXmZ08dUUWgN4WfZ o3x2O1+/4PffJ2NP3oNF9QRuxrxtVPtLMYoJ1xcHEEA9OomZE/FNnEx0oCiLdwTZCdXtTj pp7wMyda/GXSmeOZgaIFu84HjPh0QBc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683837653; a=rsa-sha256; cv=none; b=EQcAOmfuxUPgloF1u//5MQ1whvEGTWRBJjH30P42A6nvOtVM9bvRbJAPkL9o9vokxT4mo4 4WpdPwmYfKOYYJM0KCZthcXfhZrzEW9nuIZmoAjtbdW77FFP4vEuJUSsnI7A16AgREQqFp k4c8IgkIlRgbD8GRewgVPqfAsvYiqUQ= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=V31+M9mg; spf=pass (imf28.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qv1-f47.google.com with SMTP id 6a1803df08f44-62135cf0adcso19014896d6.1 for ; Thu, 11 May 2023 13:40:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683837652; x=1686429652; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vZx8FpQdLVzyY5XjqVFGkizAxjSRSvIZJciSNIfEcDI=; b=V31+M9mgzwDUisMy4JDnJDouRAbsaL7pgoffAx/ibC37kMSfO1+m5Ckl4dF2YIbx6Y 14tG8z8oHdHnyT3BhNZ68dUmkvkIX2RZyNT0iuqaCU1bWJxlzw/2G9Mg7PsOOTQkrRR2 LENQ/q6YuRjcAfhfIG49GvYyegOaAw2YvXMXEQiutviQvY/XL/nNv5MunRO5PZsrKaOV dANDmtfkJwVa5ProdP2UWjpHESzcPCcdR7g8TcsTgg0tUQAl4pL0xHxUk+dJvq3qefrz 7e9Dvt4yx0mTF/0nJcU/783igvVhbYIB/PafPCntuI7bQ3E5LaWhoML03Sr9QXQNRgTO xS4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683837652; x=1686429652; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vZx8FpQdLVzyY5XjqVFGkizAxjSRSvIZJciSNIfEcDI=; b=PGjSYF+4nN/hT07ZFrWJmUAfQLj9x3ydKgiZCnjaGyFgC6ylRdM4cxj5M9Zw2Ulp71 fdRjdRUgxR1wvK+Mx65H6eyw1kUvdDCsDl6ybzXoTHpUzKAnBFzJ9CvvdY2htYeFHdnc 95KZP5aN8CrVcwvunVopW3GMCBkT43g+1p05+g7szFFiwQwAhuNqJYqzkfgInGSLIQge SWSrAGZyUdanaR4oNFLccVQUsZisdzhHjZrBT4rRCMzkM/9Xi0+cjW1X/fIF8lgXVYMB 46DuZ7pto2HDgUGEzz6873Pdl21UGU5ubSFp0LGN3eI5wgDUKT5/XP7rpvqcFZpIAHck Ff4Q== X-Gm-Message-State: AC+VfDzbkt3oPnaVfER2BAscYpHWaHhP32OoE0OYq7f76L41KUzIK3dj ZTZ1CPe0VZLWoWcSURrqvf8S9eLWCtVLvgHYNLhJhg== X-Google-Smtp-Source: ACHHUZ5U6vqjOUU6SjFBcnB4SMX76q/SJW3pj9XYjfIWuMWuVVkfkZOy0EWLaWX1PvF4jheB/PcL22YAF72NkqzLBrg= X-Received: by 2002:ad4:594d:0:b0:621:65de:f5f9 with SMTP id eo13-20020ad4594d000000b0062165def5f9mr5248392qvb.5.1683837651964; Thu, 11 May 2023 13:40:51 -0700 (PDT) MIME-Version: 1.0 References: <20230511182426.1898675-1-axelrasmussen@google.com> <20230511202243.GA5466@monkey> In-Reply-To: <20230511202243.GA5466@monkey> From: Axel Rasmussen Date: Thu, 11 May 2023 13:40:16 -0700 Message-ID: Subject: Re: [PATCH 1/3] mm: userfaultfd: add new UFFDIO_SIGBUS ioctl To: Mike Kravetz Cc: Alexander Viro , Andrew Morton , Christian Brauner , David Hildenbrand , Hongchen Zhang , Huang Ying , James Houghton , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Peter Xu , Shuah Khan , ZhangPeng , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0F2D5C0005 X-Stat-Signature: 3a3un6yaborsrtc1t5c9sxxccdazp3dt X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1683837652-592000 X-HE-Meta: U2FsdGVkX1/sD4BTTPgFwmnB2kqPRy4T8uMQd9xymb6+adxLNpOG9dpZD0L0G+2DnxEzxTzsvNX+HCWhGWeSpmkcjkXMcVAh+Gq3eTPMa0VfPQzCdsx2CmQgRTaopsVjFqPdNyd2qFOmnRhuFigJfeR2JB3Sih+/KGkdBP/hIRnnaSfxH6/02qCrcfnrZjSwDDSZoGA15KNuhml28cWjrB14oxBe8+rSZJR5UYkiDdwAcg0cwNTqvWTTzv21r6iVcyj4kZglPqQxwXxFnBCoaadwJFaAGAB7zLx0qanJxSa9S3HKCOvi9xga6WzpNzFffHBXuSFBaCN/qT2LXLigKiLv06OvwrX4YNkVqq/KBgOh6Ld5LlkSf65srXxdGXCRSmS0J8ZQnz6AbpeXEZk1e1LVY1Xi3BAu3vTJZqLiPLo4fx+KXryV+fIZAgaIh3zA3QX+txOYALq6IDVvNyA4OGk6xo6AyUY2FEMFhisluvsWKFueNPBw5F1NMscOuc7pdy6+0M4tRouDhjmRMzswqMzmaWSZNgAIPJgBNOMuhRJhEuc583u1Q0zNnkgbro9cM/FGQU4/lIEA0htc/JHh4IhAQz4b4LQQ7Azv8P1saf7WNvPbsP9URkBaFqMlKvYyFASVLTbkvUJRMet6xLhBTBjZzqDPS4uAFypDRfu2OoHATi/oI+Kymjq9cr3pR92ZKLHRkFfuf4lKsLhx8Ip703MP/hu57aQg11ubr/k2Dc1qx7N4ibrHJsryIrSwtNN+TcT3kIbkhdf1ics/h5DBF/SrqL+BdChYv7CKmPuoQ0P6gisgZK3oCDdZTHGfdh4CxmqMpuyGJFE5j9tSjiDrz3sIwg65hMDcJusxPCXCLI5qSDZ7BCdOlYCJX0pW8sHQUCaIEHdZAXgpOaGOBqNaxTCyuBi1Pce8uCnyTWz7r/JI9sJ+2ElyJm8je4dC2BbMPo55SopdF8XmVHaPhkk 8dImnGKv ptQOUWYXzb7fZx77SlHZ+uf6qpT81pGyJkRqzflpTNpsfDswVpZXltflgtcUn9Bhj5mG3ZX7TdMdCPo2SVCi35hvTYMq1+xjM1RbReOM4wsQvxV9wFxfRhwbdxOT8L4j67i7GQZ5UxNKmYU2HxuVDNCL7GxlT+vEUk3nxPHT5TQ0qEVScGa1cESoJkO+c6G+vI9PYNIGPdS8kicCIn1ikv6ag5cWHco8DeoHUKdoxCo6zM2V2d+65ZmnDqjDRszv+Ci6k/VwUHLxgY9IKViM1mx7ZF4ym6lMVMFH2t/AIO6b3kFB8jcJDCEPtnQrykRXnoqArYrVforpZnvP0net7p1zMYHbhdaJbB6kZoW0/KczIU4JnyG8ljYAOx75Q+jhSgmjN/QniYtTeba/YGo5RgmtaL/bCFGTGxUno X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 11, 2023 at 1:29=E2=80=AFPM Mike Kravetz wrote: > > On 05/11/23 11:24, Axel Rasmussen wrote: > > The basic idea here is to "simulate" memory poisoning for VMs. A VM > > running on some host might encounter a memory error, after which some > > page(s) are poisoned (i.e., future accesses SIGBUS). They expect that > > once poisoned, pages can never become "un-poisoned". So, when we live > > migrate the VM, we need to preserve the poisoned status of these pages. > > > > When live migrating, we try to get the guest running on its new host as > > quickly as possible. So, we start it running before all memory has been > > copied, and before we're certain which pages should be poisoned or not. > > > > So the basic way to use this new feature is: > > > > - On the new host, the guest's memory is registered with userfaultfd, i= n > > either MISSING or MINOR mode (doesn't really matter for this purpose)= . > > - On any first access, we get a userfaultfd event. At this point we can > > communicate with the old host to find out if the page was poisoned. > > Just curious, what is this communication channel with the old host? James can probably describe it in more detail / more correctly than I can. My (possibly wrong :) ) understanding is: On the source machine we maintain a bitmap indicating which pages are clean or dirty (meaning, modified after the initial "precopy" of memory to the target machine) or poisoned. Eventually the entire bitmap is sent to the target machine, but this takes some time (maybe seconds on large machines). After this point though we have all the information we need, we no longer need to communicate with the source to find out the status of pages (although there may still be some memory contents to finish copying over). In the meantime, I think the target machine can also ask the source machine about the status of individual pages (for quick on-demand paging). As for the underlying mechanism, it's an internal protocol but the publicly-available thing it's most similar to is probably gRPC [1]. At a really basic level, we send binary serialized protocol buffers [2] over the network in a request / response fashion. [1] https://grpc.io/ [2] https://protobuf.dev/ > -- > Mike Kravetz > > > - If so, we can respond with a UFFDIO_SIGBUS - this places a swap marke= r > > so any future accesses will SIGBUS. Because the pte is now "present", > > future accesses won't generate more userfaultfd events, they'll just > > SIGBUS directly. > > > > UFFDIO_SIGBUS does not handle unmapping previously-present PTEs. This > > isn't needed, because during live migration we want to intercept > > all accesses with userfaultfd (not just writes, so WP mode isn't useful > > for this). So whether minor or missing mode is being used (or both), th= e > > PTE won't be present in any case, so handling that case isn't needed. > >