From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C2D3C636CC for ; Thu, 16 Feb 2023 10:47:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52F4E6B0071; Thu, 16 Feb 2023 05:47:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DE196B0072; Thu, 16 Feb 2023 05:47:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A6216B0073; Thu, 16 Feb 2023 05:47:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 28D0B6B0071 for ; Thu, 16 Feb 2023 05:47:33 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E6C811C6868 for ; Thu, 16 Feb 2023 10:47:32 +0000 (UTC) X-FDA: 80472828744.17.8166F0B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id AD986A0009 for ; Thu, 16 Feb 2023 10:47:30 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QO6DJo8N; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676544450; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bFu+2mKQzL8+1+DVxbOoshzbfGAy/UjaNE3ZIcgB4Yg=; b=evO0jvERzDc2EQ+TILky5pGsVYO9zCVxfvS/j+luyXvvrzZgs1q5W8UY0rRFaNv5qU6RxE DloMtO3c4dhS+NTnMkva4ZPnPAimo6MAGD/f+abnpyHqIakbX1cGWBkJ0JIGPn95hBKFLr 1c4nMu9do/pwIHEfd8elY7EI8cIei9U= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QO6DJo8N; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676544450; a=rsa-sha256; cv=none; b=NDshdudE3BneL8cQeQbmyepIeI4szbQczkQCsTRSUPiuDIu35/NAzT+bACBN+KgM4JeG1V X7tzKwBkIhN5X1tDKOFTSpld0b4kT6+yXsKUYgAFO0AdNoDdyc+sLX51AxC4mzq/OVEcQz FFkE/Mi3dNwXfJ+U6XkADHs7rsRSnF0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676544450; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bFu+2mKQzL8+1+DVxbOoshzbfGAy/UjaNE3ZIcgB4Yg=; b=QO6DJo8NbQ2S9vMF0+qTQtWdLcGHBiu0A8n/OnwGB9rBwCOxZg9GSbmIOGQQjKT392cxv7 Crk2ZR/jV4gOyBPxUDZXFVxMnxxLKGkBj2AjcNAkXX5sGqhkL7X9USVjOwtjeChOwZaygF w7jriGVY9xhrYYax1CCeXHfQW7yZ7Eg= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-161-1AJUIYSHOPuSdURgagbIHA-1; Thu, 16 Feb 2023 05:47:26 -0500 X-MC-Unique: 1AJUIYSHOPuSdURgagbIHA-1 Received: by mail-wr1-f72.google.com with SMTP id v5-20020adf8b45000000b002bde0366b11so181843wra.7 for ; Thu, 16 Feb 2023 02:47:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bFu+2mKQzL8+1+DVxbOoshzbfGAy/UjaNE3ZIcgB4Yg=; b=PEh6x1dgtOb/leDsQF42ypsehznhDBrCp3HMXUNx2zV8f6QM1lO+1RjmmPoro7QSNf r+R2aoIw1/1MXsfWp5jnI/HTed/1jd8qDFcNa8jHadlah+Ew69PkiUyY8lqWaXGBzzzi XWA9JJNcFrjlaCm2BjOr523OYmu0++o6eUX1jjVH4RINn3A8cLsgStN2lW6g91RkOV0Y qLub2PP5Iz4XDgpHVIus+hCIHK/jyuTpnJpZic1/fjBxohWRI2OwCybhpUW5WqmkvSWG Y0WCbllad8tlaY6Ls5CDrdFnpLxM7fWi1e2KjthDJn+DXXoKK51rSvJfsPspDkVHZ4kf lcGA== X-Gm-Message-State: AO0yUKV00JSs47ZVIQyBlygsizcvOl2fDPuyXk1COGA06yNI24UQHlme r2VeKowyf3ZK07IiazRrt9zMVJ4f89e8YT24slTwySnVyjpKcD0egSXjo+C28nYfVYIxLgVI+8g obhyhPvAWTQ0= X-Received: by 2002:a05:600c:331c:b0:3dc:19d1:3c13 with SMTP id q28-20020a05600c331c00b003dc19d13c13mr4087164wmp.12.1676544445549; Thu, 16 Feb 2023 02:47:25 -0800 (PST) X-Google-Smtp-Source: AK7set9xs6Lk1X2m6MayPCsYGet7DDOGN0gmnf1j/1HhZq3BV3F7VtjyDp7Yd46+Sou+zcpjOloL7A== X-Received: by 2002:a05:600c:331c:b0:3dc:19d1:3c13 with SMTP id q28-20020a05600c331c00b003dc19d13c13mr4087148wmp.12.1676544445204; Thu, 16 Feb 2023 02:47:25 -0800 (PST) Received: from ?IPV6:2003:cb:c708:bc00:2acb:9e46:1412:686a? (p200300cbc708bc002acb9e461412686a.dip0.t-ipconnect.de. [2003:cb:c708:bc00:2acb:9e46:1412:686a]) by smtp.gmail.com with ESMTPSA id bh25-20020a05600c3d1900b003dc5b59ed7asm1390285wmb.11.2023.02.16.02.47.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Feb 2023 02:47:24 -0800 (PST) Message-ID: <7eb2bce9-d0b1-a0e3-8be3-f28d858a61a0@redhat.com> Date: Thu, 16 Feb 2023 11:47:23 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Axel Rasmussen , Mike Rapoport , Andrew Morton , Andrea Arcangeli , Nadav Amit , Muhammad Usama Anjum References: <20230215210257.224243-1-peterx@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] mm/uffd: UFFD_FEATURE_WP_ZEROPAGE In-Reply-To: <20230215210257.224243-1-peterx@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: AD986A0009 X-Stat-Signature: f5d7rixxc3oy7o8pb8okdg3b66n3ziyd X-Rspam-User: X-HE-Tag: 1676544450-117330 X-HE-Meta: U2FsdGVkX1+8AglccionrM39Qi/1UhlmgrydAoK3NS2RZg67zhunh1LtTdGPa4kbO4HegKFk8FXbBjhPDlrr8fJTIquVNB22wESbBjFgAVSCAWMoZhTfmbrtvpLcNPLwjNuB4e9ZwkeONkMEchMRQIILIYmEsIKjMwTicxol10hJEQVM38CwZHf7QqG7x/QUkQGTvvY2t7m1+D7QmXvlSWQxcw+ufeLgBqko8/mfyi9CkNsdWwS9kJ2/206mFheRwx0Shl3PsTs/NcS/XN5n3OyUygeAX/y2FK/yygPinx60po8URcNlQyxXaTIr5fK+lQceSe+bzPELCEUbr/foxG36HqHpXU0LMymQqZlUfpWcEJKMf7sJTa2eCw6Qrvl14DsspLsGgaC0A5qJBkxoPHXMYLRKJx8OsdbKmE3dVlrwBBDFfTIFhxWdGn2jFwwvUPcgPXIHPYseJigpJo3w9hNxYheZHvUJ2xIWUiJWnSJa2UKp9lVF6mMppmprlU2z0rm4ricVQHJ1nVw78GvKmP+3LGuA4DvNJ329q7thNy+RX+u5lmxRImSzjZauW65WwPjp6x0A4qSL4JSYb3LC+uHH235kxpgOK48g8HsvY4MQhdJFpCJg5bfXwXJhzkDHHVhLzEa0lC9T37l/E379HVfPTEHUVia3RjzCqCmCEOfsuSNZeCExnfEF6+TyDDskD1tBAUbo/lpArZ1hnKqKmO9oo7PjwuCV8lfAywiYmcfA0Qt39fvorcGWtl8G6I7NXMk2kfJtJRkkjveb4vKmpGiat+TX565witFnmZI1fCWwOpJihi6HEE2us61GhzNEYiFsf3sgVNLiTttiBmAQ1mRaodcF4TXLSF3/fV2orIkMSAJ3TuxGAzk65820l5JkX/1912viEs+zr+RfratrWprHJPfe+/SQFWyzcdGCwH6dWqpHMFLA+I7/Dt26x5d1hfciA85oyDxj5lNuOAK RDyVTPIP fPh++edFbtebw4bJvKPcyaAmnT3/8QSWxD5OXIoLrpdG2+lX4IqrcxbWvE3RG0Q6QDZ9sT2loBNya2A2cTRL8IbRyWbDTUtlui9YLGyhPcCvSuLNCxHgvzl7Qa12FxlLZHew4EdmDETKRpGzOz+OeOrwFQltvHcP3TDaykf78XYhPi6UEb/D/HVnWR3CKCqAsAwm+Zo6Y/5KjWQMKCKTLJxOfLsPwd7uJux1Mbv4j6dcqUZQSZUjOpzKP39NjTK/fV5fmv4pa+3jb4zga1M6ONfWamkY8lGK/Bv/Xv8yyP7suSw70SeTrnOoV0ojrmUhzmQrBZXGAB4liBX/oxAK9Em2M4F95wpRBtiEdewN91O7bJiQA+jaIyuAhJEiosv/ORoEnSHO4NqTjwcIUr8OW5u1Ne2b6uqAuypFuRENhwTw2T1k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.011410, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 15.02.23 22:02, Peter Xu wrote: > This is a new feature that controls how uffd-wp handles zero pages (aka, > empty ptes), majorly for anonymous pages only. > > Note, here we used "zeropage" as a replacement of "empty pte" just to avoid > introducing the pte idea into uapi, since "zero page" is more well known to > an user app developer. > > File memories handles none ptes consistently by allowing wr-protecting of > none ptes because of the unawareness of page cache being exist or not. For > anonymous it was not as persistent because we used to assume that we don't > need protections on none ptes or known zero pages. > > But it's actually not true. > > One use case was VM live snapshot, where if without wr-protecting empty > ptes the snapshot can contain random rubbish in the holes of the anonymous > memory, which can cause misbehave of the guest when the guest assumes the > pages should (and were) all zeros. > > QEMU worked it around by pre-populate the section with reads to fill in > zero page entries before starting the whole snapshot process [1]. > > Recently there's another need that raised on using userfaultfd wr-protect > for detecting dirty pages (to replace soft-dirty) [2]. In that case if > without being able to wr-protect zero pages by default, the dirty info can > get lost as long as a zero page is written, even after the tracking was > started. > > In general, we want to be able to wr-protect empty ptes too even for > anonymous. > > This patch implements UFFD_FEATURE_WP_ZEROPAGE so that it'll make uffd-wp > handling on zeropage being consistent no matter what the memory type is > underneath. It doesn't have any impact on file memories so far because we > already have pte markers taking care of that. So it only affects > anonymous. > > One way to implement this is to also install pte markers for anonymous > memories. However here we can actually do better (than i.e. shmem) because > we know there's no page that is backing the pte, so the better solution is > to directly install a zeropage read-only pte, so that if there'll be a > upcoming read it'll not trigger a fault at all. It will also reduce the > changeset to implement this feature too. > There are various reasons why I think a UFFD_FEATURE_WP_UNPOPULATED, using PTE markers, would be more benficial: 1) It would be applicable to anon hugetlb 2) It would be applicable even when the zeropage is disallowed (mm_forbids_zeropage()) 3) It would be possible to optimize even without the huge zeropage, by using a PMD marker. 4) It would be possible to optimize even on the PUD level using a PMD marker. Especially when uffd-wp'ing large ranges that are possibly all unpopulated (thinking about the existing VM background snapshot use case either with untouched memory or with things like free page reporting), we might neither be reading or writing that memory any time soon. -- Thanks, David / dhildenb