From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC010C6FA8E for ; Thu, 2 Mar 2023 17:38:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 510AB6B0078; Thu, 2 Mar 2023 12:38:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 49A0C6B007B; Thu, 2 Mar 2023 12:38:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 313206B007D; Thu, 2 Mar 2023 12:38:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1C8566B0078 for ; Thu, 2 Mar 2023 12:38:28 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BAEA4140490 for ; Thu, 2 Mar 2023 17:38:27 +0000 (UTC) X-FDA: 80524667454.17.268F1B5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 5E9E2120018 for ; Thu, 2 Mar 2023 17:38:25 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HAwYiYVP; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677778705; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=orZXB9gcfRN1e3opRAWGH7mQL5diYEXzfZCK0Pyo9o8=; b=7PS+M0C8Rz5Vl1IValT+/wlpf8qDLqJedv6gilrZSM04xTalfklCQe2nXscTbhSIWy+f5s ljCAL3yrwnkzMt4GDXDTnsQB0Dcyf2pkixlmQSVwqzItQPZ1JiiBxaIX3jO24nunWMa0S8 28b8dyuGqdvaVIDCh4cQ9L82t/x6Y4I= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HAwYiYVP; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677778705; a=rsa-sha256; cv=none; b=2Lqj5aDC819yE2p8oTUt9qy61zRVP+wxQXCSihJfAjLY3cHyhdEujDHIJQ/cMZNxfgjIVX 6cEUxPcOTFOoM7zqKKUIb20H061lIAwRJeSg2HVNU2qa8/qGcCEUWzraqlxGjWme2A5X4o I6H++YWSs/EVxSBy5Ohmu7Zd5xAN/5E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677778704; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=orZXB9gcfRN1e3opRAWGH7mQL5diYEXzfZCK0Pyo9o8=; b=HAwYiYVPzoKCinJCMUbzwhpKXR6dj+bimzZb+wAcGRQ69MpLf2pa8KpTgfff77raqKwx2H JLgRIiUOjE/KoChe0cQJVkEF73e6ddzvdcs0cpA2aAvPbHkjOZomljUD4jwJNpFnmYGkoh nva0/8RaUjHBHakwNxaBVTzS2+BVq2U= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-395-Iy5z5xC8MbWVb7L92_to8Q-1; Thu, 02 Mar 2023 12:38:23 -0500 X-MC-Unique: Iy5z5xC8MbWVb7L92_to8Q-1 Received: by mail-wr1-f71.google.com with SMTP id a7-20020a056000188700b002cdd0562b11so951592wri.11 for ; Thu, 02 Mar 2023 09:38:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=orZXB9gcfRN1e3opRAWGH7mQL5diYEXzfZCK0Pyo9o8=; b=CaeXgdX619r3DJHsMl848M9+UgDvG2Tqbvg/9kKp9MlSfaqXZwJdJb/azMP6/qFxZs /HIx2hC18TccLqWvj61qmLVwSfDmWO/x7oS36ChC7o9rkvNNS5Qd7TMTyAqBgOsYiPft eN2x9i582Vj69O35FdMdneD/f0CvCR50wmIzLgGfQF6e0Wc8KAIdf6Fg4CIGO2LoO1+o PCz4v5ppV+s5g2rC/FqvawDCg9gOOTFqtUevp9UgLbs/ucSClhf7mVHdrFNWfL5sLsMe ih2phP2NZE8v+SyoQMNdNnXmHZORbbLhCjgpRKlCI6eSZOdxEJdpH+GoGa9wDi7MIZxB smAg== X-Gm-Message-State: AO0yUKVdYN9qGgVHbTA4jGfJ1flmMaI1LwAA3shyOHn6WXRmxhznffc1 YVVhARcwjHViIkuIGw7DXk6j6ZUn2aCP3JOao8Q2J4nn7TsNs+FA55JAaXFnuMB6ewMQPx4FC/A O2up6kBFG/fw= X-Received: by 2002:a05:600c:18a1:b0:3eb:39e2:915b with SMTP id x33-20020a05600c18a100b003eb39e2915bmr7773266wmp.31.1677778702383; Thu, 02 Mar 2023 09:38:22 -0800 (PST) X-Google-Smtp-Source: AK7set/wCXT/T4ew5RsMoTxfXp4eHM/sGKmKi2k65woDzwFseBEJ6tIxbuFXaTP+ttDnt4lhxKCdSg== X-Received: by 2002:a05:600c:18a1:b0:3eb:39e2:915b with SMTP id x33-20020a05600c18a100b003eb39e2915bmr7773246wmp.31.1677778702025; Thu, 02 Mar 2023 09:38:22 -0800 (PST) Received: from ?IPV6:2003:cb:c70e:4f00:87ba:e9e9:3821:677b? (p200300cbc70e4f0087bae9e93821677b.dip0.t-ipconnect.de. [2003:cb:c70e:4f00:87ba:e9e9:3821:677b]) by smtp.gmail.com with ESMTPSA id m1-20020a05600c4f4100b003e01493b136sm4015269wmq.43.2023.03.02.09.38.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 02 Mar 2023 09:38:21 -0800 (PST) Message-ID: <92f2fd13-59f2-468d-d989-9b998a098795@redhat.com> Date: Thu, 2 Mar 2023 18:38:20 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v2] mm/uffd: UFFD_FEATURE_WP_UNPOPULATED To: Muhammad Usama Anjum , Peter Xu Cc: Andrea Arcangeli , Andrew Morton , Mike Rapoport , Axel Rasmussen , Nadav Amit , linux-kernel@vger.kernel.org, linux-mm@kvack.org, "kernel@collabora.com" References: <20230227230044.1596744-1-peterx@redhat.com> <9aa69bfb-c726-ac2c-127a-b21fd35ab40b@collabora.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <9aa69bfb-c726-ac2c-127a-b21fd35ab40b@collabora.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5E9E2120018 X-Stat-Signature: khurmwzsnszbpsiz1n86wupn61eq8adh X-HE-Tag: 1677778705-113878 X-HE-Meta: U2FsdGVkX1+VDIRYVu+9mZzJHqgxumoPjXXMWUbYEINQdSJpaufm/vueQ454VOk92Fby/oyhhpTTkT61E+rQqIrvuJXYOFKd8rn3pjqtNfGUG7YyyVBG94gyD1kyTFk0qIwKtby3K6mDaiVpTl7x2tp3X1jCHKR7H2nnFt87OGXl7/XBz1sJo89uuyMrqcQWYfxwN7iIeJv9a5W9zAeljlPMIUTks9cGG2rWT/fmicqqNDOl4n/9ZaFX8zC0kRW3yreFYYBtvXHyAbSQgrPuGJSPR8TNAC+DKfEO5vMFaD0tLByLmfkYiZqtfHAAtpodYypquUJrny4vQywmJ6dgmpqCXuunkRhnUnANXKc+xtvZwD0yynr4LTwwbzVpCyFNMmqnGVe+vdSiwyAb+ewLuYv897XgCNoWqtV2LYfZw6PgCtey4xtbn35LH8PQ7rP+eIIo0OOFI9DmkCg5J0hCtJ7GP5bgEnyG8l6S4gtUnrtELXRXvjh0pK70RjWMpYL5+3EVjmpJzbgodU3w+9dX7Mh1iwys7YVm4ZUTXalmhTebATxZbCzPRkIlqWS0EEUPZ8UeKo3gfmww60y2a9rfXMlojBvRkkT6rTFsyOgSnpc9SvTRAHCDq0iwzJZFLqQUtoMPuixQo/tLiZ8FAkzRFzbwXLF2GJuSwSUouxic57I9ZfF5WzdQnkhheY9yMCYMDJBj6gFCy+ICAK7F/Mamf/1bkoNiCFpIkUQSOmZsIpy4WwuocFsfzUDw0ohwKGQNjM/BDq1SVevqs4n8oWFxHjWRbbVEV2D8sHKRjGSnGFun3aBpSy06XTs5rb9V7YVHPIWrM7CeWwVT3+d9Sqe5YDUXQv6eW9fVbft16P1vtwpm+sBqCgEtnyojxGWI1Of2Tl0c9q/mlltHyKGUXqSxuCtbiI7snZ6BleQCfWJDaAOcFiVWRMB1nyauOKJEb3Yk3TmA0oyd78Lis0ApN08 tcHXTx69 DL7o+bZJOejZ5C5RtopqyzYvWyGdiexbwvGxZMIiCnq//EkFuRX2EOheP5GusKZTT2gUXmHxCm4PsLTjq9WE9OO25oWojhJpznUcSeG9e7IMzNwYRSSmA1xwCfA9vCQV/J9Gp1hLa1l+pfYj+Kfhyd2zT+347R/LjVgekfdqG3OofXCSonFETGnM/Xm6it9nWjcn7dqrZQon9nOb7bal/MpRFohNd3C6auYCe0xvop4+BverDs90kyCXc6ft07Ds8VEXQds/MDcDyiJcz46Z7hqwGDcO0J6sdm7WTFkanHnj2jm1Ns9EF3kUHe5cjkrqQKlrq86VI6D7g5mr6199i+ZQuxkySocI36xrqmnNboasc3mTIaoHQZDNmZrsPPty2gfz3rhKN8oVtrjbgAVnqOrNTn9fY3gi56Ef/zj0b+R9VTWoJrQn6hE1/0V50ioNu9cKW4xu+GKWK7J5rd5f7LIaq9UQ5uPdYrkHgpEaB4wvyDWVfmxOFQHL6UspDmd7RD9ZueW9bFyFG/mkeDfFg3r9DKQG7+TgOxiT/PacHWr+J3EiU38BEHzdAO2KM/WurkSpkuMZDeyL2/JT4ypxY3WeDURcaLppaq+eK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02.03.23 18:19, Muhammad Usama Anjum wrote: > On 2/28/23 5:36 AM, Peter Xu wrote: >> On Mon, Feb 27, 2023 at 06:00:44PM -0500, Peter Xu wrote: >>> This is a new feature that controls how uffd-wp handles none ptes. When >>> it's set, the kernel will handle anonymous memory the same way as file >>> memory, by allowing the user to wr-protect unpopulated ptes. >>> >>> File memories handles none ptes consistently by allowing wr-protecting of >>> none ptes because of the unawareness of page cache being exist or not. For >>> anonymous it was not as persistent because we used to assume that we don't >>> need protections on none ptes or known zero pages. >>> >>> One use case of such a feature bit was VM live snapshot, where if without >>> wr-protecting empty ptes the snapshot can contain random rubbish in the >>> holes of the anonymous memory, which can cause misbehave of the guest when >>> the guest OS assumes the pages should be all zeros. >>> >>> QEMU worked it around by pre-populate the section with reads to fill in >>> zero page entries before starting the whole snapshot process [1]. >>> >>> Recently there's another need raised on using userfaultfd wr-protect for >>> detecting dirty pages (to replace soft-dirty in some cases) [2]. In that >>> case if without being able to wr-protect none ptes by default, the dirty >>> info can get lost, since we cannot treat every none pte to be dirty (the >>> current design is identify a page dirty based on uffd-wp bit being cleared). >>> >>> In general, we want to be able to wr-protect empty ptes too even for >>> anonymous. >>> >>> This patch implements UFFD_FEATURE_WP_UNPOPULATED so that it'll make >>> uffd-wp handling on none ptes being consistent no matter what the memory >>> type is underneath. It doesn't have any impact on file memories so far >>> because we already have pte markers taking care of that. So it only >>> affects anonymous. >>> >>> The feature bit is by default off, so the old behavior will be maintained. >>> Sometimes it may be wanted because the wr-protect of none ptes will contain >>> overheads not only during UFFDIO_WRITEPROTECT (by applying pte markers to >>> anonymous), but also on creating the pgtables to store the pte markers. So >>> there's potentially less chance of using thp on the first fault for a none >>> pmd or larger than a pmd. >>> >>> The major implementation part is teaching the whole kernel to understand >>> pte markers even for anonymously mapped ranges, meanwhile allowing the >>> UFFDIO_WRITEPROTECT ioctl to apply pte markers for anonymous too when the >>> new feature bit is set. >>> >>> Note that even if the patch subject starts with mm/uffd, there're a few >>> small refactors to major mm path of handling anonymous page faults. But >>> they should be straightforward. >>> >>> So far, add a very light smoke test within the userfaultfd kselftest >>> pagemap unit test to make sure anon pte markers work. >>> >>> [1] https://lore.kernel.org/all/20210401092226.102804-4-andrey.gruzdev@virtuozzo.com/ >>> [1] https://lore.kernel.org/all/Y+v2HJ8+3i%2FKzDBu@x1n/ >>> >>> Signed-off-by: Peter Xu >>> --- >>> v1->v2: >>> - Use pte markers rather than populate zero pages when protect [David] >>> - Rename WP_ZEROPAGE to WP_UNPOPULATED [David] >> >> Some very initial performance numbers (I only ran in a VM but it should be >> similar, unit is "us") below as requested. The measurement is about time >> spent when wr-protecting 10G range of empty but mapped memory. It's done >> in a VM, assuming we'll get similar results on bare metal. >> >> Four test cases: >> >> - default UFFDIO_WP >> - pre-read the memory, then UFFDIO_WP (what QEMU does right now) >> - pre-fault using MADV_POPULATE_READ, then default UFFDIO_WP >> - UFFDIO_WP with WP_UNPOPULATED >> >> Results: >> >> Test DEFAULT: 2 >> Test PRE-READ: 3277099 (pre-fault 3253826) >> Test MADVISE: 2250361 (pre-fault 2226310) >> Test WP-UNPOPULATE: 20850 > In your case: > Default < WP-UNPOPULATE < MADVISE < PRE-READ > > > In my testing on next-20230228 with this patch and my uffd async patch: > > Test DEFAULT: 6 > Test PRE-READ: 37157 (pre-fault 37006) > Test MADVISE: 4884 (pre-fault 4465) > Test WP-UNPOPULATE: 17794 > > DEFAULT < MADVISE < WP-UNPOPULATE < PRE-READ > > On my setup, MADVISE is performing better than WP-UNPOPULATE consistently. > I'm not sure why I'm getting this discrepancy here. I've liked your results > to be honest where we perform better with WP-UNPOPULATE than MADVISE. What > can be done to get consistent benchmarks over your and my side? Probably because the current approach from Peter uses uffd-wp markers, and these markers can currently only reside on the PTE level, not on the PMD level yet. With MADVISE you get a huge zeropage and avoid dealing with PTEs. -- Thanks, David / dhildenb