From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0902C77B7A for ; Wed, 24 May 2023 13:56:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C58D6B0074; Wed, 24 May 2023 09:56:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 375FF900002; Wed, 24 May 2023 09:56:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 215B06B0078; Wed, 24 May 2023 09:56:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 112FD6B0074 for ; Wed, 24 May 2023 09:56:08 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CDE3D80922 for ; Wed, 24 May 2023 13:56:07 +0000 (UTC) X-FDA: 80825297574.06.DAB8019 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf15.hostedemail.com (Postfix) with ESMTP id AEBE0A0007 for ; Wed, 24 May 2023 13:56:05 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JK9TdI1O; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684936565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zqvJBcdP4jWNQEKJ2URv+kldtu3J4d9i0mGEu6ibPxU=; b=z6wWUoEwqiRw7bqMYOXAQOr8+zxnpYDM/cpS+hw3KmOcYAtjeJwkNS0lb1nwezjN44Rm5O l8DMUDw5eZmzej/7qOYpJ0tewf4hW1/cjkxcrJ2/4VOiGasjaJPMziGqqIq9ZWnezKoNKn QF0zL4JrOpNgzbALNaId6Y7G1j0XAqI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684936565; a=rsa-sha256; cv=none; b=7i3nnJnsunDRmcCEwWPv0i+ucrp1IEwR3z0cvwunOx7LW4bdH2F2jb8gnK/6+WsRSdlK0E XvzGagVAoa+pXePdXIEpWzJsICZ3wnJIO3g/SVvU96/dRSsCvw6yv5ftx97hie6FemPX/Y 825x9avO+rDRuA7si5qFmm0nEohagvM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JK9TdI1O; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684936565; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zqvJBcdP4jWNQEKJ2URv+kldtu3J4d9i0mGEu6ibPxU=; b=JK9TdI1Obgk0C6Zpv5jhW7Gey8NK0fXzMuxz3kOnWx9C4v4st7MG3axd7yDNJiXG+ssE7z SarlZ0fJATvxr8rPuL+vM5vEW/T2s9vyBxJjfgsUSenK9pASCzINn7ghniTeqLUCxIMv/i nsteSD5k6xSULWual6kVk52vSfUVP8g= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-621-RVO9b_gXNDKx1_cGMdkV4w-1; Wed, 24 May 2023 09:56:03 -0400 X-MC-Unique: RVO9b_gXNDKx1_cGMdkV4w-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-625891d5ad5so1928766d6.0 for ; Wed, 24 May 2023 06:56:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684936563; x=1687528563; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zqvJBcdP4jWNQEKJ2URv+kldtu3J4d9i0mGEu6ibPxU=; b=gsXNYfHLFBQoB9Xka7SkeVPZz2OFKi3SizBXlbyrBxvahRvFT6FzZEi1DPIqDXiaV1 EuG00sCPNIHQyajWDvGMxyWcKJVNFFUSvKcTGfz96J4jXQ52cv5Y821kDJEMu20aJeUP AtZvdyzilS8pHfz9Mx5CPNvnL+ZanOGeYBOE+rbJfafBAxpgPvPjcQJXMMaGTOxd6Omz /KUvkJp2RIN3rbtZRlRXaQ9dq0XBcW7SyEXbHPh50Yjzb3VQXh/6JJX/lA67LcylRtdO uCxjB6mrAAMkYR96ZI1CACqrCsVEN31dTCV+pGQRVmQUDP8zzN7tgVHJQjwAUsImOVgH Ozvg== X-Gm-Message-State: AC+VfDyZgiqHBaXgMcoGPMzrtJHC4TeubNkmftmOGx76vJbhHwwFXxsG Yqqjy2BA1uzwhArSTS833DojfCwQj8V6cDw41Vjv/QF47076JdKHmGN+LDBmzWTUuSs575hvHav 4EIVBM7a9tsA= X-Received: by 2002:a05:6214:3016:b0:624:dcc5:819f with SMTP id ke22-20020a056214301600b00624dcc5819fmr18149974qvb.1.1684936563073; Wed, 24 May 2023 06:56:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4pBPX7+qaDYGxlQoeY6OyplIaXhpMsb5ATbOBD7baiqgZdqLz61HHPSv2R+xgeCMZ81/JTEw== X-Received: by 2002:a05:6214:3016:b0:624:dcc5:819f with SMTP id ke22-20020a056214301600b00624dcc5819fmr18149935qvb.1.1684936562709; Wed, 24 May 2023 06:56:02 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-62-70-24-86-62.dsl.bell.ca. [70.24.86.62]) by smtp.gmail.com with ESMTPSA id v16-20020a0ccd90000000b00604ee171d99sm3516206qvm.106.2023.05.24.06.56.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 May 2023 06:56:01 -0700 (PDT) Date: Wed, 24 May 2023 09:55:59 -0400 From: Peter Xu To: Muhammad Usama Anjum Cc: linux-mm@kvack.org, Paul Gofman , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Cyrill Gorcunov , =?utf-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrew Morton , Suren Baghdasaryan , Andrei Vagin , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Danylo Mocherniuk , Axel Rasmussen , "Gustavo A . R . Silva" , David Hildenbrand , Dan Williams , linux-kernel@vger.kernel.org, Mike Rapoport , linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com, Nadav Amit Subject: Re: [PATCH RESEND v15 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Message-ID: References: <20230420060156.895881-1-usama.anjum@collabora.com> <20230420060156.895881-3-usama.anjum@collabora.com> <0edfaf12-66f2-86d3-df1c-f5dff10fb743@collabora.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: AEBE0A0007 X-Rspam-User: X-Stat-Signature: dz9keu1ftdps7kn7jx1a7kfaidwyoxq8 X-Rspamd-Server: rspam03 X-HE-Tag: 1684936565-428303 X-HE-Meta: U2FsdGVkX1+1CpcGyToCRIlCNd6wAGMh0VGYnGFA6o5gnWWhMzWxmPW5Ub/CMnmqrqNCPRSW+khuQLCm2z3pchbpqbF3sSQxisPMYv18qGDJ29OOcEAurI80nMsy6BjPl7/sK0nW24B+dyDYOHlkID0phOh356Ou6EdFc6Xazjh4yQCVPSik4zfVay63hjlZEaXntdwb7H0BskpGBUFlJsIPLc1/YChTeWgjSrjGUzdnDb85aNMTqrLBtdzOfhQ7XcnPwwUjRnotMsfjQJgjLFusPoM7H5nMVBrY4/B3g5b33a8raNyc5XPx/hdCTUuIlnDuMPSW54mFMxqo6ZQ/hOFL6Ch24nuccKHfRjBaRWCvTlWP3Mg3j63I7KnSUoJq6xB+pu+6dF2KaQws9bJ8oVD5SV4lsNbrw1MY8WullnH2+WmiaV0szO3Djzh5JyGsghuzS+PYoNW9wczTFZSWC0D57Fhz/lItR7N4mD6qoHGVYV0ztkkymZB15Kb3LoRGwF5UAbKdPTPrhWFmEvPrlPgdvSvQyAVzPB7VOUXFodH7+9U+oiOWUzdCxMdEsTTZL1NqJVQh+QRk1/q+o6OFFi5ex5rqQbJS/odAnxrAXrmbtE36860RQVrH4dKzHnBWYtqE6tLACIcd9xcVxeEQVjqCLzdTsWU7YmWlnqt7lPjFlCH+LodR+W/37c6YHmcVITDedVJtoXimJ1gR6nUZBtcGf89rh80WX44ff/l+OzCcaTOFaegbHw48ygQlkOV+zHyinN8+2WPuTwUPHO0l8AuQHaiRsJJv7pObiWHXhYSJrQGBKszOzZs+jjdk94Sdfdgce9PAHs57KAFHUZSnoq6ePdfQaTJxfxTt4h55iBmDOhm7xIZ57sT8cBR0wPwTkGruoBqJfxKRpWFvJi7GLvCK/JL5mMpITlANgW89FazsMbSvJdAE6UqAQeWiZWp1SFk4at3jGqcIhAc1Q6b 5K1iInHy iXadCNo5z5Psesf5XYD4YyfxK4yvu7Xiauo9yl8NwtzJljARQjHQ3x5CQ922sr7FDuNIhqT0K2iwvJgEknWTqtmVTjQOiNJwDzSYBUtF8XttCzauAHWcgKW5cZxLY8FRQ4KpXp3ZFI2+xkwUKAC/wE7VUdKtf8itjXIAK+mA6GoIP2anCcSR+VFaE2//jVb0KLw07bPkcIRp7ioBGV/jFh2oo6ocKHBp5eIpiJzsEQpSGT0mpX1Gzn0qfbngv94X6vM8dOR9T18V2VQM6blmT3/brRZ2p5xtHPO9dUyPs8AUWSQHP5IwWdJCut0zP4VJYeyUcT2ViXToYpGAkS8pKBRbV8atpBBpUxobLzW5KpLj/WbXeFn2SiHw8G8gFW548TV7xzWjcB2mggW1zCKckFNOwti5m4my1Jw1W X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 24, 2023 at 04:26:33PM +0500, Muhammad Usama Anjum wrote: > On 5/24/23 12:43 AM, Peter Xu wrote: > > Hi, Muhammad, > > > > On Mon, May 22, 2023 at 04:26:07PM +0500, Muhammad Usama Anjum wrote: > >> On 5/22/23 3:24 PM, Muhammad Usama Anjum wrote: > >>> On 4/26/23 7:13 PM, Peter Xu wrote: > >>>> Hi, Muhammad, > >>>> > >>>> On Wed, Apr 26, 2023 at 12:06:23PM +0500, Muhammad Usama Anjum wrote: > >>>>> On 4/20/23 11:01 AM, Muhammad Usama Anjum wrote: > >>>>>> +/* Supported flags */ > >>>>>> +#define PM_SCAN_OP_GET (1 << 0) > >>>>>> +#define PM_SCAN_OP_WP (1 << 1) > >>>>> We have only these flag options available in PAGEMAP_SCAN IOCTL. > >>>>> PM_SCAN_OP_GET must always be specified for this IOCTL. PM_SCAN_OP_WP can > >>>>> be specified as need. But PM_SCAN_OP_WP cannot be specified without > >>>>> PM_SCAN_OP_GET. (This was removed after you had asked me to not duplicate > >>>>> functionality which can be achieved by UFFDIO_WRITEPROTECT.) > >>>>> > >>>>> 1) PM_SCAN_OP_GET | PM_SCAN_OP_WP > >>>>> vs > >>>>> 2) UFFDIO_WRITEPROTECT > >>>>> > >>>>> After removing the usage of uffd_wp_range() from PAGEMAP_SCAN IOCTL, we are > >>>>> getting really good performance which is comparable just like we are > >>>>> depending on SOFT_DIRTY flags in the PTE. But when we want to perform wp, > >>>>> PM_SCAN_OP_GET | PM_SCAN_OP_WP is more desirable than UFFDIO_WRITEPROTECT > >>>>> performance and behavior wise. > >>>>> > >>>>> I've got the results from someone else that UFFDIO_WRITEPROTECT block > >>>>> pagefaults somehow which PAGEMAP_IOCTL doesn't. I still need to verify this > >>>>> as I don't have tests comparing them one-to-one. > >>>>> > >>>>> What are your thoughts about it? Have you thought about making > >>>>> UFFDIO_WRITEPROTECT perform better? > >>>>> > >>>>> I'm sorry to mention the word "performance" here. Actually we want better > >>>>> performance to emulate Windows syscall. That is why we are adding this > >>>>> functionality. So either we need to see what can be improved in > >>>>> UFFDIO_WRITEPROTECT or can I please add only PM_SCAN_OP_WP back in > >>>>> pagemap_ioctl? > >>>> > >>>> I'm fine if you want to add it back if it works for you. Though before > >>>> that, could you remind me why there can be a difference on performance? > >>> I've looked at the code again and I think I've found something. Lets look > >>> at exact performance numbers: > >>> > >>> I've run 2 different tests. In first test UFFDIO_WRITEPROTECT is being used > >>> for engaging WP. In second test PM_SCAN_OP_WP is being used. I've measured > >>> the average write time to the same memory which is being WP-ed and total > >>> time of execution of these APIs: > > > > What is the steps of the test? Is it as simple as "writeprotect", > > "unprotect", then write all pages in a single thread? > > > > Is UFFDIO_WRITEPROTECT sent in one range covering all pages? > > > > Maybe you can attach the test program here too. > > I'd not attached the test earlier as I thought that you wouldn't be > interested in running the test. I've attached it now. The test has multiple Thanks. No plan to run it, just to make sure I understand why such a difference. > threads where one thread tries to get status of flags and reset them, while > other threads write to that memory. In main(), we call the pagemap_scan > ioctl to get status of flags and reset the memory area as well. While in N > threads, the memory is written. > > I usually run the test by following where memory area is of 100000 * pages: > ./win2_linux 8 100000 1 1 0 > > I'm running tests on real hardware. The results are pretty consistent. I'm > also testing only on x86_64. PM_SCAN_OP_WP wins every time as compared to > UFFDIO_WRITEPROTECT. If it's multi-threaded test especially when the ioctl runs together with the writers, then I'd assume it's caused by writers frequently need to flush tlb (when writes during UFFDIO_WRITEPROTECT), the flush target could potentially also include the core running the main thread who is also trying to reprotect because they run on the same mm. This makes me think that your current test case probably is the worst case of Nadav's patch 6ce64428d6 because (1) the UFFDIO_WRITEPROTECT covers a super large range, and (2) there're a _lot_ of concurrent writers during the ioctl, so all of them will need to trigger a tlb flush, and that tlb flush will further slow down the ioctl sender. While I think that's the optimal case sometimes, of having minimum tlb flush on the ioctl(UFFDIO_WRITEPROTECT), so maybe it makes sense somewhere else where concurrent writers are not that much. I'll need to rethink a bit on all these to find out whether we can have a good way for both.. For now, if your workload is mostly exactly like your test case, maybe you can have your pagemap version of WP-only op there, making sure tlb flush is within the pgtable lock critical section (so you should be safe even without Nadav's patch). If so, I'd appreciate you can add some comment somewhere about such difference of using pagemap WP-only and ioctl(UFFDIO_WRITEPROTECT), though. In short, functional-wise they should be the same, but trivial detail difference on performance as TBD (maybe one day we can have a good approach for all and make them aligned again, but maybe that also doesn't need to block your work). -- Peter Xu