From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.0 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F1E9C433DB for ; Wed, 23 Dec 2020 23:55:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 876AA224DF for ; Wed, 23 Dec 2020 23:55:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 876AA224DF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 93E3E6B00BB; Wed, 23 Dec 2020 18:55:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EF3B6B00BC; Wed, 23 Dec 2020 18:55:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DEB06B00BD; Wed, 23 Dec 2020 18:55:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 696806B00BB for ; Wed, 23 Dec 2020 18:55:20 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1E5438249980 for ; Wed, 23 Dec 2020 23:55:20 +0000 (UTC) X-FDA: 77626206000.21.day04_20110c72746c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 043AA180196AA for ; Wed, 23 Dec 2020 23:55:19 +0000 (UTC) X-HE-Tag: day04_20110c72746c X-Filterd-Recvd-Size: 5443 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Dec 2020 23:55:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608767718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fX9BrNYtyqUJqlsQzDSDD/LvzkNk1AuVbPuwYkphfN8=; b=B5+dGlJsrEAuD+7fCzNVP2eXQyNsjq4LkCzQy8tgzIRsdFqadtSsnJd2U5DKBaOfMncdGc mdKeaUUIXY/FdOcS5ke4lP3drKSqAnZ+DhSPnF+IEy/MgkH5lNdUM3v1vaQGCxak7+0g7c zRdTvvm8Lkx00GX8oPEOoai/YaFZqsY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-9uL81yDSN_yiXHLV5Sjddg-1; Wed, 23 Dec 2020 18:55:17 -0500 X-MC-Unique: 9uL81yDSN_yiXHLV5Sjddg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 15FB1801AC0; Wed, 23 Dec 2020 23:55:15 +0000 (UTC) Received: from mail (ovpn-112-5.rdu2.redhat.com [10.10.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8E4405D74C; Wed, 23 Dec 2020 23:55:11 +0000 (UTC) Date: Wed, 23 Dec 2020 18:55:11 -0500 From: Andrea Arcangeli To: Nadav Amit Cc: Yu Zhao , Peter Zijlstra , Minchan Kim , Linus Torvalds , Peter Xu , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Andy Lutomirski , Will Deacon Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <20201221172711.GE6640@xz-x1> <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.3 (2020-12-04) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 23, 2020 at 02:45:59PM -0800, Nadav Amit wrote: > I think it may be reasonable. Whatever solution used, there will be 2 users of it: uffd-wp will use whatever technique used by clear_refs_write to avoid the mmap_write_lock. My favorite is Yu's patch and not the group lock anymore. The cons is it changes the VM rules (which kind of reminds me my initial proposal of adding a spurious tlb flush if mm_tlb_flush_pending is set, except I didn't correctly specify it'd need to go in the page fault), but it still appears the simplest. > Just a proposal: At some point we can also ask ourselves whether the > =E2=80=9Cartificial" limitation of the number of software bits per PTE = should really > limit us, or do we want to hold some additional metadata per-PTE by eit= her > putting it in an adjacent page (holding 64-bits of additional software-= bits > per PTE) or by finding some place in the page-struct to link to this > metadata (and have the liberty of number of bits per PTE). One of the P= TE > software-bits can be repurposed to say whether there is =E2=80=9Cextra-= metadata=E2=80=9D > associated with the PTE. >=20 > I am fully aware that there will be some overhead associated, but it > can be limited to less-common use-cases. That's a good point, so far far we didn't run out so it's not an immediate concern. (as opposed we run out in page->flags where the PG_tail went to some LSB). In general kicking the can down the road sounds like the best thing to do for those bit shortage matters, until we can't anymore at least.. There's no gain to the kernel runtime, in doing something generically good here (again see where PG_tail rightfully went). So before spending RAM and CPU, we'd need to find a more compact encoding with the bits we already have available. This reminds me again we could double check if we could make VM_UFFD_WP mutually exclusive with VM_SOFTDIRTY. I wasn't sure if it could ever happen in a legit way to use both at the same time (CRIU on a app using uffd-wp for its own internal mm management?). Theoretically it's too late already for it, but VM_UFFD_WP is relatively new, if we're sure it cannot ever happen in a legit way, it would be possible to at least evaluate/discuss it. This is an immediate matter. What we'll do if we later run out, is not an immediate matter instead, because it won't make our life any simpler to resolve it now. Thanks, Andrea