From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50198C77B61 for ; Mon, 24 Apr 2023 19:18:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C69006B0072; Mon, 24 Apr 2023 15:18:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C19206B0074; Mon, 24 Apr 2023 15:18:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABA236B0075; Mon, 24 Apr 2023 15:18:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9ADD06B0072 for ; Mon, 24 Apr 2023 15:18:39 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4B061C0154 for ; Mon, 24 Apr 2023 19:18:39 +0000 (UTC) X-FDA: 80717246358.02.0C64F94 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf20.hostedemail.com (Postfix) with ESMTP id 64FD41C0015 for ; Mon, 24 Apr 2023 19:18:37 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=Vxlkf6eX; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682363917; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P1q/4i2elk3BA3Rj+QoCYnRS4MnuAVuxCMWMhVXymMw=; b=cs1mY9iHwi/2XUFlY8xTOwIYRzXQDsXw9ditKvYrTP7Mgu5a6wlH5ITi2Wg/YDosJxHMz3 2A/10T7a87Ubq7/eSN3UPaXHZ3ZYgLoicgtE3OMdsNUkMbMkdfH7TFoKbJXDvfDQuGuFO/ Yd+Tvi+whfEWjJsr042LY0Yu29WrYoQ= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=Vxlkf6eX; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682363917; a=rsa-sha256; cv=none; b=G9s8Avk3T+VIfyXnkDaRw+2E1nFTwwLzv86DkQEBkg1sCjXIe1OVC4qcplDheRjXir9dSI IpZDmKz2H3+kkMc7PNR8JHl17476D9YI+3DxWkJs6jzQzN+g0TG8OZTQVlDsvOEp3M/4tq 2D8QrMnnVXgU9N6frfVqa+DAECuEY5A= Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-2f7c281a015so2928129f8f.1 for ; Mon, 24 Apr 2023 12:18:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682363916; x=1684955916; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=P1q/4i2elk3BA3Rj+QoCYnRS4MnuAVuxCMWMhVXymMw=; b=Vxlkf6eXc8hNlYZKosRpWF10A+8sPT/LRwFFpD5gCIgEgQHJBPpwl9XiMM+9uYDKuC YSn4qSHxRINZbV3uwOtC94D3OJm+l367BKSLmUgej0ndLIhkKZCtmFmrkUxlSmVoNLJe GRvJGjJKlxVnK4G0OrmF46tlTXyKPBtRd5H8joCvOh7P0ty5Hfzc4EEEMj+Q9zl/9W3a 2u7zMqMYAbF8qwiF8KdyV6Lrl3fV/JR5w2wyzmlnUmeeCGpse4JDmBWv8KwSdBiOfCzT R6s4Y8XFOF0n78/m5pa0JPN8s76eyEnWzqN3PPZdKTQlQArnoA98jmakIPBtAhbCfvpQ Bs/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682363916; x=1684955916; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=P1q/4i2elk3BA3Rj+QoCYnRS4MnuAVuxCMWMhVXymMw=; b=DkZZMv93Q4qDgT44QJbwS/2fODc1ABO61K4vs4WiWmUtSP+9ElUYsIJH2wk1bXFU4d AWRTGzqb8WSWV+yy28neQj16mws0hd++3iTinetpGMb4ErbaJql3ILphrxZE7HvUYy4B WROZ1wcSr2vT8xVpiam3VW563QWgtL0a+tnQdfJoUAb83wPz0RcwsEpvUPU/NfRo2ixf Us4CJkBIZ179MqnJv29UCrcnJMd3lUq3ultjhqWGK2tTCvQIj7Zp4tLDSuS2vfPJKiGo iPH2PqciBZROhvnIqtxKLdPhFeYRMwt64UFKdWi/V5chsuJOvEfhXsl3SXofE9v8o8j/ fG+A== X-Gm-Message-State: AAQBX9dsQiCBzHdolQ0LoWg7gW6oxI5UhPWumHVKmn449jBwOjomXXsi Cm6WXG13lOuC/RefAFg9fvo= X-Google-Smtp-Source: AKy350ZDhAPYuLEpF8HIpNccA9jbhB9WMePiz/zH6jKMYHFwbOFzbI0LXgErhPvQ+QtiuM8faomL1A== X-Received: by 2002:a5d:5962:0:b0:2cf:ee9d:ce2f with SMTP id e34-20020a5d5962000000b002cfee9dce2fmr10085377wri.19.1682363915655; Mon, 24 Apr 2023 12:18:35 -0700 (PDT) Received: from localhost (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.gmail.com with ESMTPSA id j14-20020adfea4e000000b002fc3d8c134bsm11397032wrn.74.2023.04.24.12.18.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Apr 2023 12:18:34 -0700 (PDT) Date: Mon, 24 Apr 2023 20:18:33 +0100 From: Lorenzo Stoakes To: Jason Gunthorpe Cc: Christoph Hellwig , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov Subject: Re: [PATCH v2] mm/gup: disallow GUP writing to file-backed mappings by default Message-ID: References: <90a54439-5d30-4711-8a86-eba816782a66@lucifer.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 64FD41C0015 X-Stat-Signature: xcnsmapxh7zewd7j1ujd3fezrionp88c X-HE-Tag: 1682363917-581799 X-HE-Meta: U2FsdGVkX1+drRmjBCHpweCtZcvtHE6S8Vb5lbgiZpq/TFUbs8F2dTTM1k+zQ5WfsXl5WcPRN9P6cnhNL2Xvm5B/aiKxuG3cO6u93PZBCKal2fvlQCdTXQlu3kPBWcojUJnjtUkiqXfD8f+jTJl4JMF8GkO/NuB+I0A4nc+D8LraMneQRNh0DnsjaiyrIETF0+cjgY5GuM/k46TqwgXmeP8LXMtZ34NlYmAy64xgAy4NiNTQRehKZiIMiCZPprBkFH0ko0dSkeCeQ4yNaA/QhD31wc7BTqkbSAUSELBh9dQAAkDiTUdssMkWqQ3+UulLjIszVSeqMpRMtlD8PbHNfIXYpcI5eFjCtZRLEm3bH2viBL40HapqMKho5rztPmNHIIdPwgF5Mx3vcGedAgwrk88d+1gH8Ntns93Qm22nWQGh5JJIDfgnH51krUAkU48SGGfg3sYpz7brBUod43IyMcYh/Wf1p9xwT+Y2E6SLlwjGSRqJ61uMkmRzlZay3McAMVEIZBC6C4EG3Tifyx47JI9CXJfj6+PgejW5CK3phLgL+zYXFxU0sZW7nuQkkv8svjd7RANncgZCRWKSaqRA1zOn5kRzPEp/z74c1VCCKaiWnoHArPpL2sRdRhKpAEj45kvNDXwhJTAhKhHh2NqlYQTYKohrOFFULyvrPxVgwWqIE70YPiLHPxy/fYe7XTTgTsJ07mupGxju5qS4uTAUlf4Q23u5TYZybt85L/m/XvPokZ7x/pzUN75/w43kSiJZwFx4WPby0bbqfMBXqyq8HSVghYK4hRYHPZSaBaG7S6n/S68ffPv/1/DNc4ASOaDAC/8c2qVFWoLbfNH+85M1iKWJAmpUAyNorIademiRgSnLwb6wJl6cS8SdVhZyayOslmqfDawOW5Chf8YvnWFxsI+24pvPMm87vXWeYJyXMYAzKndQRmFY5Hw3yD9HFemkKBadHGGtNsJdjjdUFb7 RtHofRsE gp5NBbVvL919E5cVulJsoUjqZX7RxwGgoKk8GDJqf56D3WBNMgR5N+DFvhwr/ScDIsNeNmFCWjRA67EWebFuDmLFQCaHAobRrgAINoLqNBdRWSgw4MHqAnaTxhPRH8yoVQLON16992JcKy6ay/X/IsWNiW7CtmHs+D0m/bU3ajACMGi6+iv0jQOd8S/9XGfPWNsJnOMvBByE9O1iAHvK9P7TWFbtAF9T/fi623S+mKhn7cZ96Iqr8zLRKR5YnJIDjl5Hiwp0r7vVRgCCm981tNChhXw6N6HPv6eH1Sc0I0v/TaZ0m8PWH9X5so9N6Z2h9+AIslbWejMa9mquk33br3gtaNb9MVPrNKmLwI1lOUtZuBwo41jejFmubiY5wiLt63w8hu8O/7Asn6rWjcP097wwWQ7zrNI2vGFwgPQIekdXWvM8vtcWVaNNxQKnWIeNRMcqTBBL38Kg0+bMPgtnYGUDnZQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 24, 2023 at 03:54:48PM -0300, Jason Gunthorpe wrote: > On Mon, Apr 24, 2023 at 07:22:03PM +0100, Lorenzo Stoakes wrote: > > > OK I guess you mean the folio lock :) Well there is > > unpin_user_pages_dirty_lock() and unpin_user_page_range_dirty_lock() and > > also set_page_dirty_lock() (used by __access_remote_vm()) which should > > avoid this. > > It has been a while, but IIRC, these are all basically racy, the > comment in front of set_page_dirty_lock() even says it is racy.. > > The race is that a FS cleans a page and thinks it cannot become dirty, > and then it becomes dirty - and all variations of that.. > > Looking around a bit, I suppose what I'd expect to see is a sequence > sort of like what do_page_mkwrite() does: > > /* Synchronize with the FS and get the page locked */ > ret = vmf->vma->vm_ops->page_mkwrite(vmf); > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) > return ret; > if (unlikely(!(ret & VM_FAULT_LOCKED))) { > lock_page(page); > if (!page->mapping) { > unlock_page(page); > return 0; /* retry */ > } > ret |= VM_FAULT_LOCKED; > } else > VM_BUG_ON_PAGE(!PageLocked(page), page); > > /* Write to the page with the CPU */ > va = kmap_local_atomic(page); > memcpy(va, ....); > kunmap_local_atomic(page); > > /* Tell the FS and unlock it. */ > set_page_dirty(page); > unlock_page(page); > > I don't know if this is is exactly right, but it seems closerish > > So maybe some kind of GUP interfaces that returns single locked pages > is the right direction? IDK > > Or maybe we just need to make a memcpy primitive that works while > holding the PTLs? > I think this patch suggestion has scope crept from 'incremental improvement' to 'major rework of GUP' at this point. Also surely you'd want to obtain the PTL of all mappings to a file? This seems really unworkable and I don't think holding a folio lock over a long period is sensible either. > > We definitely need to keep ptrace and /proc/$pid/mem functioning correctly, > > and I given the privilege levels required I don't think there's a security > > issue there? > > Even root is not allowed to trigger data corruption or oops inside the > kernel. > > Jason Of course, but isn't this supposed to be an incremental fix? It feels a little contradictory to want to introduce a flag intentionally to try to highlight brokenness then to not accept any solution that doesn't solve that brokenness. In any case, I feel that this patch isn't going to go anywhere as-is, it's insufficiently large to solve the problem as a whole (I think that's a bigger problem we can return to later), and there appears to be no taste for an incremental improvement, even from the suggester :) As a result, I suggest we take the cautious route in order to unstick the vmas patch series - introduce an OPT-IN flag which allows the check to be made, and update io_uring to use this. That way it defers the larger discussion around this improvement, avoids breaking anything, provides some basis in code for this check and is a net, incremental small and digestible improvement.