From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A7BBC433FE for ; Fri, 21 Oct 2022 14:28:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FEFC8E0002; Fri, 21 Oct 2022 10:28:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AF098E0001; Fri, 21 Oct 2022 10:28:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2769A8E0002; Fri, 21 Oct 2022 10:28:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1887C8E0001 for ; Fri, 21 Oct 2022 10:28:53 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E4DC31A13BA for ; Fri, 21 Oct 2022 14:28:52 +0000 (UTC) X-FDA: 80045188104.03.0BFB20E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 7DB97140007 for ; Fri, 21 Oct 2022 14:28:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666362531; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=eT2CyOadlFFxc/wNLu2foTVdNjlrq1aQxrkN8hc9/2Y=; b=ZxgP5tb8WEgu808288T2y9xaNLBPMn/uAjXY0sUrVvTypPuQ8JRCsLD6oDuMFmsb/oRt6y sWHtai3xsjYTD+xDN6FB/aSEJ5w5Pb6XQYjbve7zj1KM7cQmtYFTq+jnjExUxj3s+RAOo0 Yod/MJ5YzCBI2I43cS9/wtjYGOrfU2w= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-412-ovUGIEdtOA24KJv2p15c0A-1; Fri, 21 Oct 2022 10:28:51 -0400 X-MC-Unique: ovUGIEdtOA24KJv2p15c0A-1 Received: by mail-qt1-f200.google.com with SMTP id gd8-20020a05622a5c0800b0039cb77202eeso2693330qtb.0 for ; Fri, 21 Oct 2022 07:28:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=eT2CyOadlFFxc/wNLu2foTVdNjlrq1aQxrkN8hc9/2Y=; b=8BseLBvxf7b//4IdChU8fSRdD39YMh7MgaLUQEIDlinSXsE1pxf9dRBoTBIGVihjFY CSIh+EdAKHrOD8jgEfYspVcmlz3mq1cG6eotmElIavp5moJIDDe6q6CisFs0dPIPtfE9 82IrQ/+/LeqQ57bqWsPUE9HxL+v7E7KuKvkf75uPLt/7T45CXeuvKmVum4Mk9bBQ65iy FXskuXG7IhB0Tr2aR19EohKrbnhZuWRMYEz36oN4xK5ie9ehC1sx7tYsm8I82jPPQOl3 rGP+Hg4VuN63A817hVl2G8s5zPNS2Cz6SIFXCcCxzHg6GgCL7Vccfrs8Ay64+ty2Y53U mxwg== X-Gm-Message-State: ACrzQf2GOIy7jl6PQ/uEzkxAwELYy1kcoRBAQzI7q97Cpy/bENSUVe1p Vd3Y0AM+F6qktxATsgDS6MtKzVAlYiw1EB2EPAwSGkqsS1X1PSwSMYZdYLnNOXMlMs5XBjQCf4n Zxwo5+uFonBs= X-Received: by 2002:a37:44e:0:b0:6ee:80b6:2eec with SMTP id 75-20020a37044e000000b006ee80b62eecmr13958338qke.712.1666362530083; Fri, 21 Oct 2022 07:28:50 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7q3FXf4E+55hD/k1WIj9U/fbspuGW+fTlNOI3ADM9DXPFEZtJupNoxkSDX9r7dwx/IUpiyOA== X-Received: by 2002:a37:44e:0:b0:6ee:80b6:2eec with SMTP id 75-20020a37044e000000b006ee80b62eecmr13958323qke.712.1666362529719; Fri, 21 Oct 2022 07:28:49 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dt5-20020a05620a478500b006ee94c5bf26sm10054163qkb.91.2022.10.21.07.28.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Oct 2022 07:28:49 -0700 (PDT) Date: Fri, 21 Oct 2022 10:28:47 -0400 From: Peter Xu To: David Hildenbrand Cc: Matthew Wilcox , linux-mm@kvack.org, Hugh Dickins Subject: Re: Avoiding allocation of unused shmem page Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666362532; a=rsa-sha256; cv=none; b=BLqCzUDF5RXnTdji1giylfbOzdHdlmreCwMmm4ZAOENqVoVxLO6eUaM9vYO4ctpnNzFbhw 2RPLSMi2XsnUF/eYYlvF86Jeg3EzvWDkfzzHvBIOBUzZcZOFcLOX58Mg36ML93ij65k1+V HGPcRR8qWHcL7Auur5K2YvMuy6AwRKU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZxgP5tb8; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666362532; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eT2CyOadlFFxc/wNLu2foTVdNjlrq1aQxrkN8hc9/2Y=; b=oB622MHuPnK5hpTW0oUGbJHIidI+NnBsLg0alpH7SfSRtGcH2LufkyqnOsV05dAvhgqryY rjchPLQW+Wf1jPfjKBRuJP9s/r//uvOS55bFJjlmHzif4FL0f8QqPMZ4qk73vfoOdiii3t 2ZvLPvc016glXmem/sso5rEogHH71Rc= X-Rspamd-Queue-Id: 7DB97140007 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZxgP5tb8; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Stat-Signature: uo5cn4a4a5gug5diyix5yj6u1978gwqi X-HE-Tag: 1666362532-743579 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 21, 2022 at 04:10:41PM +0200, David Hildenbrand wrote: > On 21.10.22 16:01, Peter Xu wrote: > > On Fri, Oct 21, 2022 at 09:23:00AM +0200, David Hildenbrand wrote: > > > On 20.10.22 23:10, Peter Xu wrote: > > > > On Thu, Oct 20, 2022 at 09:14:09PM +0100, Matthew Wilcox wrote: > > > > > In yesterday's call, David brought up the case where we fallocate a file > > > > > in shmem, call mmap(MAP_PRIVATE) and then store to a page which is over > > > > > a hole. That currently causes shmem to allocate a page, zero-fill it, > > > > > then COW it, resulting in two pages being allocated when only the > > > > > COW page really needs to be allocated. > > > > > > > > > > The path we currently take through the MM when we take the page fault > > > > > looks like this (correct me if I'm wrong ...): > > > > > > > > > > handle_mm_fault() > > > > > __handle_mm_fault() > > > > > handle_pte_fault() > > > > > do_fault() > > > > > do_cow_fault() > > > > > __do_fault() > > > > > vm_ops->fault() > > > > > > > > > > ... which is where we come into shmem_fault(). Apart from the > > > > > horrendous hole-punch handling case, shmem_fault() is quite simple: > > > > > > > > > > err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE, > > > > > gfp, vma, vmf, &ret); > > > > > if (err) > > > > > return vmf_error(err); > > > > > vmf->page = folio_file_page(folio, vmf->pgoff); > > > > > return ret; > > > > > > > > > > What we could do here is detect this case. Something like: > > > > > > > > > > enum sgp_type sgp = SGP_CACHE; > > > > > > > > > > if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) > > > > > sgp = SGP_READ; > > > > > > > > Yes this will start to save the space, but just to mention this may start > > > > to break anything that will still depend on the pagecache to work. E.g., > > > > it'll change behavior if the vma is registered with uffd missing mode; > > > > we'll start to lose MISSING events for these private mappings. Not sure > > > > whether there're other side effects. > > > > > > I don't follow, can you elaborate? > > > > > > hugetlb doesn't perform this kind of unnecessary allocation and should be fine in regards to uffd. Why should it matter here and how exactly would a problematic sequence look like? > > > > Hugetlb is special because hugetlb detects pte first and relies on pte at > > least for uffd. shmem is not. > > > > Feel free to also reference the recent fix which relies on the stable > > hugetlb pte with commit 2ea7ff1e39cbe375. > > Sorry to be dense here, but I don't follow how that relates. > > Assume we have a MAP_PRIVATE shmem mapping and someone registers uffd > missing events on that mapping. > > Assume we get a page fault on a hole. We detect no page is mapped and check > if the page cache has a page mapped -- which is also not the case, because > there is a hole. > > So we notify uffd. > > Uffd will place a page. It should *not* touch the page cache and only insert > that page into the page table -- otherwise we'd be violating MAP_PRIVATE > semantics. That's actually exactly what we do right now... we insert into page cache for the shmem. See shmem_mfill_atomic_pte(). Why it violates MAP_PRIVATE? Private pages only guarantee the exclusive ownership of pages, I don't see why it should restrict uffd behavior. Uffd missing mode (afaiu) is defined to resolve page cache missings in this case. Hugetlb is special but not shmem IMO comparing to most of the rest of the file systems. > > What am I missing? > > [...] > > > > > > > There is an easy way to trigger this from QEMU, and we've had > > > customers running into this: > > > > Can the customer simply set shared=on? > > > > Of course they can. It rather comes with a surprise for them, because -- for > now -- we're not even warning that this most probably doesn't make too much > sense. Right, some warning message could be helpful from qemu, but still not really required IMO, also we shouldn't assume that'll always happen because it's really impl detail of OS. It's the same as someone wrote a program that maps private memfd using shmem, we don't throw errros to them either. It's just a behavior of the OS underneath, and maybe one day it'll stop consuming twice the required size and it'll be transparent to apps. When that (if it will...) happens the error message could be misleading. -- Peter Xu