From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78466C4332F for ; Thu, 20 Oct 2022 21:10:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B03768E0002; Thu, 20 Oct 2022 17:10:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8CCF8E0001; Thu, 20 Oct 2022 17:10:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 954858E0002; Thu, 20 Oct 2022 17:10:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 838C48E0001 for ; Thu, 20 Oct 2022 17:10:15 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3FC324132F for ; Thu, 20 Oct 2022 21:10:15 +0000 (UTC) X-FDA: 80042570790.28.F23A831 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf12.hostedemail.com (Postfix) with ESMTP id C12F54001A for ; Thu, 20 Oct 2022 21:10:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666300214; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oVGY13AN+OIKCQ30VCEA05ZZKhBMUi2KJFiXvdJNRi4=; b=I94nCS4E+UxQV7MPt+IJh/MmBdTbXfYMp7YoDYC15yw+665mwYrN5DheQcinjOkCuZ80ws hT7nb6VCogr2NO3w+jBbmjnOfLHlMq/+16dCj7GsmhkMR7QH068zrgc+AROR57Yd6TVvOB d2VEMdfkviNm3G3pYFPgiIsHPiKW1Hg= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-15-rAqKQNCeNMuMH5o0jtiLVA-1; Thu, 20 Oct 2022 17:10:12 -0400 X-MC-Unique: rAqKQNCeNMuMH5o0jtiLVA-1 Received: by mail-qk1-f199.google.com with SMTP id t1-20020a05620a450100b006ee79ceeb6fso1072884qkp.11 for ; Thu, 20 Oct 2022 14:10:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oVGY13AN+OIKCQ30VCEA05ZZKhBMUi2KJFiXvdJNRi4=; b=JYrxQFLRhsUmd1znwTj2YSK1s3Pc4VhDLAPVnMOvkECWCJI5f6fukBMwtC+TVTZBe9 NJfE1HOenQibnLk4SgSMAeQkG2NWuS2N7nZ1/E93hC3B0TarLfR6GKFmpQQTEYphkEmF jSTdEJgjVJ2FHJvk/Yk6nXm0evmOTmnrEnS9r5MmtUIH0AtCU+m4OAsHlmY/T14u9YvH njkVt5FiqHCk4FeLyYjGwfAPaKKbcrQlaUSgnqIVlVcOhf8bmto3y0pLCUwNzGJ2KN+9 WGUZBSHB8yym6qQmxPXBPgxAJJPJ5ktKks0klXwCCNxLrn1+H48EYA4uk/2HUGDbEY7E UIqA== X-Gm-Message-State: ACrzQf0q8VznVtVTYDNYP2ROJ2phIPxcgZR9RLTGRiAHKjrwi+x4XtOT KuxD9VsbrQRxcjKX+4bVgsVMCHuUDdXxAUfAb5OAMRxcR2Nobn7XnM+nOQlg6Ze/YDZz3eg2Ga+ NIWGLXtr3zDQ= X-Received: by 2002:a05:620a:30b:b0:6e4:6de2:7f38 with SMTP id s11-20020a05620a030b00b006e46de27f38mr11243696qkm.520.1666300212022; Thu, 20 Oct 2022 14:10:12 -0700 (PDT) X-Google-Smtp-Source: AMsMyM44iarvgHUG+L8kU3XxcjIRxHH+UvgJ9ShqV4H7jKrUVmsHeNrUR0K3dgb/jYGGAnJwgE8CBw== X-Received: by 2002:a05:620a:30b:b0:6e4:6de2:7f38 with SMTP id s11-20020a05620a030b00b006e46de27f38mr11243673qkm.520.1666300211746; Thu, 20 Oct 2022 14:10:11 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id do35-20020a05620a2b2300b006b95b0a714esm1102062qkb.17.2022.10.20.14.10.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Oct 2022 14:10:10 -0700 (PDT) Date: Thu, 20 Oct 2022 17:10:09 -0400 From: Peter Xu To: Matthew Wilcox Cc: linux-mm@kvack.org, Hugh Dickins , David Hildenbrand Subject: Re: Avoiding allocation of unused shmem page Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=I94nCS4E; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666300214; a=rsa-sha256; cv=none; b=q9BJ+3nZ6kkw6zXc5KFKEyZ/mUb4Tqu1LwRTmVVhEr0NffgdggfdQ86ZCZJRnR3bKfMvC3 yqfy6EMtp7KzH7sMwuJ5xhIV/UET9UOCoyEKO1bH2NzLWwqsEPqEJs25Rpx7B0P0UHmoY3 Hv3DHERi+b08yiMJMPg9VAtb1RltP5Q= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666300214; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oVGY13AN+OIKCQ30VCEA05ZZKhBMUi2KJFiXvdJNRi4=; b=GWbrQbibrWl6EAqigOsD1y8S2BXUiHZ7A9BTDFmratPwLJ1FytXZmlUgzg0w3Siws9fw8d TPj9eHqgMnXZFECJL+YjG2a2swPVa/tE2lXCZOG7nx7Q6dMPVsI/56LrD1nylVfOBBH1H/ wE3Dh6BzuFKQKge8jj/glz7tdBylekM= Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=I94nCS4E; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: 8qfw5fc7cs8558ajez3jjyq3db31t1wn X-Rspamd-Queue-Id: C12F54001A X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666300214-482953 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 20, 2022 at 09:14:09PM +0100, Matthew Wilcox wrote: > In yesterday's call, David brought up the case where we fallocate a file > in shmem, call mmap(MAP_PRIVATE) and then store to a page which is over > a hole. That currently causes shmem to allocate a page, zero-fill it, > then COW it, resulting in two pages being allocated when only the > COW page really needs to be allocated. > > The path we currently take through the MM when we take the page fault > looks like this (correct me if I'm wrong ...): > > handle_mm_fault() > __handle_mm_fault() > handle_pte_fault() > do_fault() > do_cow_fault() > __do_fault() > vm_ops->fault() > > ... which is where we come into shmem_fault(). Apart from the > horrendous hole-punch handling case, shmem_fault() is quite simple: > > err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE, > gfp, vma, vmf, &ret); > if (err) > return vmf_error(err); > vmf->page = folio_file_page(folio, vmf->pgoff); > return ret; > > What we could do here is detect this case. Something like: > > enum sgp_type sgp = SGP_CACHE; > > if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) > sgp = SGP_READ; Yes this will start to save the space, but just to mention this may start to break anything that will still depend on the pagecache to work. E.g., it'll change behavior if the vma is registered with uffd missing mode; we'll start to lose MISSING events for these private mappings. Not sure whether there're other side effects. The zero-page approach will not have such issue as long as the pagecache is still filled with something. > err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, sgp, gfp, > vma, vmf, &ret); > if (err) > return vmf_error(err); > if (folio) > vmf->page = folio_file_page(folio, vmf->pgoff); > else > vmf->page = NULL; > return ret; > > and change do_cow_fault() like this: > > +++ b/mm/memory.c > @@ -4575,12 +4575,17 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf) > if (ret & VM_FAULT_DONE_COW) > return ret; > > - copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma); > + if (vmf->page) > + copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma); > + else > + clear_user_highpage(vmf->cow_page, vmf->address); > __SetPageUptodate(vmf->cow_page); > > ret |= finish_fault(vmf); > - unlock_page(vmf->page); > - put_page(vmf->page); > + if (vmf->page) { > + unlock_page(vmf->page); > + put_page(vmf->page); > + } > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) > goto uncharge_out; > return ret; > > ... I wrote the code directly in my email client; definitely not > compile-tested. But if this situation is causing a real problem for > someone, this would be a quick fix for them. > > Is this a real problem or just intellectual curiosity? For me it's pure curiosity when I was asking this question; I don't have a production environment that can directly benefit from this. For real users I'd expect private shmem will always be mapped on meaningful (aka, non-zero) shared pages just to have their own copy, but no better knowledge than that. > Also, does this need support for THPs being created directly, or is > khugepaged fixing it up afterwards good enough? Thanks, -- Peter Xu