From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 160E8C433EF for ; Sat, 21 May 2022 18:50:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CF2D6B0072; Sat, 21 May 2022 14:50:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87E356B0073; Sat, 21 May 2022 14:50:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71FB96B0074; Sat, 21 May 2022 14:50:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6050E6B0072 for ; Sat, 21 May 2022 14:50:15 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2D22833AAE for ; Sat, 21 May 2022 18:50:15 +0000 (UTC) X-FDA: 79490640390.24.9015747 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf04.hostedemail.com (Postfix) with ESMTP id F392D4001D for ; Sat, 21 May 2022 18:49:59 +0000 (UTC) Received: by mail-pj1-f46.google.com with SMTP id z7-20020a17090abd8700b001df78c7c209so14120978pjr.1 for ; Sat, 21 May 2022 11:50:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=SU23bumSYKBIUo4teCSWP/ehFQ/sP8yH/N95PptWhVo=; b=iXPANNzMLMXwkrJHGa1A4Wgyc+6i++MYhPE1FYdRU94UFUHGgHf/dBBBzh7E9RFCFP uO6JakX0/iXGyWSF/+PVIGS9Xzo8UmcEcyQka08qoXUrXXGU6s4l6omiDna8e2gSzTW1 mW4VLkRVMz7WH0nVgkZcmp+G2KZqvhFv/UtcKFvl1A00CZ/mpt7qc72iinsHGqPdUsaR DjXkHBWFglqXUC+/mj3iAGoBpIjjjAOtAC0NQjMv6QFpegU2u8iasvOGhBZFIdbNW1wG FAib7B9s4Oq4SMU6ylJmVne5HqGz3auabI9kiG7dXZiNRCNTFKOF09rDK710XLed5lti Z1bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=SU23bumSYKBIUo4teCSWP/ehFQ/sP8yH/N95PptWhVo=; b=G/mwdJyiliGCRxA/eLfF8/UI3q36hc0g0wAlCQiKByD8QnWhTETy8yuzubln66yolU bKD4vJgXV1q92poXctXKy7Vh7VqY90P/sOu18XP7rD1ValSAjWtEBxYn02SWmMxAM6EK 9cyANPZ5d46g1lYUQpOJUhC+hlLj78OJHaRJcue+gFstPSdfCXGLbv6C2Z2v9SYXIGIS WTWnKUAXaIq9AdopltdsGyToTQByO02WAjwkKKOv/B+1zN3rC7enWlgn8U7eP8oB30bI 8uqX34Qq8tP19Xeg4HtN30uXG9NAVFuz+xc7SeJfYPAR2MsJYp0Etg6YWwNNuOAtL4Jd n91A== X-Gm-Message-State: AOAM53202IMgh7LbBxoMGAZm4PSJgeynpBo0hDmy2mXIi99o46imPJdk iJyjGOQ/l7zTGPrGHC4Jn30= X-Google-Smtp-Source: ABdhPJzwkynMq9YMdxglKGxco/kxdXQ2g96qTaEOCShG7lSodqXT0GQBleiOfMWoH96zPmPWhjsdPw== X-Received: by 2002:a17:90b:17c4:b0:1de:c92c:ad91 with SMTP id me4-20020a17090b17c400b001dec92cad91mr17666853pjb.169.1653159013179; Sat, 21 May 2022 11:50:13 -0700 (PDT) Received: from strix-laptop (2001-b011-20e0-15d4-84b3-8c62-a0b8-199e.dynamic-ip6.hinet.net. [2001:b011:20e0:15d4:84b3:8c62:a0b8:199e]) by smtp.gmail.com with ESMTPSA id h17-20020a656391000000b003db6f4a96c4sm1845630pgv.32.2022.05.21.11.50.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 May 2022 11:50:12 -0700 (PDT) Date: Sun, 22 May 2022 02:50:04 +0800 From: Chih-En Lin To: David Hildenbrand Cc: Andrew Morton , linux-mm@kvack.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Christian Brauner , "Matthew Wilcox (Oracle)" , Vlastimil Babka , William Kucharski , John Hubbard , Yunsheng Lin , Arnd Bergmann , Suren Baghdasaryan , Colin Cross , Feng Tang , "Eric W. Biederman" , Mike Rapoport , Geert Uytterhoeven , Anshuman Khandual , "Aneesh Kumar K.V" , Daniel Axtens , Jonathan Marek , Christophe Leroy , Pasha Tatashin , Peter Xu , Andrea Arcangeli , Thomas Gleixner , Andy Lutomirski , Sebastian Andrzej Siewior , Fenghua Yu , linux-kernel@vger.kernel.org, Kaiyang Zhao , Huichun Feng , Jim Huang Subject: Re: [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table Message-ID: <20220521185004.GA1543057@strix-laptop> References: <20220519183127.3909598-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: F392D4001D X-Stat-Signature: xabi8qsnhnxcrrifxukwmoycmat1eybe Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=iXPANNzM; spf=pass (imf04.hostedemail.com: domain of shiyn.lin@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=shiyn.lin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam03 X-HE-Tag: 1653158999-113900 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, May 21, 2022 at 06:07:27PM +0200, David Hildenbrand wrote: > On 19.05.22 20:31, Chih-En Lin wrote: > > When creating the user process, it usually uses the Copy-On-Write (COW) > > mechanism to save the memory usage and the cost of time for copying. > > COW defers the work of copying private memory and shares it across the > > processes as read-only. If either process wants to write in these > > memories, it will page fault and copy the shared memory, so the process > > will now get its private memory right here, which is called break COW. > > Yes. Lately we've been dealing with advanced COW+GUP pinnings (which > resulted in PageAnonExclusive, which should hit upstream soon), and > hearing about COW of page tables (and wondering how it will interact > with the mapcount, refcount, PageAnonExclusive of anonymous pages) makes > me feel a bit uneasy :) I saw the series patch of this and knew how complicated handling COW of the physical page was [1][2][3][4]. So the COW page table will tend to restrict the sharing only to the page table. This means any modification to the physical page will trigger the break COW of page table. Presently implementation will only update the physical page information to the RSS of the owner process of COW PTE. Generally owner is the parent process. And the state of the page, like refcount and mapcount, will not change under the COW page table. But if any situations will lead to the COW page table needs to consider the state of physical page, it might be fretful. ;-) > > > > Presently this kind of technology is only used as the mapping memory. > > It still needs to copy the entire page table from the parent. > > It might cost a lot of time and memory to copy each page table when the > > parent already has a lot of page tables allocated. For example, here is > > the state table for mapping the 1 GB memory of forking. > > > > mmap before fork mmap after fork > > MemTotal: 32746776 kB 32746776 kB > > MemFree: 31468152 kB 31463244 kB > > AnonPages: 1073836 kB 1073628 kB > > Mapped: 39520 kB 39992 kB > > PageTables: 3356 kB 5432 kB > > > I'm missing the most important point: why do we care and why should we > care to make our COW/fork implementation even more complicated? > > Yes, we might save some page tables and we might reduce the fork() time, > however, which specific workload really benefits from this and why do we > really care about that workload? Without even hearing about an example > user in this cover letter (unless I missed it), I naturally wonder about > relevance in practice. > > I assume it really only matters if we fork() realtively large processes, > like databases for snapshotting. However, fork() is already a pretty > sever performance hit due to COW, and there are alternatives getting > developed as a replacement for such use cases (e.g., uffd-wp). > > I'm also missing a performance evaluation: I'd expect some simple > workloads that use fork() might be even slower after fork() with this > change. > The paper mentioned a list of benchmarks of the time cost for On-Demand fork. For example, on Redis, the meantime of fork when taking the snapshot. Default fork() got 7.40 ms; On-demand Fork (COW PTE table) got 0.12 ms. But there are some other cases, like the Response latency distribution of Apache HTTP Server, are not have significant benefits from their On-demand fork. For the COW page table from this patch, I also take the perf to analyze the cost time. But it looks like not different from the default fork. Here is the report, the mmap-sfork is COW page table version: Performance counter stats for './mmap-fork' (100 runs): 373.92 msec task-clock # 0.992 CPUs utilized ( +- 0.09% ) 1 context-switches # 2.656 /sec ( +- 6.03% ) 0 cpu-migrations # 0.000 /sec 881 page-faults # 2.340 K/sec ( +- 0.02% ) 1,860,460,792 cycles # 4.941 GHz ( +- 0.08% ) 1,451,024,912 instructions # 0.78 insn per cycle ( +- 0.00% ) 310,129,843 branches # 823.559 M/sec ( +- 0.01% ) 1,552,469 branch-misses # 0.50% of all branches ( +- 0.38% ) 0.377007 +- 0.000480 seconds time elapsed ( +- 0.13% ) Performance counter stats for './mmap-sfork' (100 runs): 373.04 msec task-clock # 0.992 CPUs utilized ( +- 0.10% ) 1 context-switches # 2.660 /sec ( +- 6.58% ) 0 cpu-migrations # 0.000 /sec 877 page-faults # 2.333 K/sec ( +- 0.08% ) 1,851,843,683 cycles # 4.926 GHz ( +- 0.08% ) 1,451,763,414 instructions # 0.78 insn per cycle ( +- 0.00% ) 310,270,268 branches # 825.352 M/sec ( +- 0.01% ) 1,649,486 branch-misses # 0.53% of all branches ( +- 0.49% ) 0.376095 +- 0.000478 seconds time elapsed ( +- 0.13% ) So, the COW of the page table may reduce the time of forking. But it builds on the transfer of the copy work to other modified operations to the physical page. > (I don't have time to read the paper, I'd expect an independent summary > in the cover letter) Sure, I will add more performance evaluations and descriptions in the next version. > I have tons of questions regarding rmap, accounting, GUP, page table > walkers, OOM situations in page walkers, but at this point I am not > (yet) convinced that the added complexity is really worth it. So I'd > appreciate some additional information. It seems like I have a lot of work to do. ;-) > > [...] > > > TODO list: > > - Handle the swap > > Scary if that's not easy to handle :/ ;-) > -- > Thanks, > > David / dhildenb > Thanks! [1] https://lore.kernel.org/all/20220131162940.210846-1-david@redhat.com/T/ [2] https://lore.kernel.org/linux-mm/20220315104741.63071-2-david@redhat.com/T/ [3] https://lore.kernel.org/linux-mm/51afa7a7-15c5-8769-78db-ed2d134792f4@redhat.com/T/ [4] https://lore.kernel.org/all/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com/