From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A9F3C05027 for ; Fri, 10 Feb 2023 16:21:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F1E46B015B; Fri, 10 Feb 2023 11:21:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A2416B015D; Fri, 10 Feb 2023 11:21:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3693E6B015E; Fri, 10 Feb 2023 11:21:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 292E06B015B for ; Fri, 10 Feb 2023 11:21:58 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D3C9D160252 for ; Fri, 10 Feb 2023 16:21:57 +0000 (UTC) X-FDA: 80451898674.11.71C6AAF Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf11.hostedemail.com (Postfix) with ESMTP id EF1F64001E for ; Fri, 10 Feb 2023 16:21:54 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b="cDO/ltH9"; spf=pass (imf11.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676046115; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q3mcvpqnGRzxZSmag23CWJUdjs1NpsRhnyJmXgB+2LI=; b=P3T6G2CXThvh77yQnuRZFDLfdwOtfKlHrnysq6HxhfRm1oqww5n1JBIx8pjsuddO0LLFy9 C1N/Fco/FyDOxLXFe3yEzZdXGzzVJzd50KodUS8s8naCl1lt1E+WpVDAco9R0yWexhY5mK w1TI3eLP3+xQOhIhCZiJRXDDPlR8+5M= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b="cDO/ltH9"; spf=pass (imf11.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676046115; a=rsa-sha256; cv=none; b=1eeYKezN440GmqA5UhQNT35KdvlNaeuyCzClre2yEOCn4AoRimDhJUDi3Ui+xoPh8lehr9 j1YDpF8C6mZ4QgALezzS7AkhySpsFZb4Wdksbe8xWQaMc0i/9/J9pzPBv5WS6YBM2NaiZR aZveZIyS39fB0q5XqSfQHJkbNP74EPk= Received: by mail-qt1-f175.google.com with SMTP id q13so6343671qtx.2 for ; Fri, 10 Feb 2023 08:21:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Q3mcvpqnGRzxZSmag23CWJUdjs1NpsRhnyJmXgB+2LI=; b=cDO/ltH9oba2g7m81sHGlz0cBIBBu9U6Y0R4QWT3HoFKc3NTV8+aFwvlkqAIWcOhEr UGt4UCdDpHWAJ0SUEqjNFUp/15BKrduZCAlc4syxNlr+MtSjMrmkOS5daqaAMhvQw5Xg 1fw7wdz/uB+dshxZ098absfclhWw7PkpV5uS9ISBzbAk0H1jixdK0jZVLy5/TReJiM0c lj9te1QyiuH+6f+kC1tAPcXXaoKeT2U+BrTBjlUsgwkP7XtCUuPuQ6RXJDFtRiiB/hdQ GpBuxkLYggD8zZXK4vXV5EiegQdXfkUPWRCTp6XMKUHP3AY2GwkERs3JIDZ+fD7akOWP NLZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Q3mcvpqnGRzxZSmag23CWJUdjs1NpsRhnyJmXgB+2LI=; b=cDfaLfJDThq4TWynPXATSQjN18S/39yR4s1qQxsuRy3A+dpUdOzwGoP3jr0pjXCXTE Z2JkTbz4pIZuxIcsqZDDvfx/etNnTjBk+Wa5wkCWIP+XFWaySgFGh5oXEktOtfyHWOK/ TK0Bx+ZdJdDFGSBrN+necv0waCcWjybI2lNmMwK9XcrXrEFxqcvUzsS0er0HmSjbztHv /lAphX97QF4FZ6irVLaWjAQQT3lzXDUJYmd5C7q79Wm1F13nNk/5HVcxEtyiEJ+IUlbd Bb/aDMkGsdPw5xXSuGTnCulPodlMJJZtJwUcoplYAZg1IgG+4ayDkuMBCBNetTIj1QMd Du5w== X-Gm-Message-State: AO0yUKUZkhAUMv4MJIqTtS74Y8xC1kW+dbFZhvQa9d9/+9faN55MFIxO zQ/Uh/IdpqwH5gz2O4kOb5y/I5Fw4sdOFhG66+8qFA== X-Google-Smtp-Source: AK7set+L6NzhA8suon4TylHNK1DlDLEyOXcc0AMap3AYzq+uL59ZF8RYF/vEPW58PuLQG6/K4kNdvBPwVDipZHv+bVc= X-Received: by 2002:a05:622a:110f:b0:3b8:67d3:343c with SMTP id e15-20020a05622a110f00b003b867d3343cmr2507820qty.301.1676046113869; Fri, 10 Feb 2023 08:21:53 -0800 (PST) MIME-Version: 1.0 References: <20230207035139.272707-1-shiyn.lin@gmail.com> In-Reply-To: From: Pasha Tatashin Date: Fri, 10 Feb 2023 11:21:16 -0500 Message-ID: Subject: Re: [PATCH v4 00/14] Introduce Copy-On-Write to Page Table To: Chih-En Lin Cc: Andrew Morton , Qi Zheng , David Hildenbrand , "Matthew Wilcox (Oracle)" , Christophe Leroy , John Hubbard , Nadav Amit , Barry Song , Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Vlastimil Babka , "Zach O'Keefe" , Yun Zhou , Hugh Dickins , Suren Baghdasaryan , Yu Zhao , Juergen Gross , Tong Tiangen , Liu Shixin , Anshuman Khandual , Li kunyu , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , Vincenzo Frascino , Thomas Gleixner , "Eric W. Biederman" , Andy Lutomirski , Sebastian Andrzej Siewior , "Liam R. Howlett" , Fenghua Yu , Andrei Vagin , Barret Rhoden , Michal Hocko , "Jason A. Donenfeld" , Alexey Gladkov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: EF1F64001E X-Stat-Signature: ig4pbu3ru436tq5nznxk5nuw9uxibpx3 X-Rspam-User: X-HE-Tag: 1676046114-925184 X-HE-Meta: U2FsdGVkX18ubBjs2brk81IpvvPsgfsMSBIsynmFjseOwsMrQ2JjZjbxCO7rCBJzGpOgQ2tJ+SdXgpWawX19DXeXNLtgJ2LZdPPZ/Qm3oH9xRcoJY13wvNYjVYjlHMtNIVsE2RFZ9NZkSHPiucfKfGXBABvqAFTcztVh9/DjL4+KqR0oYgR+3jCS3VRCWy5zs/Mn0EI4vrdnEJ2QfZzuhitdOkeArMycKJd8KmcWTl2XCz+9WWZNKVxSzjfPRNVPmca05BViagPLge33bde3SxxWlAkuf3kE/HJwW/Obzt+JskHrX5NIKSi0agbOzD5IVVtTK/dzTeKBaPPtaFmjZ2RVcVyydSjy2C2LFo7QLmv4t8mD9J5JPx0TFK7bQbe5GdGJl+y2YKedCxPv/HpX2EsodRPAJ+d3+zS798XaB0nO17uQfJ2DJ58lxuYByhsvgEq+NdasC7ZNxfV3qKSwGpSYJVm5KweA2QxMf+6jNhkwdTJcAyYSmolFf6rnM5Agck/f1qye5xc1SgUDEJP7u0WGmdXRMcOSxGUQ6d4MntT6NV9FQJAal5RoiuqaQ4Sk8x6Ssv6fYuiMaAYZKCr1ByZ44SRcLncQpH1154vLgD3oDdUEhccDXymuI+pb/OGIwqpWvw+4pInN9PgJcU6s+TikXyo3PRrSo1iPunurjHWr2AuTWxmblN478ffteOZk31RxLZ++dYICIUulUSVu7PRlwPCz6Sk9N7GPaqf1PTH1MbciXxGIuCCgaP7LwGOL3X3DAbmAHQhQg1F6RtfRA1qlaRMuS0sr5XYWSqyYiumaondkCxYfakXq5y6kOubNc7lItCTRdOOOpofdBP53znBMjxI8aJ3qrKzeh3bgNucwm+RXBNb1xZ5ikgUK74kSo9TbOjoAxa/uXudd4nyUKmEn+d1a8JIlcuB9WWqDlWWmu1adIO1tFkTrx75I70tVqFlHQ4FYfG0dEFp4yoY n3+M4rWv UffdVMyMxQFvfsUywJ9HTIwCZNhbpD4I9XrVwA5tCLmAMLEUx/6H073om4pV416l/nmSjq0fFJYv1h+0DaPOB695oAyMto0HVujnq803jmNhdF/zNWtXzSLCsJ7/9VaWEAgWyqcGVK66bSWKj4U+Y8EikTLiVs2J5ZQUyru/2yTtT8/jxFj0Q9ZbDXWsEy3so5n0txpyxkWkzh9LnMMeW3ryCS6D+PmiL9Ldmg5aWag4qkEN524a0N1VXdLaGrPhsXqLqHVeNdhPPRaYGo0nn1xlVayZZTWLC8MwofXaPFvFCCTfGBauKoH0Z0kkmSejPab9KXfjFIGUwXd+580lZ6qlMoH6Aut7C5xwZqBYy8q/bKVk0ZZMFMgfdPA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > > Currently, copy-on-write is only used for the mapped memory; the child > > > process still needs to copy the entire page table from the parent > > > process during forking. The parent process might take a lot of time and > > > memory to copy the page table when the parent has a big page table > > > allocated. For example, the memory usage of a process after forking with > > > 1 GB mapped memory is as follows: > > > > For some reason, I was not able to reproduce performance improvements > > with a simple fork() performance measurement program. The results that > > I saw are the following: > > > > Base: > > Fork latency per gigabyte: 0.004416 seconds > > Fork latency per gigabyte: 0.004382 seconds > > Fork latency per gigabyte: 0.004442 seconds > > COW kernel: > > Fork latency per gigabyte: 0.004524 seconds > > Fork latency per gigabyte: 0.004764 seconds > > Fork latency per gigabyte: 0.004547 seconds > > > > AMD EPYC 7B12 64-Core Processor > > Base: > > Fork latency per gigabyte: 0.003923 seconds > > Fork latency per gigabyte: 0.003909 seconds > > Fork latency per gigabyte: 0.003955 seconds > > COW kernel: > > Fork latency per gigabyte: 0.004221 seconds > > Fork latency per gigabyte: 0.003882 seconds > > Fork latency per gigabyte: 0.003854 seconds > > > > Given, that page table for child is not copied, I was expecting the > > performance to be better with COW kernel, and also not to depend on > > the size of the parent. > > Yes, the child won't duplicate the page table, but fork will still > traverse all the page table entries to do the accounting. > And, since this patch expends the COW to the PTE table level, it's not > the mapped page (page table entry) grained anymore, so we have to > guarantee that all the mapped page is available to do COW mapping in > the such page table. > This kind of checking also costs some time. > As a result, since the accounting and the checking, the COW PTE fork > still depends on the size of the parent so the improvement might not > be significant. The current version of the series does not provide any performance improvements for fork(). I would recommend removing claims from the cover letter about better fork() performance, as this may be misleading for those looking for a way to speed up forking. In my case, I was looking to speed up Redis OSS, which relies on fork() to create consistent snapshots for driving replicates/backups. The O(N) per-page operation causes fork() to be slow, so I was hoping that this series, which does not duplicate the VA during fork(), would make the operation much quicker. > Actually, at the RFC v1 and v2, we proposed the version of skipping > those works, and we got a significant improvement. You can see the > number from RFC v2 cover letter [1]: > "In short, with 512 MB mapped memory, COW PTE decreases latency by 93% > for normal fork" I suspect the 93% improvement (when the mapcount was not updated) was only for VAs with 4K pages. With 2M mappings this series did not provide any benefit is this correct? > > However, it might break the existing logic of the refcount/mapcount of > the page and destabilize the system. This makes sense. > [1] https://lore.kernel.org/linux-mm/20220927162957.270460-1-shiyn.lin@gmail.com/T/#me2340d963c2758a2561c39cb3baf42c478dfe548 > [2] https://lore.kernel.org/linux-mm/20220927162957.270460-1-shiyn.lin@gmail.com/T/#mbc33221f00c7cf3d71839b45fc23862a5dac3014