From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E31E5C36014 for ; Mon, 31 Mar 2025 20:28:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A6E7E280002; Mon, 31 Mar 2025 16:28:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A1CC9280001; Mon, 31 Mar 2025 16:28:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E6E0280002; Mon, 31 Mar 2025 16:28:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 70772280001 for ; Mon, 31 Mar 2025 16:28:13 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 713895639D for ; Mon, 31 Mar 2025 20:28:13 +0000 (UTC) X-FDA: 83282983266.16.AE26696 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) by imf19.hostedemail.com (Postfix) with ESMTP id 9777E1A0007 for ; Mon, 31 Mar 2025 20:28:11 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gyrY7oXc; spf=pass (imf19.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743452891; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ilqr+LOAJ5Hibc1+LGcyzRIBue1Ilh4gOJYoDX4jBbE=; b=ikXP4tDpLaN/1CLARX7rwbwrotb0L8F59+MDxYlYlsLh2aUhW2j7TAUX6BBQzG3m7gBfuu 9a1EerHF5DPpW5vNaBjzBQ0C6Z8IrPctq3cxFlWU041nVXAnxQfb/+E/civTk4ytLMMsKb weIqRUzVpFNmGUBEc381Xv+r8RbNnoA= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gyrY7oXc; spf=pass (imf19.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743452891; a=rsa-sha256; cv=none; b=CzQZnk8ZBgE3WBYAPnGnJzoVyKhcb88wzzH5Wd3n1X2i4vvjcczIGWya7gpEfBG5iJCxfw jATMhtKTkquFvgevJCOOpxnlwHb7BU/elviYVmvRyQdhS3NERq4/EpziM2qfCZbHPYYiwI 6cc0TBdgGMOdu1aKI34mLk0ZAa/i4OI= Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-5e6ff035e9aso4593221a12.0 for ; Mon, 31 Mar 2025 13:28:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743452890; x=1744057690; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ilqr+LOAJ5Hibc1+LGcyzRIBue1Ilh4gOJYoDX4jBbE=; b=gyrY7oXcUgVp+zHOSx7453gNoXJzKVcVSLevt5wUnPQw4QvFhqAoVFBJkrJYl5gxzO sMdw1DnxtlvelnIO4YvkKMPG7DemMqZo9x+GJk+vJqgN7pXg8ayS7WUVwPV3zR4JWG1f 7CFzGq86/5sS6kIvBSyDihf0Mz70kQEF/s89Xfx+zH09emvPpNnKa9IR4bQ/Cst3/nbu hvaVreSwmoujoKhZGzo0AupKGXj5DPyiCEm7bniVo7EUxrRF02pBZC97aWad+u2C3xXX eaP54kWo6N4gylRqn7jhQdbUZo9U2kxAD6JpwlegDGw266lEDJn6UZdSsElVNG/iL6OX uFjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743452890; x=1744057690; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ilqr+LOAJ5Hibc1+LGcyzRIBue1Ilh4gOJYoDX4jBbE=; b=jv+IWi00xJEsMpPvol7ACwFOs5PRX6/Lj/2UK65O5v62yGfbOF4zb7N0f7dSSdu0t8 ENmkgWOb3wIJQl7mK/cF1f45rA9o8LS0WU13ZSyUIMM8S8x0cvnRclF/urufFcYS8q6/ 4Ujdj/xQM0gKyQ+frVTIdurJmPQsLUpthZXO3s55SP/XFBf1tjOJ2XewfERLSY+ZhDkg lU9LCcpObHPDgK2agqJYFBkGIMWC69G8Ot6LyhQkCh5MOybOcHG7zuHNufxIDjbeAiKj 8BsDfn5lotNIk+pN4cNPAZudwSP7tlES0ioO+hPaFEFu4r8q++omEq3+3WuyPcsCTg+E 6sdg== X-Forwarded-Encrypted: i=1; AJvYcCW0VVNzWzCgcZ7kvsIVYqAHiDF9faJ8dW9DsDmk1cKw6n8vmSVS+b7lKmC8W6tHOCccqcFZEJeMXw==@kvack.org X-Gm-Message-State: AOJu0Yy6H0LMw3DTicn3dmAtvhpjf/xwMEY24YvlaN4mohWSmpK1lIZw UozCCpgWyjmS5ecm3bIBDdOVlM11j7NuAMUJXTWuzrfJ+RDrp16+4gMJuW2SXRHeIPM0dciwNcy pZfJXN7J0hNO11jsMPOZVE9daUeg= X-Gm-Gg: ASbGncuzMAlqjKR13068UQcfLvJBRLhjyff/FLiHsUpxukIoKeWBwMfRAETJ2t3D3lT mI/Nf1JE/03ptOfNJdXVGMvdG21eYue2duTyOrB53bTOdEnk5qiV0M7djfBNDbOEg2KKs8IOW0L xtqmehXVweYA2BWAAM4ftmaTC6IuU35pnhJpM= X-Google-Smtp-Source: AGHT+IHxJc/yOtMlot0nhPk5C2fxhr4CQQ/EIsxYc/CF7zZlyzsQ1rYUb8YNEFqRnsxsIcdyPf2jDOmJE2+ZVCwUhlw= X-Received: by 2002:a05:6402:34c8:b0:5eb:cc1b:773a with SMTP id 4fb4d7f45d1cf-5f02b333448mr327142a12.23.1743452889581; Mon, 31 Mar 2025 13:28:09 -0700 (PDT) MIME-Version: 1.0 References: <4v25j6kbo4flzy5zbubf2w6bgesz2juadevokqlg2ugbe3im3h@h37mcss2aclt> In-Reply-To: <4v25j6kbo4flzy5zbubf2w6bgesz2juadevokqlg2ugbe3im3h@h37mcss2aclt> From: Mateusz Guzik Date: Mon, 31 Mar 2025 22:27:55 +0200 X-Gm-Features: AQ5f1JoSel5l3GDheDz1sz31yHgzFnLuNwnc67CO9IjAghurPFXjcgWLMazTxqA Message-ID: Subject: Re: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded To: "Liam R. Howlett" , Mateusz Guzik , Suren Baghdasaryan , linux-mm , Lorenzo Stoakes , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 9777E1A0007 X-Stat-Signature: gyjigeo5x8aaxgtwrnojy7ocxw85t7i3 X-HE-Tag: 1743452891-437809 X-HE-Meta: U2FsdGVkX18ynuweb3PkHHVehYn0VHa5CAatGnri2nZU3Y2ymHetUpFAWtaPlU6LHmy2/dQgvXcLnz1UVrmNFD9lDbrDLrDfj8WnTiAyFincB7TLfhBFr2LI9W2uEquABsGe7igkZmKE0DSppZhTypyXBR0sN3jufJ4UL3lEmnLwzTnDoH3hYif/jN2ghbpewfFFoMlh8boUY5w2ypDw6UjXctfu4ZDltzK0FyBEVzkK7ylfaNJwh4iaOIY8rpuBgi+3h+1T3Ifs6ph8B7hFEaSh60HAe6uOTUPtjO8mIZyHFvII5GzBnXQIjdcG32CaeUvRc5Cnq8XTbuJ9kByELnCfqtFfI7KtJTgNZi/YAL99hdisazVl3fnuncqq3LZFeSK8/VA3wgwv/s9SdGCg1qd/Wrc+iFhwqUax7nWvl9AxNMnkxoGgJ6qblAtJU0TM+cHaymlx+GYVO/HLCIfyksZBCpFmrHW73k1Y2jqLYIww8kOGbGqS948ak3HXSPp0/z4/USY79XTKeSY/z1L0i/uYaEaUBRTgwGLSV2qq3TdWwtWCskPI/7P9spFrQ7xFU0Hl9t3pl6VUEKQpuwx32JfWEM8vhvnU7evH6lXXyHpJ77jDSFuEk91teOiRevWXcPIZF6cJ65cbJiEQAsLJDFjYuS2L+Ky/WIZBk8ZnUulbimFk7VmWqBCQNb8bhSjHkDA8Umh+1g8TWo9KdRZ9dRgXEkaaq7+L+MqF615tvd+jtKtcCb80LTppnTigbiFIrZn6SFL8NmJIkF7qF3W8FWvVGW0sgnkKgILyhxjy1EjYSP3Md+SivZsBluJSTFWjCTsYT8BU5iT3JQ1vfDHuArTNhL6aQAlNOER/PCxdOKugVLpcBZ6A0uBM32YzmI3Oi3GGAf+3snAeZdH5sQOl6cZIlzc4IUlrpvFMyoDWNHOrn7f9eejJbChv+jkztmj3cFPgpygqVMA4D3SzWXC 3D0Ut7NX 6fh0y4/+lQAtlMGMdvzphax74yOY/sLJ9kc8h7brIZ1bf5D0vQisyFlwwtAumNduStQPPPIz5+9Ut7byWCsgbi9ckcNKHQOgiaPfUjSkkeuv0DJk7kE2wZm5bU6n/EQsCcq4X99NlsSSaJSHAsHDK2kbcRfT0CE6plb7h19FYah3nO7qGA7UNmIAiiv9v8cGI8BQb1GIfBGYfatfvjb7YMeimvxiBwoykUaezFlAboVPG0cK/fUWnpjfV8WFMFjDxLpUAgG6TGdZktIMsUYkhZKObXwc1PVWdxZjk4MdMF8A/N5X6ybCNEnNj1A0vtpecQcfoJIRrFR+IhySQbBXG4khqbtlkPyK9ULXJFIrgmY5SHLuNX8//E5Kc3SEDTlBtOKuYd598m2Q7MAm3lF2J4f7/qb7+0XJZ9K9c X-Bogosity: Ham, tests=bogofilter, spamicity=0.114342, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 31, 2025 at 9:24=E2=80=AFPM Liam R. Howlett wrote: > Reading the commit did trigger my memory about the 5% regression, but I > disregarded it as it seems so unlikely in real life that the benefits > outweighed the upper bounds of 5% negative. > > The 5,000 forks of 10,000 vmas benchmark would also be at least a bit > faster with Suren's more recent rcu safe vma change [1]. > Well it follows it would be more than 5% now? ;) > Although, that doesn't rule out changing this to get higher number, I > still don't think the complexity of the change is worth it for a > contrived benchmark. > I assumed this would be easy enough to reason through for someone familiar with mm (which I'm not). If this is deemed too hairy, then I completely agree it should not be pursued (at least for now, see below). > Reading Documentation/mm/active_mm.rst makes me uncertain of the > mm_count =3D=3D 1 and mm_users =3D=3D 1 test. Since mm_count is the numb= er of > 'lazy' users, and it may no longer include the lazy users.. > The doc states interested callers need to use mmgrab_lazy_tlb() and mmdrop_lazy_tlb() and that the *exact* count wont be known. This is perfectly fine for my proposal as we don't care specifically how many other users are out there (lazy tlb or otherwise), what we do care to know is if there is at least one and that much we are being told in the expected manner: pounding the mm off to lazy tlb also grabs a ->mm_count on it. To be more exact, mmgrab_lazy_tlb() starts with pinning the struct while the mmap semaphore is held. This synchronizes ->mm_count visibility against dup_mmap() which write-locks it. Suppose a process with two threads races with one issuing dup_mmap() and the other one exiting. Further suppose the forking thread got the lock after the thread executing exit_mmap() finished and the mm remains in active use for lazy tlb. Per my previous e-mail it is an invariant that in this case ->mm_count is at least two: 1. the forking thread uses the mm, which implies mm_users > 0, which implies mm_count of at least one. 2. employing the mm for lazy tlb use bumped the count, which means it has to be at least two, regardless of how many lazy uses managed to elide refcount manipulation afterwards But the count being two disables the optimization. Also note the forking thread got the lock first, ->mm_count cannot transition 1 -> 2 on the account of lazy tlb flush as in the worst case it is waiting for the lock (besides, this implies ->mm_users of at least two due to the thread waiting which also disables the optimization). > Since there are no real workloads that suffer, is this worth it? > It is unclear to me if you are criticizing this idea specifically (given your doubts of whether it even works) or the notion of this kind of optimization to begin with. And this brings me back to kernel build, which I'm looking at for kicks. There is a lot of evil going on there from userspace pov, which partially obfuscates kernel-side slowdowns, of which there are plenty (and I whacked some of them on the VFS side, more in the pipeline). I don't expect dodging vma locking to be worth much in isolation in a real workload. I do expect consistent effort to provide smooth execution in commonly used code paths (fork being one) to add up. Apart from being algorithmically sound in its own right, smooth execution means no stalls for no stalls which can be easily avoided (notably cache misses and lock-prefixed ops). To give you a specific example, compilation of C files ran by gnu make goes through the shell -- as in extra fork + exec trip on top of spawning the compiler. In order to model this, with focus on the kernel side, I slapped in a system("gcc -c -o /tmp/crap.o src.c") testcase into will-it-scale (src.c is hello world with one #include clause, important for later). The use of system() suffers the shell trip in the same manner. I'm seeing about 20% system time, about a third of which looks suspicious at best. For example over 1% of the total (or over 4% of kernel time) is spent in sync_regs() copying 168 bytes using rep movsq. Convincing the compiler to use regular stores instead dropped the routine to about 0.1% and gave me a measurable boost in throughput of these compiles. Will this speed up compilation of a non-toy file? In isolation, no. I do however claim that: 1. consistent effort to whack avoidable slowdowns adds up, you don't need to do anything revolutionary (albeit that's welcome) 2. the kernel is chock full of slowdowns which don't need to be there, but I'm not going to rant about specific examples 3. the famed real workloads are shooting themselves in their feet on the regular, making it really hard to show improvement on the kernel side, apart from massive changes or fixing utter breakage So, for the vma lock avoidance idea, it may be it requires too much analysis to justify it. If so, screw it. I do stand by what I tries to accomplish tho (whacking atomics in the fast path in the common case). --=20 Mateusz Guzik