From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84000D3C52C for ; Thu, 17 Oct 2024 20:13:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AFDC6B0083; Thu, 17 Oct 2024 16:13:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 138536B0085; Thu, 17 Oct 2024 16:13:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1BC16B0088; Thu, 17 Oct 2024 16:13:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CE03C6B0083 for ; Thu, 17 Oct 2024 16:13:09 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CF4EC8031C for ; Thu, 17 Oct 2024 20:12:59 +0000 (UTC) X-FDA: 82684193004.24.C1B1255 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf25.hostedemail.com (Postfix) with ESMTP id 94831A0024 for ; Thu, 17 Oct 2024 20:13:00 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TdoPP4Eq; spf=pass (imf25.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729195939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ja6Fh9/kqcvQYixmRlkAxQv+Z2SX3C04BQvu9t30ons=; b=ajCm1Qtba20PdxXvJogVNBjsUmt2nCRCGdVUNe1c4S0vypS4xiJJKyGN7F0ojWRPUky3D2 78Y5AqTVNPPejHh1TGqtKRjePFmM7X0hLSFPl/sqYioVS5mcPDvEhgMXo/aAx2cJGg4Bcq rml7u9hAMJ18yX//Svxxit6D7LLyHm4= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TdoPP4Eq; spf=pass (imf25.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729195939; a=rsa-sha256; cv=none; b=2/+CbRNteGWJ2WbabUccmVQQOr0hnG3iCO4luMeLWcEiBTQPiKOFWtnL8YHXsTXSTgX3Jk Q3e1OxEw2H6EOiIcMBuNYuJijWfIr4Ugb1P6oMUfgdH5+h8lGktTPKtB4NyJEJNJOkytck L84WIDnjuxbSB0RMB47+Z8WMBeay9og= Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-71e5130832aso960717b3a.0 for ; Thu, 17 Oct 2024 13:13:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729195986; x=1729800786; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ja6Fh9/kqcvQYixmRlkAxQv+Z2SX3C04BQvu9t30ons=; b=TdoPP4Eq8gG3HH4TUn7m+b7jDgw+1rWc8Mxk9NmQfcmVK1SSeupuLfUG7/yBjvlo76 9NCi2X3297nC8Z9iuLOCcuSE9FfKz77vqL9dl+sps0f6cnKtogRB30PWv6i7A4HvYvCc Qgd0UaqekCnkjZ+XEZizsnjlAPNgFq84EMr3xVehqJeu+m6Mzn4Ju42ZGIGm3asvXfaG JzZS8bdqN4O4ugSqXtLu34zdBzMbqbX/NMoUASBhbvG5bI4LNt+7D73vngSVDP53hErk /PSU01JbXwTmzPpvYOovxSkfnXbR8AzRSDiG28zWolmF+31ntnyFluYR4Bq8t20LhfIm shxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729195986; x=1729800786; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ja6Fh9/kqcvQYixmRlkAxQv+Z2SX3C04BQvu9t30ons=; b=ergnbhDlvgQGb/fOp7xAp0PatuuC4GqpwLx6iqNXVDzhxldqszlLGmj0sLZ6fFb6ik uELH+9ygUc7s3MwICr5CljzHmYQGAWa4zeoOEneXQvjb2H20mVBbCfaw2zQrKWPskWLZ pkSG7rcIzsuMaujeRz6oQsLQrlogKy3+bCAgJB/TaHRGCIvtDsIkxS1gxrTn3xMnsMdE 8MDAXhZonJaVzTBzPJIWNuWPFDlehP+cOz6takeCzjBiCg7zPEP0HTL3uo89RKoJTUot 70sftB1ClVE7qk60G18VJJn1jAO2BTw56HoTZgrWdw3UrpXkIQpU36kB2ztx0LMMpJcK +Eew== X-Forwarded-Encrypted: i=1; AJvYcCWAYKthIfCd41b5iJbrq4chmCN/VRmB/kHuXgXt3wKTjCax8xO/qC06XjlEaXauyvy8mmiJpSedpA==@kvack.org X-Gm-Message-State: AOJu0YzZN/kmt216CLad8h8A2z6MMBMwxQZoQvHPa3oYd0h1WvLmSI98 jG0BlREWnV0chzaRkxRjhi4yCRmPSK1rUgZqrQ2zGaZ7f6IvIfBQwGnoJ0B+wtBUZ1YNSJH/MSd h4ZWMQuTabRGJCfOkUKsZ2cFQZI8= X-Google-Smtp-Source: AGHT+IEuqSjPzyt42X54W6LXGj8i0fvc5Tl87DgX/uq7AMOSYMFpeKD+1oEd6pxVlSDusiJ6RH5KvAGANmecM7W8NS4= X-Received: by 2002:a05:6a00:4b14:b0:71e:e4f:3e58 with SMTP id d2e1a72fcca58-71ea31e5553mr235117b3a.17.1729195986174; Thu, 17 Oct 2024 13:13:06 -0700 (PDT) MIME-Version: 1.0 References: <20241010205644.3831427-1-andrii@kernel.org> <20241010205644.3831427-3-andrii@kernel.org> <55hskn2iz5ixsl6wvupnhx7hkzcvx2u4muswvzi4wuqplmu2uo@rj72ypyeksjy> In-Reply-To: From: Andrii Nakryiko Date: Thu, 17 Oct 2024 13:12:53 -0700 Message-ID: Subject: Re: [PATCH v3 tip/perf/core 2/4] mm: switch to 64-bit mm_lock_seq/vm_lock_seq on 64-bit architectures To: Suren Baghdasaryan Cc: Shakeel Butt , Andrii Nakryiko , linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, akpm@linux-foundation.org, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, mhocko@kernel.org, vbabka@suse.cz, hannes@cmpxchg.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: gyqbnmzc89833t4gx193weg17jh75j8h X-Rspamd-Queue-Id: 94831A0024 X-Rspamd-Server: rspam11 X-HE-Tag: 1729195980-142674 X-HE-Meta: U2FsdGVkX19SovAEWO1hqdbPT6attzaASzuRkTImAdraxoVSznxXGqaDXSk/JAzIXRyGpQUzsm9b/JdDggWDbp/lr2AbeGt9B0OyEVtygVADU1Q3lFbt3UhI3NSek5iaXt+7RZin8/rSZ2Gl9Gy+AkGavWgzyCLICOE1ZSWPbGmA27YyqAp8l/tyyPP70tHnJVviBCGKtI6fGCtNZLh8pPQD9q0tAEyxkwRKB6uhp/JRddC8IUQU1JUWdH5QbVfoWh3P2d5GAxxMyxYNpM/V77owyK1swLZVp9dhId/lWQQS3y6WC3HxhnPpz9KvhDOWCpX9qA0dp7wyiCxMfE19JzVFFXUsAceIE54WxYUkzWnxi+acJwGdf2g5ij/6APLz57kalHpiIIOujGht5ZBOxCvPdXIPv5xw4Fb4YWyaGYH0ZqXoqLydlwtV8vcON5+NUgh65wI0kcIVwQ1/bUEfAMc1W8x4jM55eZ/F/dWZPp86CVPQppp9LCbS8d1tnpMQ6INr1SQqYd2d97Bb4hDmeswW03qvsi0ciFtUjQ/SN/US+P9cYmxQoWTq3/KtGksSqGOARYNIYj+Z7lvC5An7X4VMeZkV4EvGKcopHzh8ktxWP0Zb/Dq4gVeU5+o/63gDhDVIUoAcu2WBr9uL6QHvIqR9joh3QuTZ6J9Bi/mWBEwSNlOfcORV+LkEzIfIEgqhECdGCl3QrDJdEuCG/2PaKFmZF6CqeCDjg9gQ6sg4THQwnyFHifcyNZtSgwnPgly75IRGrcW7HREdtzHo+gIVqOd8Awirw/9ldifGeFulMnsjloMuUyHvzTvin7l2vccn6uh+8ExgDlbmG4H36EUlCh7eVI31adRyNzFdL7y1gnbiA12/cK9eW5+nXwWl8XiqWpIAFZUJEUg+nLH8FKwL/aVn/YGBKQl947PS6xGZ73+EpdrPYxY1d/jxG/iGpuJwfGHiql8qKVmEf2zkCvs hTQ+7IZa dsb0CjKAqsEgJC7wA1aokqDE6rsAkvdLe4eQlF5PbJ2NItMhfs/Ibco1KSqC3HqWYlRGcixeHlF6ilCz7mHIjLIzlMwrFqYhRy2KjEc3o/Xz2DUkIcLtiynhkbalIgJMupeEPnwRcG48QOPIHJhkvAdKol1FTrr7/dLt2HdaPQblY0hXFw1x5POA4wm+RK6E/hxBj6smU/DyxgcmDshlnIUzWPOr7lQp/zhUy6Wmay8fKcDmU/S34sVL/C+uzRNBUaBMOcGObaY2zcnkZNl2maoGqj12dPtpja4dyD+5oZAukMj058h3r1LGBs9UIyacFI5EkKJ0xfiHw8RLfXU8iCsRJNoevn/f1YBr5a4xMWh71LIny/fYsuoBrf29fOqIxsCt6tftRngmVe6vWBpVjmXKvEBuasj4d+XP4IYsGyA2OlDYuk4ZdHfbPXjF/GecRGpUHr65NCqHX0yG1yWc5HC5HFnLSluSYiwGfBeWQWVIqSyhQ519X5vkmnx7aGkICUp8h76WC/W8yDMaHKiMsfc7fyw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 17, 2024 at 12:42=E2=80=AFPM Suren Baghdasaryan wrote: > > On Thu, Oct 17, 2024 at 11:55=E2=80=AFAM Andrii Nakryiko > wrote: > > > > On Wed, Oct 16, 2024 at 7:02=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > On Sun, Oct 13, 2024 at 12:56=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > On Thu, Oct 10, 2024 at 01:56:42PM GMT, Andrii Nakryiko wrote: > > > > > To increase mm->mm_lock_seq robustness, switch it from int to lon= g, so > > > > > that it's a 64-bit counter on 64-bit systems and we can stop worr= ying > > > > > about it wrapping around in just ~4 billion iterations. Same goes= for > > > > > VMA's matching vm_lock_seq, which is derived from mm_lock_seq. > > > > > > vm_lock_seq does not need to be long but for consistency I guess that > > > > How come, we literally assign vm_lock_seq from mm_lock_seq and do > > direct comparisons. They have to be exactly the same type, no? > > Not necessarily. vm_lock_seq is a snapshot of the mm_lock_seq but it > does not have to be a "complete" snapshot. Just something that has a > very high probability of identifying a match and a rare false positive > is not a problem (see comment in > https://elixir.bootlin.com/linux/v6.11.3/source/include/linux/mm.h#L678). > So, something like this for taking and comparing a snapshot would do: > > vma->vm_lock_seq =3D (unsigned int)mm->mm_lock_seq; > if (vma->vm_lock_seq =3D=3D (unsigned int)mm->mm_lock_seq) Ah, ok, I see what's the idea behind it, makes sense. > > > > > > makes sense. While at it, can you please change these seq counters to > > > be unsigned? > > > > There is `vma->vm_lock_seq =3D -1;` in kernel/fork.c, should it be > > switched to ULONG_MAX then? In general, unless this is critical for > > correctness, I'd very much like stuff like this to be done in the mm > > tree afterwards, but it seems trivial enough, so if you insist I'll do > > it. > > Yeah, ULONG_MAX should work fine here. vma->vm_lock_seq is initialized > to -1 to avoid false initial match with mm->mm_lock_seq which is > initialized to 0. As I said, a false match is not a problem but if we > can avoid it, that's better. ok, ULONG_MAX and unsigned long it is, will update > > > > > > Also, did you check with pahole if the vm_area_struct layout change > > > pushes some members into a difference cacheline or creates new gaps? > > > > > > > Just did. We had 3 byte hole after `bool detached;`, it now grew to 7 > > bytes (so +4) and then vm_lock_seq itself is now 8 bytes (so +4), > > which now does push rb and rb_subtree_last into *THE SAME* cache line > > (which sounds like an improvement to me). vm_lock_seq and vm_lock stay > > in the same cache line. vm_pgoff and vm_file are now in the same cache > > line, and given they are probably always accessed together, seems like > > a good accidental change as well. See below pahole outputs before and > > after. > > Ok, sounds good to me. Looks like keeping both sequence numbers 64bit > is not an issue. Changing them to unsigned would be nice and trivial > but I don't insist. You can add: > > Reviewed-by: Suren Baghdasaryan thanks, will switch to unsigned in the next revision (next week, probably, to let some of the pending patches land) > > > > > That singular detached bool looks like a complete waste, tbh. Maybe it > > would be better to roll it into vm_flags and save 8 bytes? (not that I > > want to do those mm changes in this patch set, of course...). > > vm_area_struct is otherwise nicely tightly packed. > > > > tl;dr, seems fine, and detached would be best to get rid of, if > > possible (but that's a completely separate thing) > > Yeah, I'll take a look at that. Thanks! > > > > > BEFORE > > =3D=3D=3D=3D=3D=3D > > struct vm_area_struct { > > union { > > struct { > > long unsigned int vm_start; /* 0 8= */ > > long unsigned int vm_end; /* 8 8= */ > > }; /* 0 16= */ > > struct callback_head vm_rcu; /* 0 16= */ > > } __attribute__((__aligned__(8))); /* 0 16= */ > > struct mm_struct * vm_mm; /* 16 8= */ > > pgprot_t vm_page_prot; /* 24 8= */ > > union { > > const vm_flags_t vm_flags; /* 32 8= */ > > vm_flags_t __vm_flags; /* 32 8= */ > > }; /* 32 8= */ > > bool detached; /* 40 1= */ > > > > /* XXX 3 bytes hole, try to pack */ > > > > int vm_lock_seq; /* 44 4= */ > > struct vma_lock * vm_lock; /* 48 8= */ > > struct { > > struct rb_node rb; /* 56 24= */ > > /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago= --- */ > > long unsigned int rb_subtree_last; /* 80 8= */ > > } /* 56 32= */ > > struct list_head anon_vma_chain; /* 88 16= */ > > struct anon_vma * anon_vma; /* 104 8= */ > > const struct vm_operations_struct * vm_ops; /* 112 8= */ > > long unsigned int vm_pgoff; /* 120 8= */ > > /* --- cacheline 2 boundary (128 bytes) --- */ > > struct file * vm_file; /* 128 8= */ > > void * vm_private_data; /* 136 8= */ > > atomic_long_t swap_readahead_info; /* 144 8= */ > > struct mempolicy * vm_policy; /* 152 8= */ > > struct vma_numab_state * numab_state; /* 160 8= */ > > struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 168 8= */ > > > > /* size: 176, cachelines: 3, members: 18 */ > > /* sum members: 173, holes: 1, sum holes: 3 */ > > /* forced alignments: 2 */ > > /* last cacheline: 48 bytes */ > > } __attribute__((__aligned__(8))); > > > > AFTER > > =3D=3D=3D=3D=3D > > struct vm_area_struct { > > union { > > struct { > > long unsigned int vm_start; /* 0 8= */ > > long unsigned int vm_end; /* 8 8= */ > > }; /* 0 16= */ > > struct callback_head vm_rcu; /* 0 16= */ > > } __attribute__((__aligned__(8))); /* 0 16= */ > > struct mm_struct * vm_mm; /* 16 8= */ > > pgprot_t vm_page_prot; /* 24 8= */ > > union { > > const vm_flags_t vm_flags; /* 32 8= */ > > vm_flags_t __vm_flags; /* 32 8= */ > > }; /* 32 8= */ > > bool detached; /* 40 1= */ > > > > /* XXX 7 bytes hole, try to pack */ > > > > long int vm_lock_seq; /* 48 8= */ > > struct vma_lock * vm_lock; /* 56 8= */ > > /* --- cacheline 1 boundary (64 bytes) --- */ > > struct { > > struct rb_node rb; /* 64 24= */ > > long unsigned int rb_subtree_last; /* 88 8= */ > > } /* 64 32= */ > > struct list_head anon_vma_chain; /* 96 16= */ > > struct anon_vma * anon_vma; /* 112 8= */ > > const struct vm_operations_struct * vm_ops; /* 120 8= */ > > /* --- cacheline 2 boundary (128 bytes) --- */ > > long unsigned int vm_pgoff; /* 128 8= */ > > struct file * vm_file; /* 136 8= */ > > void * vm_private_data; /* 144 8= */ > > atomic_long_t swap_readahead_info; /* 152 8= */ > > struct mempolicy * vm_policy; /* 160 8= */ > > struct vma_numab_state * numab_state; /* 168 8= */ > > struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 176 8= */ > > > > /* size: 184, cachelines: 3, members: 18 */ > > /* sum members: 177, holes: 1, sum holes: 7 */ > > /* forced alignments: 2 */ > > /* last cacheline: 56 bytes */ > > } __attribute__((__aligned__(8))); > > > > > > > > > > > > > > I didn't use __u64 outright to keep 32-bit architectures unaffect= ed, but > > > > > if it seems important enough, I have nothing against using __u64. > > > > > > > > > > Suggested-by: Jann Horn > > > > > Signed-off-by: Andrii Nakryiko > > > > > > > > Reviewed-by: Shakeel Butt > > >