From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B747C3DA4A for ; Thu, 8 Aug 2024 19:39:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E092F6B0085; Thu, 8 Aug 2024 15:39:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB8AE6B0092; Thu, 8 Aug 2024 15:39:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C80456B0095; Thu, 8 Aug 2024 15:39:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A96066B0085 for ; Thu, 8 Aug 2024 15:39:11 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 365FF16163E for ; Thu, 8 Aug 2024 19:39:11 +0000 (UTC) X-FDA: 82430091702.25.EDDF164 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf05.hostedemail.com (Postfix) with ESMTP id 64DD2100007 for ; Thu, 8 Aug 2024 19:39:09 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=GkQl1XHx; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of surenb@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723145896; a=rsa-sha256; cv=none; b=PPOEgP/FsuROJLYdqaTqHoxG/+3KLAMppyYw852/qwKgMFGa4BKta80tsZwudL83SrIrEZ iKzO5G43i/eLo9U4E2IyiEp3t4azq56h5+fOV6BKKq9OgLPHDlI8XWreGpLe/SbnnO+QpN C/ZkXC05VxJAH6ITTWFDUGqteHgzErE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=GkQl1XHx; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of surenb@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723145896; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p9WjPqxVi+qyY0gD+I3pguLmX9srGyitoSJtP9fP+jw=; b=CtTkebr7J73d7X2SiyXVDZlgRhT3fuemp3YgPddh6NlxgyHmv7SLjhuzeNm7NIP24s+MDZ 3/Sa1CxbTtiKuhCpJfmy0dj2pUdC/EtsPek55I7meMwZxw7xKK2gJe1XiHKxLMOGpIKiF3 bIpnkO1eJ6uZ6bbrUfaf3Ixru2IS4Vk= Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-691bb56eb65so13152217b3.0 for ; Thu, 08 Aug 2024 12:39:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723145948; x=1723750748; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=p9WjPqxVi+qyY0gD+I3pguLmX9srGyitoSJtP9fP+jw=; b=GkQl1XHxt2JEcXPEjomupM2OVmaIRh8kzT34ReCAINJB8peVz0IjQA5JvpIOsQVr1z cM2l+ELF/PYuunSJwoBhw5FNvdWpdFfYNxuSeYIF7cTxMuJC45NYIm++TF3fIpeCe12V ubAbZw1KT5maRu05b67v+HV1dZwN4Eu3rhV78zCiXTO+Q1TCryg5eMSVkZAiyXqHapDm gAT+LqxJSfffbc9lKCL9gl2CVOT4C0XH1sD4616yZyJOh/TAwpwwTCoZvTGar4ddiJ/+ zBl5/nPE3Ee2Mv5pT1hgOUG0ayuAFNHK73SsPyCsy/a1LvFC+oOOVLBWaINRDP1BPdR+ 3nnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723145948; x=1723750748; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p9WjPqxVi+qyY0gD+I3pguLmX9srGyitoSJtP9fP+jw=; b=dXrkDQLbq/jmsKVT5fUcKCgVXOSns01yA1AbLN9hoymrWZKXYoMbUBWM8a57PIgzJh fR99c0EzHm/HQEu1zHhftjZeF0TFTXXqFy1+OsEiqzLn85+cr/6g0jx5nzMk9AOR5Vv3 C+d7VvI02fX7T+0d+Y3aSMML90CqL2SBComCngTlf0NEHEBtBFdXmxVObv6UJlkOFspw /vI/PGNlZ3uWsUWypA9OqKJJzQRaIj/AMyFnYmGz6JyEIqRsbRG7KLB0XREkQyvns1AK BsfV6ubnIxqUXwMXYEeFd4Yh0SfnB4YY8Dt0nFLmHAnfq9KNBmt2UR1sByJZ0DaGOmLd gK6g== X-Forwarded-Encrypted: i=1; AJvYcCXxuUKuzeR5nc7bA6x0WDcmH4Dy1Ph13YTeZQwg/T3AWNtSd0vxDYBBuWrkp2LEVIuughsjgjWBTx1LaIkS4jw90oU= X-Gm-Message-State: AOJu0YzM8Ksflg0G765P08z2Iz8JL7B3haFQrPuYf/gMrJS7EBtaR8OQ oSgnxx5kP3dYLJuzKFrj840IA75u/9ZnNQCAJk8tF8AV7foeWOYYMzBpL3okvGkqPVYVcXo8sTd qHC6zXy7SS6rY1jWPcjexw6/P5x+17LIJJs+C X-Google-Smtp-Source: AGHT+IHn5t+U3VeTjnzSaR/vEuHLpp1i0k8EvI299d9tn+X095/ahlUlJVPlB+VGjJMyXedNhNnX2lsnFIsBkFKdCQ8= X-Received: by 2002:a05:690c:60c4:b0:63c:416f:182d with SMTP id 00721157ae682-69bf7d075b1mr43882077b3.12.1723145948011; Thu, 08 Aug 2024 12:39:08 -0700 (PDT) MIME-Version: 1.0 References: <20240808185949.1094891-1-mjguzik@gmail.com> In-Reply-To: <20240808185949.1094891-1-mjguzik@gmail.com> From: Suren Baghdasaryan Date: Thu, 8 Aug 2024 19:38:55 +0000 Message-ID: Subject: Re: [RFC PATCH] vm: align vma allocation and move the lock back into the struct To: Mateusz Guzik Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Liam.Howlett@oracle.com, vbabka@suse.cz, lstoakes@gmail.com, pedro.falcato@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 64DD2100007 X-Stat-Signature: 3c9r477fz47ortzpd6ys6cswagdmoe8f X-Rspam-User: X-HE-Tag: 1723145949-868389 X-HE-Meta: U2FsdGVkX1/fZ1M/mZfQL27EjwcKrriqRR35en8osPmH9nPz6RPPOy+VvRWQg+zITbExA4DtHM9TH1JYdx/bX/LwtaVkRFHaZJTPMTCg4STA+xkDVh4eqFjeGpnTzOnYsu599INDJL0yngOvUFDFiI3ZzPv3FfNzEc6wNAUoFN2ZEKC8hoC/lL9xmETI4/Oo4dJ3KnuxenAClHNc3cWnje1YU/X2f6f4MQD+ZdC+j66XPYTd0wH4/XjJeIAZdRQ0iFoWtEQpaHCrEmqP8MvhIrrcTqQSRFYi/2q73vkEd9nVDypDsyQnteAfW0OpKqRz7ZmfEYseJJ7e37J75RT98zsSbxQWE16UhTC/8sOsTPmhXgDsBq2TG5PusEBRqWs2AHnNTwmfknNQjasnSUXBd+dRhJOYqCMbP0PZRuWNg3HruCfgmO2WIPsc31l1zh8H825mK55w4JzrUS7j8daVbub6xUAKM4vsYqq6pT9Ukuo96cOTbSQATGGIjI5lNYmQJsd35ADGW4PB5cdoFbb1CBzkzdq+c8mjY463g9POkZMKKzZctacMNIqbY7vo/R7olzPoW2L9YWM597X1KDuV/xn+yfLcrS0I5DrI06aDm9P6QNKD1PjKl0C3OhODsj9a8npJazWqXVj8WO4nnCkBC+zLqgun8zNgAAu3XFZbitLnPlStIiCQlCvUlShuVSQDOpd6NIpE5xvIwge8sT6WB9kHnhBJ6enOiDeQnUHbjYMlPsrIqxYvAGSX+8ANRvOeFe9bYdyNv6sFLEQwZW48rPzahhN5sWhwgylSFKt5XYbWf+zWdykpVJl65q/GcRMlqjILMBszrJON1zWZREGIII1QcFNJrw3DVNDgIqt2kCZHY9+Qo+XRL5SHgEjg1lt+FXc7SwZ0g2bS3//MRFiRRrGAQIhto8shP6g6wP75INie5q6MrdKVT38VA4s1B8sS4gMUK9PgM4MtEIIYKSW jwpzQJ46 fR45Sr8p7BEgfAmoDa91vPQfRhH/lxomIZTlC8ZF4L+duA3ya/actQzbTbBvGsv2ot1ET0BOgfnKZ86ZjWMpE7w9t1q/0Of+ZNTLuJnHN8/7MAuWK8i0l9UW+msXL93dw8RuK6njuDdNVqOTQyxDlGS+DE1wN63m7pSyeAjkXkgen30h/vo8J1cH0PsRyFRjG5LuTOWEfUMHOra8WRDtr9hDTQcYniVEqZV9nT9do/za2LX1MMt2FFDUu4b8fNrsKn53NLlkwIqSzmhvp2wOaefc8QvdVKxy4ht2jx8l9S7wpX3Q0ZUwx5UfUybOEqw9Iuvz6RTuuGr+IIBMtmrNXsACrWA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 8, 2024 at 7:00=E2=80=AFPM Mateusz Guzik wr= ote: > > ACHTUNG: this is more of a request for benchmarking than a patch > proposal at this stage > > I was pointed at your patch which moved the vma lock to a separate > allocation [1]. The commit message does not say anything about making > sure the object itself is allocated with proper alignment and I found > that the cache creation lacks the HWCACHE_ALIGN flag, which may or may > not be the problem. > > I verified with a simple one-liner than on a stock kernel the vmas keep > roaming around with a 16-byte alignment: > # bpftrace -e 'kretprobe:vm_area_alloc { @[retval & 0x3f] =3D count(); }= ' > @[16]: 39 > @[0]: 46 > @[32]: 53 > @[48]: 56 > > Note the stock vma lock cache also lacks the alignment flag. While I > have not verified experimentally, if they are also romaing it would mean > that 2 unrelated vmas can false-share locks. If the patch below is a > bust, the flag should probably be added to that one. > > The patch has slapped-around vma lock cache removal + HWALLOC for the > vma cache. I left a pointer to not change relative offsets between > current fields. I does compile without CONFIG_PER_VMA_LOCK. > > Vlastimil says you tested a case where the struct got bloated to 256 > bytes, but the lock remained separate. It is unclear to me if this > happened with allocations made with the HWCACHE_ALIGN flag though. > > There is 0 urgency on my end, but it would be nice if you could try > this out with your test rig. Hi Mateusz, Sure, I'll give it a spin but I'm not optimistic. Your code looks almost identical to my latest attempt where I tried placing vm_lock into different cachelines including a separate one and using HWCACHE_ALIGN. And yet all my attempts showed regression. Just FYI, the test I'm using is the pft-threads test from mmtests suite. I'll post results today evening. Thanks, Suren. > > 1: https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.co= m/T/#u > > --- > include/linux/mm.h | 18 +++++++-------- > include/linux/mm_types.h | 10 ++++----- > kernel/fork.c | 47 ++++------------------------------------ > mm/userfaultfd.c | 6 ++--- > 4 files changed, 19 insertions(+), 62 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 43b40334e9b2..6d8b668d3deb 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -687,7 +687,7 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) > if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_l= ock_seq)) > return false; > > - if (unlikely(down_read_trylock(&vma->vm_lock->lock) =3D=3D 0)) > + if (unlikely(down_read_trylock(&vma->vm_lock) =3D=3D 0)) > return false; > > /* > @@ -702,7 +702,7 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) > * This pairs with RELEASE semantics in vma_end_write_all(). > */ > if (unlikely(vma->vm_lock_seq =3D=3D smp_load_acquire(&vma->vm_mm= ->mm_lock_seq))) { > - up_read(&vma->vm_lock->lock); > + up_read(&vma->vm_lock); > return false; > } > return true; > @@ -711,7 +711,7 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) > static inline void vma_end_read(struct vm_area_struct *vma) > { > rcu_read_lock(); /* keeps vma alive till the end of up_read */ > - up_read(&vma->vm_lock->lock); > + up_read(&vma->vm_lock); > rcu_read_unlock(); > } > > @@ -740,7 +740,7 @@ static inline void vma_start_write(struct vm_area_str= uct *vma) > if (__is_vma_write_locked(vma, &mm_lock_seq)) > return; > > - down_write(&vma->vm_lock->lock); > + down_write(&vma->vm_lock); > /* > * We should use WRITE_ONCE() here because we can have concurrent= reads > * from the early lockless pessimistic check in vma_start_read(). > @@ -748,7 +748,7 @@ static inline void vma_start_write(struct vm_area_str= uct *vma) > * we should use WRITE_ONCE() for cleanliness and to keep KCSAN h= appy. > */ > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > - up_write(&vma->vm_lock->lock); > + up_write(&vma->vm_lock); > } > > static inline void vma_assert_write_locked(struct vm_area_struct *vma) > @@ -760,7 +760,7 @@ static inline void vma_assert_write_locked(struct vm_= area_struct *vma) > > static inline void vma_assert_locked(struct vm_area_struct *vma) > { > - if (!rwsem_is_locked(&vma->vm_lock->lock)) > + if (!rwsem_is_locked(&vma->vm_lock)) > vma_assert_write_locked(vma); > } > > @@ -827,10 +827,6 @@ static inline void assert_fault_locked(struct vm_fau= lt *vmf) > > extern const struct vm_operations_struct vma_dummy_vm_ops; > > -/* > - * WARNING: vma_init does not initialize vma->vm_lock. > - * Use vm_area_alloc()/vm_area_free() if vma needs locking. > - */ > static inline void vma_init(struct vm_area_struct *vma, struct mm_struct= *mm) > { > memset(vma, 0, sizeof(*vma)); > @@ -839,6 +835,8 @@ static inline void vma_init(struct vm_area_struct *vm= a, struct mm_struct *mm) > INIT_LIST_HEAD(&vma->anon_vma_chain); > vma_mark_detached(vma, false); > vma_numab_state_init(vma); > + init_rwsem(&vma->vm_lock); > + vma->vm_lock_seq =3D -1; > } > > /* Use when VMA is not part of the VMA tree and needs no locking */ > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 003619fab20e..caffdb4eeb94 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -615,10 +615,6 @@ static inline struct anon_vma_name *anon_vma_name_al= loc(const char *name) > } > #endif > > -struct vma_lock { > - struct rw_semaphore lock; > -}; > - > struct vma_numab_state { > /* > * Initialised as time in 'jiffies' after which VMA > @@ -716,8 +712,7 @@ struct vm_area_struct { > * slowpath. > */ > int vm_lock_seq; > - /* Unstable RCU readers are allowed to read this. */ > - struct vma_lock *vm_lock; > + void *vm_dummy; > #endif > > /* > @@ -770,6 +765,9 @@ struct vm_area_struct { > struct vma_numab_state *numab_state; /* NUMA Balancing state *= / > #endif > struct vm_userfaultfd_ctx vm_userfaultfd_ctx; > +#ifdef CONFIG_PER_VMA_LOCK > + struct rw_semaphore vm_lock ____cacheline_aligned_in_smp; > +#endif > } __randomize_layout; > > #ifdef CONFIG_NUMA > diff --git a/kernel/fork.c b/kernel/fork.c > index 92bfe56c9fed..eab04a24d5f1 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -436,35 +436,6 @@ static struct kmem_cache *vm_area_cachep; > /* SLAB cache for mm_struct structures (tsk->mm) */ > static struct kmem_cache *mm_cachep; > > -#ifdef CONFIG_PER_VMA_LOCK > - > -/* SLAB cache for vm_area_struct.lock */ > -static struct kmem_cache *vma_lock_cachep; > - > -static bool vma_lock_alloc(struct vm_area_struct *vma) > -{ > - vma->vm_lock =3D kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL); > - if (!vma->vm_lock) > - return false; > - > - init_rwsem(&vma->vm_lock->lock); > - vma->vm_lock_seq =3D -1; > - > - return true; > -} > - > -static inline void vma_lock_free(struct vm_area_struct *vma) > -{ > - kmem_cache_free(vma_lock_cachep, vma->vm_lock); > -} > - > -#else /* CONFIG_PER_VMA_LOCK */ > - > -static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return t= rue; } > -static inline void vma_lock_free(struct vm_area_struct *vma) {} > - > -#endif /* CONFIG_PER_VMA_LOCK */ > - > struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) > { > struct vm_area_struct *vma; > @@ -474,10 +445,6 @@ struct vm_area_struct *vm_area_alloc(struct mm_struc= t *mm) > return NULL; > > vma_init(vma, mm); > - if (!vma_lock_alloc(vma)) { > - kmem_cache_free(vm_area_cachep, vma); > - return NULL; > - } > > return vma; > } > @@ -496,10 +463,8 @@ struct vm_area_struct *vm_area_dup(struct vm_area_st= ruct *orig) > * will be reinitialized. > */ > data_race(memcpy(new, orig, sizeof(*new))); > - if (!vma_lock_alloc(new)) { > - kmem_cache_free(vm_area_cachep, new); > - return NULL; > - } > + init_rwsem(&new->vm_lock); > + new->vm_lock_seq =3D -1; > INIT_LIST_HEAD(&new->anon_vma_chain); > vma_numab_state_init(new); > dup_anon_vma_name(orig, new); > @@ -511,7 +476,6 @@ void __vm_area_free(struct vm_area_struct *vma) > { > vma_numab_state_free(vma); > free_anon_vma_name(vma); > - vma_lock_free(vma); > kmem_cache_free(vm_area_cachep, vma); > } > > @@ -522,7 +486,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head= ) > vm_rcu); > > /* The vma should not be locked while being destroyed. */ > - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma); > + VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock), vma); > __vm_area_free(vma); > } > #endif > @@ -3192,10 +3156,7 @@ void __init proc_caches_init(void) > SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, > NULL); > > - vm_area_cachep =3D KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACC= OUNT); > -#ifdef CONFIG_PER_VMA_LOCK > - vma_lock_cachep =3D KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT)= ; > -#endif > + vm_area_cachep =3D KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACC= OUNT|SLAB_HWCACHE_ALIGN); > mmap_init(); > nsproxy_cache_init(); > } > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 3b7715ecf292..e95ecb2063d2 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -92,7 +92,7 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_s= truct *mm, > * mmap_lock, which guarantees that nobody can lock the > * vma for write (vma_start_write()) under us. > */ > - down_read(&vma->vm_lock->lock); > + down_read(&vma->vm_lock); > } > > mmap_read_unlock(mm); > @@ -1468,9 +1468,9 @@ static int uffd_move_lock(struct mm_struct *mm, > * See comment in uffd_lock_vma() as to why not using > * vma_start_read() here. > */ > - down_read(&(*dst_vmap)->vm_lock->lock); > + down_read(&(*dst_vmap)->vm_lock); > if (*dst_vmap !=3D *src_vmap) > - down_read_nested(&(*src_vmap)->vm_lock->lock, > + down_read_nested(&(*src_vmap)->vm_lock, > SINGLE_DEPTH_NESTING); > } > mmap_read_unlock(mm); > -- > 2.43.0 >