From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEADED5D689 for ; Thu, 7 Nov 2024 20:19:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3A2E6B00A4; Thu, 7 Nov 2024 15:19:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CE9436B00A5; Thu, 7 Nov 2024 15:19:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB1536B00A6; Thu, 7 Nov 2024 15:19:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9DBD56B00A4 for ; Thu, 7 Nov 2024 15:19:04 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 07830801DD for ; Thu, 7 Nov 2024 20:19:04 +0000 (UTC) X-FDA: 82760412630.28.70FAFAA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 1FE9D1C001E for ; Thu, 7 Nov 2024 20:18:16 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=faYHpBuY; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731010571; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DH+MZbGllo2Z/LKXzkjk2+K2q8W6yDeuyVqTLfkdLSU=; b=cw0lCzHpUF/do5U96LFU45h2wnGO/Rs5UG69q9nSolfQzjZwIHxo1scay0KpxYPguCu/lz QNK3tYvnFwuY36/IscZIpukYNlYdbv35Ds/mqaVv7va/FmA7ISb1pySVdTiyk8tA3MPI7A Kk1XJsey1/X1O3kWPkH8fckGYJVCZ20= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=faYHpBuY; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731010571; a=rsa-sha256; cv=none; b=RruFpOZAjSvfqFKh8jXfawPBI3huZK1uuqk63srwxfL2axVNF4tuSjf0Rzz4RQvLwQ0xVa PpPGiUx4BOXsnV1ed+WO0sXGvSQccaGuLvi7sZ2QIwjvPlRW7PKLF6JMmZ2/HF7+VG+Udi VgCLKAJeUSFpbMUpV1ZY3KdMfsL849U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731010741; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DH+MZbGllo2Z/LKXzkjk2+K2q8W6yDeuyVqTLfkdLSU=; b=faYHpBuYkhYzjx+4fK0XFB5x7PGpOdK7vWRwn8jEpPB2c4u9jNNilxPDS5zIqW9chQWk+Q IfWixR5N73WfQJTIWvGNoWFTcZ/2SjCJhO+/BjQ/8jXdzotRBLqfbz+EtpZUFR7b/4X6qw hhNMEhkPscbpLTd3PdiC4qFPfBMsNAU= Received: from mail-ot1-f69.google.com (mail-ot1-f69.google.com [209.85.210.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-261-AkBdNrWaON-B2gMh0k03pg-1; Thu, 07 Nov 2024 15:18:56 -0500 X-MC-Unique: AkBdNrWaON-B2gMh0k03pg-1 X-Mimecast-MFC-AGG-ID: AkBdNrWaON-B2gMh0k03pg Received: by mail-ot1-f69.google.com with SMTP id 46e09a7af769-71808dad730so1302609a34.3 for ; Thu, 07 Nov 2024 12:18:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010736; x=1731615536; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=DH+MZbGllo2Z/LKXzkjk2+K2q8W6yDeuyVqTLfkdLSU=; b=BChmBS72mMU09kPA9fzBSCgLDH6+oVvCwzHrOqccWNCFwT3oUK41mf/EEnnoV2xkzx XQecUTePHUzAyl1RFtThE+n5AHlSKBNK9knBAxWKL4GHo99cidw9smkHkE9RnViw9/yJ L0Ygjc0LBQpuWnt0zj4S3ny7FJse675uZ089eZIJ6/iLOGx8aWH0R731IbQHhuAgkzAf lPOpXORF0TE/iAUFzvP5IDCXUGz68PYc18ccImJds6oJZ9wehTHJL1pVuDOj+0BxYCad UPCZ+VnM8gfZmS95a33acgMpGYsT/cFNTFefI8EzpOyaJen7zJCkhNywqqfQVini7sEH gkwg== X-Forwarded-Encrypted: i=1; AJvYcCUvOCN/AC/CJXianBtlJ1dXtE2jpkOgyQFtp8sDVxEBQQEXMqgFojPaZ78Fke0IDysKYGiugGfnfg==@kvack.org X-Gm-Message-State: AOJu0YznfTQNdMHGRl5AAPNinsendhyE/hQRu3F0F9XsT9Sb6UPb6t0B XrZZJSa3qQ52Rfez74i0rPQdK3X4H7IPXM0lQbGC12N+q8hg+tKftGhW8aLQFyeBK40QIXvQPuL RHlenPNsSf2zb+eRQgzHQraSkZYZEpVjD8qTPxzXJCfhZ1mcx X-Received: by 2002:a05:6830:4901:b0:718:1090:3d10 with SMTP id 46e09a7af769-71a1c26b815mr384730a34.26.1731010736030; Thu, 07 Nov 2024 12:18:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IFwCHKP9eJDHkTkRNdJvbMHj2ouDzJkHvi1omVCRtbP3+U4VtXnQQh8zIvMo/9t0yF/hTfmNw== X-Received: by 2002:a05:6830:4901:b0:718:1090:3d10 with SMTP id 46e09a7af769-71a1c26b815mr384707a34.26.1731010735689; Thu, 07 Nov 2024 12:18:55 -0800 (PST) Received: from x1n (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-71a108e344asm429889a34.48.2024.11.07.12.18.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 12:18:55 -0800 (PST) Date: Thu, 7 Nov 2024 15:18:51 -0500 From: Peter Xu To: riel@surriel.com Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org, stable@kernel.org, Ackerley Tng , Oscar Salvador Subject: Re: [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs Message-ID: References: <20231006040020.3677377-1-riel@surriel.com> <20231006040020.3677377-3-riel@surriel.com> MIME-Version: 1.0 In-Reply-To: <20231006040020.3677377-3-riel@surriel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: iVbrC7eunuvn5nKD8BM7hgTH9MipfRe54qEcGBqz8dM_1731010736 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1FE9D1C001E X-Stat-Signature: jnmge3sj1y4uawbxcpudbi4jfto7chxs X-Rspam-User: X-HE-Tag: 1731010696-806070 X-HE-Meta: U2FsdGVkX18k8gT1GbDCDSZHOYydIWddZoyP3QTxPWY35X0/+ARr7WjmToHv5l1mprc6XezXiLllMxUNxgDUlpAukJl0IoG61RUl9p2+gqH7C4z+mKdl5aK97iFYzXa2AUHCROGTeaVW76zpCYQgOx06J8LPk7IYuS1RIuPFqPtaNWxvHuUKMQqRxK/OyUpoV82gimeypZ870dzjzTUPPmI6aRwaYEmCbsIsE7F6+6vgBQh0ISbCr0wEWtjps/ZGTQLTwIX+3r24gqsaKxI83cPMsJ83tvMzeuJJh6C0yyWb1lsC0PzFsuewPs+AUPaXVY3vCiDgawGPsjkhLPC7akjLYlnIzs1GK+D7Yoc9NSkjK9nOvzCZb0TgnvYT22b4dKlRll3227ASO/IYKuYHL771Hmc1nIzR580J8Kai0cUqNEQJkXV+kgvjYDls5VWc4x7eQTwKx3SCAYCzsiK3zHqMVBRt3r/M5fkvgtDWV3zsn6WOmwESZCAj0UX2B5017KebrOdD8XN8Pz7v/iAyAGINxbdlMcEGO2mDjMVdZMpUWIJ2sj3A13kqofLAXihXAoil06CWXzuSMtrkMhWOescgkA48zIYfpT2s4oq3erattXnePYDYDncgAEhSyWZb6+h8QC3L/lS4/JFeUdF4UTx6Id7x2Akip7hllQfyyv187sLlo1X9LDctwH23YL73/ptoaEUBaCpSDxiXOGyNFOdWuQ27lT9gLsjrj2tFZG74djY0ibTEuP1lCyWQ5xZQ4ZBEF1+l5pjbNZW8VQ/8r4bo8kjDKcY3fT/Ilee4+FUNglTdOImUq7emqPcG1oja1/qm1cGu8pI8CH64nn3wk6XGkxS4i64DHHF+yXFH346mCBXOI7aRnAB/35gfIK4W/GOYznS/ZtyCFBogjpcGsFM99/EAVOgLfSqRRoiQM5POwFp3sNAxLVmYfoaL2hWGYgUg9cMKmNuPifKXPJz 1tvBnCph fIllemugOeFU7GyLOo7d1px+oCXCZD8Lwq4/5LVQHAe0tm9tJNv7TRFa+boz/POy2UHsRv24m5cEded8JaAPCxRWVOp6+OiRkR7O7Pg196sF1uQqW4NT1SnxFjaVm8ZK9t1hpK/P6L4LuQUemELxz+iBNM50X8AL1/6wW/Ur+94dgUQ4z21Mj2cIs97CleOimxH5zo2V+7IOnj4IwDl5T92NkzkXgA0Q9xxxQLUcdBfsDDBr7NOghVBqhvlkZHpc9qKMzxsuGNiTZTd+NCSg3bS9zH10I9szBP//n9JOQPmG3IBfL4wT58KhjxnRPmOJ9U5BLtkHLpTtaan5ahx0M91t73QCUUq0HzQy3oKnJiH35K75mWZNQfw1OPhShUVI2lgQ2TpXzhEaBe4ybw3cmRqYIMR9V3JMgjETEbR6wfGUPibhh8GFraEPenaGpNbyUEIJxkJ29Flegoq86FH9h7HQon1z4dYeqHlf542KXVywZW50rj6WIGxdDaRCBlDC6/Ch5VhUE2aSp1joPETkRBmgplw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 05, 2023 at 11:59:07PM -0400, riel@surriel.com wrote: > From: Rik van Riel > > Extend the locking scheme used to protect shared hugetlb mappings > from truncate vs page fault races, in order to protect private > hugetlb mappings (with resv_map) against MADV_DONTNEED. > > Add a read-write semaphore to the resv_map data structure, and > use that from the hugetlb_vma_(un)lock_* functions, in preparation > for closing the race between MADV_DONTNEED and page faults. > > Signed-off-by: Rik van Riel > Reviewed-by: Mike Kravetz > Cc: stable@kernel.org > Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") > --- > include/linux/hugetlb.h | 6 ++++++ > mm/hugetlb.c | 41 +++++++++++++++++++++++++++++++++++++---- > 2 files changed, 43 insertions(+), 4 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 5b2626063f4f..694928fa06a3 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -60,6 +60,7 @@ struct resv_map { > long adds_in_progress; > struct list_head region_cache; > long region_cache_count; > + struct rw_semaphore rw_sema; > #ifdef CONFIG_CGROUP_HUGETLB > /* > * On private mappings, the counter to uncharge reservations is stored > @@ -1231,6 +1232,11 @@ static inline bool __vma_shareable_lock(struct vm_area_struct *vma) > return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data; > } > > +static inline bool __vma_private_lock(struct vm_area_struct *vma) > +{ > + return (!(vma->vm_flags & VM_MAYSHARE)) && vma->vm_private_data; > +} > + > /* > * Safe version of huge_pte_offset() to check the locks. See comments > * above huge_pte_offset(). > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index a86e070d735b..dd3de6ec8f1a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -97,6 +97,7 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); > static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); > static void hugetlb_unshare_pmds(struct vm_area_struct *vma, > unsigned long start, unsigned long end); > +static struct resv_map *vma_resv_map(struct vm_area_struct *vma); > > static inline bool subpool_is_free(struct hugepage_subpool *spool) > { > @@ -267,6 +268,10 @@ void hugetlb_vma_lock_read(struct vm_area_struct *vma) > struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > down_read(&vma_lock->rw_sema); > + } else if (__vma_private_lock(vma)) { > + struct resv_map *resv_map = vma_resv_map(vma); > + > + down_read(&resv_map->rw_sema); > } > } +Ackerley +Oscar I'm reading the resv code recently and just stumbled upon this. So want to raise this question. IIUC __vma_private_lock() will return false for MAP_PRIVATE hugetlb vma if the vma is dup()ed from a fork(), with/without commit 187da0f8250a ("hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write") which fixed a slightly different issue. The problem is the current vma lock for private mmap() is based on the resv map, and the resv map only belongs to the process that mmap()ed this private vma. E.g. dup_mmap() has: if (is_vm_hugetlb_page(tmp)) hugetlb_dup_vma_private(tmp); Which does: if (vma->vm_flags & VM_MAYSHARE) { ... } else vma->vm_private_data = NULL; <--------------------- So even if I don't know how many of us are even using hugetlb PRIVATE + fork(), assuming that's the most controversial use case that I'm aware of on hugetlb that people complains about.. with some tricky changes like 04f2cbe35699.. Just still want to raise this pure question, that after a fork() on private vma, and if I read it alright, lock/unlock operations may become noop.. Thanks, > > @@ -276,6 +281,10 @@ void hugetlb_vma_unlock_read(struct vm_area_struct *vma) > struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > up_read(&vma_lock->rw_sema); > + } else if (__vma_private_lock(vma)) { > + struct resv_map *resv_map = vma_resv_map(vma); > + > + up_read(&resv_map->rw_sema); > } > } > > @@ -285,6 +294,10 @@ void hugetlb_vma_lock_write(struct vm_area_struct *vma) > struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > down_write(&vma_lock->rw_sema); > + } else if (__vma_private_lock(vma)) { > + struct resv_map *resv_map = vma_resv_map(vma); > + > + down_write(&resv_map->rw_sema); > } > } > > @@ -294,17 +307,27 @@ void hugetlb_vma_unlock_write(struct vm_area_struct *vma) > struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > up_write(&vma_lock->rw_sema); > + } else if (__vma_private_lock(vma)) { > + struct resv_map *resv_map = vma_resv_map(vma); > + > + up_write(&resv_map->rw_sema); > } > } > > int hugetlb_vma_trylock_write(struct vm_area_struct *vma) > { > - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > - if (!__vma_shareable_lock(vma)) > - return 1; > + if (__vma_shareable_lock(vma)) { > + struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > - return down_write_trylock(&vma_lock->rw_sema); > + return down_write_trylock(&vma_lock->rw_sema); > + } else if (__vma_private_lock(vma)) { > + struct resv_map *resv_map = vma_resv_map(vma); > + > + return down_write_trylock(&resv_map->rw_sema); > + } > + > + return 1; > } > > void hugetlb_vma_assert_locked(struct vm_area_struct *vma) > @@ -313,6 +336,10 @@ void hugetlb_vma_assert_locked(struct vm_area_struct *vma) > struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > lockdep_assert_held(&vma_lock->rw_sema); > + } else if (__vma_private_lock(vma)) { > + struct resv_map *resv_map = vma_resv_map(vma); > + > + lockdep_assert_held(&resv_map->rw_sema); > } > } > > @@ -345,6 +372,11 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma) > struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > __hugetlb_vma_unlock_write_put(vma_lock); > + } else if (__vma_private_lock(vma)) { > + struct resv_map *resv_map = vma_resv_map(vma); > + > + /* no free for anon vmas, but still need to unlock */ > + up_write(&resv_map->rw_sema); > } > } > > @@ -1068,6 +1100,7 @@ struct resv_map *resv_map_alloc(void) > kref_init(&resv_map->refs); > spin_lock_init(&resv_map->lock); > INIT_LIST_HEAD(&resv_map->regions); > + init_rwsem(&resv_map->rw_sema); > > resv_map->adds_in_progress = 0; > /* > -- > 2.41.0 > > -- Peter Xu