From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A14FC71130 for ; Mon, 7 Jul 2025 23:13:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80DD48D0003; Mon, 7 Jul 2025 19:13:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E6098D0002; Mon, 7 Jul 2025 19:13:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FBCC8D0003; Mon, 7 Jul 2025 19:13:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5C58E8D0002 for ; Mon, 7 Jul 2025 19:13:03 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D1C198029E for ; Mon, 7 Jul 2025 23:13:02 +0000 (UTC) X-FDA: 83639021004.30.9DB2773 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf17.hostedemail.com (Postfix) with ESMTP id E1A3F40002 for ; Mon, 7 Jul 2025 23:13:00 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0Qh5Nx5j; spf=pass (imf17.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751929980; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vb25O4aO4UT3lbamHRx0zE908q3J7m7Kv8tAjLvZ3wo=; b=18rvdjkrrkQDAOWCPsnBBHT92UXJDfSQxg0UQi5u2sICrRGm4FL/psdcQFOzWYydIVERsh BADIadLOv+6HqBfKL/6VoZefdU/OMjOqFM3N4/boN6stZZRpS5P+emVUOAm2unrfna6lx9 g1EFb6dJ8mq8HhD0l5ZQFJrSijD062U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751929980; a=rsa-sha256; cv=none; b=hRPDNweJteuTeTmP8pQc5Jk2l1tfvABx7KXdFLgBFUBckdBIlVVZUbC/wiP4FmFZqEdvmA te8gRjJS9cexIjxsQUtoRK6KqTyUvpi0H0wn7KNsZya4VLmDmnyZWjWgbjPoPJx88+VHMD RnhUMYvyQ6rxmTbVLKIQjGrDaP+12u0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0Qh5Nx5j; spf=pass (imf17.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4a7fc24ed5cso82601cf.1 for ; Mon, 07 Jul 2025 16:13:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751929980; x=1752534780; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Vb25O4aO4UT3lbamHRx0zE908q3J7m7Kv8tAjLvZ3wo=; b=0Qh5Nx5jgc5slabQHGyrR21y5ACtYicKn/FDhXulZJ4lETZR1Rlt5LNuq6VI/MSqo3 8mZcVmY+XVOiZDnbvaOBbKOaxaTQG7b5o7xasnSXoPFqCZMtiQg3DLX/yzs5g/dvvVRk l2jzIHwYcdK5gJUNLsIgPN9GJcOkk1QmzrQAULC5dmObqMHKC19XweE6H+OU72g3lX9F WxAJ7L45X6Tn/xFbONs0V7PUddKTlcfQaTm9OONn4vteeKZCgUrNB80UETL3X+KHepx+ sMZTP+thcOvT0Irk5AXRCpvpPGhdxBJu3MEmcw67nz2IslHQvPVxSOA6CmwYwMSj/XlC 4a/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751929980; x=1752534780; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vb25O4aO4UT3lbamHRx0zE908q3J7m7Kv8tAjLvZ3wo=; b=rOVEZwYIofknLMPQ1Z4Y5fkShU+VIMHjsB7M1ekYYs509DTilrla09ddLhEiz73PsL ubWzME/FE6KldwylPWA1kCsnKR9tIjqlW4HIxRmaeZJCd0qN6Km34Tcb5i3MDPtCMO7j UXff3Y6NJjQXlb4wtZQ4PlCD423XYGdNHEe0sS9P1aR/7oEp+rWCKR+kSQumn/th7Ojv SkygQ/Hikh9UQZB3QgHEgU2W3ENx1L5+aecwGU7fFV2Ua3xYe5RMNSh+pAkL0skqW9QF AOP174u/8UAHIxKgDIz0GIyi+JcATjL5LTWhkBdW5GGfVSm6+gn4tcrYjaZ4g8Vir7qO PR8A== X-Forwarded-Encrypted: i=1; AJvYcCVrHfUYheWGQvgDMtytEzYtDK8kT9dwr0xWORvDCX2PZaVNUluEXLxD2uBHUBQd6L6XiV+BosbFow==@kvack.org X-Gm-Message-State: AOJu0Yw6SdEcy7bXbYDg1nJMCLWvWgyTS6U1eZ+Dyvc6L+KMopSWNokB ekdeMLfqH674lWYlaQOy0jv0QnsHNIwfJ2OV2yoYRJik9APl3XSVfNITfR0T+MMpdezRBtmRbh+ oOTsCsPQq4vRgDmevQFA6Zul4zx0vSZXLzwAxrRwC X-Gm-Gg: ASbGncv1SZJjeQ7zxcP/ld0HrZhZ+V2FGqZTFsYT5KuHUzYzh39s8dNbK9phbw9UnUA L8BMqGpoBUIoQOkIryH16rvXs+j2440snqu0Cy2uyb0xTF4WSo89UvTJww7X5dB6NwnDJiUSpSb 9Z8hj36+2crvL+ftryB2dNA8gT6gL/p1ACgyZ9SeGsaw== X-Google-Smtp-Source: AGHT+IETgKmn3XFY+l1mIRl5sj+e0ujWE5WNG6Mi/bRZmR8SuoNbDS3rsbpkj/zI+wTZJ+gdFSV5n1X6kK9q13Q4UZA= X-Received: by 2002:a05:622a:1cc3:b0:480:dde:aa4c with SMTP id d75a77b69052e-4a9cccdea97mr1590441cf.4.1751929979568; Mon, 07 Jul 2025 16:12:59 -0700 (PDT) MIME-Version: 1.0 References: <20250704060727.724817-1-surenb@google.com> <20250704060727.724817-8-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 7 Jul 2025 16:12:48 -0700 X-Gm-Features: Ac12FXwi7079ioCO2E4qz_JGwtg_CyJBtiTMTTwcEur0H9sgldAVvNh2S8wJCxs Message-ID: Subject: Re: [PATCH v6 7/8] fs/proc/task_mmu: read proc/pid/maps under per-vma lock To: "Liam R. Howlett" , Suren Baghdasaryan , akpm@linux-foundation.org, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, tjmercier@google.com, kaleshsingh@google.com, aha310510@gmail.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E1A3F40002 X-Stat-Signature: tw44wk8ou17xy38kw48xwi63xigbcn5g X-HE-Tag: 1751929980-747347 X-HE-Meta: U2FsdGVkX19JTRvf8oAfql1h/6dSLhlg33t3414OWZUUhf+qMa4sKWdUv1HmgiNx0+XZ2BiY5zas8pRxv//OSstG8Apyu90seH4ZWDzErsg0pFLTw3VKvnMMNq9YfmVZNeEywIbW8Mt7H+9XpPY2l7EJXQTt2C3reP+2SW8TJyEgqQR6YYCE3yIOX6bML7IJvIbb36bNgFnF2P0ak6XuOPnuILAwSkVzryx52l0GCjvORQAjfByZUE3iLJdI3Faf9xYFgfa5NW5V3ayDdG3bT71dXxoYutXcg1KHYF0PKlIeGv4aKIJ38yN4iG6faPXqBTOmkfMlcYAgezydxcar0960H1RlY3BylmwFKP8TExkv6klANOeDF/IUf60+sAB7xscHpj49wX3lgROJqtzNrelBvvEU9UQsCkz1fBkoHCGXxTQWKvBuW4JGui+ZuVjMSCv14Jo8Vpw5QqWuGItztVmcKqqxHsxNqZhoUECJmt8WKFyw1vie4+1dVroi7c+XKFbg8oiK5332eC51hTJxvmcyStRSvOQw4o9VPFDjEWKccqpjk+4GMIhhmsmzYnKyyrPiVGYd1aDTrg/3eo/G3WdEMCZ+dfxqgbJlwDOaFDOHMhnxCbPloJ7HFMSDS0pseYqukcMsVSrVXC1CsyCJiGIfHcbo8WrhrMaMtaoNppv9Ej845iRWCOL9q33z3GkGkAmESbqnNn0QcovrLJEDVVujHX8AQtemN2pZWRJNdbuy0SSplnpFCi3KrodghU3kWQm9GOrgDUnW82J4adsqIxoAi/De2x3w7hEG63Ggxx8L+IDvnDljeNoF+7sqdspsGJNPOeENEagIIBPxjHeS1V5/20JJyaVz+nL/gRkiuWNFMmiHk8jIm3iv5FhXMqQdJxCSjxc3byaVhL6LCgNkgtxeipmPngVb1gjhNc48MQxZkFXWqnWTosp5KNJqGL0kG+mOEse363Djrnfvp7E SeEK/Qfh gWPGU8zYh5S9EEDI5mNwQkz/3gmooWTd353irV54wzinYKaXRCOh/8FDhLqVuwmnFOGdvuR/R5BJ/mDABOUIDjEjlrtx3ETNLGe/DgYCNdvtWC2KIztRIx4Y6IWhjN+N0//cycuX752DQcqTHEIyboCUftmVkLhvF7UKKmpaMEgv9bLsKIn89cxtCqLrJedV+ggYFEELd+9I28kxKCcxUnobAGxZJswYcYZPn0Z2WXE9/U/N51nQGy12lG5hn+54U8KktrvReygyMBCvTejOm9fDB5Sr//a07KB3D X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 7, 2025 at 11:21=E2=80=AFAM Liam R. Howlett wrote: > > * Suren Baghdasaryan [250704 02:07]: > > With maple_tree supporting vma tree traversal under RCU and per-vma > > locks, /proc/pid/maps can be read while holding individual vma locks > > instead of locking the entire address space. > > Completely lockless approach (walking vma tree under RCU) would be quit= e > > complex with the main issue being get_vma_name() using callbacks which > > might not work correctly with a stable vma copy, requiring original > > (unstable) vma - see special_mapping_name() for an example. > > When per-vma lock acquisition fails, we take the mmap_lock for reading, > > lock the vma, release the mmap_lock and continue. This fallback to mmap > > read lock guarantees the reader to make forward progress even during > > lock contention. This will interfere with the writer but for a very > > short time while we are acquiring the per-vma lock and only when there > > was contention on the vma reader is interested in. We shouldn't see a > > repeated fallback to mmap read locks in practice, as this require a > > very unlikely series of lock contentions (for instance due to repeated > > vma split operations). However even if this did somehow happen, we woul= d > > still progress. > > One case requiring special handling is when vma changes between the > > time it was found and the time it got locked. A problematic case would > > be if vma got shrunk so that it's start moved higher in the address > > space and a new vma was installed at the beginning: > > > > reader found: |--------VMA A--------| > > VMA is modified: |-VMA B-|----VMA A----| > > reader locks modified VMA A > > reader reports VMA A: | gap |----VMA A----| > > > > This would result in reporting a gap in the address space that does not > > exist. To prevent this we retry the lookup after locking the vma, howev= er > > we do that only when we identify a gap and detect that the address spac= e > > was changed after we found the vma. > > This change is designed to reduce mmap_lock contention and prevent a > > process reading /proc/pid/maps files (often a low priority task, such > > as monitoring/data collection services) from blocking address space > > updates. Note that this change has a userspace visible disadvantage: > > it allows for sub-page data tearing as opposed to the previous mechanis= m > > where data tearing could happen only between pages of generated output > > data. Since current userspace considers data tearing between pages to b= e > > acceptable, we assume is will be able to handle sub-page data tearing > > as well. > > > > Signed-off-by: Suren Baghdasaryan > > Reviewed-by: Liam R. Howlett Thanks! I'll update addressing Lorenzo's nits and will repost in a couple days. Hopefully by then I can get some reviews for the tests in the series. > > > --- > > fs/proc/internal.h | 5 ++ > > fs/proc/task_mmu.c | 118 ++++++++++++++++++++++++++++++++++---- > > include/linux/mmap_lock.h | 11 ++++ > > mm/madvise.c | 3 +- > > mm/mmap_lock.c | 88 ++++++++++++++++++++++++++++ > > 5 files changed, 214 insertions(+), 11 deletions(-) > > > > diff --git a/fs/proc/internal.h b/fs/proc/internal.h > > index 3d48ffe72583..7c235451c5ea 100644 > > --- a/fs/proc/internal.h > > +++ b/fs/proc/internal.h > > @@ -384,6 +384,11 @@ struct proc_maps_private { > > struct task_struct *task; > > struct mm_struct *mm; > > struct vma_iterator iter; > > + loff_t last_pos; > > +#ifdef CONFIG_PER_VMA_LOCK > > + bool mmap_locked; > > + struct vm_area_struct *locked_vma; > > +#endif > > #ifdef CONFIG_NUMA > > struct mempolicy *task_mempolicy; > > #endif > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index b8bc06d05a72..ff3fe488ce51 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -127,15 +127,107 @@ static void release_task_mempolicy(struct proc_m= aps_private *priv) > > } > > #endif > > > > -static struct vm_area_struct *proc_get_vma(struct proc_maps_private *p= riv, > > - loff_t *ppos) > > +#ifdef CONFIG_PER_VMA_LOCK > > + > > +static void unlock_vma(struct proc_maps_private *priv) > > +{ > > + if (priv->locked_vma) { > > + vma_end_read(priv->locked_vma); > > + priv->locked_vma =3D NULL; > > + } > > +} > > + > > +static const struct seq_operations proc_pid_maps_op; > > + > > +static inline bool lock_vma_range(struct seq_file *m, > > + struct proc_maps_private *priv) > > +{ > > + /* > > + * smaps and numa_maps perform page table walk, therefore require > > + * mmap_lock but maps can be read with locking just the vma. > > + */ > > + if (m->op !=3D &proc_pid_maps_op) { > > + if (mmap_read_lock_killable(priv->mm)) > > + return false; > > + > > + priv->mmap_locked =3D true; > > + } else { > > + rcu_read_lock(); > > + priv->locked_vma =3D NULL; > > + priv->mmap_locked =3D false; > > + } > > + > > + return true; > > +} > > + > > +static inline void unlock_vma_range(struct proc_maps_private *priv) > > +{ > > + if (priv->mmap_locked) { > > + mmap_read_unlock(priv->mm); > > + } else { > > + unlock_vma(priv); > > + rcu_read_unlock(); > > + } > > +} > > + > > +static struct vm_area_struct *get_next_vma(struct proc_maps_private *p= riv, > > + loff_t last_pos) > > +{ > > + struct vm_area_struct *vma; > > + > > + if (priv->mmap_locked) > > + return vma_next(&priv->iter); > > + > > + unlock_vma(priv); > > + vma =3D lock_next_vma(priv->mm, &priv->iter, last_pos); > > + if (!IS_ERR_OR_NULL(vma)) > > + priv->locked_vma =3D vma; > > + > > + return vma; > > +} > > + > > +#else /* CONFIG_PER_VMA_LOCK */ > > + > > +static inline bool lock_vma_range(struct seq_file *m, > > + struct proc_maps_private *priv) > > { > > - struct vm_area_struct *vma =3D vma_next(&priv->iter); > > + return mmap_read_lock_killable(priv->mm) =3D=3D 0; > > +} > > + > > +static inline void unlock_vma_range(struct proc_maps_private *priv) > > +{ > > + mmap_read_unlock(priv->mm); > > +} > > + > > +static struct vm_area_struct *get_next_vma(struct proc_maps_private *p= riv, > > + loff_t last_pos) > > +{ > > + return vma_next(&priv->iter); > > +} > > > > +#endif /* CONFIG_PER_VMA_LOCK */ > > + > > +static struct vm_area_struct *proc_get_vma(struct seq_file *m, loff_t = *ppos) > > +{ > > + struct proc_maps_private *priv =3D m->private; > > + struct vm_area_struct *vma; > > + > > + vma =3D get_next_vma(priv, *ppos); > > + /* EINTR is possible */ > > + if (IS_ERR(vma)) > > + return vma; > > + > > + /* Store previous position to be able to restart if needed */ > > + priv->last_pos =3D *ppos; > > if (vma) { > > - *ppos =3D vma->vm_start; > > + /* > > + * Track the end of the reported vma to ensure position c= hanges > > + * even if previous vma was merged with the next vma and = we > > + * found the extended vma with the same vm_start. > > + */ > > + *ppos =3D vma->vm_end; > > } else { > > - *ppos =3D -2; > > + *ppos =3D -2; /* -2 indicates gate vma */ > > vma =3D get_gate_vma(priv->mm); > > } > > > > @@ -163,28 +255,34 @@ static void *m_start(struct seq_file *m, loff_t *= ppos) > > return NULL; > > } > > > > - if (mmap_read_lock_killable(mm)) { > > + if (!lock_vma_range(m, priv)) { > > mmput(mm); > > put_task_struct(priv->task); > > priv->task =3D NULL; > > return ERR_PTR(-EINTR); > > } > > > > + /* > > + * Reset current position if last_addr was set before > > + * and it's not a sentinel. > > + */ > > + if (last_addr > 0) > > + *ppos =3D last_addr =3D priv->last_pos; > > vma_iter_init(&priv->iter, mm, (unsigned long)last_addr); > > hold_task_mempolicy(priv); > > if (last_addr =3D=3D -2) > > return get_gate_vma(mm); > > > > - return proc_get_vma(priv, ppos); > > + return proc_get_vma(m, ppos); > > } > > > > static void *m_next(struct seq_file *m, void *v, loff_t *ppos) > > { > > if (*ppos =3D=3D -2) { > > - *ppos =3D -1; > > + *ppos =3D -1; /* -1 indicates no more vmas */ > > return NULL; > > } > > - return proc_get_vma(m->private, ppos); > > + return proc_get_vma(m, ppos); > > } > > > > static void m_stop(struct seq_file *m, void *v) > > @@ -196,7 +294,7 @@ static void m_stop(struct seq_file *m, void *v) > > return; > > > > release_task_mempolicy(priv); > > - mmap_read_unlock(mm); > > + unlock_vma_range(priv); > > mmput(mm); > > put_task_struct(priv->task); > > priv->task =3D NULL; > > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h > > index 5da384bd0a26..1f4f44951abe 100644 > > --- a/include/linux/mmap_lock.h > > +++ b/include/linux/mmap_lock.h > > @@ -309,6 +309,17 @@ void vma_mark_detached(struct vm_area_struct *vma)= ; > > struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > unsigned long address); > > > > +/* > > + * Locks next vma pointed by the iterator. Confirms the locked vma has= not > > + * been modified and will retry under mmap_lock protection if modifica= tion > > + * was detected. Should be called from read RCU section. > > + * Returns either a valid locked VMA, NULL if no more VMAs or -EINTR i= f the > > + * process was interrupted. > > + */ > > +struct vm_area_struct *lock_next_vma(struct mm_struct *mm, > > + struct vma_iterator *iter, > > + unsigned long address); > > + > > #else /* CONFIG_PER_VMA_LOCK */ > > > > static inline void mm_lock_seqcount_init(struct mm_struct *mm) {} > > diff --git a/mm/madvise.c b/mm/madvise.c > > index a34c2c89a53b..e61e32b2cd91 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -108,7 +108,8 @@ void anon_vma_name_free(struct kref *kref) > > > > struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma) > > { > > - mmap_assert_locked(vma->vm_mm); > > + if (!rwsem_is_locked(&vma->vm_mm->mmap_lock)) > > + vma_assert_locked(vma); > > > > return vma->anon_name; > > } > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > > index 5f725cc67334..ed0e5e2171cd 100644 > > --- a/mm/mmap_lock.c > > +++ b/mm/mmap_lock.c > > @@ -178,6 +178,94 @@ struct vm_area_struct *lock_vma_under_rcu(struct m= m_struct *mm, > > count_vm_vma_lock_event(VMA_LOCK_ABORT); > > return NULL; > > } > > + > > +static struct vm_area_struct *lock_vma_under_mmap_lock(struct mm_struc= t *mm, > > + struct vma_iterato= r *iter, > > + unsigned long addr= ess) > > +{ > > + struct vm_area_struct *vma; > > + int ret; > > + > > + ret =3D mmap_read_lock_killable(mm); > > + if (ret) > > + return ERR_PTR(ret); > > + > > + /* Lookup the vma at the last position again under mmap_read_lock= */ > > + vma_iter_init(iter, mm, address); > > + vma =3D vma_next(iter); > > + if (vma) > > + vma_start_read_locked(vma); > > + > > + mmap_read_unlock(mm); > > + > > + return vma; > > +} > > + > > +struct vm_area_struct *lock_next_vma(struct mm_struct *mm, > > + struct vma_iterator *iter, > > + unsigned long address) > > +{ > > + struct vm_area_struct *vma; > > + unsigned int mm_wr_seq; > > + bool mmap_unlocked; > > + > > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu read lock held"); > > +retry: > > + /* Start mmap_lock speculation in case we need to verify the vma = later */ > > + mmap_unlocked =3D mmap_lock_speculate_try_begin(mm, &mm_wr_seq); > > + vma =3D vma_next(iter); > > + if (!vma) > > + return NULL; > > + > > + vma =3D vma_start_read(mm, vma); > > + > > + if (IS_ERR_OR_NULL(vma)) { > > + /* > > + * Retry immediately if the vma gets detached from under = us. > > + * Infinite loop should not happen because the vma we fin= d will > > + * have to be constantly knocked out from under us. > > + */ > > + if (PTR_ERR(vma) =3D=3D -EAGAIN) { > > + vma_iter_init(iter, mm, address); > > + goto retry; > > + } > > + > > + goto out; > > + } > > + > > + /* > > + * Verify the vma we locked belongs to the same address space and= it's > > + * not behind of the last search position. > > + */ > > + if (unlikely(vma->vm_mm !=3D mm || address >=3D vma->vm_end)) > > + goto out_unlock; > > + > > + /* > > + * vma can be ahead of the last search position but we need to ve= rify > > + * it was not shrunk after we found it and another vma has not be= en > > + * installed ahead of it. Otherwise we might observe a gap that s= hould > > + * not be there. > > + */ > > + if (address < vma->vm_start) { > > + /* Verify only if the address space might have changed si= nce vma lookup. */ > > + if (!mmap_unlocked || mmap_lock_speculate_retry(mm, mm_wr= _seq)) { > > + vma_iter_init(iter, mm, address); > > + if (vma !=3D vma_next(iter)) > > + goto out_unlock; > > + } > > + } > > + > > + return vma; > > + > > +out_unlock: > > + vma_end_read(vma); > > +out: > > + rcu_read_unlock(); > > + vma =3D lock_vma_under_mmap_lock(mm, iter, address); > > + rcu_read_lock(); > > + > > + return vma; > > +} > > #endif /* CONFIG_PER_VMA_LOCK */ > > > > #ifdef CONFIG_LOCK_MM_AND_FIND_VMA > > -- > > 2.50.0.727.gbf7dc18ff4-goog > >