From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 743C8C47074 for ; Tue, 2 Jan 2024 02:20:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 641216B01FB; Mon, 1 Jan 2024 21:20:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F0176B01FE; Mon, 1 Jan 2024 21:20:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 469636B01FF; Mon, 1 Jan 2024 21:20:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 35DBD6B01FB for ; Mon, 1 Jan 2024 21:20:26 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F420CA1532 for ; Tue, 2 Jan 2024 02:20:25 +0000 (UTC) X-FDA: 81632766810.27.6523F44 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by imf25.hostedemail.com (Postfix) with ESMTP id 99FBCA0015 for ; Tue, 2 Jan 2024 02:20:23 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="lh zNXiW"; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf25.hostedemail.com: domain of quic_aiquny@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_aiquny@quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704162023; a=rsa-sha256; cv=none; b=yg6eZx8rY71SGxvDJIumoGx9+Q3/WLRJd4XW9yhD5OgJy/BzRcS/rgklIqsQHHCv7D/yjU J2Uh2XiKPtrUV7SNGKkfWWs5bin+J4pdovx+VX1A2xdQGSqGXmllFI9ewNgGXBx5YWUamS p2zHq/lTFRq/uFWYYwJy7wcI74Kf/Aw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="lh zNXiW"; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf25.hostedemail.com: domain of quic_aiquny@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_aiquny@quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704162023; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZGxsmnDMc0vbTKGtH4YJqhEz5lvxf1xws5Jq2QyBRnw=; b=6kGe8TeRYe4X2YJsYhYDmcHJfYld405uceQjNdWg9kOujjMmsQ2xYm1gd+8nmpYsV8r0o9 m7sZtd/A6E5FW2U6s9yk4CqasAl38B3Rq/uXLBgoj302fh+/hQ+r0r8pSdMs2d58FcQCpf 9iDqktcSFFijzoM9sdsnDXpI8DtKBJo= Received: from pps.filterd (m0279869.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 4021N2Zd007630; Tue, 2 Jan 2024 02:19:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= message-id:date:mime-version:subject:to:cc:references:from :in-reply-to:content-type:content-transfer-encoding; s= qcppdkim1; bh=ZGxsmnDMc0vbTKGtH4YJqhEz5lvxf1xws5Jq2QyBRnw=; b=lh zNXiWfi4qdUlLMM8qujDNCv3SGLhdKdgXAzAJfMrG0SI7M8XCH//UqLWTandHt2q bgPEyGWMR/G0QK6VZJYEPcGVJFfg9ugTOr9vW1r0fkGVSdKzGtwl9b6BI+TJgYfe DofAcrV1KZzdmNUxV1HluuccluCzXdOn0iLmfV9L0VQkMc9xTwHlmA8+sOnYd8DF rJNci9xBGIHcq5nXTf9qYIz7VngxX0c7RmvwXW42OeKLHDpLRs6bSzuUCEyCbSWo yAD6q4L4qm20jorNHKvbVqzRz+GCsljj5UCdcfkb0MQSkRy0+GJd1tALIArNauQn GCeXUKnVIEEE3/Nr/SBw== Received: from nasanppmta03.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3vaa7cbx2u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 02 Jan 2024 02:19:59 +0000 (GMT) Received: from nasanex01a.na.qualcomm.com (nasanex01a.na.qualcomm.com [10.52.223.231]) by NASANPPMTA03.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 4022JwdX031534 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 2 Jan 2024 02:19:58 GMT Received: from [10.239.132.150] (10.80.80.8) by nasanex01a.na.qualcomm.com (10.52.223.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 1 Jan 2024 18:19:50 -0800 Message-ID: Date: Tue, 2 Jan 2024 10:19:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] kernel: Introduce a write lock/unlock wrapper for tasklist_lock Content-Language: en-US To: Matthew Wilcox , "Eric W. Biederman" , Hillf Danton CC: , , , , , , , , , , , , , , , , References: <20231213101745.4526-1-quic_aiquny@quicinc.com> <87o7eu7ybq.fsf@email.froward.int.ebiederm.org> From: "Aiqun Yu (Maria)" In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nasanex01a.na.qualcomm.com (10.52.223.231) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: ieBhQjEar3H5w9T9G5k9p8BSV-vJJDVH X-Proofpoint-ORIG-GUID: ieBhQjEar3H5w9T9G5k9p8BSV-vJJDVH X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-09_02,2023-12-07_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 clxscore=1015 spamscore=0 phishscore=0 impostorscore=0 lowpriorityscore=0 priorityscore=1501 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=518 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2311290000 definitions=main-2401020017 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 99FBCA0015 X-Stat-Signature: wtw8w344rpixomizeynucyatxigup1gu X-HE-Tag: 1704162023-252171 X-HE-Meta: U2FsdGVkX19zDDCxzD0GWtaRO7fXEch9KEPMna/ZFkTLJeaKDL/FKlCoDXAvrR+t/5kBOxantVBAKraAoPTQSJQba/ldNzb8dfVQgNldCgQvDKk61DCLmO0LXqIa2C2u/JGB5JHuz57acxnABmBDMSncGCs7C2CcAtB2nPzfzxwxn6QjzPWDv88Ax9EmMd/8iytEpJ9dH6UtprE6TcYH0zL+G2ylQMKx5FYZBEdIdHvqgFdoPAXR0TXEgvK5f1AAAdLWSg03m9JprArS3XpA9RO/3TBCh9Bp2PeBxt2a1JoVwX5ju5B6mwUX+lg/yq69Kdo20pH2PxpnAtSjtvcSze6ps94co+0yWoHvMwlc/Ql96kegYqODYLfoH4AG4ZyGhc4Aeb86BW91eW8bvF+L5mtHqY5qCJC5t50RAI1b6fxghdIb3FzeFe4wV7yB+Sd6Zzt14pJVPtwVNR8XFTB5fFOzD2eZ7al20tUHfYe0KJBXUEeUHNruoZhE1EkHXWAIZcRDZWYDtSfyAc6PYJ4hR1s0mdviBuxzaQj8UB3wwqhRKyk9FNNkfC660eSlJSEdnu+UEZaIgSAd5WVXb+zRz8qT5Tp1+mkGiXp0mwSrPj/9CUkf9Qd6p3OAQ8cSwNhlfQFJ3ormrTWBnkfHL0ejKaHBojOpM3/J8F+OIGUMAShWfGnvk0Xvh0q7ojr1mXBeMluIJTcSfkdO+wES8cNF/QVC8dBTc0GpNGnD8FakNQKZkaUfusqeIUsxRE+QUFkpELPATYIFU8lxWCu5B4qwWNsnpq+ciU8o1AvzMZl0UnotxJm92dLAurKgaB1r4TCv17tgh8NI8RyJPTqTbLS5XPnvWw2ZkLRzDl6WMMUv3QAivJnYDHQJgfGYhXjh5MAv9TwXHseUUcFewTbUoQDL/xAIl9NPVaoIckl3ju+aZLj57rqoHCQPXQWwCxkaiGqLy2ethkjjwR9BJWJp7em Xp+UG+eK ETFVgwg5izRzF7u47PDY+DmpT8DadTOlOzXNUjFG6/yCd37hlHaku/5gEOkjH9rZvTa6U2IC+OGjRWKr+U9PJphJqjOm2am6RGYNXyaU0Rwa5K3MYcpmif1CSPCMIS8Udw1mxYw+wTcMUWXDjy8mi0W8pjpmfMku6Wm7bCuwAg9chK6ZD+KMJrIplbC2ESRjPb3UV8haIajIQk/fGiH68sPW7uLoF3ASb4h+BEohHnxLiwXN6WdFFBvKdL6pmke56ERiTba0PxHv5eNnLUR+OSepp2rWr2fu4J/NKeOwKiCfD6xux/ZPiGSADuCfkQc/qPk8a7+bBeaVKFRx/1MpUdWd6ztXgnrJCOW4b4YBZIHNsRzZD5V/M1FXDVhkLRDUcZn3+787wGq9hWEA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 12/29/2023 6:20 AM, Matthew Wilcox wrote: > On Wed, Dec 13, 2023 at 12:27:05PM -0600, Eric W. Biederman wrote: >> Matthew Wilcox writes: >>> I think the right way to fix this is to pass a boolean flag to >>> queued_write_lock_slowpath() to let it know whether it can re-enable >>> interrupts while checking whether _QW_WAITING is set. >> >> Yes. It seems to make sense to distinguish between write_lock_irq and >> write_lock_irqsave and fix this for all of write_lock_irq. > > I wasn't planning on doing anything here, but Hillf kind of pushed me into > it. I think it needs to be something like this. Compile tested only. > If it ends up getting used, Happy new year! Thx Metthew for chiming into this. I think more thoughts will gain more perfect designs. > > Signed-off-by: Matthew Wilcox (Oracle) > > diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h > index 75b8f4601b28..1152e080c719 100644 > --- a/include/asm-generic/qrwlock.h > +++ b/include/asm-generic/qrwlock.h > @@ -33,8 +33,8 @@ > /* > * External function declarations > */ > -extern void queued_read_lock_slowpath(struct qrwlock *lock); > -extern void queued_write_lock_slowpath(struct qrwlock *lock); > +void queued_read_lock_slowpath(struct qrwlock *lock); > +void queued_write_lock_slowpath(struct qrwlock *lock, bool irq); > > /** > * queued_read_trylock - try to acquire read lock of a queued rwlock > @@ -98,7 +98,21 @@ static inline void queued_write_lock(struct qrwlock *lock) > if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED))) > return; > > - queued_write_lock_slowpath(lock); > + queued_write_lock_slowpath(lock, false); > +} > + > +/** > + * queued_write_lock_irq - acquire write lock of a queued rwlock > + * @lock : Pointer to queued rwlock structure > + */ > +static inline void queued_write_lock_irq(struct qrwlock *lock) > +{ > + int cnts = 0; > + /* Optimize for the unfair lock case where the fair flag is 0. */ > + if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED))) > + return; > + > + queued_write_lock_slowpath(lock, true); > } > > /** > @@ -138,6 +152,7 @@ static inline int queued_rwlock_is_contended(struct qrwlock *lock) > */ > #define arch_read_lock(l) queued_read_lock(l) > #define arch_write_lock(l) queued_write_lock(l) > +#define arch_write_lock_irq(l) queued_write_lock_irq(l) > #define arch_read_trylock(l) queued_read_trylock(l) > #define arch_write_trylock(l) queued_write_trylock(l) > #define arch_read_unlock(l) queued_read_unlock(l) > diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h > index c0ef596f340b..897010b6ba0a 100644 > --- a/include/linux/rwlock.h > +++ b/include/linux/rwlock.h > @@ -33,6 +33,7 @@ do { \ > extern int do_raw_read_trylock(rwlock_t *lock); > extern void do_raw_read_unlock(rwlock_t *lock) __releases(lock); > extern void do_raw_write_lock(rwlock_t *lock) __acquires(lock); > + extern void do_raw_write_lock_irq(rwlock_t *lock) __acquires(lock); > extern int do_raw_write_trylock(rwlock_t *lock); > extern void do_raw_write_unlock(rwlock_t *lock) __releases(lock); > #else > @@ -40,6 +41,7 @@ do { \ > # define do_raw_read_trylock(rwlock) arch_read_trylock(&(rwlock)->raw_lock) > # define do_raw_read_unlock(rwlock) do {arch_read_unlock(&(rwlock)->raw_lock); __release(lock); } while (0) > # define do_raw_write_lock(rwlock) do {__acquire(lock); arch_write_lock(&(rwlock)->raw_lock); } while (0) > +# define do_raw_write_lock_irq(rwlock) do {__acquire(lock); arch_write_lock_irq(&(rwlock)->raw_lock); } while (0) > # define do_raw_write_trylock(rwlock) arch_write_trylock(&(rwlock)->raw_lock) > # define do_raw_write_unlock(rwlock) do {arch_write_unlock(&(rwlock)->raw_lock); __release(lock); } while (0) > #endif > diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h > index dceb0a59b692..6257976dfb72 100644 > --- a/include/linux/rwlock_api_smp.h > +++ b/include/linux/rwlock_api_smp.h > @@ -193,7 +193,7 @@ static inline void __raw_write_lock_irq(rwlock_t *lock) > local_irq_disable(); > preempt_disable(); > rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); > - LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock); > + LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock_irq); > } > > static inline void __raw_write_lock_bh(rwlock_t *lock) > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c > index d2ef312a8611..6c644a71b01d 100644 > --- a/kernel/locking/qrwlock.c > +++ b/kernel/locking/qrwlock.c > @@ -61,9 +61,10 @@ EXPORT_SYMBOL(queued_read_lock_slowpath); > > /** > * queued_write_lock_slowpath - acquire write lock of a queued rwlock > - * @lock : Pointer to queued rwlock structure > + * @lock: Pointer to queued rwlock structure > + * @irq: True if we can enable interrupts while spinning > */ > -void __lockfunc queued_write_lock_slowpath(struct qrwlock *lock) > +void __lockfunc queued_write_lock_slowpath(struct qrwlock *lock, bool irq) > { > int cnts; > > @@ -82,7 +83,11 @@ void __lockfunc queued_write_lock_slowpath(struct qrwlock *lock) > Also a new state showed up after the current design: 1. locked flag with _QW_WAITING, while irq enabled. 2. And this state will be only in interrupt context. 3. lock->wait_lock is hold by the write waiter. So per my understanding, a different behavior also needed to be done in queued_write_lock_slowpath: when (unlikely(in_interrupt())) , get the lock directly. So needed to be done in release path. This is to address Hillf's concern on possibility of deadlock. Add Hillf here to merge thread. I am going to have a tested patch V2 accordingly. Feel free to let me know your thoughts prior on that. > /* When no more readers or writers, set the locked flag */ > do { > + if (irq) > + local_irq_enable(); I think write_lock_irqsave also needs to be take account. So loal_irq_save(flags) should be take into account here. > cnts = atomic_cond_read_relaxed(&lock->cnts, VAL == _QW_WAITING); > + if (irq) > + local_irq_disable(); ditto. > } while (!atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)); > unlock: > arch_spin_unlock(&lock->wait_lock); > diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c > index 87b03d2e41db..bf94551d7435 100644 > --- a/kernel/locking/spinlock_debug.c > +++ b/kernel/locking/spinlock_debug.c > @@ -212,6 +212,13 @@ void do_raw_write_lock(rwlock_t *lock) > debug_write_lock_after(lock); > } > > +void do_raw_write_lock_irq(rwlock_t *lock) > +{ > + debug_write_lock_before(lock); > + arch_write_lock_irq(&lock->raw_lock); > + debug_write_lock_after(lock); > +} > + > int do_raw_write_trylock(rwlock_t *lock) > { > int ret = arch_write_trylock(&lock->raw_lock); -- Thx and BRs, Aiqun(Maria) Yu