From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A01CCC2BA1A for ; Tue, 7 Apr 2020 18:04:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5FCD22063A for ; Tue, 7 Apr 2020 18:04:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5FCD22063A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EBAB68E0005; Tue, 7 Apr 2020 14:04:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E6B7E8E0001; Tue, 7 Apr 2020 14:04:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D81D58E0005; Tue, 7 Apr 2020 14:04:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id C0FE98E0001 for ; Tue, 7 Apr 2020 14:04:44 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 71C5C181AEF1A for ; Tue, 7 Apr 2020 18:04:44 +0000 (UTC) X-FDA: 76681834488.10.screw46_6383b8f1b6f01 X-HE-Tag: screw46_6383b8f1b6f01 X-Filterd-Recvd-Size: 9536 Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Tue, 7 Apr 2020 18:04:43 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from localhost (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP (TLS) id 20831296-1500050 for multiple; Tue, 07 Apr 2020 19:04:09 +0100 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable In-Reply-To: References: <20200406200254.a69ebd9e08c4074e41ddebaf@linux-foundation.org> <20200407031042.8o-fYMox-%akpm@linux-foundation.org> <158627540139.8918.10102358634447361335@build.alporthouse.com> From: Chris Wilson Subject: Re: [patch 125/166] lib/list: prevent compiler reloads inside 'safe' list iteration To: Linus Torvalds Cc: Andrew Morton , David Laight , Marco Elver , Linux-MM , Mark Rutland , mm-commits@vger.kernel.org, "Paul E. McKenney" , Randy Dunlap , stable Message-ID: <158628265081.8918.1825514020221532657@build.alporthouse.com> User-Agent: alot/0.8.1 Date: Tue, 07 Apr 2020 19:04:10 +0100 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Quoting Linus Torvalds (2020-04-07 18:28:34) > On Tue, Apr 7, 2020 at 9:04 AM Chris Wilson wr= ote: > > > > It'll take some time to reconstruct the original report, but the case in > > question was in removing the last element of the list of the last list, > > switch to a global lock over all such lists to park the HW, which in > > doing so added one more element to the original list. [If we happen to > > be retiring along the kernel timeline in the first place.] >=20 > Please point to the real code and the list. >=20 > Honestly, what you describe sounds complex enough that I think your > locking is simply just buggy. >=20 > IOW, this patch seems to really just paper over a locking bug, and the > KASAN report tied to it. >=20 > Because the fundamental issue is that READ_ONCE() can not fix a bug > here. Reading the next pointer once fundamentally cannot matter if it > can change concurrently: the code is buggy, and the READ_ONCE() just > means that it gets one or the other value randomly, and that the list > walking is fundamentally racy. >=20 > One the other hand, if the next pointer _cannot_ change concurrently, > then READ_ONCE() cannot make a difference. >=20 > So as fat as I can tell, we have two possibilities, and in both cases > changing the code to use READ_ONCE() is not the right thing to do. In > one case it hides a bug, and in another case it's just pointless. >=20 > > list->next changed from pointing to list_head, to point to the new > > element instead. However, we don't have to check the next element yet > > and want to terminate the list iteration. >=20 > I'd really like to see the actual code that has that list walking. You sa= y: >=20 > > For reference, > > drivers/gpu/drm/i915/gt/intel_engine_pm.c::__engine_park() >=20 > .. but that function doesn't have any locking or list-walking. Is it > the "call_idle_barriers()" loop? What is it? list_for_each_safe @ intel_gt_requests.c::retire_requests list_del @ i915_requests.c::i915_request_retire list_add @ i915_requests.c::__i915_request_create > I'd really like to see the KASAN report and the discussion about this cha= nge. >=20 > And if there was no discussion, then the patch just seems like "I > changed code randomly and the KASAN report went away, so it's all > good". >=20 > > Global activity is serialised by engine->wakeref.mutex; every active > > timeline is required to hold an engine wakeref, but retiring is local to > > timelines and serialised by their own timeline->mutex. > > > > lock(&timeline->lock) > > list_for_each_safe(&timeline->requests) > > \-> i915_request_retire [list_del(&timeline->requests)] > > \-> intel_timeline_exit > > \-> lock(&engine->wakeref.mutex) > > engine_park [list_add_tail(&engine->kernel_context->timeline->r= equests)] >=20 > in that particular list_for_each_safe() thing, there's no possibility > that the 'next' field would be reloaded, since the list_del() in the > above will be somethign the compiler is aware of. >=20 > So yes, the beginning list_for_each_safe() might load it twice (or a > hundred times), but by the time that list_del() in > i915_request_retire() has been called, if the compiler then reloads it > afterwards, that would be a major compiler bug, since it's after that > value could have been written in the local thread. My understanding of how KCSAN works is that it installs a watchpoint after a read and complains if the value changes within X us. > So this doesn't explain it to me. >=20 > What it *sounds* like is that the "engine" lock that you do *not* hold > initially, is not protecting some accessor to that list, so you have a > race on the list at the time of that list_del(). We take an engine wakeref prior to creating a request, and release it upon retiring the request [through the respective context/timelines, across all the engines that might be used for that context]. When the engine wakeref hits zero, that mutex is taken to prevent any other new request being created, and under that mutex while we know that the engine->kernel_context->timeline is empty, we submit a request to switch the GPU to point into our scratch context. That submission can run concurrently to the list iteration, but only _after_ the final list_del. > And that race may be what KASAN is reporting, and what that patch is > _hiding_ from KASAN - but not fixing. Yes, it was sent as a means to silence KCSAN with a query as to whether or not it could happen in practice with an aggressive compiler. > See what I am saying and why I find this patch questionable? >=20 > There may be something really subtle going on, but it really smells > like "two threads are modifying the same list at the same time". In strict succession. =20 > And there's no way that the READ_ONCE() will fix that bug, it will > only make KASAN shut up about it. There's some more shutting up required for KCSAN to bring the noise down to usable levels which I hope has been done so I don't have to argue for it, such as diff --git a/include/linux/timer.h b/include/linux/timer.h index 1e6650ed066d..c7c8dd89f279 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct timer_= list *timer) { } */ static inline int timer_pending(const struct timer_list * timer) { - return timer->entry.pprev !=3D NULL; + return READ_ONCE(timer->entry.pprev) !=3D NULL; } extern void add_timer_on(struct timer_list *timer, int cpu); diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 5352ce50a97e..7461b3f33629 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -565,8 +565,9 @@ bool mutex_spin_on_owner(struct mutex *lock, struct tas= k_struct *owner, /* * Use vcpu_is_preempted to detect lock holder preemption issue. */ - if (!owner->on_cpu || need_resched() || - vcpu_is_preempted(task_cpu(owner))) { + if (!READ_ONCE(owner->on_cpu) || + need_resched() || + vcpu_is_preempted(task_cpu(owner))) { ret =3D false; break; } @@ -602,7 +603,7 @@ static inline int mutex_can_spin_on_owner(struct mutex = *lock) * on cpu or its cpu is preempted */ if (owner) - retval =3D owner->on_cpu && !vcpu_is_preempted(task_cpu(owner)); + retval =3D READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner= )); rcu_read_unlock(); /* diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 1f7734949ac8..4a81fba4cf70 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock, * wait for either @lock to point to us, through its Step-B, or * wait for a new @node->next from its Step-C. */ - if (node->next) { + if (READ_ONCE(node->next)) { next =3D xchg(&node->next, NULL); if (next) break; @@ -154,7 +154,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) */ for (;;) { - if (prev->next =3D=3D node && + if (READ_ONCE(prev->next) =3D=3D node && cmpxchg(&prev->next, node, NULL) =3D=3D node) break; diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 0d9b6be9ecc8..eef4835cecf2 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -650,7 +650,7 @@ static inline bool owner_on_cpu(struct task_struct *own= er) * As lock holder preemption issue, we both skip spinning if * task is not on cpu or its cpu is preempted */ - return owner->on_cpu && !vcpu_is_preempted(task_cpu(owner)); + return READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner)); } static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,