From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9C6BC32771 for ; Wed, 28 Sep 2022 21:39:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 576606B0072; Wed, 28 Sep 2022 17:39:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 524E96B0075; Wed, 28 Sep 2022 17:39:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C6EC6B0078; Wed, 28 Sep 2022 17:39:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2FF786B0072 for ; Wed, 28 Sep 2022 17:39:25 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 01272140FC1 for ; Wed, 28 Sep 2022 21:39:24 +0000 (UTC) X-FDA: 79962810690.04.A61260C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 8CFBD4000E for ; Wed, 28 Sep 2022 21:39:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664401163; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lZVwlHkJX+CA2MdLV8ui7Hgvltct3SaTb4zc3R8SWOA=; b=AJJU+UP4lZiUnLudLjjIQmzjKB8SI9j71mGXZYJz3+SGZyJxeO2j9SKHrh5c8TyHxKrHxs iLmhCXjhl/qH3DTWr5mYwXGuf+h0l5vCaIHDxZRwTtssp/zJ+Qmb97RP6D91rdbhvwtPwn pj1HYqn08dfZUnNMTKnQ+d4CUrO06J0= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-125-mcv4NLp8N52Df-BHeWGszw-1; Wed, 28 Sep 2022 17:39:20 -0400 X-MC-Unique: mcv4NLp8N52Df-BHeWGszw-1 Received: by mail-qt1-f197.google.com with SMTP id fz10-20020a05622a5a8a00b0035ce18717daso9733033qtb.11 for ; Wed, 28 Sep 2022 14:39:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:user-agent:organization :references:in-reply-to:date:cc:to:from:subject:message-id :x-gm-message-state:from:to:cc:subject:date; bh=lZVwlHkJX+CA2MdLV8ui7Hgvltct3SaTb4zc3R8SWOA=; b=kTDIOIJj7FfB7R04EuLYBl9rDJZJuZpBZOeJ2bJhfhj4o0Ye8Y+6F2dRM6TGgWFyfh jF+WIXFlydIe1dPck+6xHmwS3rFZce+t1BDDHizoPiWjQn9LuloWC2S/EYmYcpBhDgo4 PxS8jVxqzVMDkpNWC9i6mzrpwlLLeF2hWcVgvdFl2/N9Z6Wa9mMOH+JapQwsafdn73Lb g52+AQ+aoFyMN19kQEzJsjUQHEZDgmbfYc38zgaxjAAFL4jzTcA9jPqSFo5qg95pG59h g9Su/kJh1V2npBQsiojswx8X7QPTsWjKrf91YVTZg+3TcxHki+Z2gpof4PXod5ugT6vo IzGA== X-Gm-Message-State: ACrzQf1GHzPe6ZVIuD90Ho2rMfFzr1KWKVz8Rje/FL2maFhVHg/8EZpD nxWgXqJWDR6dyytLDs+29h3jjEC5/pwukzVC+iMMVIKW/4xK/uin4Iaw5Hzl51CHFoWwUMeKe46 vB3SdABZclUE= X-Received: by 2002:ac8:5d8b:0:b0:35b:b035:9573 with SMTP id d11-20020ac85d8b000000b0035bb0359573mr28229976qtx.632.1664401160171; Wed, 28 Sep 2022 14:39:20 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6+NTBF2OvbAgxteoflWFSd6erD1BpZf1mrqbddCj+mgL2acOrz19fID/cOV8yI3Qia+7pfGg== X-Received: by 2002:ac8:5d8b:0:b0:35b:b035:9573 with SMTP id d11-20020ac85d8b000000b0035bb0359573mr28229957qtx.632.1664401159952; Wed, 28 Sep 2022 14:39:19 -0700 (PDT) Received: from ?IPv6:2600:4040:5c48:e00::feb? ([2600:4040:5c48:e00::feb]) by smtp.gmail.com with ESMTPSA id bs13-20020a05620a470d00b006a6ebde4799sm3987020qkb.90.2022.09.28.14.39.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Sep 2022 14:39:18 -0700 (PDT) Message-ID: <139a402b4f9a09a4e89b0c0b0e556014ae7a8b83.camel@redhat.com> Subject: Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release From: Lyude Paul To: Alistair Popple , John Hubbard Cc: linux-mm@kvack.org, Andrew Morton , Michael Ellerman , Nicholas Piggin , Felix Kuehling , Alex Deucher , Christian =?ISO-8859-1?Q?K=F6nig?= , "Pan, Xinhui" , David Airlie , Daniel Vetter , Ben Skeggs , Karol Herbst , Ralph Campbell , "Matthew Wilcox (Oracle)" , Alex Sierra , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Jason Gunthorpe , Dan Williams Date: Wed, 28 Sep 2022 17:39:17 -0400 In-Reply-To: <87k05plm9j.fsf@nvdebian.thelocal> References: <072e1ce590fe101a4cdbd5e91b1702efebb6d0fd.1664171943.git-series.apopple@nvidia.com> <881735bda9b1ba0ecf3648af201840233508f206.camel@redhat.com> <6ff9dcc5-c34b-963f-f5e7-7038eecae98b@nvidia.com> <87k05plm9j.fsf@nvdebian.thelocal> Organization: Red Hat Inc. User-Agent: Evolution 3.42.4 (3.42.4-2.fc35) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664401164; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lZVwlHkJX+CA2MdLV8ui7Hgvltct3SaTb4zc3R8SWOA=; b=5DuJLUDU5XY1/6C1uk5ZsabdHae3tAeKC4p/UmAY73kiT73sIArlNozDUaRCvQFN56Clob efYfaT2nA3PITc8WpjoHjIRXCBwzrRor2E6dnIFCNPCEC5tszU51VRSEfl8+cYF+Lp3rdm EdlfTN3RIEbAeJjqTEM09Pb56FNYf/Y= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AJJU+UP4; spf=pass (imf17.hostedemail.com: domain of lyude@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=lyude@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664401164; a=rsa-sha256; cv=none; b=viTQH3Z76QkRhF65he4eag9esarE9HKXN0UqTRIV4dBdXPuHHPAHI/itGNFhWSEiL6HHQx uoaqvkZ/3hqFHY9XW03p5yhGgfdKZ3upWBtZITlFTIyBPWT4LkR2DFld/ny2y/5Gdoo8Y5 +mF1IxLfO1/XlZfQEap1wKSM+4g4RU0= Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AJJU+UP4; spf=pass (imf17.hostedemail.com: domain of lyude@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=lyude@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Stat-Signature: urzmqjqdbiqzumzm1rcbpgtamfytj5w5 X-Rspamd-Queue-Id: 8CFBD4000E X-Rspamd-Server: rspam08 X-HE-Tag: 1664401164-648636 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Re comments about infinite retry: gotcha, makes sense to me. On Tue, 2022-09-27 at 09:45 +1000, Alistair Popple wrote: > John Hubbard writes: > > > On 9/26/22 14:35, Lyude Paul wrote: > > > > + for (i = 0; i < npages; i++) { > > > > + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { > > > > + struct page *dpage; > > > > + > > > > + /* > > > > + * _GFP_NOFAIL because the GPU is going away and there > > > > + * is nothing sensible we can do if we can't copy the > > > > + * data back. > > > > + */ > > > > > > You'll have to excuse me for a moment since this area of nouveau isn't one of > > > my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite > > > retry, in the case of a GPU hotplug event I would assume we would rather just > > > stop trying to migrate things to the GPU and just drop the data instead of > > > hanging on infinite retries. > > > > > No problem, thanks for taking a look! > > > Hi Lyude! > > > > Actually, I really think it's better in this case to keep trying > > (presumably not necessarily infinitely, but only until memory becomes > > available), rather than failing out and corrupting data. > > > > That's because I'm not sure it's completely clear that this memory is > > discardable. And at some point, we're going to make this all work with > > file-backed memory, which will *definitely* not be discardable--I > > realize that we're not there yet, of course. > > > > But here, it's reasonable to commit to just retrying indefinitely, > > really. Memory should eventually show up. And if it doesn't, then > > restarting the machine is better than corrupting data, generally. > > The memory is definitely not discardable here if the migration failed > because that implies it is still mapped into some userspace process. > > We could avoid restarting the machine by doing something similar to what > happens during memory failure and killing every process that maps the > page(s). But overall I think it's better to retry until memory is > available, because that allows things like reclaim to work and in the > worst case allows the OOM killer to select an appropriate task to kill. > It also won't cause data corruption if/when we have file-backed memory. > > > thanks, > -- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat