From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CE8CC02193 for ; Wed, 29 Jan 2025 11:28:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 753EF6B00D7; Wed, 29 Jan 2025 06:28:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 703DB6B00EC; Wed, 29 Jan 2025 06:28:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A49B6B00ED; Wed, 29 Jan 2025 06:28:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3B5C06B00D7 for ; Wed, 29 Jan 2025 06:28:32 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B928FAEC57 for ; Wed, 29 Jan 2025 11:28:31 +0000 (UTC) X-FDA: 83060266422.19.78AAF0F Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf12.hostedemail.com (Postfix) with ESMTP id 93DD040004 for ; Wed, 29 Jan 2025 11:28:29 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=HvLqHq+J; spf=none (imf12.hostedemail.com: domain of simona.vetter@ffwll.ch has no SPF policy when checking 209.85.128.43) smtp.mailfrom=simona.vetter@ffwll.ch; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738150109; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z2VbjRVyCbXR+8Dg8WLIWe582fw710vs2oZGBMizD0s=; b=X3U+ylyYrXP+JpypBDzUyC5rbdDVUNlts3KBrxW7l6gJdZwj7wKAN6ojAT4Ryw+VuEuEYY Xsk1BanRttDOuseE/WTLrEpjQEukQrBESAsrEWsoQ1xnO6WuTuG3f+RmabFtYoXvhpUcPm LsXn/einEf1DZVDq5Lwzk4efLBaaICQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=ffwll.ch header.s=google header.b=HvLqHq+J; spf=none (imf12.hostedemail.com: domain of simona.vetter@ffwll.ch has no SPF policy when checking 209.85.128.43) smtp.mailfrom=simona.vetter@ffwll.ch; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738150109; a=rsa-sha256; cv=none; b=Alib64XKPalNt47Cs9t+32x6Ayy55ZIkv8PCmVniHqA7GG4r6U+UGvI9oX0pVAVzwSdOk+ 3NHZOMY8kOMZAN82tdBVoV9uDAwVjUxtStzdZOGQscDSnoYOzYVgDzGKFK19fP83AvLwMP a9sBkq7nLeO1bmkXIuRQUljYmoqy7mg= Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-4363ae65100so74488325e9.0 for ; Wed, 29 Jan 2025 03:28:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; t=1738150108; x=1738754908; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=Z2VbjRVyCbXR+8Dg8WLIWe582fw710vs2oZGBMizD0s=; b=HvLqHq+J/alE1aIQrb+WUxaP4AVoHDh73XHhm2BFyNyMBosVxxjpx5bzNb4W6Yjkch njtvU7V5IzIEN8/FZTvNN621Aqy2/tUXS9YphDkD5sWlhVpB2HWpzeyOQTdvbBNmWk6v zemYGJRvDLV++k3bz+30UzUJ+5hmCsit8lFBo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738150108; x=1738754908; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Z2VbjRVyCbXR+8Dg8WLIWe582fw710vs2oZGBMizD0s=; b=RKkAukG0TJPk40ZKlcCMCakK/5scnKcFqcHIEnxLbFH5GFLETjevE/CDWozIsIYqu2 CT7oK4RCg90pJU1zVU/Z8By4bl96ulRkzBCF+rytYcmsd6qEnoIE/ZTVj3rmIyxO2URi clYHw6qXJZiWgSKG4ZlGynBLrJ2zQcpHuDEXlsx2HANTQr3YwKtA9a0uI1R3UMOlmqdj bsPzCnwvwF1hKFVa/sO5O7ZvlmqbE2JjH1lNJjn6o2z0FuUflDyVmX04GXFCN/rE0etw +wDMTPnCu80f3sAoNRR9OKHiKd72BvazkuW3Pne3ZJuSLSivpLMCBcsiS1+JCWQBUjHB F+lQ== X-Forwarded-Encrypted: i=1; AJvYcCWIKtVnjzm9d+206Q/JRomf1NMRG6NlF4bJfsH2PtIyS6BV11rVs0XrlarvaQjxXUtnoJZcVNWHJw==@kvack.org X-Gm-Message-State: AOJu0YytJmg9lDJyWsWRA2mEp/L1W8R+7gT1J4xbzKJgbWIGF3ZCQ8Tw mxaf7wT7qMicISQAJ8dYXjTTUb3aFDtBKYZ3glfmHwzObU5bl9T9kdZTvULZBAI= X-Gm-Gg: ASbGncuX5eFlCDWrE2Wn4Rt0XozCxSxi1aJyO16wnKV6QIlBjtVNOB3cKlTp5PcEdex +Gil/Njs27gnGHaLXcpgQ68zNRFBLD6k3YEVPwunxJlFHG2p6MppCobl2HU5czqxCW2ZIApncr0 5LS4WvHokDxQ9bwoa1MVyaXhCxUWPWpKfHe1tCJ/2h/xZYjhZBh3pXQ/yrM/6//Mjtdrp+Jq9FI n7LlSsjqqj4FwuKvyueKP93WeS4hT08Jj0yhRubaZTvhYEON94c61cbCrUv0zmpihXaEfNMuzcN +UOfjxgqD4JNVxcyZFy0doQNz7o= X-Google-Smtp-Source: AGHT+IHg+He9FWfYDRGN+VU9l4bW9JJdgPPVmyUVVgTnOLsxQcZLamrOOWh8CGRSyngbepz9UL4uNQ== X-Received: by 2002:a05:600c:1d1e:b0:438:a1f5:3e41 with SMTP id 5b1f17b1804b1-438dc3c387fmr25071285e9.12.1738150107630; Wed, 29 Jan 2025 03:28:27 -0800 (PST) Received: from phenom.ffwll.local ([2a02:168:57f4:0:5485:d4b2:c087:b497]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438dcc13202sm20014415e9.5.2025.01.29.03.28.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Jan 2025 03:28:27 -0800 (PST) Date: Wed, 29 Jan 2025 12:28:25 +0100 From: Simona Vetter To: David Hildenbrand , Alistair Popple , "linux-mm@kvack.org" , John Hubbard , nouveau@lists.freedesktop.org, Jason Gunthorpe , DRI Development , Karol Herbst , Lyude Paul , Danilo Krummrich Subject: Re: [Question] Are "device exclusive non-swap entries" / "SVM atomics in Nouveau" still getting used in practice? Message-ID: Mail-Followup-To: David Hildenbrand , Alistair Popple , "linux-mm@kvack.org" , John Hubbard , nouveau@lists.freedesktop.org, Jason Gunthorpe , DRI Development , Karol Herbst , Lyude Paul , Danilo Krummrich References: <346518a4-a090-4eaa-bc04-634388fd4ca3@redhat.com> <8c6f3838-f194-4a42-845d-10011192a234@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: Linux phenom 6.12.11-amd64 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 93DD040004 X-Stat-Signature: hfpj9e5my4ut9axrj5rtmj6mkrfkyisj X-Rspam-User: X-HE-Tag: 1738150109-775315 X-HE-Meta: U2FsdGVkX1/16SdlCJPu+Adxrrsqp/bax0BOO2Gwa9P7ZY+GJg/3gEffG0hP+GLUltL5jOu4c/Z4bBGy44QXIwW5sePN/8Dcr1IW7FuL3GmIKnJOAHvQWZGAXq0dCj1xEPiA8cj2tl9YV8QlPUilTS1BJJgHRqovuQ3XD6w7vXXgeYB7+C12KjMXSeM3Dn/bMXzCHZSS6ffOR164+vRtM+9h0vM8UvYGrcDRZv8OTXyHX2g9FIA17i/KYs0dFbXj5oLXrdI9zABRdkLa7qmKC5DAr3LNURI4yyc/TA1fMVW6dgJWx2UAP7Gh2a1MSshgIWHE2s+k6JtQYg0yqHmn0cVX05OgX33gLp9kYz9Y/J+N6UitjwWvxO1lj7M+5beDb2i7bpLAI1mUukwg+EPDXzu6sUEposSlG67juloGSa8dtW5kh8GmRYzGb0V8KAm0RaR3m5GI5alyNVsd9VIV16uv7tbCxmaeP069+f9h2pv9pb57Y8aWq6j0SPOCF3SMMG6kCHZSHuQyv6pbhO3Cjqnc5cfDIZrZkDvCGru7oLrYYkOO5O6pOD3lFlmJ2ctLbHgIuBcw6EuEH/xH9TrtIHGNl/Pgc/y83IbA7EUmX/8ACfX2/lrqSy9MTVXzphEPdEBY0ZqsDpgSZWeJOdhuflembJVtjL6ogYNoFdyC+GHlPzHzBfKk6/8BYotWK0UXlbxbHJ6Y8ZxwaeWenYZJxRHJiUejlDmof33xgkPnBthmwsO0AzoK5lbm9bh1agn+JmXY4QWA/fWHwVo+ZkCMFN3yibSG8ZsC6jwpGEMPDwQaoa2LK2DnwdTEwPkHYls+mmQQECod3D+9mzJFpzA34Odr56i4fNINmSw9rjUPFNGlMwTEQ1DFpKtVcZttfjowg14zjVkLwHJQCQaNJ4tk2thUeqzjY6KBRm8YJGXQy6qocZmcuJxdnhkM8sjE3O6CNijkqyNUVurz0nzuG7E SUAIv234 bsK7MKp35b9AZnMrQBA0c4poCjsOu9qJ5coe/QGM6JSkv30aVSQFmbll+Xbi5GMQvePOrhEiVf4l4mUTkwIPpj2uhq1sDgWmSdQvdN+QoAtNX3iZc/WAD/AeVTF9P5UnhCtBnGLNvekfwV4KxmjsBbdkVW2Yn9iDgm3VzbZ5Uh4+L/o3IiC7S79a65dqAd0wYiC6BaPIMwl/T6f8Dl5mlue9BNgRbVGUpE4ed1kIC+c3VxiRY9jnJUKOGl2BHQz97HH+ShRTYju1D91K0aH55Z8BpzTBxEEGmB9LKbNSYxGwNxhQkNVDkPgivTTdBzJCHcZtLlb8rP6Mrh5xFRkncmy2UzhWdTLXSxkZs6SkgE+zm8PtBVnZJzlFL1TsfDE1qODq3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.013867, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 29, 2025 at 11:48:03AM +0100, Simona Vetter wrote: > On Tue, Jan 28, 2025 at 09:24:33PM +0100, David Hildenbrand wrote: > > On 28.01.25 21:14, Simona Vetter wrote: > > > On Tue, Jan 28, 2025 at 11:09:24AM +1100, Alistair Popple wrote: > > > > On Fri, Jan 24, 2025 at 06:54:02PM +0100, David Hildenbrand wrote: > > > > > > > > On integrated the gpu is tied into the coherency > > > > > > > > fabric, so there it's not needed. > > > > > > > > > > > > > > > > I think the more fundamental question with both this function here and > > > > > > > > with forced migration to device memory is that there's no guarantee it > > > > > > > > will work out. > > > > > > > > > > > > > > Yes, in particular with device-exclusive, it doesn't really work with THP > > > > > > > and is only limited to anonymous memory. I have patches to at least make it > > > > > > > work reliably with THP. > > > > > > > > > > > > I should have crawled through the implementation first before replying. > > > > > > Since it only looks at folio_mapcount() make_device_exclusive() should at > > > > > > least in theory work reliably on anon memory, and not be impacted by > > > > > > elevated refcounts due to migration/ksm/thp/whatever. > > > > > > > > > > Yes, there is -- in theory -- nothing blocking the conversion except the > > > > > folio lock. That's different than page migration. > > > > > > > > Indeed - this was the entire motivation for make_device_exclusive() - that we > > > > needed a way to reliably exclude CPU access that couldn't be blocked in the same > > > > way page migration can (otherwise we could have just migrated to a device page, > > > > even if that may have added unwanted overhead). > > > > > > The folio_trylock worries me a bit. I guess this is to avoid deadlocks > > > when locking multiple folios, but I think at least on the first one we > > > need an unconditional folio_lock to guarantee forward progress. > > > > At least on the hmm path I was able to trigger the EBUSY a couple of times > > due to concurrent swapout. But the hmm-tests selftest fails immediately > > instead of retrying. > > My worries with just retrying is that it's very hard to assess whether > there's a livelock or whether the retry has a good chance of success. As > an example the ->migrate_to_ram path has some trylocks, and the window > where all other threads got halfway and then fail the trylock is big > enough that once you pile up enough threads that spin through there, > you're stuck forever. Which isn't great. > > So if we could convert at least the first folio_trylock into a plain lock > then forward progress is obviously assured and there's no need to crawl > through large chunks of mm/ code to hunt for corner cases where we could > be too unlucky to ever win the race. > > > > Since > > > atomics can't cross 4k boundaries (or the hw is just really broken) this > > > should be enough to avoid being stuck in a livelock. I'm also not seeing > > > any other reason why a folio_lock shouldn't work here, but then my > > > understanding of mm/ stuff is really just scratching the surface. > > > > > > I did crawl through all the other code and it looks like everything else > > > is unconditional locks. So looks all good and I didn't spot anything else > > > that seemed problematic. > > > > > > Somewhat aside, I do wonder whether we really want to require callers to > > > hold the mmap lock, or whether with all the work towards lockless fastpath > > > that shouldn't instead just be an implementation detail. > > > > We might be able to use the VMA lock in the future, but that will require > > GUP support and a bunch more. Until then, the mm_lock in read mode is > > required. > > Yup. I also don't think we should try to improve before benchmarks show an > actual need. It's more about future proofing and making sure mmap_lock > doesn't leak into driver data structures that I'm worried about. Because > I've seen some hmm/gpu rfc patches that heavily relied on mmap_lock to > keep everything correct on the driver side, which is not a clean design. > > > I was not able to convince myself that we'll really need the folio lock, but > > that's also a separate discussion. > > This is way above my pay understanding of mm/ unfortunately. I pondered this some more, and I think it's to make sure we get a stable reading of folio_mapcount() and are not racing with new rmaps being established. But I also got lost a few times in the maze ... -Sima > > > > At least for the > > > gpu hmm code I've seen I've tried to push hard towards a world were the > > > gpu side does not rely on mmap_read_lock being held at all, to future > > > proof this all. And currently we only have one caller of > > > make_device_exclusive_range() so would be simple to do. > > > > We could likely move the mmap_lock into that function, but avoiding it is > > more effort. > > I didn't mean more than just that, which would make sure drivers at least > do not rely on mmap_lock being held. That then allows us to switch over to > vma lock or anything else entirely within mm/ code. > > If we leave it as-is then more drivers accidentally or intentionally will > rely on this, like I think is the case for ->migrate_to_ram for hmm > already. And then it's more pain to untangle. > > > In any case, I'll send something out probably tomorrow to fix page > > migration/swapout of pages with device-exclusive entries and a bunch of > > other things (THP, interaction with hugetlb, ...). > > Thanks a lot! > > Cheer, Sima > > > > -- > > Cheers, > > > > David / dhildenb > > > > -- > Simona Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch