From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61F48C433C1 for ; Thu, 25 Mar 2021 08:32:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AA3D061A1D for ; Thu, 25 Mar 2021 08:32:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA3D061A1D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2B6B96B0036; Thu, 25 Mar 2021 04:32:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 28E156B006C; Thu, 25 Mar 2021 04:32:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BB566B006E; Thu, 25 Mar 2021 04:32:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id E160D6B0036 for ; Thu, 25 Mar 2021 04:32:04 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A2643BA10 for ; Thu, 25 Mar 2021 08:32:04 +0000 (UTC) X-FDA: 77957728968.11.18DBB9B Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) by imf25.hostedemail.com (Postfix) with ESMTP id 77F4E6000107 for ; Thu, 25 Mar 2021 08:32:02 +0000 (UTC) Received: by mail-oi1-f169.google.com with SMTP id x207so1361969oif.1 for ; Thu, 25 Mar 2021 01:32:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=OSJJN6tx81YHfrI0UAeuzZVXunphtfPSIhfgCcMn1EY=; b=HZqsaxoxbmFaRTQ//LQp4SY3lvKCjGAIZnkgV3B3pQYjLrKu8gOGhSzMmUF4/IA4vG r6/dNMHEvFq3Goz0uBhL9JfEgeiQfhdUWIctEpBP5rv+YXA+Yws8OAYJNDzUpzdEiiKv H3yAfQ//XrdXTEN0GMthIGLT+I5aGeifbEmZQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=OSJJN6tx81YHfrI0UAeuzZVXunphtfPSIhfgCcMn1EY=; b=LExcIapY2a24omHHBUeZstJainkMPK/A0/PG6ujAcgziA7nYvzSX2tXQekUqb2Hesf 0AeuHKug8adJ/MEVRzXwTV1d791gvxZFa+zrkdXezhLAukGtUHJyj+IfmmUgKdkz+GgZ 7I+NxWmr6dbHOxZBDW+VVE8OYGrbAli2wUxJsvLIG37FHbEJpN6fCcd9C7MChDGdYMz+ SRnDP3ajhuod2Li4RBaVYhv/xdM2OQ0vlJN9qLjl5ihh9noJhsVHgMtiNsKFxyV5A6/i 7bYdYYdPvO7aeFkMvjB++dcKdZ5UC+rArRBefiijTZBDFJ5poW7bK0n5deVe3zXpmFqY f7cQ== X-Gm-Message-State: AOAM533kUt2fxoqsiJp8BYupnNRxXSPVjtgxkFrwGCtnoEhUApsNlNxz HIZ2DD8l3lbD8WdUWFUGKLQu8Lq7yuj9JwOjM16Htw== X-Google-Smtp-Source: ABdhPJw/hOOj/X/5U+YWyGdFDELGl46IA5TLHxr47YSMIL2MsHlGSNlM7sy38K6UWF63I63W/hIJ10gkkktEVyyzNhg= X-Received: by 2002:aca:4188:: with SMTP id o130mr5173626oia.101.1616661122459; Thu, 25 Mar 2021 01:32:02 -0700 (PDT) MIME-Version: 1.0 References: <20210324134845.2338-1-christian.koenig@amd.com> <0f2308b4-f36e-ab02-c26d-4065e9972b50@gmail.com> In-Reply-To: <0f2308b4-f36e-ab02-c26d-4065e9972b50@gmail.com> From: Daniel Vetter Date: Thu, 25 Mar 2021 09:31:51 +0100 Message-ID: Subject: Re: [PATCH] drm/ttm: switch back to static allocation limits for now To: =?UTF-8?Q?Christian_K=C3=B6nig?= Cc: dri-devel , Linux MM , Liang.Liang@amd.com, =?UTF-8?Q?Thomas_Hellstr=C3=B6m_=28VMware=29?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 77F4E6000107 X-Stat-Signature: 64ecw1b5zkonkpwip93dqn1o4fmod9t7 Received-SPF: none (ffwll.ch>: No applicable sender policy available) receiver=imf25; identity=mailfrom; envelope-from=""; helo=mail-oi1-f169.google.com; client-ip=209.85.167.169 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616661122-237042 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 25, 2021 at 9:07 AM Christian K=C3=B6nig wrote: > > Am 24.03.21 um 20:29 schrieb Daniel Vetter: > > On Wed, Mar 24, 2021 at 02:48:45PM +0100, Christian K=C3=B6nig wrote: > >> The shrinker based approach still has some flaws. Especially that we n= eed > >> temporary pages to free up the pages allocated to the driver is proble= matic > >> in a shrinker. > >> > >> Signed-off-by: Christian K=C3=B6nig > >> --- > >> drivers/gpu/drm/ttm/ttm_device.c | 14 ++-- > >> drivers/gpu/drm/ttm/ttm_tt.c | 112 ++++++++++++-----------------= -- > >> include/drm/ttm/ttm_tt.h | 3 +- > >> 3 files changed, 53 insertions(+), 76 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/tt= m_device.c > >> index 95e1b7b1f2e6..388da2a7f0bb 100644 > >> --- a/drivers/gpu/drm/ttm/ttm_device.c > >> +++ b/drivers/gpu/drm/ttm/ttm_device.c > >> @@ -53,7 +53,6 @@ static void ttm_global_release(void) > >> goto out; > >> > >> ttm_pool_mgr_fini(); > >> - ttm_tt_mgr_fini(); > >> > >> __free_page(glob->dummy_read_page); > >> memset(glob, 0, sizeof(*glob)); > >> @@ -64,7 +63,7 @@ static void ttm_global_release(void) > >> static int ttm_global_init(void) > >> { > >> struct ttm_global *glob =3D &ttm_glob; > >> - unsigned long num_pages; > >> + unsigned long num_pages, num_dma32; > >> struct sysinfo si; > >> int ret =3D 0; > >> unsigned i; > >> @@ -79,8 +78,15 @@ static int ttm_global_init(void) > >> * system memory. > >> */ > >> num_pages =3D ((u64)si.totalram * si.mem_unit) >> PAGE_SHIFT; > >> - ttm_pool_mgr_init(num_pages * 50 / 100); > >> - ttm_tt_mgr_init(); > >> + num_pages /=3D 2; > >> + > >> + /* But for DMA32 we limit ourself to only use 2GiB maximum. */ > >> + num_dma32 =3D (u64)(si.totalram - si.totalhigh) * si.mem_unit > >> + >> PAGE_SHIFT; > >> + num_dma32 =3D min(num_dma32, 2UL << (30 - PAGE_SHIFT)); > >> + > >> + ttm_pool_mgr_init(num_pages); > >> + ttm_tt_mgr_init(num_pages, num_dma32); > >> > >> spin_lock_init(&glob->lru_lock); > >> glob->dummy_read_page =3D alloc_page(__GFP_ZERO | GFP_DMA32); > >> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt= .c > >> index 2f0833c98d2c..5d8820725b75 100644 > >> --- a/drivers/gpu/drm/ttm/ttm_tt.c > >> +++ b/drivers/gpu/drm/ttm/ttm_tt.c > >> @@ -40,8 +40,18 @@ > >> > >> #include "ttm_module.h" > >> > >> -static struct shrinker mm_shrinker; > >> -static atomic_long_t swapable_pages; > >> +static unsigned long ttm_pages_limit; > >> + > >> +MODULE_PARM_DESC(pages_limit, "Limit for the allocated pages"); > >> +module_param_named(pages_limit, ttm_pages_limit, ulong, 0644); > >> + > >> +static unsigned long ttm_dma32_pages_limit; > >> + > >> +MODULE_PARM_DESC(dma32_pages_limit, "Limit for the allocated DMA32 pa= ges"); > >> +module_param_named(dma32_pages_limit, ttm_dma32_pages_limit, ulong, 0= 644); > >> + > >> +static atomic_long_t ttm_pages_allocated; > >> +static atomic_long_t ttm_dma32_pages_allocated; > > Making this configurable looks an awful lot like "job done, move on". J= ust > > the revert to hardcoded 50% (or I guess just revert the shrinker patch = at > > that point) for -fixes is imo better. > > Well this is the revert to hardcoded 50%. We had that configurable > before and have it configurable now. Hm I looked, but I missed it I guess. Acked-by: Daniel Vetter I just really want to make sure we don't walk down the path of first reinventing kswapd, then reinventing direct reclaim (i915-gem has done that because of the single dev->struct_mutex, despite our shrinker, it's real ugly and fragile) in selective calls, then realizing that's still not enough because especially compute loves userptr, and then we're back to having to reinvent the GFP hierarchy. Been there with i915-gem for dev->struct_mutex reasons, but the effect is kinda similar: In some cases the shrinker is defacto not there, and everything then keels over. btw one thing that cross my mind a bunch times is to add a GFP_NOGPU. That fixes things for everything else except our own gpu memory usage, and since the goal is to let that approach 100% it's not really a gain. > Reverting everything back would mean that we also need to revert the > sysfs changes and move all the memory accounting code from VMWGFX back > into TTM. Hm yeah ... -Daniel > Christian. > > > > > Then I guess retry again for 5.14 or so. > > -Daniel > > > >> > >> /* > >> * Allocates a ttm structure for the given BO. > >> @@ -294,8 +304,6 @@ static void ttm_tt_add_mapping(struct ttm_device *= bdev, struct ttm_tt *ttm) > >> > >> for (i =3D 0; i < ttm->num_pages; ++i) > >> ttm->pages[i]->mapping =3D bdev->dev_mapping; > >> - > >> - atomic_long_add(ttm->num_pages, &swapable_pages); > >> } > >> > >> int ttm_tt_populate(struct ttm_device *bdev, > >> @@ -309,12 +317,25 @@ int ttm_tt_populate(struct ttm_device *bdev, > >> if (ttm_tt_is_populated(ttm)) > >> return 0; > >> > >> + atomic_long_add(ttm->num_pages, &ttm_pages_allocated); > >> + if (bdev->pool.use_dma32) > >> + atomic_long_add(ttm->num_pages, &ttm_dma32_pages_allocate= d); > >> + > >> + while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit |= | > >> + atomic_long_read(&ttm_dma32_pages_allocated) > > >> + ttm_dma32_pages_limit) { > >> + > >> + ret =3D ttm_bo_swapout(ctx, GFP_KERNEL); > >> + if (ret) > >> + goto error; > >> + } > >> + > >> if (bdev->funcs->ttm_tt_populate) > >> ret =3D bdev->funcs->ttm_tt_populate(bdev, ttm, ctx); > >> else > >> ret =3D ttm_pool_alloc(&bdev->pool, ttm, ctx); > >> if (ret) > >> - return ret; > >> + goto error; > >> > >> ttm_tt_add_mapping(bdev, ttm); > >> ttm->page_flags |=3D TTM_PAGE_FLAG_PRIV_POPULATED; > >> @@ -327,6 +348,12 @@ int ttm_tt_populate(struct ttm_device *bdev, > >> } > >> > >> return 0; > >> + > >> +error: > >> + atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); > >> + if (bdev->pool.use_dma32) > >> + atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocate= d); > >> + return ret; > >> } > >> EXPORT_SYMBOL(ttm_tt_populate); > >> > >> @@ -342,12 +369,9 @@ static void ttm_tt_clear_mapping(struct ttm_tt *t= tm) > >> (*page)->mapping =3D NULL; > >> (*page++)->index =3D 0; > >> } > >> - > >> - atomic_long_sub(ttm->num_pages, &swapable_pages); > >> } > >> > >> -void ttm_tt_unpopulate(struct ttm_device *bdev, > >> - struct ttm_tt *ttm) > >> +void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) > >> { > >> if (!ttm_tt_is_populated(ttm)) > >> return; > >> @@ -357,76 +381,24 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, > >> bdev->funcs->ttm_tt_unpopulate(bdev, ttm); > >> else > >> ttm_pool_free(&bdev->pool, ttm); > >> - ttm->page_flags &=3D ~TTM_PAGE_FLAG_PRIV_POPULATED; > >> -} > >> - > >> -/* As long as pages are available make sure to release at least one *= / > >> -static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink, > >> - struct shrink_control *sc) > >> -{ > >> - struct ttm_operation_ctx ctx =3D { > >> - .no_wait_gpu =3D false > >> - }; > >> - int ret; > >> - > >> - ret =3D ttm_bo_swapout(&ctx, GFP_NOFS); > >> - return ret < 0 ? SHRINK_EMPTY : ret; > >> -} > >> - > >> -/* Return the number of pages available or SHRINK_EMPTY if we have no= ne */ > >> -static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink, > >> - struct shrink_control *sc) > >> -{ > >> - unsigned long num_pages; > >> - > >> - num_pages =3D atomic_long_read(&swapable_pages); > >> - return num_pages ? num_pages : SHRINK_EMPTY; > >> -} > >> > >> -#ifdef CONFIG_DEBUG_FS > >> + atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); > >> + if (bdev->pool.use_dma32) > >> + atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocate= d); > >> > >> -/* Test the shrinker functions and dump the result */ > >> -static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data) > >> -{ > >> - struct shrink_control sc =3D { .gfp_mask =3D GFP_KERNEL }; > >> - > >> - fs_reclaim_acquire(GFP_KERNEL); > >> - seq_printf(m, "%lu/%lu\n", ttm_tt_shrinker_count(&mm_shrinker, &s= c), > >> - ttm_tt_shrinker_scan(&mm_shrinker, &sc)); > >> - fs_reclaim_release(GFP_KERNEL); > >> - > >> - return 0; > >> + ttm->page_flags &=3D ~TTM_PAGE_FLAG_PRIV_POPULATED; > >> } > >> -DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink); > >> - > >> -#endif > >> - > >> - > >> > >> /** > >> * ttm_tt_mgr_init - register with the MM shrinker > >> * > >> * Register with the MM shrinker for swapping out BOs. > >> */ > >> -int ttm_tt_mgr_init(void) > >> +void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32= _pages) > >> { > >> -#ifdef CONFIG_DEBUG_FS > >> - debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL, > >> - &ttm_tt_debugfs_shrink_fops); > >> -#endif > >> - > >> - mm_shrinker.count_objects =3D ttm_tt_shrinker_count; > >> - mm_shrinker.scan_objects =3D ttm_tt_shrinker_scan; > >> - mm_shrinker.seeks =3D 1; > >> - return register_shrinker(&mm_shrinker); > >> -} > >> + if (!ttm_pages_limit) > >> + ttm_pages_limit =3D num_pages; > >> > >> -/** > >> - * ttm_tt_mgr_fini - unregister our MM shrinker > >> - * > >> - * Unregisters the MM shrinker. > >> - */ > >> -void ttm_tt_mgr_fini(void) > >> -{ > >> - unregister_shrinker(&mm_shrinker); > >> + if (!ttm_dma32_pages_limit) > >> + ttm_dma32_pages_limit =3D num_dma32_pages; > >> } > >> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h > >> index 069f8130241a..134d09ef7766 100644 > >> --- a/include/drm/ttm/ttm_tt.h > >> +++ b/include/drm/ttm/ttm_tt.h > >> @@ -157,8 +157,7 @@ int ttm_tt_populate(struct ttm_device *bdev, struc= t ttm_tt *ttm, struct ttm_oper > >> */ > >> void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm); > >> > >> -int ttm_tt_mgr_init(void); > >> -void ttm_tt_mgr_fini(void); > >> +void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32= _pages); > >> > >> #if IS_ENABLED(CONFIG_AGP) > >> #include > >> -- > >> 2.25.1 > >> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel --=20 Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch