From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=trbL=IX=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	MSGID_FROM_MTA_HEADER,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9792BC433DB
	for <linux-mm@archiver.kernel.org>; Thu, 25 Mar 2021 08:27:44 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id E828661A01
	for <linux-mm@archiver.kernel.org>; Thu, 25 Mar 2021 08:27:43 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E828661A01
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=amd.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 4DF246B0036; Thu, 25 Mar 2021 04:27:43 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4B6296B006C; Thu, 25 Mar 2021 04:27:43 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3090B6B006E; Thu, 25 Mar 2021 04:27:43 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205])
	by kanga.kvack.org (Postfix) with ESMTP id 102056B0036
	for <linux-mm@kvack.org>; Thu, 25 Mar 2021 04:27:43 -0400 (EDT)
Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id BD8DB4403
	for <linux-mm@kvack.org>; Thu, 25 Mar 2021 08:27:42 +0000 (UTC)
X-FDA: 77957717964.16.D3A5287
Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam08on2088.outbound.protection.outlook.com [40.107.100.88])
	by imf03.hostedemail.com (Postfix) with ESMTP id E7FEDC0001FE
	for <linux-mm@kvack.org>; Thu, 25 Mar 2021 08:27:40 +0000 (UTC)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=Durq3iq/RRV7rOS4Q01LXdrsnoWsH2E4toAKnhWQi+xQikjmpMQTzZ9kZumu4QcoI9shDkc6anTyuUzZba2XssIdxrmjHObWqsW5yuOkqYoSx0Yysdu1w+cfRaa4QLK0Y5yTvCtzxIXi0WMesvyGHK8GPP0SWxleHGNGgTsbARHovCXOcH67l+mzfMw9PBwq4xmzIwBBRgjEGDfBQzqJr5kJTYqfU/XMl6W5uZNR3OddR5fXj5CBlI9g9xc0CpE1Wp8vXPMnWyBQq+uZrvXvwK6tk1071Ae1s67rqH/SlQAf0WkCavs2u/vYGJl/WUiU+J4GZKTWo1QNLVdFXGXMEQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=Qg3g7k5O8kKW3yVApdgohccey7Ue/UJDEPWNoaGtb2o=;
 b=GaSS9pBWSOvRvqokR0qqHa3L2dHwJnPuyelqaFV7vgo2R2hG00lRPs0yvFWfEdSJz8YQY7z/5Z2Iu57jjQQTCl7AvRmJUsfcVaYtCArvRIh9qMEGEMHZTU+joUJ7mGuf4KCoDV0UQufxrEgoX05sJpftRs7CRgR0wfwVDMez9vwsMciAa+uaYtOlKKWA9GgDUibWMHpMXAZMsPNSVggDUbIl5CSA2JPJVwuURODTd4SMY55cYlNH3pkkpHhWgG08sENtJiJ3fQ5acaXoHZthkxlv0HxoFPTtzhEhDP8kscHUzcosmLIHLxawWOXKCjPoGOfZ7s+aPbWr6iCx4qYJ9g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass
 header.d=amd.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=Qg3g7k5O8kKW3yVApdgohccey7Ue/UJDEPWNoaGtb2o=;
 b=fLhKunWtavMBlwXPANZiiQlCgTSGD54zGfSZXpEY6RqK7LkA4Tii/KLY5+9dm8cpWyDSr2a4jpM0enyZXU6RJK0Tr/oP1jk6tkqyzeK++CRopPeNxjGZnH93pTlF9sywr38uiCuQHMsBj7nh0cVzhSWX1/DqcCy/kN6ISaDYuuE=
Authentication-Results: linux-foundation.org; dkim=none (message not signed)
 header.d=none;linux-foundation.org; dmarc=none action=none
 header.from=amd.com;
Received: from MN2PR12MB3775.namprd12.prod.outlook.com (2603:10b6:208:159::19)
 by MN2PR12MB4157.namprd12.prod.outlook.com (2603:10b6:208:1db::13) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.25; Thu, 25 Mar
 2021 08:27:39 +0000
Received: from MN2PR12MB3775.namprd12.prod.outlook.com
 ([fe80::c1ff:dcf1:9536:a1f2]) by MN2PR12MB3775.namprd12.prod.outlook.com
 ([fe80::c1ff:dcf1:9536:a1f2%2]) with mapi id 15.20.3955.027; Thu, 25 Mar 2021
 08:27:39 +0000
Subject: Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages
To: =?UTF-8?Q?Thomas_Hellstr=c3=b6m_=28Intel=29?= <thomas_os@shipmail.org>,
 Jason Gunthorpe <jgg@nvidia.com>
Cc: David Airlie <airlied@linux.ie>, linux-kernel@vger.kernel.org,
 dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
 Andrew Morton <akpm@linux-foundation.org>
References: <YFsM23t2niJwhpM/@phenom.ffwll.local>
 <20210324122430.GW2356281@nvidia.com>
 <e12e2c49-afaf-dbac-b18c-272c93c83e06@shipmail.org>
 <20210324124127.GY2356281@nvidia.com>
 <6c9acb90-8e91-d8af-7abd-e762d9a901aa@shipmail.org>
 <20210324134833.GE2356281@nvidia.com>
 <0b984f96-00fb-5410-bb16-02e12b2cc024@shipmail.org>
 <20210324163812.GJ2356281@nvidia.com>
 <08f19e80-d6cb-8858-0c5d-67d2e2723f72@amd.com>
 <730eb2ff-ba98-2393-6d42-61735e3c6b83@shipmail.org>
 <20210324231419.GR2356281@nvidia.com>
 <607ecbeb-e8a5-66e9-6fe2-9a8d22f12bc2@shipmail.org>
From: =?UTF-8?Q?Christian_K=c3=b6nig?= <christian.koenig@amd.com>
Message-ID: <fb74efd9-55be-9a8d-95b0-6103e263aab8@amd.com>
Date: Thu, 25 Mar 2021 09:27:34 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.7.1
In-Reply-To: <607ecbeb-e8a5-66e9-6fe2-9a8d22f12bc2@shipmail.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
X-Originating-IP: [2a02:908:1252:fb60:a792:596e:3412:8626]
X-ClientProxiedBy: AM0PR02CA0147.eurprd02.prod.outlook.com
 (2603:10a6:20b:28d::14) To MN2PR12MB3775.namprd12.prod.outlook.com
 (2603:10b6:208:159::19)
MIME-Version: 1.0
X-MS-Exchange-MessageSentRepresentingType: 1
Received: from [IPv6:2a02:908:1252:fb60:a792:596e:3412:8626] (2a02:908:1252:fb60:a792:596e:3412:8626) by AM0PR02CA0147.eurprd02.prod.outlook.com (2603:10a6:20b:28d::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3977.25 via Frontend Transport; Thu, 25 Mar 2021 08:27:38 +0000
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-HT: Tenant
X-MS-Office365-Filtering-Correlation-Id: 3566ec28-4740-492e-95db-08d8ef67d9ff
X-MS-TrafficTypeDiagnostic: MN2PR12MB4157:
X-Microsoft-Antispam-PRVS:
	<MN2PR12MB4157AE2BA16579094ABBB21B83629@MN2PR12MB4157.namprd12.prod.outlook.com>
X-MS-Oob-TLC-OOBClassifiers: OLM:8273;
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info:
	wXdp3kGXUgKrpGmV2TTwmTyLm4LURZup+CmdJWLhdKYCrwk8eQhqADA3+dKlUmDbSKJQDcKo92LOJ3i/pnSB4MyJv/qh0tcZx9+eL6/6KoQy/+8Hh0xgEhXZFmSPFvXP2CpYOZaYY/Rl8U7OpGoYw76FYmIEtxuQtrQKK7+tqIWzF/3wd6OBxGtglBv0pBIPz67j7CV+jvB6RtIIxNWONTayoTTKUekh+4K9vZRW079JHDZBGLS8UTfRwlP6fUJ0+t70pmRLtiX8v8D8EW8nm1vLaI1PuYKABDSMzqkbjztMmluVIlxa0vkXgeAlbM/cXKO52sBIaKV8Jag9Cbigs/TIYTaabGSVnUs1fuVMxTZS6zjGZzf9c6uWVQR/9AWlFVy/ZKbm5hIWaypmxBSysb5kAum1GhMITcHvX1T1q7nSyeOhfIlR5mC9Sxm4ehyEq0kp20/m9Mmt2po+upZaXoHEPyRitukuitTpoNxE7lvzJ5Wvbyw2KG5p9mUnpxY1fVIvD8mFVi+O8YjaHqHuDA4rjPggYZxL44B4CBc8XMZGUsRWh+sveeQi20SLPGBQqQz1pM/zvh3kMcZ7hMa+OfWfhsIfGB9lLrZh9ld3+PAclKbzwzrQSnJBPq4QhCW8LvNOi1Pus+o83A8YYM+CpaZY39sgC+N70PKpTk4e9zREqWXlUkwftj3VIgdhPCHCNSszbTWMANBwOa+KjxKuzkS6hJpSJ1czNwwkSq6yU9ZbG9xPBaPJAnLUj72/xcuXns2Mu6ASDEZGal0PmrrQGg==
X-Forefront-Antispam-Report:
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MN2PR12MB3775.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(366004)(39860400002)(136003)(346002)(396003)(376002)(31696002)(53546011)(38100700001)(52116002)(6486002)(86362001)(316002)(66574015)(110136005)(36756003)(54906003)(8676002)(478600001)(83380400001)(31686004)(8936002)(2616005)(186003)(66476007)(6666004)(16526019)(66946007)(5660300002)(66556008)(2906002)(4326008)(21314003)(14583001)(45980500001)(43740500002);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData:
	=?utf-8?B?QmxaMWdnZkYrMzRicWcxY1doSzJ4djlJZ0lrVEliak0zZkhLVmozWnF4NjJj?=
 =?utf-8?B?MFdoMUxzS3pTUUpPdVNub2QyNFNkRmxoMnV0NWdvY1RqdVBKSjRlZnhZSHgw?=
 =?utf-8?B?aDRJRER4SmU2b0VZSTJnWmtWeG1WT28rRDJBdS9XNHIrdjdGRS9HSGdqM3NK?=
 =?utf-8?B?blhRclIxalBmeFhVcXUrUnRNYXU0T1cwMFVub0daWXl2V0RtQVpxOHU0OExK?=
 =?utf-8?B?emdrV3FGcGpEQm5NUkNTbGR4SHQzU3gzUVF0c0cyNndzc3hBMjlHYW9GbGRL?=
 =?utf-8?B?a0J1OUk5bjJPMWRBL2NxNllHbGY3ZjZseTZHRUd0eHpuckcxTXB2MTh4bEsy?=
 =?utf-8?B?WGlNb0JVRmgyblN0dWJuTzhvZDRLNHZMRlpPVEw2SVZvOHVjcXA4YVU2dDBl?=
 =?utf-8?B?VlBFNCs1eEN3YUpja3gzYlhsTE84Y1NENTk4NnNpTm9HK1dpa2VXSnBSdTNE?=
 =?utf-8?B?V01qUDV5eGp1RWVickRBNFJpVm9PbkRrR3RmRlRtcEU2K1BuaTk4ejJGdG1D?=
 =?utf-8?B?VkY3VmN0TTZ4dFFraGtkZ3ZHbGdkdnR2akx1NnhUMStxaTV6WFk1ZW9nYnM1?=
 =?utf-8?B?L3JtcjIzK3VOMDc4aThDMk9DSmoyQ3Awb2hvVXJlaWFBRTl6NXJOS3htYmF3?=
 =?utf-8?B?THhrRDVBWW1OVXR2LysxUEtRUTY5WHluSGZVcStiM3o1MWxCbDFzdDFzaFBu?=
 =?utf-8?B?blVsdFliZmt1bVJYT2dleHh1MGR1MjlqKzFod2tyVWtFUEtkNWw1a0FtTFNS?=
 =?utf-8?B?bml3Mm04Wmd4SFp0R0V4eVVabU9hM2pXNUhSQW5FZFZFM2F2b25jZWd1ZTVJ?=
 =?utf-8?B?cHNseXl0SXc2Z0wrWmQrUVJjN2pHMDFJZld4aXBKeHZWMFhWNXdudFBOOW5o?=
 =?utf-8?B?OEVHMlY4elR4NUNKOHROMThBVTA3YjFQWXJXUlRuSE1NWGxYS2E3cGo0c05H?=
 =?utf-8?B?c0lhWEZQUlErTkhhZzdySWhWbWVaWjVlOU5WeU1lUUNkbkFmUG90c2JjMUtR?=
 =?utf-8?B?eXlXcGZCT2g2TmRaZlJCbFlwQlZwc0JhVmhoRThGSTE1YlUwNW1XVDRmOHVF?=
 =?utf-8?B?RG81eXBKcWlEeFdFUndrTjlLMXZQNVM2Vm5zOGEyMktCVnpIdVI0d1VJRUsw?=
 =?utf-8?B?L3dNeEZpcWdrTVk4V0NiSDIvNXlSVW9neHBzK2NyKzFzNUdlMlhtWnVmRitI?=
 =?utf-8?B?VldNcWI1MjZRVTExeWVMQ3djeUkzc0R4SDNpQW5tRURXTlNmRTA1NVd2YTZR?=
 =?utf-8?B?cFJnNkpLaDcydUowMVo5NjZCNkVmOGNHTUpOTmtacWtHS3hLc1gxUXVZTkM0?=
 =?utf-8?B?b3JNeFM4QjkxZzZnNHM1RUxqU002UklmZHk1dFBibFkyTmR2ZjhmZUc2cExM?=
 =?utf-8?B?WnBJUXVpeXhiaE1VQWVETDUzZ3ZYVGhZTGVkVWJMVE9xMEJuTXRuRytUOHFj?=
 =?utf-8?B?RUJ0NXJRbmRsWDJKN3c0a1lyYnBsMU5tYWF1SmNWS3NDU24vTlRBT3V1QTJB?=
 =?utf-8?B?NzVYajJHbnRzUG1YSWJ4N24zMkJSMVpiRTFlcGVJY1FjVXgweDdmam44aVBC?=
 =?utf-8?B?Mzg1YTYyZTN5N0U2LzZtUUI2WEgvTXFmR1pPUXZzWDVUNlBsbE5XaWZ4Q1Vj?=
 =?utf-8?B?QUplV1RFYmtod3RPdUJQTUo5RVkzdXZlRFlFcXNwNVNsUjJyL3Qxd1pmWWw1?=
 =?utf-8?B?UkZNK1dUYldJbWRVTWcyY3Z6Q29aeUFVdGRqa2dUZUk3bkFocWJLcnFMQ2c3?=
 =?utf-8?B?R3RVT292YVQvVktaWVFjOVE2aFE5R1dJdStnZUp6cUh3RGora1RQMGM0RzFJ?=
 =?utf-8?B?TUxIM01YcDgySjlSOEppbEhvRXcvZFBTdzR2cXFjZVVEUzlxVThaQjBhbFJP?=
 =?utf-8?Q?CePBA4GYhINoM?=
X-OriginatorOrg: amd.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 3566ec28-4740-492e-95db-08d8ef67d9ff
X-MS-Exchange-CrossTenant-AuthSource: MN2PR12MB3775.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Mar 2021 08:27:39.2999
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: apRbpqep3YrV7z7cBHh9VCS06bBYsEk2RxGWBTcRLcuG0KaN1tUkfi4qXAu1+SbD
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4157
X-Stat-Signature: 14koi7qo49acc9eqx8msx9rruw3ix3d8
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: E7FEDC0001FE
Received-SPF: none (amd.com>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from="<Christian.Koenig@amd.com>"; helo=NAM04-BN8-obe.outbound.protection.outlook.com; client-ip=40.107.100.88
X-HE-DKIM-Result: pass/pass
X-HE-Tag: 1616660860-190742
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Am 25.03.21 um 08:48 schrieb Thomas Hellstr=C3=B6m (Intel):
>
> On 3/25/21 12:14 AM, Jason Gunthorpe wrote:
>> On Wed, Mar 24, 2021 at 09:07:53PM +0100, Thomas Hellstr=C3=B6m (Intel=
)=20
>> wrote:
>>> On 3/24/21 7:31 PM, Christian K=C3=B6nig wrote:
>>>>
>>>> Am 24.03.21 um 17:38 schrieb Jason Gunthorpe:
>>>>> On Wed, Mar 24, 2021 at 04:50:14PM +0100, Thomas Hellstr=C3=B6m (In=
tel)
>>>>> wrote:
>>>>>> On 3/24/21 2:48 PM, Jason Gunthorpe wrote:
>>>>>>> On Wed, Mar 24, 2021 at 02:35:38PM +0100, Thomas Hellstr=C3=B6m
>>>>>>> (Intel) wrote:
>>>>>>>
>>>>>>>>> In an ideal world the creation/destruction of page
>>>>>>>>> table levels would
>>>>>>>>> by dynamic at this point, like THP.
>>>>>>>> Hmm, but I'm not sure what problem we're trying to solve
>>>>>>>> by changing the
>>>>>>>> interface in this way?
>>>>>>> We are trying to make a sensible driver API to deal with huge=20
>>>>>>> pages.
>>>>>>>> Currently if the core vm requests a huge pud, we give it
>>>>>>>> one, and if we
>>>>>>>> can't or don't want to (because of dirty-tracking, for
>>>>>>>> example, which is
>>>>>>>> always done on 4K page-level) we just return
>>>>>>>> VM_FAULT_FALLBACK, and the
>>>>>>>> fault is retried at a lower level.
>>>>>>> Well, my thought would be to move the pte related stuff into
>>>>>>> vmf_insert_range instead of recursing back via VM_FAULT_FALLBACK.
>>>>>>>
>>>>>>> I don't know if the locking works out, but it feels cleaner that=20
>>>>>>> the
>>>>>>> driver tells the vmf how big a page it can stuff in, not the vm
>>>>>>> telling the driver to stuff in a certain size page which it=20
>>>>>>> might not
>>>>>>> want to do.
>>>>>>>
>>>>>>> Some devices want to work on a in-between page size like 64k so=20
>>>>>>> they
>>>>>>> can't form 2M pages but they can stuff 64k of 4K pages in a=20
>>>>>>> batch on
>>>>>>> every fault.
>>>>>> Hmm, yes, but we would in that case be limited anyway to insert=20
>>>>>> ranges
>>>>>> smaller than and equal to the fault size to avoid extensive and
>>>>>> possibly
>>>>>> unnecessary checks for contigous memory.
>>>>> Why? The insert function is walking the page tables, it just update=
s
>>>>> things as they are. It learns the arragement for free while doing t=
he
>>>>> walk.
>>>>>
>>>>> The device has to always provide consistent data, if it overlaps in=
to
>>>>> pages that are already populated that is fine so long as it isn't
>>>>> changing their addresses.
>>>>>
>>>>>> And then if we can't support the full fault size, we'd need to
>>>>>> either presume a size and alignment of the next level or search fo=
r
>>>>>> contigous memory in both directions around the fault address,
>>>>>> perhaps unnecessarily as well.
>>>>> You don't really need to care about levels, the device should be
>>>>> faulting in the largest memory regions it can within its efficiency=
.
>>>>>
>>>>> If it works on 4M pages then it should be faulting 4M pages. The pa=
ge
>>>>> size of the underlying CPU doesn't really matter much other than so=
me
>>>>> tuning to impact how the device's allocator works.
>>> Yes, but then we'd be adding a lot of complexity into this function=20
>>> that is
>>> already provided by the current interface for DAX, for little or no=20
>>> gain, at
>>> least in the drm/ttm setting. Please think of the following=20
>>> situation: You
>>> get a fault, you do an extensive time-consuming scan of your VRAM=20
>>> buffer
>>> object into which the fault goes and determine you can fault 1GB.=20
>>> Now you
>>> hand it to vmf_insert_range() and because the user-space address is
>>> misaligned, or already partly populated because of a previous=20
>>> eviction, you
>>> can only fault single pages, and you end up faulting a full GB of=20
>>> single
>>> pages perhaps for a one-time small update.
>> Why would "you can only fault single pages" ever be true? If you have
>> 1GB of pages then the vmf_insert_range should allocate enough page
>> table entries to consume it, regardless of alignment.
>
> Ah yes, What I meant was you can only insert PTE size entries, either=20
> because of misalignment or because the page-table is alredy=20
> pre-populated with pmd size page directories, which you can't remove=20
> with only the read side of the mmap lock held.

Please explain that further. Why do we need the mmap lock to insert PMDs=20
but not when insert PTEs?

>> And why shouldn't DAX switch to this kind of interface anyhow? It is
>> basically exactly the same problem. The underlying filesystem block
>> size is *not* necessarily aligned to the CPU page table sizes and DAX
>> would benefit from better handling of this mismatch.
>
> First, I think we must sort out what "better handling" means. This is=20
> my takeout of the discussion so far:
>
> Claimed Pros: of vmf_insert_range()
> * We get an interface that doesn't require knowledge of CPU page table=20
> entry level sizes.
> * We get the best efficiency when we look at what the GPU driver=20
> provides. (I disagree on this one).
>
> Claimed Cons:
> * A new implementation that may get complicated particularly if it=20
> involves modifying all of the DAX code
> * The driver would have to know about those sizes anyway to get=20
> alignment right (Applies to DRM, because we mmap buffer objects, not=20
> physical address ranges. But not to DAX AFAICT),

I don't think so. We could just align all buffers to their next power of=20
two in size. Since we have plenty of offset space that shouldn't matter=20
much.

Apart from that I still don't fully get why we need this in the first pla=
ce.

> * We loose efficiency, because we are prepared to spend an extra=20
> effort for alignment- and continuity checks when we know we can insert=20
> a huge page table entry, but not if we know we can't

I don't think so either. See with don't need any extra effort for the=20
alignment nor the handling, it actually becomes much cheaper as far as I=20
can see.

In other words when you have a fault you don't care about the faulting=20
address that much, you only use it to determine the memory segment to map=
.

Then this whole memory segment is mapped into the address space of the=20
user application.

If can of course happen that we need to fiddle with addresses and sizes=20
because userspace only mmap a fraction of the underlying buffer, but in=20
reality we never do this.

> * We loose efficiency because we might unnecessarily prefault a number=20
> of PTE size page-table entries (really a special case of the above one)=
.

I really don't see that either. When a buffer is accessed by the CPU it=20
is in > 90% of all cases completely accessed. Not faulting in full=20
ranges is just optimizing for a really unlikely case here.

>
> Now in the context of quickly fixing a critical bug, the choice IMHO=20
> becomes easy.

Well for quick fixing this I would rather disable huge pages for now.

Regards,
Christian.

>
>>
>>> On top of this, unless we want to do the walk trying increasingly=20
>>> smaller
>>> sizes of vmf_insert_xxx(), we'd have to use apply_to_page_range()=20
>>> and teach
>>> it about transhuge page table entries, because pagewalk.c can't be=20
>>> used (It
>>> can't populate page tables). That also means apply_to_page_range()=20
>>> needs to
>>> be complicated with page table locks since transhuge pages aren't=20
>>> stable and
>>> can be zapped and refaulted under us while we do the walk.
>> I didn't say it would be simple :) But we also need to stop hacking
>> around the sides of all this huge page stuff and come up with sensible
>> APIs that drivers can actually implement correctly. Exposing drivers
>> to specific kinds of page levels really feels like the wrong level of
>> abstraction.
>
> I generally agree. But for the last sentence I think the potential=20
> gain must be carefully weighed against the efficiency arguments.
>
>>
>> Once we start doing this we should do it everywhere, the io_remap_pfn
>> stuff should be able to create huge special IO pages as well, for
>> instance.
>
> I agree here as well. Here we can be more agressive as the contigous=20
> range is already known and we IIRC hold the mmap lock in write mode.
>
>>> On top of this, the user-space address allocator needs to know how=20
>>> large gpu
>>> pages are aligned in buffer objects to have a reasonable chance of=20
>>> aligning
>>> with CPU huge page boundaries which is a requirement to be able to=20
>>> insert a
>>> huge CPU page table entry, so the driver would basically need the=20
>>> drm helper
>>> that can do this alignment anyway.
>> Don't you have this problem anyhow?
>
> Yes, but it sort of defeats the simplicity argument of the proposed=20
> interface change.
>
> /Thomas
>
>