From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6691C33CB6 for ; Wed, 22 Jan 2020 17:49:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A3AD824125 for ; Wed, 22 Jan 2020 17:49:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="GlcKuzS9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3AD824125 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 50CE36B0273; Wed, 22 Jan 2020 12:49:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E5086B0274; Wed, 22 Jan 2020 12:49:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D4816B0275; Wed, 22 Jan 2020 12:49:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0153.hostedemail.com [216.40.44.153]) by kanga.kvack.org (Postfix) with ESMTP id 1FA846B0274 for ; Wed, 22 Jan 2020 12:49:15 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id C2AF04417 for ; Wed, 22 Jan 2020 17:49:14 +0000 (UTC) X-FDA: 76406006628.13.burn29_338e6e4b57b13 X-HE-Tag: burn29_338e6e4b57b13 X-Filterd-Recvd-Size: 8775 Received: from mail-il1-f196.google.com (mail-il1-f196.google.com [209.85.166.196]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Wed, 22 Jan 2020 17:49:14 +0000 (UTC) Received: by mail-il1-f196.google.com with SMTP id b15so91176iln.3 for ; Wed, 22 Jan 2020 09:49:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Ax0jdzS3yv65T0/5cDvrLsbUmZQIJbN7ZtKGhGW0j6A=; b=GlcKuzS9fgzaII5dhRNL+C0pt5mdA3f0fPdlP+J0ieR1YB2uRzrMmyk9sGqpmwsnBI jZbtO+wtpAMXsOdqpzSbjsyV/EbemzSqPHNMXtQhfHHRnJUV4iXNv5545Jm8PNRKj8OL 5CqbBpCivWgOdetqZOhRG81sj5zIVxFHJBntTNIJrPzeWgpbwHYM0bP0eet/+wyhTHRL jHWPnKwmlf62fqbfuNULnXhC8u16njiSV8+bxFvUXuxfijTEFclsUslKbqTxV4jqoV+G 3iJW+I3oCkT35DZNj7/g+84JPBay3TAROO8Hmu8jE6kHfw1Foj25LFD47xLfy0hOsIvw csXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Ax0jdzS3yv65T0/5cDvrLsbUmZQIJbN7ZtKGhGW0j6A=; b=FJWyd6tunbiaTa4rRGVtc2Z/SLVYgotLRNo8rLMfEKvjuM/P0lhc94ccxtMzzmceDA 47o4NhRiYDzRmdNAB0Od16XxiKq0wxMQyaVFfm3I04fE7oFn0Wlh0E99hgYdhaDCtcH0 zlPDaznbcG4AXujlTp70gKm6khTA7Ox3TRhUNwk0b4q5HN+YMBUkzDbj+71LLasXY50l JKvAxbF8plNN1FzbiLDNXI2corHcfOwaIUDj2v/XZm7C2jkATNSiy6cP092tAQh1A9ZF bspiecUi7h1wpf2UIxZTO4rDyWTaUXAnZIrlNFdasfBtJPpLfskLdDODOdXwF+ezw1FC YYpg== X-Gm-Message-State: APjAAAXCIYkBpyfmN++NCRlB7XzvZeiWfUKe9YJBxq0T8KzCAIomFnud uAIbUq380siEDCpPpZPF5wbHVg== X-Google-Smtp-Source: APXvYqw9U4ggrZoHMOpQMj19wQXi/wH22fXuerlFlxpyhATQGhw0kJo/6SQDBWdmanESvrHTITjMkA== X-Received: by 2002:a92:d151:: with SMTP id t17mr9221313ilg.175.1579715353190; Wed, 22 Jan 2020 09:49:13 -0800 (PST) Received: from [192.168.1.159] ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id x10sm11006698ioh.11.2020.01.22.09.49.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Jan 2020 09:49:12 -0800 (PST) Subject: Re: [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme To: Jerome Glisse Cc: Michal Hocko , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Benjamin LaHaise References: <20200122023100.75226-1-jglisse@redhat.com> <20200122045723.GC76712@redhat.com> <20200122115926.GW29276@dhcp22.suse.cz> <015647b0-360c-c9ac-ac20-405ae0ec4512@kernel.dk> <20200122165427.GA6009@redhat.com> <66027259-81c3-0bc4-a70b-74069e746058@kernel.dk> <20200122172842.GC6009@redhat.com> <00864312-13cc-daac-36e8-5f3f5b6dbeb8@kernel.dk> <20200122174059.GA7033@redhat.com> From: Jens Axboe Message-ID: <0976dc63-dcb8-815b-7b2a-a0a5313f71ef@kernel.dk> Date: Wed, 22 Jan 2020 10:49:11 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20200122174059.GA7033@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/22/20 10:40 AM, Jerome Glisse wrote: > On Wed, Jan 22, 2020 at 10:38:56AM -0700, Jens Axboe wrote: >> On 1/22/20 10:28 AM, Jerome Glisse wrote: >>> On Wed, Jan 22, 2020 at 10:04:44AM -0700, Jens Axboe wrote: >>>> On 1/22/20 9:54 AM, Jerome Glisse wrote: >>>>> On Wed, Jan 22, 2020 at 08:12:51AM -0700, Jens Axboe wrote: >>>>>> On 1/22/20 4:59 AM, Michal Hocko wrote: >>>>>>> On Tue 21-01-20 20:57:23, Jerome Glisse wrote: >>>>>>>> We can also discuss what kind of knobs we want to expose so that >>>>>>>> people can decide to choose the tradeof themself (ie from i want low >>>>>>>> latency io-uring and i don't care wether mm can not do its business; to >>>>>>>> i want mm to never be impeded in its business and i accept the extra >>>>>>>> latency burst i might face in io operations). >>>>>>> >>>>>>> I do not think it is a good idea to make this configurable. How can >>>>>>> people sensibly choose between the two without deep understanding of >>>>>>> internals? >>>>>> >>>>>> Fully agree, we can't just punt this to a knob and call it good, that's >>>>>> a typical fallacy of core changes. And there is only one mode for >>>>>> io_uring, and that's consistent low latency. If this change introduces >>>>>> weird reclaim, compaction or migration latencies, then that's a >>>>>> non-starter as far as I'm concerned. >>>>>> >>>>>> And what do those two settings even mean? I don't even know, and a user >>>>>> sure as hell doesn't either. >>>>>> >>>>>> io_uring pins two types of pages - registered buffers, these are used >>>>>> for actual IO, and the rings themselves. The rings are not used for IO, >>>>>> just used to communicate between the application and the kernel. >>>>> >>>>> So, do we still want to solve file back pages write back if page in >>>>> ubuffer are from a file ? >>>> >>>> That's not currently a concern for io_uring, as it disallows file backed >>>> pages for the IO buffers that are being registered. >>>> >>>>> Also we can introduce a flag when registering buffer that allows to >>>>> register buffer without pining and thus avoid the RLIMIT_MEMLOCK at >>>>> the cost of possible latency spike. Then user registering the buffer >>>>> knows what he gets. >>>> >>>> That may be fine for others users, but I don't think it'll apply >>>> to io_uring. I can't see anyone selecting that flag, unless you're >>>> doing something funky where you're registering a substantial amount >>>> of the system memory for IO buffers. And I don't think that's going >>>> to be a super valid use case... >>> >>> Given dataset are getting bigger and bigger i would assume that we >>> will have people who want to use io-uring with large buffer. >>> >>>> >>>>> Maybe it would be good to test, it might stay in the noise, then it >>>>> might be a good thing to do. Also they are strategy to avoid latency >>>>> spike for instance we can block/force skip mm invalidation if buffer >>>>> has pending/running io in the ring ie only have buffer invalidation >>>>> happens when there is no pending/running submission entry. >>>> >>>> Would that really work? The buffer could very well be idle right when >>>> you check, but wanting to do IO the instant you decide you can do >>>> background work on it. Additionally, that would require accounting >>>> on when the buffers are inflight, which is exactly the kind of >>>> overhead we're trying to avoid to begin with. >>>> >>>>> We can also pick what kind of invalidation we allow (compaction, >>>>> migration, ...) and thus limit the scope and likelyhood of >>>>> invalidation. >>>> >>>> I think it'd be useful to try and understand the use case first. >>>> If we're pinning a small percentage of the system memory, do we >>>> really care at all? Isn't it completely fine to just ignore? >>> >>> My main motivation is migration in NUMA system, if the process that >>> did register buffer get migrated to a different node then it might >>> actualy end up with bad performance because its io buffer are still >>> on hold node. I am not sure we want to tell application developer to >>> constantly monitor which node they are on and to re-register buffer >>> after process migration to allow for memory migration. >> >> If the process truly cares, would it not have pinned itself to that >> node? > > Not necesarily, programmer can not thing of everything and also process Node placement is generally the _first_ think you think of, though. It's not like it's some esoteric thing that application developers don't know anything about. Particularly if you're doing intensive IO, which you probably are if you register buffers for use with io_uring. That ties to a hardware device of some sort, or multiple ones. You would have placed you memory local to that device as well. > pinning defeat load balancing. Moreover we now have to thing about deep > memory topology ie by the time you register the buffer the page backing > it might be from slower memory and then all your io and CPU access will > be stuck on using that. To me, this sounds like some sort of event the application will want to know about. And take appropriate measures. -- Jens Axboe