From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F2F7C33CB6 for ; Wed, 22 Jan 2020 17:39:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E0EC821569 for ; Wed, 22 Jan 2020 17:39:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="TeVaIzJo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E0EC821569 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 64D5B6B0007; Wed, 22 Jan 2020 12:39:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DE5A6B000A; Wed, 22 Jan 2020 12:39:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49C926B0008; Wed, 22 Jan 2020 12:39:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id 2FAEF6B0005 for ; Wed, 22 Jan 2020 12:39:00 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id C52B2181AEF09 for ; Wed, 22 Jan 2020 17:38:59 +0000 (UTC) X-FDA: 76405980798.12.spark16_6b98ce560d10d X-HE-Tag: spark16_6b98ce560d10d X-Filterd-Recvd-Size: 7556 Received: from mail-io1-f68.google.com (mail-io1-f68.google.com [209.85.166.68]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Wed, 22 Jan 2020 17:38:59 +0000 (UTC) Received: by mail-io1-f68.google.com with SMTP id x1so87982iop.7 for ; Wed, 22 Jan 2020 09:38:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=lP2CiE/Dv9A81bCmQ9A7d4Ky3RGHEzsgYjmIMGQQjyM=; b=TeVaIzJo8wgHeIHA0VcuSTzyZJrvlOY1ugfTi2vHttyOSLa0W4FEpBMRUgY6ZUQ/Pm nsGLpC3J4g0bVQL/K4LxQ48eYvXWFWd3CSUTG524OcsHvs1r8HMoUluhzYRQlX+/KXYu JRDOrKm+/Dx18AWheGJ7IvmPlKpvkGJt4IQtC0WjI1vrUrycyY+3llLXWxlQEH0f1h6M Ct2YPkfFhiYowl8NOFvdmRMYo/d/mo01cj1O15Q7iIzFdv1B/I8q+zUGLqoHa+8dfbMK QB0gNSMykNrTFEbZqEYl0ka+c0yWTvk+iNrGvi/RgGMtlIduXD1ccE463y4LV3k4/m+K u2PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=lP2CiE/Dv9A81bCmQ9A7d4Ky3RGHEzsgYjmIMGQQjyM=; b=Aeimr6CBTIxgyz2aKJFyJkR1ilTaM7YfavgbDpTAifaeeRmsapwkz3oYO6mIhOQY8e LiDGtLQ0ycYH5LeUEjP9s9YmY5OZPg/7bkh/QzL5XgO7lx0TbRcXDNXAah6Qqb0kQ9ki qVb3jVEQK0ueg2TJSx2Lu35ODjRu6dlGBt+KFNTfs+gskRvfhVz/KlZQskmRuNZDaEKY f4spYcXMPYOc71ZuVBHYiBMoBVJfErvLVzwPMqi3M2d6cmyw3hPgCB6ZnkPck46qcwfM 5IrMmZzG7ijsvjx/kfvfCXgOsskgZtN98wT3Eyy7BVruIchhp6oDivzGhcgE15U7JHJr Yz8Q== X-Gm-Message-State: APjAAAWpOXAedFICwcOsrjlKzDL7iHE1QkPylSCw4cG7nLu9n6NHwI84 xwn2XVabA/Wh0WXX7EYsEoM6rQ== X-Google-Smtp-Source: APXvYqz/kL/ghy3fY4RSOXWZ+rEpBJLRNX6G/0RImBDoQRPIGsPAg169Td2FmB7qvx4CfBli/E2kiA== X-Received: by 2002:a6b:4407:: with SMTP id r7mr7076484ioa.160.1579714738247; Wed, 22 Jan 2020 09:38:58 -0800 (PST) Received: from [192.168.1.159] ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id m24sm10982139ioc.37.2020.01.22.09.38.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Jan 2020 09:38:57 -0800 (PST) Subject: Re: [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme To: Jerome Glisse Cc: Michal Hocko , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Benjamin LaHaise References: <20200122023100.75226-1-jglisse@redhat.com> <20200122045723.GC76712@redhat.com> <20200122115926.GW29276@dhcp22.suse.cz> <015647b0-360c-c9ac-ac20-405ae0ec4512@kernel.dk> <20200122165427.GA6009@redhat.com> <66027259-81c3-0bc4-a70b-74069e746058@kernel.dk> <20200122172842.GC6009@redhat.com> From: Jens Axboe Message-ID: <00864312-13cc-daac-36e8-5f3f5b6dbeb8@kernel.dk> Date: Wed, 22 Jan 2020 10:38:56 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20200122172842.GC6009@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/22/20 10:28 AM, Jerome Glisse wrote: > On Wed, Jan 22, 2020 at 10:04:44AM -0700, Jens Axboe wrote: >> On 1/22/20 9:54 AM, Jerome Glisse wrote: >>> On Wed, Jan 22, 2020 at 08:12:51AM -0700, Jens Axboe wrote: >>>> On 1/22/20 4:59 AM, Michal Hocko wrote: >>>>> On Tue 21-01-20 20:57:23, Jerome Glisse wrote: >>>>>> We can also discuss what kind of knobs we want to expose so that >>>>>> people can decide to choose the tradeof themself (ie from i want low >>>>>> latency io-uring and i don't care wether mm can not do its business; to >>>>>> i want mm to never be impeded in its business and i accept the extra >>>>>> latency burst i might face in io operations). >>>>> >>>>> I do not think it is a good idea to make this configurable. How can >>>>> people sensibly choose between the two without deep understanding of >>>>> internals? >>>> >>>> Fully agree, we can't just punt this to a knob and call it good, that's >>>> a typical fallacy of core changes. And there is only one mode for >>>> io_uring, and that's consistent low latency. If this change introduces >>>> weird reclaim, compaction or migration latencies, then that's a >>>> non-starter as far as I'm concerned. >>>> >>>> And what do those two settings even mean? I don't even know, and a user >>>> sure as hell doesn't either. >>>> >>>> io_uring pins two types of pages - registered buffers, these are used >>>> for actual IO, and the rings themselves. The rings are not used for IO, >>>> just used to communicate between the application and the kernel. >>> >>> So, do we still want to solve file back pages write back if page in >>> ubuffer are from a file ? >> >> That's not currently a concern for io_uring, as it disallows file backed >> pages for the IO buffers that are being registered. >> >>> Also we can introduce a flag when registering buffer that allows to >>> register buffer without pining and thus avoid the RLIMIT_MEMLOCK at >>> the cost of possible latency spike. Then user registering the buffer >>> knows what he gets. >> >> That may be fine for others users, but I don't think it'll apply >> to io_uring. I can't see anyone selecting that flag, unless you're >> doing something funky where you're registering a substantial amount >> of the system memory for IO buffers. And I don't think that's going >> to be a super valid use case... > > Given dataset are getting bigger and bigger i would assume that we > will have people who want to use io-uring with large buffer. > >> >>> Maybe it would be good to test, it might stay in the noise, then it >>> might be a good thing to do. Also they are strategy to avoid latency >>> spike for instance we can block/force skip mm invalidation if buffer >>> has pending/running io in the ring ie only have buffer invalidation >>> happens when there is no pending/running submission entry. >> >> Would that really work? The buffer could very well be idle right when >> you check, but wanting to do IO the instant you decide you can do >> background work on it. Additionally, that would require accounting >> on when the buffers are inflight, which is exactly the kind of >> overhead we're trying to avoid to begin with. >> >>> We can also pick what kind of invalidation we allow (compaction, >>> migration, ...) and thus limit the scope and likelyhood of >>> invalidation. >> >> I think it'd be useful to try and understand the use case first. >> If we're pinning a small percentage of the system memory, do we >> really care at all? Isn't it completely fine to just ignore? > > My main motivation is migration in NUMA system, if the process that > did register buffer get migrated to a different node then it might > actualy end up with bad performance because its io buffer are still > on hold node. I am not sure we want to tell application developer to > constantly monitor which node they are on and to re-register buffer > after process migration to allow for memory migration. If the process truly cares, would it not have pinned itself to that node? -- Jens Axboe