From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70064C352A4 for ; Thu, 13 Feb 2020 00:07:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2E63E206DB for ; Thu, 13 Feb 2020 00:07:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jmvVOhaX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2E63E206DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BC6996B04E0; Wed, 12 Feb 2020 19:07:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B77446B04E1; Wed, 12 Feb 2020 19:07:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A65E16B04E2; Wed, 12 Feb 2020 19:07:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 8CCD26B04E0 for ; Wed, 12 Feb 2020 19:07:57 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 47ABB2A87 for ; Thu, 13 Feb 2020 00:07:57 +0000 (UTC) X-FDA: 76483165794.14.mine35_4563ac9762460 X-HE-Tag: mine35_4563ac9762460 X-Filterd-Recvd-Size: 9305 Received: from mail-qt1-f195.google.com (mail-qt1-f195.google.com [209.85.160.195]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Thu, 13 Feb 2020 00:07:56 +0000 (UTC) Received: by mail-qt1-f195.google.com with SMTP id f3so3006603qtc.5 for ; Wed, 12 Feb 2020 16:07:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Xp7j+w+mSRWprEP7jKf4yInMPuuxYmE0Jx4GMGEDPCQ=; b=jmvVOhaXi+4hUwokVKiclEzfFZs0kumgKecddVwQJ6W4BYeb4eunfR42QH1BanKgXe MWQhbMX2xwNbvy0w6kTd2sh93gkpDWROStsMnfu41dD9mtVfKl5N/rEJCKmqqY4aoAcn kpev9dqGLS1dzO5YtQPFJGED458NEkCaq8BEWGqGQcyVhuH/okSHYJfODuu/8aZmP4cG C3ZXndp9Fcx2UthovyxMFXfFmE7hwC77Y/SZbgnh3Ul4OJATixPYCRxJfupisKXfC2C1 oGZSd76/0NwAuj1H3V+4klnwdDUYqolEtgnvSXfBSw3iRwkHKOmYNLCadx1UHUFV8fTN Nsbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=Xp7j+w+mSRWprEP7jKf4yInMPuuxYmE0Jx4GMGEDPCQ=; b=gPHKVaZGZTTN/oEKrvPBh+Inas3xes0Fk23GpswYGrD66gZ1Wec3ebyGhSoWObkTnU p7MCLo2FwegMYy6OS/LnXXv0fXGx7dbzmhetRmgEKa9dVNpzCxy4rALIwbKZe5aihiJO k8XgfmKGhJI+hpfmBctTHhSCshicAU8fMIAQUbPBRmdYpVb0mAqZQh7KQ3Jp5q5WMPwp MlOF3DuHQTaOYYR3Fmz3Wd1ogucLqRKARyc2lzhIeR7LouIQJPnJFJkdv3FSRG8j647C 7H8BpEvrArZbtf/NGAyEnxuzx3p+zxw4O0e1Uw6/ny4SN5fjPU2dcf0qkqVraQjvFoUX KFIg== X-Gm-Message-State: APjAAAU3kV48igjixvUr2kA05wWYtMnQwrGNH5m/jLGZUH2x1nWZVMqH wgPs+CBDCVBXY/78XcnK54M= X-Google-Smtp-Source: APXvYqxkglYm+ctYrqwr6O5FMziGEB0S7P4tMND0bpObW7mupop3DlvqhOJzX02e+LxNJ9Ob5nbkpw== X-Received: by 2002:ac8:3ac3:: with SMTP id x61mr9574212qte.25.1581552475996; Wed, 12 Feb 2020 16:07:55 -0800 (PST) Received: from localhost ([2620:10d:c091:500::1:985a]) by smtp.gmail.com with ESMTPSA id 64sm312550qkh.98.2020.02.12.16.07.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Feb 2020 16:07:55 -0800 (PST) Date: Wed, 12 Feb 2020 19:07:54 -0500 From: Tejun Heo To: Dan Schatzberg Cc: Jens Axboe , Li Zefan , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , "open list:BLOCK LAYER" , open list , "open list:CONTROL GROUP (CGROUP)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Subject: Re: [PATCH 2/2] loop: charge i/o per cgroup Message-ID: <20200213000754.GD88887@mtj.thefacebook.com> References: <20200205223348.880610-1-dschatzberg@fb.com> <20200205223348.880610-3-dschatzberg@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200205223348.880610-3-dschatzberg@fb.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Wed, Feb 05, 2020 at 02:33:48PM -0800, Dan Schatzberg wrote: > -static int loop_kthread_worker_fn(void *worker_ptr) > -{ > - current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO; > - return kthread_worker_fn(worker_ptr); > + flush_workqueue(lo->workqueue); > + destroy_workqueue(lo->workqueue); destroy_workqueue() implies draining, so the explicit flush isn't necessary. > static int loop_prepare_queue(struct loop_device *lo) > { > + lo->workqueue = alloc_workqueue("loop%d", > + WQ_FREEZABLE | WQ_MEM_RECLAIM | > + WQ_HIGHPRI, > + lo->lo_number); > + if (IS_ERR(lo->workqueue)) > return -ENOMEM; Given that these can be pretty cpu intensive and a single work item can be saturated by multiple cpus keepings queueing bios, it probably would be better to use an unbound workqueue (WQ_UNBOUND) and let the scheduler figure out. > +struct loop_worker { > + struct rb_node rb_node; > + struct work_struct work; > + struct list_head cmd_list; > + struct loop_device *lo; > + int css_id; > +}; > + > +static void loop_workfn(struct work_struct *work); > +static void loop_rootcg_workfn(struct work_struct *work); > + > +static void loop_queue_on_rootcg_locked(struct loop_device *lo, > + struct loop_cmd *cmd) > +{ lockdep_assert_held() here? > + list_add_tail(&cmd->list_entry, &lo->rootcg_cmd_list); > + if (list_is_first(&cmd->list_entry, &lo->rootcg_cmd_list)) > + queue_work(lo->workqueue, &lo->rootcg_work); I'd just call queue_work() without the preceding check. Trying to requeue an active work item is pretty cheap. > +} > + > +static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd) > +{ > + struct rb_node **node = &(lo->worker_tree.rb_node), *parent = NULL; ^ This isn't necessary, right? > + struct loop_worker *cur_worker, *worker = NULL; > + int css_id = 0; > + > + if (cmd->blk_css) We usually use blkcg_css as the name. > + css_id = cmd->blk_css->id; > + > + spin_lock_irq(&lo->lo_lock); > + > + if (!css_id) { > + loop_queue_on_rootcg_locked(lo, cmd); > + goto out; > + } > + node = &(lo->worker_tree.rb_node); > + > + while (*node) { > + parent = *node; > + cur_worker = container_of(*node, struct loop_worker, rb_node); > + if (cur_worker->css_id == css_id) { > + worker = cur_worker; > + break; > + } else if (cur_worker->css_id < 0) { > + node = &((*node)->rb_left); ^ I'd keep only the inner parentheses. > + } else { > + node = &((*node)->rb_right); > + } > + } > + if (worker) { > + list_add_tail(&cmd->list_entry, &worker->cmd_list); Maybe goto and unindent else block? > + } else { > + worker = kmalloc(sizeof(struct loop_worker), GFP_NOIO); I think the gpf flag combo you wanna use is GFP_NOWAIT | __GFP_NOWARN - we don't wanna enter direct reclaim from here or generate warnings. Also, I personally tend to use kzalloc() for small stuff by default as it doesn't really cost anything while making things easier / safer later when adding new fields, but up to you. > + /* > + * In the event we cannot allocate a worker, just > + * queue on the rootcg worker > + */ > + if (!worker) { > + loop_queue_on_rootcg_locked(lo, cmd); > + goto out; > + } > + worker->css_id = css_id; Maybe blkcg_css_id would be clearer? Your call for sure tho. > + INIT_WORK(&worker->work, loop_workfn); > + INIT_LIST_HEAD(&worker->cmd_list); > + worker->lo = lo; > + rb_link_node(&worker->rb_node, parent, node); > + rb_insert_color(&worker->rb_node, &lo->worker_tree); > + list_add_tail(&cmd->list_entry, &worker->cmd_list); > + queue_work(lo->workqueue, &worker->work); It might be better to share the above two lines between existing and new worker paths. I think you're missing queue_work() for the former. > @@ -1942,6 +2006,9 @@ static void loop_handle_cmd(struct loop_cmd *cmd) > struct request *rq = blk_mq_rq_from_pdu(cmd); > const bool write = op_is_write(req_op(rq)); > struct loop_device *lo = rq->q->queuedata; > +#ifdef CONFIG_MEMCG > + struct cgroup_subsys_state *mem_css; memcg_css is what we've been using; however, I kinda like blk/mem_css. Maybe we should rename the others. Please feel free to leave as-is. > @@ -1958,26 +2041,50 @@ static void loop_handle_cmd(struct loop_cmd *cmd) > } > } > > +static void loop_process_work(struct loop_worker *worker, > + struct list_head *cmd_list, struct loop_device *lo) > { > + int orig_flags = current->flags; > + struct loop_cmd *cmd; > + > + while (1) { > + spin_lock_irq(&lo->lo_lock); > + if (list_empty(cmd_list)) { Maybe break here and cleanup at the end of the function? > + if (worker) > + rb_erase(&worker->rb_node, &lo->worker_tree); > + spin_unlock_irq(&lo->lo_lock); > + kfree(worker); > + current->flags = orig_flags; I wonder whether we wanna keep them around for a bit. A lot of IO patterns involve brief think times between accesses and this would be constantly creating and destroying constantly. > + return; > + } > > + cmd = container_of( > + cmd_list->next, struct loop_cmd, list_entry); > + list_del(cmd_list->next); > + spin_unlock_irq(&lo->lo_lock); > + current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO; I think we can set this at the head of the function and > + loop_handle_cmd(cmd); > + current->flags = orig_flags; restore them before returning. > @@ -587,6 +587,7 @@ struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgrp, > rcu_read_unlock(); > return css; > } > +EXPORT_SYMBOL_GPL(cgroup_get_e_css); Can you please mention the above in the changelog? Also, it'd be great to have rationales there too. Thanks. -- tejun