[PATCH v6 00/20] kthread: Use kthread worker API more widely

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH v6 00/20] kthread: Use kthread worker API more widely

Petr Mladek-2
My intention is to make it easier to manipulate and maintain kthreads.
Especially, I want to replace all the custom main cycles with a
generic one. Also I want to make the kthreads sleep in a consistent
state in a common place when there is no work.

My first attempt was with a brand new API (iterant kthread), see
http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
directed to improve the existing kthread worker API. This is
the 4th iteration of the new direction.


1nd..10th patches: improve the existing kthread worker API

11th..16th, 18th, 20th patches: convert several kthreads into
      the kthread worker API, namely: khugepaged, ring buffer
      benchmark, hung_task, kmemleak, ipmi, IB/fmr_pool,
      memstick/r592, intel_powerclamp
     
17th, 19th patches: do some preparation steps; they usually do
      some clean up that makes sense even without the conversion.


Changes against v5:

  + removed spin_trylock() from delayed_kthread_work_timer_fn();
    instead temporary released worked->lock() when calling
    del_timer_sync(); made sure that any queueing was blocked
    by work->canceling in the meatime

  + used 0th byte for KTW_FREEZABLE to reduce confusion

  + fixed warnings in comments reported by make htmldocs

  + sigh, there was no easy way to create an empty va_list
    that would work on all architectures; decided to make
    @namefmt generic in create_kthread_worker_on_cpu()

  + converted khungtaskd a better way; it was inspired by
    the recent changes that appeared in 4.6-rc1


Changes against v4:

  + added worker->delayed_work_list; it simplified the check
    for pending work; we do not longer need the new timer_active()
    function; also we do not need the link work->timer. On the
    other hand we need to distinguish between the normal and
    the delayed work by a boolean parameter passed to
    the common functions, e.g. __cancel_kthread_work_sync()
   
  + replaced most try_lock repeat cycles with a WARN_ON();
    the API does not allow to use the work with more workers;
    so such a situation would be a bug; it removed the
    complex try_lock_kthread_work() function that supported
    more modes;

  + renamed kthread_work_pending() to queuing_blocked();
    added this function later when really needed

  + renamed try_to_cancel_kthread_work() to __cancel_kthread_work();
    in fact, this a common implementation for the async cancel()
    function

  + removed a dull check for invalid cpu number in
    create_kthread_worker_on_cpu(); removed some other unnecessary
    code structures as suggested by Tejun

  + consistently used bool return value in all new __cancel functions

  + fixed ordering of cpu and flags parameters in
    create_kthread_worker_on_cpu() vs. create_kthread_worker()

  + used memset in the init_kthread_worker()

  + updated many comments as suggested by Tejun and as
    required the above changes

  + removed obsolete patch adding timer_active()

  + removed obsolete patch for using try_lock in flush_kthread_worker()

  + double checked all existing users of kthread worker API
    that they reinitialized the work when the worker was started
    and would not print false warnings; all looked fine

  + added taken acks for the Intel Powerclamp conversion
   

Changes against v3:

  + allow to free struct kthread_work from its callback; do not touch
    the struct from the worker post-mortem; as a side effect, the structure
    must be reinitialized when the worker gets restarted; updated
    khugepaged, and kmemleak accordingly

  + call del_timer_sync() with worker->lock; instead, detect canceling
    in the timer callback and give up an attempt to get the lock there;
    do busy loop with spin_is_locked() to reduce cache bouncing

  + renamed ipmi+func() -> ipmi_kthread_worker_func() as suggested
    by Corey

  + added some collected Reviewed-by

 
Changes against v2:

  + used worker->lock to synchronize the operations with the work
    instead of the PENDING bit as suggested by Tejun Heo; it simplified
    the implementation in several ways

  + added timer_active(); used it together with del_timer_sync()
    to cancel the work a less tricky way

  + removed the controversial conversion of the RCU kthreads

  + added several other examples: hung_task, kmemleak, ipmi,
    IB/fmr_pool, memstick/r592, intel_powerclamp

  + the helper fixes for the ring buffer benchmark has been improved
    as suggested by Steven; they already are in the Linus tree now

  + fixed a possible race between the check for existing khugepaged
    worker and queuing the work
 

Changes against v1:

  + remove wrappers to manipulate the scheduling policy and priority

  + remove questionable wakeup_and_destroy_kthread_worker() variant

  + do not check for chained work when draining the queue

  + allocate struct kthread worker in create_kthread_work() and
    use more simple checks for running worker

  + add support for delayed kthread works and use them instead
    of waiting inside the works

  + rework the "unrelated" fixes for the ring buffer benchmark
    as discussed in the 1st RFC; also sent separately

  + convert also the consumer in the ring buffer benchmark


I have tested this patch set against the stable Linus tree
for 4.6-rc3.

Comments against v5 can be found at
http://thread.gmane.org/gmane.linux.kernel.mm/146726

Petr Mladek (20):
  kthread/smpboot: Do not park in kthread_create_on_cpu()
  kthread: Allow to call __kthread_create_on_node() with va_list args
  kthread: Add create_kthread_worker*()
  kthread: Add drain_kthread_worker()
  kthread: Add destroy_kthread_worker()
  kthread: Detect when a kthread work is used by more workers
  kthread: Initial support for delayed kthread work
  kthread: Allow to cancel kthread work
  kthread: Allow to modify delayed kthread work
  kthread: Better support freezable kthread workers
  mm/huge_page: Convert khugepaged() into kthread worker API
  ring_buffer: Convert benchmark kthreads into kthread worker API
  hung_task: Convert hungtaskd into kthread worker API
  kmemleak: Convert kmemleak kthread into kthread worker API
  ipmi: Convert kipmi kthread into kthread worker API
  IB/fmr_pool: Convert the cleanup thread into kthread worker API
  memstick/r592: Better synchronize debug messages in r592_io kthread
  memstick/r592: convert r592_io kthread into kthread worker API
  thermal/intel_powerclamp: Remove duplicated code that starts the
    kthread
  thermal/intel_powerclamp: Convert the kthread to kthread worker API

 drivers/char/ipmi/ipmi_si_intf.c     | 121 ++++----
 drivers/infiniband/core/fmr_pool.c   |  54 ++--
 drivers/memstick/host/r592.c         |  61 ++--
 drivers/memstick/host/r592.h         |   5 +-
 drivers/thermal/intel_powerclamp.c   | 302 ++++++++++--------
 include/linux/kthread.h              |  57 ++++
 kernel/hung_task.c                   |  83 +++--
 kernel/kthread.c                     | 571 +++++++++++++++++++++++++++++++----
 kernel/smpboot.c                     |   5 +
 kernel/trace/ring_buffer_benchmark.c | 133 ++++----
 mm/huge_memory.c                     | 138 +++++----
 mm/kmemleak.c                        |  87 +++---
 12 files changed, 1106 insertions(+), 511 deletions(-)

CC: Catalin Marinas <[hidden email]>
CC: [hidden email]
CC: Corey Minyard <[hidden email]>
CC: [hidden email]
CC: Doug Ledford <[hidden email]>
CC: Sean Hefty <[hidden email]>
CC: Hal Rosenstock <[hidden email]>
CC: [hidden email]
CC: Maxim Levitsky <[hidden email]>
CC: Zhang Rui <[hidden email]>
CC: Eduardo Valentin <[hidden email]>
CC: Jacob Pan <[hidden email]>
CC: [hidden email]
CC: Sebastian Andrzej Siewior <[hidden email]>

--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 01/20] kthread/smpboot: Do not park in kthread_create_on_cpu()

Petr Mladek-2
kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e
("kthread: Implement park/unpark facility"). It is currently used
only when enabling new CPU. For this purpose, the newly created
kthread has to be parked.

The CPU binding is a bit tricky. The kthread is parked when the CPU
has not been allowed yet. And the CPU is bound when the kthread
is unparked.

The function would be useful for more per-CPU kthreads, e.g.
bnx2fc_thread, fcoethread. For this purpose, the newly created
kthread should stay in the uninterruptible state.

This patch moves the parking into smpboot. It binds the thread
already when created. Then the function might be used universally.
Also the behavior is consistent with kthread_create() and
kthread_create_on_node().

Signed-off-by: Petr Mladek <[hidden email]>
Reviewed-by: Thomas Gleixner <[hidden email]>
---
 kernel/kthread.c | 8 ++++++--
 kernel/smpboot.c | 5 +++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9ff173dca1ae..1ffc11ec5546 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -390,10 +390,10 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
    cpu);
  if (IS_ERR(p))
  return p;
+ kthread_bind(p, cpu);
+ /* CPU hotplug need to bind once again when unparking the thread. */
  set_bit(KTHREAD_IS_PER_CPU, &to_kthread(p)->flags);
  to_kthread(p)->cpu = cpu;
- /* Park the thread to get it out of TASK_UNINTERRUPTIBLE state */
- kthread_park(p);
  return p;
 }
 
@@ -407,6 +407,10 @@ static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)
  * which might be about to be cleared.
  */
  if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
+ /*
+ * Newly created kthread was parked when the CPU was offline.
+ * The binding was lost and we need to set it again.
+ */
  if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
  __kthread_bind(k, kthread->cpu, TASK_PARKED);
  wake_up_state(k, TASK_PARKED);
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 13bc43d1fb22..4a5c6e73ecd4 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -186,6 +186,11 @@ __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
  kfree(td);
  return PTR_ERR(tsk);
  }
+ /*
+ * Park the thread so that it could start right on the CPU
+ * when it is available.
+ */
+ kthread_park(tsk);
  get_task_struct(tsk);
  *per_cpu_ptr(ht->store, cpu) = tsk;
  if (ht->create) {
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 02/20] kthread: Allow to call __kthread_create_on_node() with va_list args

Petr Mladek-2
In reply to this post by Petr Mladek-2
kthread_create_on_node() implements a bunch of logic to create
the kthread. It is already called by kthread_create_on_cpu().

We are going to extend the kthread worker API and will
need to call kthread_create_on_node() with va_list args there.

This patch does only a refactoring and does not modify the existing
behavior.

Signed-off-by: Petr Mladek <[hidden email]>
---
 kernel/kthread.c | 72 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1ffc11ec5546..bfe8742c4217 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -244,33 +244,10 @@ static void create_kthread(struct kthread_create_info *create)
  }
 }
 
-/**
- * kthread_create_on_node - create a kthread.
- * @threadfn: the function to run until signal_pending(current).
- * @data: data ptr for @threadfn.
- * @node: task and thread structures for the thread are allocated on this node
- * @namefmt: printf-style name for the thread.
- *
- * Description: This helper function creates and names a kernel
- * thread.  The thread will be stopped: use wake_up_process() to start
- * it.  See also kthread_run().  The new thread has SCHED_NORMAL policy and
- * is affine to all CPUs.
- *
- * If thread is going to be bound on a particular cpu, give its node
- * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
- * When woken, the thread will run @threadfn() with @data as its
- * argument. @threadfn() can either call do_exit() directly if it is a
- * standalone thread for which no one will call kthread_stop(), or
- * return when 'kthread_should_stop()' is true (which means
- * kthread_stop() has been called).  The return value should be zero
- * or a negative error number; it will be passed to kthread_stop().
- *
- * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
- */
-struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
-   void *data, int node,
-   const char namefmt[],
-   ...)
+static struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
+    void *data, int node,
+    const char namefmt[],
+    va_list args)
 {
  DECLARE_COMPLETION_ONSTACK(done);
  struct task_struct *task;
@@ -311,11 +288,8 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
  task = create->result;
  if (!IS_ERR(task)) {
  static const struct sched_param param = { .sched_priority = 0 };
- va_list args;
 
- va_start(args, namefmt);
  vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
- va_end(args);
  /*
  * root may have changed our (kthreadd's) priority or CPU mask.
  * The kernel thread should not inherit these properties.
@@ -326,6 +300,44 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
  kfree(create);
  return task;
 }
+
+/**
+ * kthread_create_on_node - create a kthread.
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @node: task and thread structures for the thread are allocated on this node
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread.  The thread will be stopped: use wake_up_process() to start
+ * it.  See also kthread_run().  The new thread has SCHED_NORMAL policy and
+ * is affine to all CPUs.
+ *
+ * If thread is going to be bound on a particular cpu, give its node
+ * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
+ * When woken, the thread will run @threadfn() with @data as its
+ * argument. @threadfn() can either call do_exit() directly if it is a
+ * standalone thread for which no one will call kthread_stop(), or
+ * return when 'kthread_should_stop()' is true (which means
+ * kthread_stop() has been called).  The return value should be zero
+ * or a negative error number; it will be passed to kthread_stop().
+ *
+ * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
+ */
+struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
+   void *data, int node,
+   const char namefmt[],
+   ...)
+{
+ struct task_struct *task;
+ va_list args;
+
+ va_start(args, namefmt);
+ task = __kthread_create_on_node(threadfn, data, node, namefmt, args);
+ va_end(args);
+
+ return task;
+}
 EXPORT_SYMBOL(kthread_create_on_node);
 
 static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, long state)
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 03/20] kthread: Add create_kthread_worker*()

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthread workers are currently created using the classic kthread API,
namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
parameter.

This patch defines create_kthread_worker() and
create_kthread_worker_on_cpu() functions that hide implementation details.

They enforce using kthread_worker_fn() for the main thread. But I doubt
that there are any plans to create any alternative. In fact, I think
that we do not want any alternative main thread because it would be
hard to support consistency with the rest of the kthread worker API.

The naming and function of create_kthread_worker() is inspired by
the workqueues API like the rest of the kthread worker API.

The create_kthread_worker_on_cpu() variant is motivated by the original
kthread_create_on_cpu(). Note that we need to bind per-CPU kthread
workers already when they are created. It makes the life easier.
kthread_bind() could not be used later for an already running worker.

This patch does _not_ convert existing kthread workers. The kthread worker
API need more improvements first, e.g. a function to destroy the worker.

IMPORTANT:

create_kthread_worker_on_cpu() allows to use any format of the
worker name, in compare with kthread_create_on_cpu(). The good thing
is that it is more generic. The bad thing is that most users will
need to pass the cpu number in two parameters, e.g.
create_kthread_worker_on_cpu(cpu, "helper/%d", cpu).

To be honest, the main motivation was to avoid the need for an
empty va_list. The only legal way was to create a helper function that
would be called with an empty list. Other attempts caused compilation
warnings or even errors on different architectures.

There were also other alternatives, for example, using #define or
splitting __create_kthread_worker(). The used solution looked
like the least ugly.

Signed-off-by: Petr Mladek <[hidden email]>
---
 include/linux/kthread.h |   7 +++
 kernel/kthread.c        | 113 +++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 110 insertions(+), 10 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index e691b6a23f72..468011efa68d 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -124,6 +124,13 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
+__printf(1, 2)
+struct kthread_worker *
+create_kthread_worker(const char namefmt[], ...);
+
+struct kthread_worker *
+create_kthread_worker_on_cpu(int cpu, const char namefmt[], ...);
+
 bool queue_kthread_work(struct kthread_worker *worker,
  struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfe8742c4217..76364374ff98 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -567,23 +567,24 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
  * kthread_worker_fn - kthread function to process kthread_worker
  * @worker_ptr: pointer to initialized kthread_worker
  *
- * This function can be used as @threadfn to kthread_create() or
- * kthread_run() with @worker_ptr argument pointing to an initialized
- * kthread_worker.  The started kthread will process work_list until
- * the it is stopped with kthread_stop().  A kthread can also call
- * this function directly after extra initialization.
+ * This function implements the main cycle of kthread worker. It processes
+ * work_list until it is stopped with kthread_stop(). It sleeps when the queue
+ * is empty.
  *
- * Different kthreads can be used for the same kthread_worker as long
- * as there's only one kthread attached to it at any given time.  A
- * kthread_worker without an attached kthread simply collects queued
- * kthread_works.
+ * The works are not allowed to keep any locks, disable preemption or interrupts
+ * when they finish. There is defined a safe point for freezing when one work
+ * finishes and before a new one is started.
  */
 int kthread_worker_fn(void *worker_ptr)
 {
  struct kthread_worker *worker = worker_ptr;
  struct kthread_work *work;
 
- WARN_ON(worker->task);
+ /*
+ * FIXME: Update the check and remove the assignment when all kthread
+ * worker users are created using create_kthread_worker*() functions.
+ */
+ WARN_ON(worker->task && worker->task != current);
  worker->task = current;
 repeat:
  set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
@@ -617,6 +618,98 @@ repeat:
 }
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
+static struct kthread_worker *
+__create_kthread_worker(int cpu, const char namefmt[], va_list args)
+{
+ struct kthread_worker *worker;
+ struct task_struct *task;
+
+ worker = kzalloc(sizeof(*worker), GFP_KERNEL);
+ if (!worker)
+ return ERR_PTR(-ENOMEM);
+
+ init_kthread_worker(worker);
+
+ if (cpu >= 0) {
+ char name[TASK_COMM_LEN];
+
+ /*
+ * creare_kthread_worker_on_cpu() allows to pass a generic
+ * namefmt in compare with kthread_create_on_cpu. We need
+ * to format it here.
+ */
+ vsnprintf(name, sizeof(name), namefmt, args);
+ task = kthread_create_on_cpu(kthread_worker_fn, worker,
+     cpu, name);
+ } else {
+ task = __kthread_create_on_node(kthread_worker_fn, worker,
+ -1, namefmt, args);
+ }
+
+ if (IS_ERR(task))
+ goto fail_task;
+
+ worker->task = task;
+ wake_up_process(task);
+ return worker;
+
+fail_task:
+ kfree(worker);
+ return ERR_CAST(task);
+}
+
+/**
+ * create_kthread_worker - create a kthread worker
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * Returns a pointer to the allocated worker on success, ERR_PTR(-ENOMEM)
+ * when the needed structures could not get allocated, and ERR_PTR(-EINTR)
+ * when the worker was SIGKILLed.
+ */
+struct kthread_worker *
+create_kthread_worker(const char namefmt[], ...)
+{
+ struct kthread_worker *worker;
+ va_list args;
+
+ va_start(args, namefmt);
+ worker = __create_kthread_worker(-1, namefmt, args);
+ va_end(args);
+
+ return worker;
+}
+EXPORT_SYMBOL(create_kthread_worker);
+
+/**
+ * create_kthread_worker_on_cpu - create a kthread worker and bind it
+ * it to a given CPU and the associated NUMA node.
+ * @cpu: CPU number
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * Use a valid CPU number if you want to bind the kthread worker
+ * to the given CPU and the associated NUMA node.
+ *
+ * A good practice is to add the cpu number also into the worker name.
+ * For example, use create_kthread_worker_on_cpu(cpu, "helper/%d", cpu).
+ *
+ * Returns a pointer to the allocated worker on success, ERR_PTR(-ENOMEM)
+ * when the needed structures could not get allocated, and ERR_PTR(-EINTR)
+ * when the worker was SIGKILLed.
+ */
+struct kthread_worker *
+create_kthread_worker_on_cpu(int cpu, const char namefmt[], ...)
+{
+ struct kthread_worker *worker;
+ va_list args;
+
+ va_start(args, namefmt);
+ worker = __create_kthread_worker(cpu, namefmt, args);
+ va_end(args);
+
+ return worker;
+}
+EXPORT_SYMBOL(create_kthread_worker_on_cpu);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
        struct kthread_work *work,
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 05/20] kthread: Add destroy_kthread_worker()

Petr Mladek-2
In reply to this post by Petr Mladek-2
The current kthread worker users call flush() and stop() explicitly.
This function drains the worker, stops it, and frees the kthread_worker
struct in one call.

It is supposed to be used together with create_kthread_worker*() that
allocates struct kthread_worker.

Also note that drain() correctly handles self-queuing works in compare
with flush().

Signed-off-by: Petr Mladek <[hidden email]>
---
 include/linux/kthread.h |  2 ++
 kernel/kthread.c        | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 468011efa68d..a36604fa8aa2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -136,4 +136,6 @@ bool queue_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
+void destroy_kthread_worker(struct kthread_worker *worker);
+
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 6051aa9d93c6..441651765f08 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -852,3 +852,24 @@ void drain_kthread_worker(struct kthread_worker *worker)
  spin_unlock_irq(&worker->lock);
 }
 EXPORT_SYMBOL(drain_kthread_worker);
+
+/**
+ * destroy_kthread_worker - destroy a kthread worker
+ * @worker: worker to be destroyed
+ *
+ * Drain and destroy @worker.  It has the same conditions
+ * for use as drain_kthread_worker(), see above.
+ */
+void destroy_kthread_worker(struct kthread_worker *worker)
+{
+ struct task_struct *task;
+
+ task = worker->task;
+ if (WARN_ON(!task))
+ return;
+
+ drain_kthread_worker(worker);
+ kthread_stop(task);
+ kfree(worker);
+}
+EXPORT_SYMBOL(destroy_kthread_worker);
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 07/20] kthread: Initial support for delayed kthread work

Petr Mladek-2
In reply to this post by Petr Mladek-2
We are going to use kthread_worker more widely and delayed works
will be pretty useful.

The implementation is inspired by workqueues. It uses a timer to
queue the work after the requested delay. If the delay is zero,
the work is queued immediately.

In compare with workqueues, each work is associated with a single
worker (kthread). Therefore the implementation could be much easier.
In particular, we use the worker->lock to synchronize all the
operations with the work. We do not need any atomic operation
with a flags variable.

In fact, we do not need any state variable at all. Instead, we
add a list of delayed works into the worker. Then the pending
work is listed either in the list of queued or delayed works.
And the existing check of pending works is the same even for
the delayed ones.

A work must not be assigned to another worker unless reinitialized.
Therefore the timer handler might expect that dwork->work->worker
is valid and it could simply take the lock. We just add some
sanity checks to help with debugging a potential misuse.

Signed-off-by: Petr Mladek <[hidden email]>
---
 include/linux/kthread.h |  33 ++++++++++++++++
 kernel/kthread.c        | 102 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 135 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index a36604fa8aa2..b27be55cffa3 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -63,10 +63,12 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
  */
 struct kthread_work;
 typedef void (*kthread_work_func_t)(struct kthread_work *work);
+void delayed_kthread_work_timer_fn(unsigned long __data);
 
 struct kthread_worker {
  spinlock_t lock;
  struct list_head work_list;
+ struct list_head delayed_work_list;
  struct task_struct *task;
  struct kthread_work *current_work;
 };
@@ -77,9 +79,15 @@ struct kthread_work {
  struct kthread_worker *worker;
 };
 
+struct delayed_kthread_work {
+ struct kthread_work work;
+ struct timer_list timer;
+};
+
 #define KTHREAD_WORKER_INIT(worker) { \
  .lock = __SPIN_LOCK_UNLOCKED((worker).lock), \
  .work_list = LIST_HEAD_INIT((worker).work_list), \
+ .delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\
  }
 
 #define KTHREAD_WORK_INIT(work, fn) { \
@@ -87,12 +95,23 @@ struct kthread_work {
  .func = (fn), \
  }
 
+#define DELAYED_KTHREAD_WORK_INIT(dwork, fn) { \
+ .work = KTHREAD_WORK_INIT((dwork).work, (fn)), \
+ .timer = __TIMER_INITIALIZER(delayed_kthread_work_timer_fn, \
+     0, (unsigned long)&(dwork), \
+     TIMER_IRQSAFE), \
+ }
+
 #define DEFINE_KTHREAD_WORKER(worker) \
  struct kthread_worker worker = KTHREAD_WORKER_INIT(worker)
 
 #define DEFINE_KTHREAD_WORK(work, fn) \
  struct kthread_work work = KTHREAD_WORK_INIT(work, fn)
 
+#define DEFINE_DELAYED_KTHREAD_WORK(dwork, fn) \
+ struct delayed_kthread_work dwork = \
+ DELAYED_KTHREAD_WORK_INIT(dwork, fn)
+
 /*
  * kthread_worker.lock needs its own lockdep class key when defined on
  * stack with lockdep enabled.  Use the following macros in such cases.
@@ -122,6 +141,15 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
  (work)->func = (fn); \
  } while (0)
 
+#define init_delayed_kthread_work(dwork, fn) \
+ do { \
+ init_kthread_work(&(dwork)->work, (fn)); \
+ __setup_timer(&(dwork)->timer, \
+      delayed_kthread_work_timer_fn, \
+      (unsigned long)(dwork), \
+      TIMER_IRQSAFE); \
+ } while (0)
+
 int kthread_worker_fn(void *worker_ptr);
 
 __printf(1, 2)
@@ -133,6 +161,11 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[], ...);
 
 bool queue_kthread_work(struct kthread_worker *worker,
  struct kthread_work *work);
+
+bool queue_delayed_kthread_work(struct kthread_worker *worker,
+ struct delayed_kthread_work *dwork,
+ unsigned long delay);
+
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 385a7d6b4872..7655357065e1 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -559,6 +559,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
  spin_lock_init(&worker->lock);
  lockdep_set_class_and_name(&worker->lock, key, name);
  INIT_LIST_HEAD(&worker->work_list);
+ INIT_LIST_HEAD(&worker->delayed_work_list);
  worker->task = NULL;
 }
 EXPORT_SYMBOL_GPL(__init_kthread_worker);
@@ -763,6 +764,107 @@ bool queue_kthread_work(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL_GPL(queue_kthread_work);
 
+/**
+ * delayed_kthread_work_timer_fn - callback that queues the associated delayed
+ * kthread work when the timer expires.
+ * @__data: pointer to the data associated with the timer
+ *
+ * The format of the function is defined by struct timer_list.
+ * It should have been called from irqsafe timer with irq already off.
+ */
+void delayed_kthread_work_timer_fn(unsigned long __data)
+{
+ struct delayed_kthread_work *dwork =
+ (struct delayed_kthread_work *)__data;
+ struct kthread_work *work = &dwork->work;
+ struct kthread_worker *worker = work->worker;
+
+ /*
+ * This might happen when a pending work is reinitialized.
+ * It means that it is used a wrong way.
+ */
+ if (WARN_ON_ONCE(!worker))
+ return;
+
+ spin_lock(&worker->lock);
+ /* Work must not be used with more workers, see queue_kthread_work(). */
+ WARN_ON_ONCE(work->worker != worker);
+
+ /* Move the work from worker->delayed_work_list. */
+ WARN_ON_ONCE(list_empty(&work->node));
+ list_del_init(&work->node);
+ insert_kthread_work(worker, work, &worker->work_list);
+
+ spin_unlock(&worker->lock);
+}
+EXPORT_SYMBOL(delayed_kthread_work_timer_fn);
+
+void __queue_delayed_kthread_work(struct kthread_worker *worker,
+ struct delayed_kthread_work *dwork,
+ unsigned long delay)
+{
+ struct timer_list *timer = &dwork->timer;
+ struct kthread_work *work = &dwork->work;
+
+ WARN_ON_ONCE(timer->function != delayed_kthread_work_timer_fn ||
+     timer->data != (unsigned long)dwork);
+
+ /*
+ * If @delay is 0, queue @dwork->work immediately.  This is for
+ * both optimization and correctness.  The earliest @timer can
+ * expire is on the closest next tick and delayed_work users depend
+ * on that there's no such delay when @delay is 0.
+ */
+ if (!delay) {
+ insert_kthread_work(worker, work, &worker->work_list);
+ return;
+ }
+
+ /* Be paranoid and try to detect possible races already now. */
+ insert_kthread_work_sanity_check(worker, work);
+
+ list_add(&work->node, &worker->delayed_work_list);
+ work->worker = worker;
+ timer_stats_timer_set_start_info(&dwork->timer);
+ timer->expires = jiffies + delay;
+ add_timer(timer);
+}
+
+/**
+ * queue_delayed_kthread_work - queue the associated kthread work
+ * after a delay.
+ * @worker: target kthread_worker
+ * @dwork: delayed_kthread_work to queue
+ * @delay: number of jiffies to wait before queuing
+ *
+ * If the work has not been pending it starts a timer that will queue
+ * the work after the given @delay. If @delay is zero, it queues the
+ * work immediately.
+ *
+ * Return: %false if the @work has already been pending. It means that
+ * either the timer was running or the work was queued. It returns %true
+ * otherwise.
+ */
+bool queue_delayed_kthread_work(struct kthread_worker *worker,
+ struct delayed_kthread_work *dwork,
+ unsigned long delay)
+{
+ struct kthread_work *work = &dwork->work;
+ unsigned long flags;
+ bool ret = false;
+
+ spin_lock_irqsave(&worker->lock, flags);
+
+ if (list_empty(&work->node)) {
+ __queue_delayed_kthread_work(worker, dwork, delay);
+ ret = true;
+ }
+
+ spin_unlock_irqrestore(&worker->lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(queue_delayed_kthread_work);
+
 struct kthread_flush_work {
  struct kthread_work work;
  struct completion done;
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 10/20] kthread: Better support freezable kthread workers

Petr Mladek-2
In reply to this post by Petr Mladek-2
This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.

It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.

I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.

Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.

Signed-off-by: Petr Mladek <[hidden email]>
---
 include/linux/kthread.h | 12 +++++++++---
 kernel/kthread.c        | 21 +++++++++++++++------
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 1d5ca191562f..edad163b26d0 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -65,7 +65,12 @@ struct kthread_work;
 typedef void (*kthread_work_func_t)(struct kthread_work *work);
 void delayed_kthread_work_timer_fn(unsigned long __data);
 
+enum {
+ KTW_FREEZABLE = 1 << 0, /* freeze during suspend */
+};
+
 struct kthread_worker {
+ unsigned int flags;
  spinlock_t lock;
  struct list_head work_list;
  struct list_head delayed_work_list;
@@ -154,12 +159,13 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
-__printf(1, 2)
+__printf(2, 3)
 struct kthread_worker *
-create_kthread_worker(const char namefmt[], ...);
+create_kthread_worker(unsigned int flags, const char namefmt[], ...);
 
 struct kthread_worker *
-create_kthread_worker_on_cpu(int cpu, const char namefmt[], ...);
+create_kthread_worker_on_cpu(int cpu, unsigned int flags,
+     const char namefmt[], ...);
 
 bool queue_kthread_work(struct kthread_worker *worker,
  struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 2cc32cad66ef..4ee4c05f8bf7 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -556,11 +556,11 @@ void __init_kthread_worker(struct kthread_worker *worker,
  const char *name,
  struct lock_class_key *key)
 {
+ memset(worker, 0, sizeof(struct kthread_worker));
  spin_lock_init(&worker->lock);
  lockdep_set_class_and_name(&worker->lock, key, name);
  INIT_LIST_HEAD(&worker->work_list);
  INIT_LIST_HEAD(&worker->delayed_work_list);
- worker->task = NULL;
 }
 EXPORT_SYMBOL_GPL(__init_kthread_worker);
 
@@ -590,6 +590,10 @@ int kthread_worker_fn(void *worker_ptr)
  */
  WARN_ON(worker->task && worker->task != current);
  worker->task = current;
+
+ if (worker->flags & KTW_FREEZABLE)
+ set_freezable();
+
 repeat:
  set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
 
@@ -623,7 +627,8 @@ repeat:
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
 static struct kthread_worker *
-__create_kthread_worker(int cpu, const char namefmt[], va_list args)
+__create_kthread_worker(int cpu, unsigned int flags,
+ const char namefmt[], va_list args)
 {
  struct kthread_worker *worker;
  struct task_struct *task;
@@ -653,6 +658,7 @@ __create_kthread_worker(int cpu, const char namefmt[], va_list args)
  if (IS_ERR(task))
  goto fail_task;
 
+ worker->flags = flags;
  worker->task = task;
  wake_up_process(task);
  return worker;
@@ -664,6 +670,7 @@ fail_task:
 
 /**
  * create_kthread_worker - create a kthread worker
+ * @flags: flags modifying the default behavior of the worker
  * @namefmt: printf-style name for the kthread worker (task).
  *
  * Returns a pointer to the allocated worker on success, ERR_PTR(-ENOMEM)
@@ -671,13 +678,13 @@ fail_task:
  * when the worker was SIGKILLed.
  */
 struct kthread_worker *
-create_kthread_worker(const char namefmt[], ...)
+create_kthread_worker(unsigned int flags, const char namefmt[], ...)
 {
  struct kthread_worker *worker;
  va_list args;
 
  va_start(args, namefmt);
- worker = __create_kthread_worker(-1, namefmt, args);
+ worker = __create_kthread_worker(-1, flags, namefmt, args);
  va_end(args);
 
  return worker;
@@ -688,6 +695,7 @@ EXPORT_SYMBOL(create_kthread_worker);
  * create_kthread_worker_on_cpu - create a kthread worker and bind it
  * it to a given CPU and the associated NUMA node.
  * @cpu: CPU number
+ * @flags: flags modifying the default behavior of the worker
  * @namefmt: printf-style name for the kthread worker (task).
  *
  * Use a valid CPU number if you want to bind the kthread worker
@@ -701,13 +709,14 @@ EXPORT_SYMBOL(create_kthread_worker);
  * when the worker was SIGKILLed.
  */
 struct kthread_worker *
-create_kthread_worker_on_cpu(int cpu, const char namefmt[], ...)
+create_kthread_worker_on_cpu(int cpu, unsigned int flags,
+     const char namefmt[], ...)
 {
  struct kthread_worker *worker;
  va_list args;
 
  va_start(args, namefmt);
- worker = __create_kthread_worker(cpu, namefmt, args);
+ worker = __create_kthread_worker(cpu, flags, namefmt, args);
  va_end(args);
 
  return worker;
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 12/20] ring_buffer: Convert benchmark kthreads into kthread worker API

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the ring buffer benchmark producer into a kthread
worker because it modifies the scheduling priority and policy.
Also, it is a benchmark. It makes CPU very busy. It will most likely
run only limited time. IMHO, it does not make sense to mess the system
workqueues with it.

The thread is split into two independent works. It might look more
complicated but it helped me to find a race in the sleeping part
that was fixed separately.

kthread_should_stop() could not longer be used inside the works
because it defines the life of the worker and it needs to stay
usable until all works are done. Instead, we add @test_end
global variable. It is set during normal termination in compare
with @test_error.

Signed-off-by: Petr Mladek <[hidden email]>
---
 kernel/trace/ring_buffer_benchmark.c | 133 ++++++++++++++++-------------------
 1 file changed, 59 insertions(+), 74 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 6df9a83e20d7..7ff443f1e406 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -26,10 +26,17 @@ static int wakeup_interval = 100;
 static int reader_finish;
 static DECLARE_COMPLETION(read_start);
 static DECLARE_COMPLETION(read_done);
-
 static struct ring_buffer *buffer;
-static struct task_struct *producer;
-static struct task_struct *consumer;
+
+static void rb_producer_hammer_func(struct kthread_work *dummy);
+static struct kthread_worker *rb_producer_worker;
+static DEFINE_DELAYED_KTHREAD_WORK(rb_producer_hammer_work,
+   rb_producer_hammer_func);
+
+static void rb_consumer_func(struct kthread_work *dummy);
+static struct kthread_worker *rb_consumer_worker;
+static DEFINE_KTHREAD_WORK(rb_consumer_work, rb_consumer_func);
+
 static unsigned long read;
 
 static unsigned int disable_reader;
@@ -61,6 +68,7 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 static int read_events;
 
 static int test_error;
+static int test_end;
 
 #define TEST_ERROR() \
  do { \
@@ -77,7 +85,7 @@ enum event_status {
 
 static bool break_test(void)
 {
- return test_error || kthread_should_stop();
+ return test_error || test_end;
 }
 
 static enum event_status read_event(int cpu)
@@ -262,8 +270,8 @@ static void ring_buffer_producer(void)
  end_time = ktime_get();
 
  cnt++;
- if (consumer && !(cnt % wakeup_interval))
- wake_up_process(consumer);
+ if (rb_consumer_worker && !(cnt % wakeup_interval))
+ wake_up_process(rb_consumer_worker->task);
 
 #ifndef CONFIG_PREEMPT
  /*
@@ -281,14 +289,14 @@ static void ring_buffer_producer(void)
  } while (ktime_before(end_time, timeout) && !break_test());
  trace_printk("End ring buffer hammer\n");
 
- if (consumer) {
+ if (rb_consumer_worker) {
  /* Init both completions here to avoid races */
  init_completion(&read_start);
  init_completion(&read_done);
  /* the completions must be visible before the finish var */
  smp_wmb();
  reader_finish = 1;
- wake_up_process(consumer);
+ wake_up_process(rb_consumer_worker->task);
  wait_for_completion(&read_done);
  }
 
@@ -366,68 +374,39 @@ static void ring_buffer_producer(void)
  }
 }
 
-static void wait_to_die(void)
-{
- set_current_state(TASK_INTERRUPTIBLE);
- while (!kthread_should_stop()) {
- schedule();
- set_current_state(TASK_INTERRUPTIBLE);
- }
- __set_current_state(TASK_RUNNING);
-}
-
-static int ring_buffer_consumer_thread(void *arg)
+static void rb_consumer_func(struct kthread_work *dummy)
 {
- while (!break_test()) {
- complete(&read_start);
-
- ring_buffer_consumer();
+ complete(&read_start);
 
- set_current_state(TASK_INTERRUPTIBLE);
- if (break_test())
- break;
- schedule();
- }
- __set_current_state(TASK_RUNNING);
-
- if (!kthread_should_stop())
- wait_to_die();
-
- return 0;
+ ring_buffer_consumer();
 }
 
-static int ring_buffer_producer_thread(void *arg)
+static void rb_producer_hammer_func(struct kthread_work *dummy)
 {
- while (!break_test()) {
- ring_buffer_reset(buffer);
+ if (break_test())
+ return;
 
- if (consumer) {
- wake_up_process(consumer);
- wait_for_completion(&read_start);
- }
-
- ring_buffer_producer();
- if (break_test())
- goto out_kill;
+ ring_buffer_reset(buffer);
 
- trace_printk("Sleeping for 10 secs\n");
- set_current_state(TASK_INTERRUPTIBLE);
- if (break_test())
- goto out_kill;
- schedule_timeout(HZ * SLEEP_TIME);
+ if (rb_consumer_worker) {
+ queue_kthread_work(rb_consumer_worker, &rb_consumer_work);
+ wait_for_completion(&read_start);
  }
 
-out_kill:
- __set_current_state(TASK_RUNNING);
- if (!kthread_should_stop())
- wait_to_die();
+ ring_buffer_producer();
 
- return 0;
+ if (break_test())
+ return;
+
+ trace_printk("Sleeping for 10 secs\n");
+ queue_delayed_kthread_work(rb_producer_worker,
+   &rb_producer_hammer_work,
+   HZ * SLEEP_TIME);
 }
 
 static int __init ring_buffer_benchmark_init(void)
 {
- int ret;
+ int ret = 0;
 
  /* make a one meg buffer in overwite mode */
  buffer = ring_buffer_alloc(1000000, RB_FL_OVERWRITE);
@@ -435,19 +414,21 @@ static int __init ring_buffer_benchmark_init(void)
  return -ENOMEM;
 
  if (!disable_reader) {
- consumer = kthread_create(ring_buffer_consumer_thread,
-  NULL, "rb_consumer");
- ret = PTR_ERR(consumer);
- if (IS_ERR(consumer))
+ rb_consumer_worker = create_kthread_worker(0, "rb_consumer");
+ if (IS_ERR(rb_consumer_worker)) {
+ ret = PTR_ERR(rb_consumer_worker);
  goto out_fail;
+ }
  }
 
- producer = kthread_run(ring_buffer_producer_thread,
-       NULL, "rb_producer");
- ret = PTR_ERR(producer);
-
- if (IS_ERR(producer))
+ rb_producer_worker = create_kthread_worker(0, "rb_producer");
+ if (IS_ERR(rb_producer_worker)) {
+ ret = PTR_ERR(rb_producer_worker);
  goto out_kill;
+ }
+
+ queue_delayed_kthread_work(rb_producer_worker,
+   &rb_producer_hammer_work, 0);
 
  /*
  * Run them as low-prio background tasks by default:
@@ -457,24 +438,26 @@ static int __init ring_buffer_benchmark_init(void)
  struct sched_param param = {
  .sched_priority = consumer_fifo
  };
- sched_setscheduler(consumer, SCHED_FIFO, &param);
+ sched_setscheduler(rb_consumer_worker->task,
+   SCHED_FIFO, &param);
  } else
- set_user_nice(consumer, consumer_nice);
+ set_user_nice(rb_consumer_worker->task, consumer_nice);
  }
 
  if (producer_fifo >= 0) {
  struct sched_param param = {
  .sched_priority = producer_fifo
  };
- sched_setscheduler(producer, SCHED_FIFO, &param);
+ sched_setscheduler(rb_producer_worker->task,
+   SCHED_FIFO, &param);
  } else
- set_user_nice(producer, producer_nice);
+ set_user_nice(rb_producer_worker->task, producer_nice);
 
  return 0;
 
  out_kill:
- if (consumer)
- kthread_stop(consumer);
+ if (rb_consumer_worker)
+ destroy_kthread_worker(rb_consumer_worker);
 
  out_fail:
  ring_buffer_free(buffer);
@@ -483,9 +466,11 @@ static int __init ring_buffer_benchmark_init(void)
 
 static void __exit ring_buffer_benchmark_exit(void)
 {
- kthread_stop(producer);
- if (consumer)
- kthread_stop(consumer);
+ test_end = 1;
+ cancel_delayed_kthread_work_sync(&rb_producer_hammer_work);
+ destroy_kthread_worker(rb_producer_worker);
+ if (rb_consumer_worker)
+ destroy_kthread_worker(rb_consumer_worker);
  ring_buffer_free(buffer);
 }
 
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 14/20] kmemleak: Convert kmemleak kthread into kthread worker API

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the kmemleak kthread into the kthread worker
API because it modifies the scheduling priority.

The result is a simple self-queuing work that just calls kmemleak_scan().

The info messages and set_user_nice() are moved to the functions that
start and stop the worker. These are also renamed to mention worker
instead of thread.

We do not longer need to handle a spurious wakeup and count the remaining
timeout. It is handled by the worker. The delayed work is queued after
the full timeout passes.

Finally, the initial delay is done only when the kthread is started
during the boot. For this we added a parameter to the start function.

Signed-off-by: Petr Mladek <[hidden email]>
CC: Catalin Marinas <[hidden email]>
---
 mm/kmemleak.c | 87 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 43 insertions(+), 44 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index e6429926e957..ad61c51af5d5 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -216,7 +216,8 @@ static int kmemleak_error;
 static unsigned long min_addr = ULONG_MAX;
 static unsigned long max_addr;
 
-static struct task_struct *scan_thread;
+static struct kthread_worker *kmemleak_scan_worker;
+static struct delayed_kthread_work kmemleak_scan_work;
 /* used to avoid reporting of recently allocated objects */
 static unsigned long jiffies_min_age;
 static unsigned long jiffies_last_scan;
@@ -1469,54 +1470,48 @@ static void kmemleak_scan(void)
 }
 
 /*
- * Thread function performing automatic memory scanning. Unreferenced objects
- * at the end of a memory scan are reported but only the first time.
+ * Kthread worker function performing automatic memory scanning.
+ * Unreferenced objects at the end of a memory scan are reported
+ * but only the first time.
  */
-static int kmemleak_scan_thread(void *arg)
+static void kmemleak_scan_func(struct kthread_work *dummy)
 {
- static int first_run = 1;
-
- pr_info("Automatic memory scanning thread started\n");
- set_user_nice(current, 10);
-
- /*
- * Wait before the first scan to allow the system to fully initialize.
- */
- if (first_run) {
- first_run = 0;
- ssleep(SECS_FIRST_SCAN);
- }
-
- while (!kthread_should_stop()) {
- signed long timeout = jiffies_scan_wait;
-
- mutex_lock(&scan_mutex);
- kmemleak_scan();
- mutex_unlock(&scan_mutex);
-
- /* wait before the next scan */
- while (timeout && !kthread_should_stop())
- timeout = schedule_timeout_interruptible(timeout);
- }
-
- pr_info("Automatic memory scanning thread ended\n");
+ mutex_lock(&scan_mutex);
+ kmemleak_scan();
+ mutex_unlock(&scan_mutex);
 
- return 0;
+ queue_delayed_kthread_work(kmemleak_scan_worker, &kmemleak_scan_work,
+   jiffies_scan_wait);
 }
 
 /*
  * Start the automatic memory scanning thread. This function must be called
  * with the scan_mutex held.
  */
-static void start_scan_thread(void)
+static void start_scan_thread(bool boot)
 {
- if (scan_thread)
+ unsigned long timeout = 0;
+
+ if (kmemleak_scan_worker)
  return;
- scan_thread = kthread_run(kmemleak_scan_thread, NULL, "kmemleak");
- if (IS_ERR(scan_thread)) {
- pr_warn("Failed to create the scan thread\n");
- scan_thread = NULL;
+
+ init_delayed_kthread_work(&kmemleak_scan_work, kmemleak_scan_func);
+ kmemleak_scan_worker = create_kthread_worker(0, "kmemleak");
+ if (IS_ERR(kmemleak_scan_worker)) {
+ pr_warn("Failed to create the memory scan worker\n");
+ kmemleak_scan_worker = NULL;
  }
+ pr_info("Automatic memory scanning worker started\n");
+ set_user_nice(kmemleak_scan_worker->task, 10);
+
+ /*
+ * Wait before the first scan to allow the system to fully initialize.
+ */
+ if (boot)
+ timeout = msecs_to_jiffies(SECS_FIRST_SCAN * MSEC_PER_SEC);
+
+ queue_delayed_kthread_work(kmemleak_scan_worker, &kmemleak_scan_work,
+   timeout);
 }
 
 /*
@@ -1525,10 +1520,14 @@ static void start_scan_thread(void)
  */
 static void stop_scan_thread(void)
 {
- if (scan_thread) {
- kthread_stop(scan_thread);
- scan_thread = NULL;
- }
+ if (!kmemleak_scan_worker)
+ return;
+
+ cancel_delayed_kthread_work_sync(&kmemleak_scan_work);
+ destroy_kthread_worker(kmemleak_scan_worker);
+ kmemleak_scan_worker = NULL;
+
+ pr_info("Automatic memory scanning thread ended\n");
 }
 
 /*
@@ -1725,7 +1724,7 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
  else if (strncmp(buf, "stack=off", 9) == 0)
  kmemleak_stack_scan = 0;
  else if (strncmp(buf, "scan=on", 7) == 0)
- start_scan_thread();
+ start_scan_thread(false);
  else if (strncmp(buf, "scan=off", 8) == 0)
  stop_scan_thread();
  else if (strncmp(buf, "scan=", 5) == 0) {
@@ -1737,7 +1736,7 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
  stop_scan_thread();
  if (secs) {
  jiffies_scan_wait = msecs_to_jiffies(secs * 1000);
- start_scan_thread();
+ start_scan_thread(false);
  }
  } else if (strncmp(buf, "scan", 4) == 0)
  kmemleak_scan();
@@ -1960,7 +1959,7 @@ static int __init kmemleak_late_init(void)
  if (!dentry)
  pr_warn("Failed to create the debugfs kmemleak file\n");
  mutex_lock(&scan_mutex);
- start_scan_thread();
+ start_scan_thread(true);
  mutex_unlock(&scan_mutex);
 
  pr_info("Kernel memory leak detector initialized\n");
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 13/20] hung_task: Convert hungtaskd into kthread worker API

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts hungtaskd() in kthread worker API because
it modifies the priority.

This patch moves one iteration of the main cycle into a self-queuing
delayed kthread work. It does not longer check if it was called
earlier. Instead, the work is scheduled only when needed. This
requires storing the time of the last check into a global
variable.

Also the check is not longer schedule with MAX_SCHEDULE_TIMEOUT
when it is disabled. Instead the work is canceled and it is
not queued at all.

There is a small race window when sysctl_hung_task_timeout_secs
might be modified between queuing and processing the work.
Therefore the lapsed time has to be computed explicitly.

The user nice and initial hung_task_last_checked values are
set from hung_task_init(). Otherwise, we would need to add
an extra init_work.

The patch also handles the error when the kthread worker could not
be crated from some reasons. It was broken before. For example,
wake_up_process would have failed if watchdog_task included an error
code instead of a valid pointer.

Signed-off-by: Petr Mladek <[hidden email]>
CC: [hidden email]
---
 kernel/hung_task.c | 83 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 50 insertions(+), 33 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d234022805dc..9070c822abd8 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -36,12 +36,15 @@ int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
  * Zero means infinite timeout - no checking done:
  */
 unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;
+unsigned long hung_task_last_checked;
 
 int __read_mostly sysctl_hung_task_warnings = 10;
 
 static int __read_mostly did_panic;
 
-static struct task_struct *watchdog_task;
+static struct kthread_worker *watchdog_worker;
+static void watchdog_check_func(struct kthread_work *dummy);
+static DEFINE_DELAYED_KTHREAD_WORK(watchdog_check_work, watchdog_check_func);
 
 /*
  * Should we panic (and reboot, if panic_timeout= is set) when a
@@ -72,7 +75,7 @@ static struct notifier_block panic_block = {
  .notifier_call = hung_task_panic,
 };
 
-static void check_hung_task(struct task_struct *t, unsigned long timeout)
+static void check_hung_task(struct task_struct *t, unsigned long lapsed)
 {
  unsigned long switch_count = t->nvcsw + t->nivcsw;
 
@@ -109,7 +112,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
  * complain:
  */
  pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
- t->comm, t->pid, timeout);
+ t->comm, t->pid, lapsed);
  pr_err("      %s %s %.*s\n",
  print_tainted(), init_utsname()->release,
  (int)strcspn(init_utsname()->version, " "),
@@ -155,7 +158,7 @@ static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
  * a really long time (120 seconds). If that happens, print out
  * a warning.
  */
-static void check_hung_uninterruptible_tasks(unsigned long timeout)
+static void check_hung_uninterruptible_tasks(unsigned long lapsed)
 {
  int max_count = sysctl_hung_task_check_count;
  int batch_count = HUNG_TASK_BATCHING;
@@ -179,20 +182,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
  }
  /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
  if (t->state == TASK_UNINTERRUPTIBLE)
- check_hung_task(t, timeout);
+ check_hung_task(t, lapsed);
  }
  unlock:
  rcu_read_unlock();
 }
 
-static long hung_timeout_jiffies(unsigned long last_checked,
- unsigned long timeout)
-{
- /* timeout of 0 will disable the watchdog */
- return timeout ? last_checked - jiffies + timeout * HZ :
- MAX_SCHEDULE_TIMEOUT;
-}
-
 /*
  * Process updating of timeout sysctl
  */
@@ -201,13 +196,26 @@ int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
   size_t *lenp, loff_t *ppos)
 {
  int ret;
+ long remaining;
 
  ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 
- if (ret || !write)
+ if (ret || !write || !watchdog_worker)
+ goto out;
+
+ /* Disable watchdog when there is a zero timeout */
+ if (!sysctl_hung_task_timeout_secs) {
+ cancel_delayed_kthread_work_sync(&watchdog_check_work);
  goto out;
+ }
 
- wake_up_process(watchdog_task);
+ /* Reschedule the check according to the updated timeout */
+ remaining = sysctl_hung_task_timeout_secs * HZ -
+    (jiffies - hung_task_last_checked);
+ if (remaining < 0)
+ remaining = 0;
+ mod_delayed_kthread_work(watchdog_worker, &watchdog_check_work,
+ remaining);
 
  out:
  return ret;
@@ -221,36 +229,45 @@ void reset_hung_task_detector(void)
 }
 EXPORT_SYMBOL_GPL(reset_hung_task_detector);
 
+static void schedule_next_watchdog_check(void)
+{
+ unsigned long timeout = READ_ONCE(sysctl_hung_task_timeout_secs);
+
+ hung_task_last_checked = jiffies;
+ if (timeout)
+ queue_delayed_kthread_work(watchdog_worker,
+   &watchdog_check_work,
+   timeout * HZ);
+}
+
 /*
  * kthread which checks for tasks stuck in D state
  */
-static int watchdog(void *dummy)
+static void watchdog_check_func(struct kthread_work *dummy)
 {
- unsigned long hung_last_checked = jiffies;
+ unsigned long lapsed = (jiffies - hung_task_last_checked) / HZ;
 
- set_user_nice(current, 0);
+ if (!atomic_xchg(&reset_hung_task, 0))
+ check_hung_uninterruptible_tasks(lapsed);
 
- for ( ; ; ) {
- unsigned long timeout = sysctl_hung_task_timeout_secs;
- long t = hung_timeout_jiffies(hung_last_checked, timeout);
-
- if (t <= 0) {
- if (!atomic_xchg(&reset_hung_task, 0))
- check_hung_uninterruptible_tasks(timeout);
- hung_last_checked = jiffies;
- continue;
- }
- schedule_timeout_interruptible(t);
- }
-
- return 0;
+ schedule_next_watchdog_check();
 }
 
 static int __init hung_task_init(void)
 {
+ struct kthread_worker *worker;
+
  atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
- watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");
+ worker = create_kthread_worker(0, "khungtaskd");
+ if (IS_ERR(worker)) {
+ pr_warn("Failed to create khungtaskd\n");
+ goto out;
+ }
+ watchdog_worker = worker;
+ set_user_nice(worker->task, 0);
+ schedule_next_watchdog_check();
 
+ out:
  return 0;
 }
 subsys_initcall(hung_task_init);
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 15/20] ipmi: Convert kipmi kthread into kthread worker API

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts kipmi kthread into the kthread worker API because
it modifies the scheduling priority. The change is quite straightforward.

First, we move the per-thread variable "busy_until" into the per-thread
structure struct smi_info. As a side effect, we could omit one parameter
in ipmi_thread_busy_wait(). On the other hand, the structure could not
longer be passed with the const qualifier.

The value of "busy_until" is initialized when the kthread is created.
Also the scheduling priority is set there. This helps to avoid an extra
init work.

One iteration of the kthread cycle is moved to a delayed work function.
The different delays between the cycles are solved the following way:

  + immediate cycle (nope) is converted into goto within the same work

  + immediate cycle with a possible reschedule is converted into
    re-queuing with a zero delay

  + schedule_timeout() is converted into re-queuing with the given
    delay

  + interruptible sleep is converted into nothing; The work
    will get queued again from the check_start_timer_thread().
    By other words the external wakeup_up_process() will get
    replaced by queuing with a zero delay.

Probably the most tricky change is when the worker is being stopped.
We need to explicitly cancel the work to prevent it from re-queuing.

Signed-off-by: Petr Mladek <[hidden email]>
Reviewed-by: Corey Minyard <[hidden email]>
---
 drivers/char/ipmi/ipmi_si_intf.c | 121 ++++++++++++++++++++++-----------------
 1 file changed, 69 insertions(+), 52 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 1e25b5205724..f06481ba4e23 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -304,7 +304,9 @@ struct smi_info {
  /* Counters and things for the proc filesystem. */
  atomic_t stats[SI_NUM_STATS];
 
- struct task_struct *thread;
+ struct kthread_worker *worker;
+ struct delayed_kthread_work work;
+ struct timespec64 busy_until;
 
  struct list_head link;
  union ipmi_smi_info_union addr_info;
@@ -429,8 +431,8 @@ static void start_new_msg(struct smi_info *smi_info, unsigned char *msg,
 {
  smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
- if (smi_info->thread)
- wake_up_process(smi_info->thread);
+ if (smi_info->worker)
+ mod_delayed_kthread_work(smi_info->worker, &smi_info->work, 0);
 
  smi_info->handlers->start_transaction(smi_info->si_sm, msg, size);
 }
@@ -953,8 +955,9 @@ static void check_start_timer_thread(struct smi_info *smi_info)
  if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL) {
  smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
- if (smi_info->thread)
- wake_up_process(smi_info->thread);
+ if (smi_info->worker)
+ mod_delayed_kthread_work(smi_info->worker,
+ &smi_info->work, 0);
 
  start_next_msg(smi_info);
  smi_event_handler(smi_info, 0);
@@ -1032,10 +1035,10 @@ static inline int ipmi_si_is_busy(struct timespec64 *ts)
 }
 
 static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
- const struct smi_info *smi_info,
- struct timespec64 *busy_until)
+ struct smi_info *smi_info)
 {
  unsigned int max_busy_us = 0;
+ struct timespec64 *busy_until = &smi_info->busy_until;
 
  if (smi_info->intf_num < num_max_busy_us)
  max_busy_us = kipmid_max_busy_us[smi_info->intf_num];
@@ -1066,53 +1069,49 @@ static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
  * (if that is enabled).  See the paragraph on kimid_max_busy_us in
  * Documentation/IPMI.txt for details.
  */
-static int ipmi_thread(void *data)
+static void ipmi_kthread_worker_func(struct kthread_work *work)
 {
- struct smi_info *smi_info = data;
+ struct smi_info *smi_info = container_of(work, struct smi_info,
+ work.work);
  unsigned long flags;
  enum si_sm_result smi_result;
- struct timespec64 busy_until;
+ int busy_wait;
 
- ipmi_si_set_not_busy(&busy_until);
- set_user_nice(current, MAX_NICE);
- while (!kthread_should_stop()) {
- int busy_wait;
+next:
+ spin_lock_irqsave(&(smi_info->si_lock), flags);
+ smi_result = smi_event_handler(smi_info, 0);
 
- spin_lock_irqsave(&(smi_info->si_lock), flags);
- smi_result = smi_event_handler(smi_info, 0);
+ /*
+ * If the driver is doing something, there is a possible
+ * race with the timer.  If the timer handler see idle,
+ * and the thread here sees something else, the timer
+ * handler won't restart the timer even though it is
+ * required.  So start it here if necessary.
+ */
+ if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
+ smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
- /*
- * If the driver is doing something, there is a possible
- * race with the timer.  If the timer handler see idle,
- * and the thread here sees something else, the timer
- * handler won't restart the timer even though it is
- * required.  So start it here if necessary.
- */
- if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
- smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
-
- spin_unlock_irqrestore(&(smi_info->si_lock), flags);
- busy_wait = ipmi_thread_busy_wait(smi_result, smi_info,
-  &busy_until);
- if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
- ; /* do nothing */
- else if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait)
- schedule();
- else if (smi_result == SI_SM_IDLE) {
- if (atomic_read(&smi_info->need_watch)) {
- schedule_timeout_interruptible(100);
- } else {
- /* Wait to be woken up when we are needed. */
- __set_current_state(TASK_INTERRUPTIBLE);
- schedule();
- }
- } else
- schedule_timeout_interruptible(1);
+ spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+ busy_wait = ipmi_thread_busy_wait(smi_result, smi_info);
+
+ if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
+ goto next;
+ if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait) {
+ queue_delayed_kthread_work(smi_info->worker,
+   &smi_info->work, 0);
+ } else if (smi_result == SI_SM_IDLE) {
+ if (atomic_read(&smi_info->need_watch)) {
+ queue_delayed_kthread_work(smi_info->worker,
+   &smi_info->work, 100);
+ } else {
+ /* Nope. Wait to be queued when we are needed. */
+ }
+ } else {
+ queue_delayed_kthread_work(smi_info->worker,
+   &smi_info->work, 1);
  }
- return 0;
 }
 
-
 static void poll(void *send_info)
 {
  struct smi_info *smi_info = send_info;
@@ -1253,17 +1252,30 @@ static int smi_start_processing(void       *send_info,
  enable = 1;
 
  if (enable) {
- new_smi->thread = kthread_run(ipmi_thread, new_smi,
-      "kipmi%d", new_smi->intf_num);
- if (IS_ERR(new_smi->thread)) {
+ struct kthread_worker *worker;
+
+ worker = create_kthread_worker(0, "kipmi%d",
+       new_smi->intf_num);
+
+ if (IS_ERR(worker)) {
  dev_notice(new_smi->dev, "Could not start"
    " kernel thread due to error %ld, only using"
    " timers to drive the interface\n",
-   PTR_ERR(new_smi->thread));
- new_smi->thread = NULL;
+   PTR_ERR(worker));
+ goto out;
  }
+
+ ipmi_si_set_not_busy(&new_smi->busy_until);
+ set_user_nice(worker->task, MAX_NICE);
+
+ init_delayed_kthread_work(&new_smi->work,
+  ipmi_kthread_worker_func);
+ queue_delayed_kthread_work(worker, &new_smi->work, 0);
+
+ new_smi->worker = worker;
  }
 
+out:
  return 0;
 }
 
@@ -3442,8 +3454,13 @@ static void check_for_broken_irqs(struct smi_info *smi_info)
 
 static inline void wait_for_timer_and_thread(struct smi_info *smi_info)
 {
- if (smi_info->thread != NULL)
- kthread_stop(smi_info->thread);
+ if (smi_info->worker != NULL) {
+ struct kthread_worker *worker = smi_info->worker;
+
+ smi_info->worker = NULL;
+ cancel_delayed_kthread_work_sync(&smi_info->work);
+ destroy_kthread_worker(worker);
+ }
  if (smi_info->timer_running)
  del_timer_sync(&smi_info->si_timer);
 }
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 18/20] memstick/r592: convert r592_io kthread into kthread worker API

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the r592_io kthread into the kthread worker
API. I am not sure how busy the kthread is and if anyone would
like to control the resources. It is well possible that a workqueue
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.

The patch moves one iteration from the kthread into the kthread
worker function. It helps to remove all the hackery with process
state and kthread_should_stop().

The work is queued instead of waking the thread.

The work is explicitly canceled before the worker is destroyed.
It is self-queuing and it might take a long time until the queue
is drained, otherwise.

Important: The change is only compile tested. I did not find an easy
way how to check it in use.

Signed-off-by: Petr Mladek <[hidden email]>
CC: Maxim Levitsky <[hidden email]>
---
 drivers/memstick/host/r592.c | 58 ++++++++++++++++++++------------------------
 drivers/memstick/host/r592.h |  3 ++-
 2 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index 7d29d6549110..1d8547b12eca 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -562,40 +562,32 @@ out:
  return;
 }
 
-/* Main request processing thread */
-static int r592_process_thread(void *data)
+/* Main request processing work */
+static void r592_process_func(struct kthread_work *work)
 {
  int error;
- struct r592_device *dev = (struct r592_device *)data;
-
- while (!kthread_should_stop()) {
- if (!dev->io_started) {
- dbg_verbose("IO: started");
- dev->io_started = true;
- }
-
- set_current_state(TASK_INTERRUPTIBLE);
- error = memstick_next_req(dev->host, &dev->req);
+ struct r592_device *dev =
+ container_of(work, struct r592_device, io_work);
 
- if (error) {
- if (error == -ENXIO || error == -EAGAIN) {
- dbg_verbose("IO: done");
- } else {
- dbg("IO: unknown error from "
- "memstick_next_req %d", error);
- }
- dev->io_started = false;
+ if (!dev->io_started) {
+ dbg_verbose("IO: started");
+ dev->io_started = true;
+ }
 
- if (kthread_should_stop())
- set_current_state(TASK_RUNNING);
+ error = memstick_next_req(dev->host, &dev->req);
 
- schedule();
+ if (error) {
+ if (error == -ENXIO || error == -EAGAIN) {
+ dbg_verbose("IO: done");
  } else {
- set_current_state(TASK_RUNNING);
- r592_execute_tpc(dev);
+ dbg("IO: unknown error from memstick_next_req %d",
+    error);
  }
+ dev->io_started = false;
+ } else {
+ r592_execute_tpc(dev);
+ queue_kthread_work(dev->io_worker, &dev->io_work);
  }
- return 0;
 }
 
 /* Reprogram chip to detect change in card state */
@@ -720,7 +712,7 @@ static void r592_submit_req(struct memstick_host *host)
  if (dev->req)
  return;
 
- wake_up_process(dev->io_thread);
+ queue_kthread_work(dev->io_worker, &dev->io_work);
 }
 
 static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -778,9 +770,10 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
  r592_check_dma(dev);
 
  dev->io_started = false;
- dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
- if (IS_ERR(dev->io_thread)) {
- error = PTR_ERR(dev->io_thread);
+ init_kthread_work(&dev->io_work, r592_process_func);
+ dev->io_worker = create_kthread_worker(0, "r592_io");
+ if (IS_ERR(dev->io_worker)) {
+ error = PTR_ERR(dev->io_worker);
  goto error5;
  }
 
@@ -806,7 +799,7 @@ error6:
  dma_free_coherent(&pdev->dev, PAGE_SIZE, dev->dummy_dma_page,
  dev->dummy_dma_page_physical_address);
 
- kthread_stop(dev->io_thread);
+ destroy_kthread_worker(dev->io_worker);
 error5:
  iounmap(dev->mmio);
 error4:
@@ -826,7 +819,8 @@ static void r592_remove(struct pci_dev *pdev)
 
  /* Stop the processing thread.
  That ensures that we won't take any more requests */
- kthread_stop(dev->io_thread);
+ cancel_kthread_work_sync(&dev->io_work);
+ destroy_kthread_worker(dev->io_worker);
 
  r592_enable_device(dev, false);
 
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index aa8f0f22f4ce..1ac71380ac04 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -139,7 +139,8 @@ struct r592_device {
  spinlock_t irq_lock;
  struct timer_list detect_timer;
 
- struct task_struct *io_thread;
+ struct kthread_worker *io_worker;
+ struct kthread_work io_work;
  bool io_started;
  bool parallel_mode;
 
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 19/20] thermal/intel_powerclamp: Remove duplicated code that starts the kthread

Petr Mladek-2
In reply to this post by Petr Mladek-2
This patch removes a code duplication. It does not modify
the functionality.

Signed-off-by: Petr Mladek <[hidden email]>
CC: Zhang Rui <[hidden email]>
CC: Eduardo Valentin <[hidden email]>
CC: Jacob Pan <[hidden email]>
CC: Sebastian Andrzej Siewior <[hidden email]>
CC: [hidden email]
Acked-by: Jacob Pan <[hidden email]>
---
 drivers/thermal/intel_powerclamp.c | 45 +++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 6c79588251d5..cb32c38f9828 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -505,10 +505,27 @@ static void poll_pkg_cstate(struct work_struct *dummy)
  schedule_delayed_work(&poll_pkg_cstate_work, HZ);
 }
 
+static void start_power_clamp_thread(unsigned long cpu)
+{
+ struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
+ struct task_struct *thread;
+
+ thread = kthread_create_on_node(clamp_thread,
+ (void *) cpu,
+ cpu_to_node(cpu),
+ "kidle_inject/%ld", cpu);
+ if (IS_ERR(thread))
+ return;
+
+ /* bind to cpu here */
+ kthread_bind(thread, cpu);
+ wake_up_process(thread);
+ *p = thread;
+}
+
 static int start_power_clamp(void)
 {
  unsigned long cpu;
- struct task_struct *thread;
 
  /* check if pkg cstate counter is completely 0, abort in this case */
  if (!has_pkg_state_counter()) {
@@ -530,20 +547,7 @@ static int start_power_clamp(void)
 
  /* start one thread per online cpu */
  for_each_online_cpu(cpu) {
- struct task_struct **p =
- per_cpu_ptr(powerclamp_thread, cpu);
-
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%ld", cpu);
- /* bind to cpu here */
- if (likely(!IS_ERR(thread))) {
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *p = thread;
- }
-
+ start_power_clamp_thread(cpu);
  }
  put_online_cpus();
 
@@ -575,7 +579,6 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
  unsigned long action, void *hcpu)
 {
  unsigned long cpu = (unsigned long)hcpu;
- struct task_struct *thread;
  struct task_struct **percpu_thread =
  per_cpu_ptr(powerclamp_thread, cpu);
 
@@ -584,15 +587,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 
  switch (action) {
  case CPU_ONLINE:
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%lu", cpu);
- if (likely(!IS_ERR(thread))) {
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *percpu_thread = thread;
- }
+ start_power_clamp_thread(cpu);
  /* prefer BSP as controlling CPU */
  if (cpu == 0) {
  control_cpu = 0;
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the intel powerclamp kthreads into the kthread
worker because they need to have a good control over the assigned
CPUs.

IMHO, the most natural way is to split one cycle into two works.
First one does some balancing and let the CPU work normal
way for some time. The second work checks what the CPU has done
in the meantime and put it into C-state to reach the required
idle time ratio. The delay between the two works is achieved
by the delayed kthread work.

The two works have to share some data that used to be local
variables of the single kthread function. This is achieved
by the new per-CPU struct kthread_worker_data. It might look
as a complication. On the other hand, the long original kthread
function was not nice either.

The patch tries to avoid extra init and cleanup works. All the
actions might be done outside the thread. They are moved
to the functions that create or destroy the worker. Especially,
I checked that the timers are assigned to the right CPU.

The two works are queuing each other. It makes it a bit tricky to
break it when we want to stop the worker. We use the global and
per-worker "clamping" variables to make sure that the re-queuing
eventually stops. We also cancel the works to make it faster.
Note that the canceling is not reliable because the handling
of the two variables and queuing is not synchronized via a lock.
But it is not a big deal because it is just an optimization.
The job is stopped faster than before in most cases.

Signed-off-by: Petr Mladek <[hidden email]>
CC: Zhang Rui <[hidden email]>
CC: Eduardo Valentin <[hidden email]>
CC: Jacob Pan <[hidden email]>
CC: Sebastian Andrzej Siewior <[hidden email]>
CC: [hidden email]
Acked-by: Jacob Pan <[hidden email]>
---
 drivers/thermal/intel_powerclamp.c | 287 ++++++++++++++++++++++---------------
 1 file changed, 168 insertions(+), 119 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index cb32c38f9828..c6f4058a572b 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -86,11 +86,27 @@ static unsigned int control_cpu; /* The cpu assigned to collect stat and update
   */
 static bool clamping;
 
+static const struct sched_param sparam = {
+ .sched_priority = MAX_USER_RT_PRIO / 2,
+};
+struct powerclamp_worker_data {
+ struct kthread_worker *worker;
+ struct kthread_work balancing_work;
+ struct delayed_kthread_work idle_injection_work;
+ struct timer_list wakeup_timer;
+ unsigned int cpu;
+ unsigned int count;
+ unsigned int guard;
+ unsigned int window_size_now;
+ unsigned int target_ratio;
+ unsigned int duration_jiffies;
+ bool clamping;
+};
 
-static struct task_struct * __percpu *powerclamp_thread;
+static struct powerclamp_worker_data * __percpu worker_data;
 static struct thermal_cooling_device *cooling_dev;
 static unsigned long *cpu_clamping_mask;  /* bit map for tracking per cpu
-   * clamping thread
+   * clamping kthread worker
    */
 
 static unsigned int duration;
@@ -368,100 +384,102 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio,
  return set_target_ratio + guard <= current_ratio;
 }
 
-static int clamp_thread(void *arg)
+static void clamp_balancing_func(struct kthread_work *work)
 {
- int cpunr = (unsigned long)arg;
- DEFINE_TIMER(wakeup_timer, noop_timer, 0, 0);
- static const struct sched_param param = {
- .sched_priority = MAX_USER_RT_PRIO/2,
- };
- unsigned int count = 0;
- unsigned int target_ratio;
+ struct powerclamp_worker_data *w_data;
+ int sleeptime;
+ unsigned long target_jiffies;
+ unsigned int compensation;
+ int interval; /* jiffies to sleep for each attempt */
 
- set_bit(cpunr, cpu_clamping_mask);
- set_freezable();
- init_timer_on_stack(&wakeup_timer);
- sched_setscheduler(current, SCHED_FIFO, &param);
-
- while (true == clamping && !kthread_should_stop() &&
- cpu_online(cpunr)) {
- int sleeptime;
- unsigned long target_jiffies;
- unsigned int guard;
- unsigned int compensation = 0;
- int interval; /* jiffies to sleep for each attempt */
- unsigned int duration_jiffies = msecs_to_jiffies(duration);
- unsigned int window_size_now;
-
- try_to_freeze();
- /*
- * make sure user selected ratio does not take effect until
- * the next round. adjust target_ratio if user has changed
- * target such that we can converge quickly.
- */
- target_ratio = set_target_ratio;
- guard = 1 + target_ratio/20;
- window_size_now = window_size;
- count++;
+ w_data = container_of(work, struct powerclamp_worker_data,
+      balancing_work);
 
- /*
- * systems may have different ability to enter package level
- * c-states, thus we need to compensate the injected idle ratio
- * to achieve the actual target reported by the HW.
- */
- compensation = get_compensation(target_ratio);
- interval = duration_jiffies*100/(target_ratio+compensation);
-
- /* align idle time */
- target_jiffies = roundup(jiffies, interval);
- sleeptime = target_jiffies - jiffies;
- if (sleeptime <= 0)
- sleeptime = 1;
- schedule_timeout_interruptible(sleeptime);
- /*
- * only elected controlling cpu can collect stats and update
- * control parameters.
- */
- if (cpunr == control_cpu && !(count%window_size_now)) {
- should_skip =
- powerclamp_adjust_controls(target_ratio,
- guard, window_size_now);
- smp_mb();
- }
+ /*
+ * make sure user selected ratio does not take effect until
+ * the next round. adjust target_ratio if user has changed
+ * target such that we can converge quickly.
+ */
+ w_data->target_ratio = READ_ONCE(set_target_ratio);
+ w_data->guard = 1 + w_data->target_ratio / 20;
+ w_data->window_size_now = window_size;
+ w_data->duration_jiffies = msecs_to_jiffies(duration);
+ w_data->count++;
+
+ /*
+ * systems may have different ability to enter package level
+ * c-states, thus we need to compensate the injected idle ratio
+ * to achieve the actual target reported by the HW.
+ */
+ compensation = get_compensation(w_data->target_ratio);
+ interval = w_data->duration_jiffies * 100 /
+ (w_data->target_ratio + compensation);
+
+ /* align idle time */
+ target_jiffies = roundup(jiffies, interval);
+ sleeptime = target_jiffies - jiffies;
+ if (sleeptime <= 0)
+ sleeptime = 1;
+
+ if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+ queue_delayed_kthread_work(w_data->worker,
+   &w_data->idle_injection_work,
+   sleeptime);
+}
+
+static void clamp_idle_injection_func(struct kthread_work *work)
+{
+ struct powerclamp_worker_data *w_data;
+ unsigned long target_jiffies;
+
+ w_data = container_of(work, struct powerclamp_worker_data,
+      idle_injection_work.work);
+
+ /*
+ * only elected controlling cpu can collect stats and update
+ * control parameters.
+ */
+ if (w_data->cpu == control_cpu &&
+    !(w_data->count % w_data->window_size_now)) {
+ should_skip =
+ powerclamp_adjust_controls(w_data->target_ratio,
+   w_data->guard,
+   w_data->window_size_now);
+ smp_mb();
+ }
 
- if (should_skip)
- continue;
+ if (should_skip)
+ goto balance;
+
+ target_jiffies = jiffies + w_data->duration_jiffies;
+ mod_timer(&w_data->wakeup_timer, target_jiffies);
+ if (unlikely(local_softirq_pending()))
+ goto balance;
+ /*
+ * stop tick sched during idle time, interrupts are still
+ * allowed. thus jiffies are updated properly.
+ */
+ preempt_disable();
+ /* mwait until target jiffies is reached */
+ while (time_before(jiffies, target_jiffies)) {
+ unsigned long ecx = 1;
+ unsigned long eax = target_mwait;
 
- target_jiffies = jiffies + duration_jiffies;
- mod_timer(&wakeup_timer, target_jiffies);
- if (unlikely(local_softirq_pending()))
- continue;
  /*
- * stop tick sched during idle time, interrupts are still
- * allowed. thus jiffies are updated properly.
+ * REVISIT: may call enter_idle() to notify drivers who
+ * can save power during cpu idle. same for exit_idle()
  */
- preempt_disable();
- /* mwait until target jiffies is reached */
- while (time_before(jiffies, target_jiffies)) {
- unsigned long ecx = 1;
- unsigned long eax = target_mwait;
-
- /*
- * REVISIT: may call enter_idle() to notify drivers who
- * can save power during cpu idle. same for exit_idle()
- */
- local_touch_nmi();
- stop_critical_timings();
- mwait_idle_with_hints(eax, ecx);
- start_critical_timings();
- atomic_inc(&idle_wakeup_counter);
- }
- preempt_enable();
+ local_touch_nmi();
+ stop_critical_timings();
+ mwait_idle_with_hints(eax, ecx);
+ start_critical_timings();
+ atomic_inc(&idle_wakeup_counter);
  }
- del_timer_sync(&wakeup_timer);
- clear_bit(cpunr, cpu_clamping_mask);
+ preempt_enable();
 
- return 0;
+balance:
+ if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+ queue_kthread_work(w_data->worker, &w_data->balancing_work);
 }
 
 /*
@@ -505,22 +523,58 @@ static void poll_pkg_cstate(struct work_struct *dummy)
  schedule_delayed_work(&poll_pkg_cstate_work, HZ);
 }
 
-static void start_power_clamp_thread(unsigned long cpu)
+static void start_power_clamp_worker(unsigned long cpu)
 {
- struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
- struct task_struct *thread;
-
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%ld", cpu);
- if (IS_ERR(thread))
+ struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+ struct kthread_worker *worker;
+
+ worker = create_kthread_worker_on_cpu(cpu, KTW_FREEZABLE,
+      "kidle_inject/%ld", cpu);
+ if (IS_ERR(worker))
  return;
 
- /* bind to cpu here */
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *p = thread;
+ w_data->worker = worker;
+ w_data->count = 0;
+ w_data->cpu = cpu;
+ w_data->clamping = true;
+ set_bit(cpu, cpu_clamping_mask);
+ setup_timer(&w_data->wakeup_timer, noop_timer, 0);
+ sched_setscheduler(worker->task, SCHED_FIFO, &sparam);
+ init_kthread_work(&w_data->balancing_work, clamp_balancing_func);
+ init_delayed_kthread_work(&w_data->idle_injection_work,
+  clamp_idle_injection_func);
+ queue_kthread_work(w_data->worker, &w_data->balancing_work);
+}
+
+static void stop_power_clamp_worker(unsigned long cpu)
+{
+ struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+
+ if (!w_data->worker)
+ return;
+
+ w_data->clamping = false;
+ /*
+ * Make sure that all works that get queued after this point see
+ * the clamping disabled. The counter part is not needed because
+ * there is an implicit memory barrier when the queued work
+ * is proceed.
+ */
+ smp_wmb();
+ cancel_kthread_work_sync(&w_data->balancing_work);
+ cancel_delayed_kthread_work_sync(&w_data->idle_injection_work);
+ /*
+ * The balancing work still might be queued here because
+ * the handling of the "clapming" variable, cancel, and queue
+ * operations are not synchronized via a lock. But it is not
+ * a big deal. The balancing work is fast and destroy kthread
+ * will wait for it.
+ */
+ del_timer_sync(&w_data->wakeup_timer);
+ clear_bit(w_data->cpu, cpu_clamping_mask);
+ destroy_kthread_worker(w_data->worker);
+
+ w_data->worker = NULL;
 }
 
 static int start_power_clamp(void)
@@ -545,9 +599,9 @@ static int start_power_clamp(void)
  clamping = true;
  schedule_delayed_work(&poll_pkg_cstate_work, 0);
 
- /* start one thread per online cpu */
+ /* start one kthread worker per online cpu */
  for_each_online_cpu(cpu) {
- start_power_clamp_thread(cpu);
+ start_power_clamp_worker(cpu);
  }
  put_online_cpus();
 
@@ -557,20 +611,17 @@ static int start_power_clamp(void)
 static void end_power_clamp(void)
 {
  int i;
- struct task_struct *thread;
 
- clamping = false;
  /*
- * make clamping visible to other cpus and give per cpu clamping threads
- * sometime to exit, or gets killed later.
+ * Block requeuing in all the kthread workers. They will drain and
+ * stop faster.
  */
- smp_mb();
- msleep(20);
+ clamping = false;
  if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
  for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
- pr_debug("clamping thread for cpu %d alive, kill\n", i);
- thread = *per_cpu_ptr(powerclamp_thread, i);
- kthread_stop(thread);
+ pr_debug("clamping worker for cpu %d alive, destroy\n",
+ i);
+ stop_power_clamp_worker(i);
  }
  }
 }
@@ -579,15 +630,13 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
  unsigned long action, void *hcpu)
 {
  unsigned long cpu = (unsigned long)hcpu;
- struct task_struct **percpu_thread =
- per_cpu_ptr(powerclamp_thread, cpu);
 
  if (false == clamping)
  goto exit_ok;
 
  switch (action) {
  case CPU_ONLINE:
- start_power_clamp_thread(cpu);
+ start_power_clamp_worker(cpu);
  /* prefer BSP as controlling CPU */
  if (cpu == 0) {
  control_cpu = 0;
@@ -598,7 +647,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
  if (test_bit(cpu, cpu_clamping_mask)) {
  pr_err("cpu %lu dead but powerclamping thread is not\n",
  cpu);
- kthread_stop(*percpu_thread);
+ stop_power_clamp_worker(cpu);
  }
  if (cpu == control_cpu) {
  control_cpu = smp_processor_id();
@@ -785,8 +834,8 @@ static int __init powerclamp_init(void)
  window_size = 2;
  register_hotcpu_notifier(&powerclamp_cpu_notifier);
 
- powerclamp_thread = alloc_percpu(struct task_struct *);
- if (!powerclamp_thread) {
+ worker_data = alloc_percpu(struct powerclamp_worker_data);
+ if (!worker_data) {
  retval = -ENOMEM;
  goto exit_unregister;
  }
@@ -806,7 +855,7 @@ static int __init powerclamp_init(void)
  return 0;
 
 exit_free_thread:
- free_percpu(powerclamp_thread);
+ free_percpu(worker_data);
 exit_unregister:
  unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 exit_free:
@@ -819,7 +868,7 @@ static void __exit powerclamp_exit(void)
 {
  unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
  end_power_clamp();
- free_percpu(powerclamp_thread);
+ free_percpu(worker_data);
  thermal_cooling_device_unregister(cooling_dev);
  kfree(cpu_clamping_mask);
 
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 17/20] memstick/r592: Better synchronize debug messages in r592_io kthread

Petr Mladek-2
In reply to this post by Petr Mladek-2
There is an attempt to print debug messages when the kthread is waken
and when it goes into sleep. It does not work well because the spin lock
does not guard all manipulations with the thread state.

I did not find a way how to print a message when the kthread really
goes into sleep. Instead, I added a state variable. It clearly marks
when a series of IO requests is started and finished. It makes sure
that we always have a pair of started/done messages.

The only problem is that it will print these messages also when
the kthread is created and there is no real work. We might want
to use create_kthread() instead of run_kthread(). Then the kthread
will stay stopped until the first request.

Important: This change is only compile tested. I did not find an easy
way how to test it. This is why I was conservative and did not modify
the kthread creation.

Signed-off-by: Petr Mladek <[hidden email]>
CC: Maxim Levitsky <[hidden email]>
---
 drivers/memstick/host/r592.c | 19 +++++++++----------
 drivers/memstick/host/r592.h |  2 +-
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index d5cfb503b9d6..7d29d6549110 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -567,21 +567,24 @@ static int r592_process_thread(void *data)
 {
  int error;
  struct r592_device *dev = (struct r592_device *)data;
- unsigned long flags;
 
  while (!kthread_should_stop()) {
- spin_lock_irqsave(&dev->io_thread_lock, flags);
+ if (!dev->io_started) {
+ dbg_verbose("IO: started");
+ dev->io_started = true;
+ }
+
  set_current_state(TASK_INTERRUPTIBLE);
  error = memstick_next_req(dev->host, &dev->req);
- spin_unlock_irqrestore(&dev->io_thread_lock, flags);
 
  if (error) {
  if (error == -ENXIO || error == -EAGAIN) {
- dbg_verbose("IO: done IO, sleeping");
+ dbg_verbose("IO: done");
  } else {
  dbg("IO: unknown error from "
  "memstick_next_req %d", error);
  }
+ dev->io_started = false;
 
  if (kthread_should_stop())
  set_current_state(TASK_RUNNING);
@@ -713,15 +716,11 @@ static int r592_set_param(struct memstick_host *host,
 static void r592_submit_req(struct memstick_host *host)
 {
  struct r592_device *dev = memstick_priv(host);
- unsigned long flags;
 
  if (dev->req)
  return;
 
- spin_lock_irqsave(&dev->io_thread_lock, flags);
- if (wake_up_process(dev->io_thread))
- dbg_verbose("IO thread woken to process requests");
- spin_unlock_irqrestore(&dev->io_thread_lock, flags);
+ wake_up_process(dev->io_thread);
 }
 
 static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -767,7 +766,6 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
  dev->irq = pdev->irq;
  spin_lock_init(&dev->irq_lock);
- spin_lock_init(&dev->io_thread_lock);
  init_completion(&dev->dma_done);
  INIT_KFIFO(dev->pio_fifo);
  setup_timer(&dev->detect_timer,
@@ -779,6 +777,7 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
  host->set_param = r592_set_param;
  r592_check_dma(dev);
 
+ dev->io_started = false;
  dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
  if (IS_ERR(dev->io_thread)) {
  error = PTR_ERR(dev->io_thread);
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index c5726c1e8832..aa8f0f22f4ce 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -137,10 +137,10 @@ struct r592_device {
  void __iomem *mmio;
  int irq;
  spinlock_t irq_lock;
- spinlock_t io_thread_lock;
  struct timer_list detect_timer;
 
  struct task_struct *io_thread;
+ bool io_started;
  bool parallel_mode;
 
  DECLARE_KFIFO(pio_fifo, u8, sizeof(u32));
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 16/20] IB/fmr_pool: Convert the cleanup thread into kthread worker API

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the frm_pool kthread into the kthread worker
API because I am not sure how busy the thread is. It is well
possible that it does not need a dedicated kthread and workqueues
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.

The patch moves one iteration from the kthread into the work function.
It preserves the check for a spurious queuing (wake up). Then it
processes one request. Finally, it re-queues itself if more requests
are pending.

Otherwise, wake_up_process() is replaced by queuing the work.

Important: The change is only compile tested. I did not find an easy
way how to check it in a real life.

Signed-off-by: Petr Mladek <[hidden email]>
CC: Doug Ledford <[hidden email]>
CC: Sean Hefty <[hidden email]>
CC: Hal Rosenstock <[hidden email]>
CC: [hidden email]
---
 drivers/infiniband/core/fmr_pool.c | 54 ++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c
index cdbb1f1a6d97..571089ba3a72 100644
--- a/drivers/infiniband/core/fmr_pool.c
+++ b/drivers/infiniband/core/fmr_pool.c
@@ -96,7 +96,8 @@ struct ib_fmr_pool {
    void *              arg);
  void                     *flush_arg;
 
- struct task_struct       *thread;
+ struct kthread_worker  *worker;
+ struct kthread_work  work;
 
  atomic_t                  req_ser;
  atomic_t                  flush_ser;
@@ -174,29 +175,26 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
  spin_unlock_irq(&pool->pool_lock);
 }
 
-static int ib_fmr_cleanup_thread(void *pool_ptr)
+static void ib_fmr_cleanup_func(struct kthread_work *work)
 {
- struct ib_fmr_pool *pool = pool_ptr;
+ struct ib_fmr_pool *pool = container_of(work, struct ib_fmr_pool, work);
 
- do {
- if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0) {
- ib_fmr_batch_release(pool);
-
- atomic_inc(&pool->flush_ser);
- wake_up_interruptible(&pool->force_wait);
+ /*
+ * The same request might be queued twice when it appears and
+ * by re-queuing from this work.
+ */
+ if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0)
+ return;
 
- if (pool->flush_function)
- pool->flush_function(pool, pool->flush_arg);
- }
+ ib_fmr_batch_release(pool);
+ atomic_inc(&pool->flush_ser);
+ wake_up_interruptible(&pool->force_wait);
 
- set_current_state(TASK_INTERRUPTIBLE);
- if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0 &&
-    !kthread_should_stop())
- schedule();
- __set_current_state(TASK_RUNNING);
- } while (!kthread_should_stop());
+ if (pool->flush_function)
+ pool->flush_function(pool, pool->flush_arg);
 
- return 0;
+ if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0)
+ queue_kthread_work(pool->worker, &pool->work);
 }
 
 /**
@@ -266,15 +264,13 @@ struct ib_fmr_pool *ib_create_fmr_pool(struct ib_pd             *pd,
  atomic_set(&pool->flush_ser, 0);
  init_waitqueue_head(&pool->force_wait);
 
- pool->thread = kthread_run(ib_fmr_cleanup_thread,
-   pool,
-   "ib_fmr(%s)",
-   device->name);
- if (IS_ERR(pool->thread)) {
- pr_warn(PFX "couldn't start cleanup thread\n");
- ret = PTR_ERR(pool->thread);
+ pool->worker = create_kthread_worker(0, "ib_fmr(%s)", device->name);
+ if (IS_ERR(pool->worker)) {
+ pr_warn(PFX "couldn't start cleanup kthread worker\n");
+ ret = PTR_ERR(pool->worker);
  goto out_free_pool;
  }
+ init_kthread_work(&pool->work, ib_fmr_cleanup_func);
 
  {
  struct ib_pool_fmr *fmr;
@@ -339,7 +335,7 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
  LIST_HEAD(fmr_list);
  int                 i;
 
- kthread_stop(pool->thread);
+ destroy_kthread_worker(pool->worker);
  ib_fmr_batch_release(pool);
 
  i = 0;
@@ -389,7 +385,7 @@ int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
  spin_unlock_irq(&pool->pool_lock);
 
  serial = atomic_inc_return(&pool->req_ser);
- wake_up_process(pool->thread);
+ queue_kthread_work(pool->worker, &pool->work);
 
  if (wait_event_interruptible(pool->force_wait,
      atomic_read(&pool->flush_ser) - serial >= 0))
@@ -503,7 +499,7 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr)
  list_add_tail(&fmr->list, &pool->dirty_list);
  if (++pool->dirty_len >= pool->dirty_watermark) {
  atomic_inc(&pool->req_ser);
- wake_up_process(pool->thread);
+ queue_kthread_work(pool->worker, &pool->work);
  }
  }
  }
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 11/20] mm/huge_page: Convert khugepaged() into kthread worker API

Petr Mladek-2
In reply to this post by Petr Mladek-2
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup the worker
when it is already created and someone calls start() once again.

Note that kthread works get associated with a single kthread worker.
They must be initialized if we want to use them with another worker.
This is needed also when the worker is restarted.

set_freezable() is not needed because the kthread worker is
created as freezable.

set_user_nice() is called from start_stop_khugepaged(). It need
not be done from within the kthread.

The scan work must be queued only when the worker is available.
We have to use "khugepaged_mm_lock" to avoid a race between the check
and queuing. I admit that this was a bit easier before because wake_up()
was a nope when the kthread did not exist.

Also the scan work is queued only when the list of scanned pages is
not empty. It adds one check but it is cleaner.

They delay between scans is done using a delayed work.

Note that @khugepaged_wait waitqueue had two purposes. It was used
to wait between scans and when an allocation failed. It is still used
for the second purpose. Therefore it was renamed to better describe
the current use.

Also note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <[hidden email]>
---
 mm/huge_memory.c | 138 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86f9f8b82f8e..1c507115aec2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,10 +89,16 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static struct kthread_worker *khugepaged_worker;
+static struct delayed_kthread_work khugepaged_do_scan_work;
+static struct kthread_work khugepaged_cleanup_work;
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
-static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
+static DECLARE_WAIT_QUEUE_HEAD(khugepaged_alloc_wait);
 /*
  * default collapse hugepages if there is at least one pte mapped like
  * it would have happened if the vma was large enough during page
@@ -100,7 +106,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -176,29 +181,55 @@ static void set_recommended_min_free_kbytes(void)
  setup_per_zone_wmarks();
 }
 
+static int khugepaged_has_work(void)
+{
+ return !list_empty(&khugepaged_scan.mm_head);
+}
+
 static int start_stop_khugepaged(void)
 {
+ struct kthread_worker *worker;
  int err = 0;
+
  if (khugepaged_enabled()) {
- if (!khugepaged_thread)
- khugepaged_thread = kthread_run(khugepaged, NULL,
- "khugepaged");
- if (IS_ERR(khugepaged_thread)) {
- pr_err("khugepaged: kthread_run(khugepaged) failed\n");
- err = PTR_ERR(khugepaged_thread);
- khugepaged_thread = NULL;
- goto fail;
+ if (khugepaged_worker)
+ goto out;
+
+ worker = create_kthread_worker(KTW_FREEZABLE, "khugepaged");
+ if (IS_ERR(worker)) {
+ pr_err("khugepaged: failed to create kthread worker\n");
+ goto out;
  }
+ set_user_nice(worker->task, MAX_NICE);
 
- if (!list_empty(&khugepaged_scan.mm_head))
- wake_up_interruptible(&khugepaged_wait);
+ /* Always initialize the works when the worker is started. */
+ init_delayed_kthread_work(&khugepaged_do_scan_work,
+  khugepaged_do_scan_func);
+ init_kthread_work(&khugepaged_cleanup_work,
+  khugepaged_cleanup_func);
+
+ /* Make the worker public and check for work synchronously. */
+ spin_lock(&khugepaged_mm_lock);
+ khugepaged_worker = worker;
+ if (khugepaged_has_work())
+ queue_delayed_kthread_work(worker,
+   &khugepaged_do_scan_work,
+   0);
+ spin_unlock(&khugepaged_mm_lock);
 
  set_recommended_min_free_kbytes();
- } else if (khugepaged_thread) {
- kthread_stop(khugepaged_thread);
- khugepaged_thread = NULL;
+ } else if (khugepaged_worker) {
+ /* First, stop others from using the worker. */
+ spin_lock(&khugepaged_mm_lock);
+ worker = khugepaged_worker;
+ khugepaged_worker = NULL;
+ spin_unlock(&khugepaged_mm_lock);
+
+ cancel_delayed_kthread_work_sync(&khugepaged_do_scan_work);
+ queue_kthread_work(worker, &khugepaged_cleanup_work);
+ destroy_kthread_worker(worker);
  }
-fail:
+out:
  return err;
 }
 
@@ -467,7 +498,13 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
  return -EINVAL;
 
  khugepaged_scan_sleep_millisecs = msecs;
- wake_up_interruptible(&khugepaged_wait);
+
+ spin_lock(&khugepaged_mm_lock);
+ if (khugepaged_worker && khugepaged_has_work())
+ mod_delayed_kthread_work(khugepaged_worker,
+ &khugepaged_do_scan_work,
+ 0);
+ spin_unlock(&khugepaged_mm_lock);
 
  return count;
 }
@@ -494,7 +531,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
  return -EINVAL;
 
  khugepaged_alloc_sleep_millisecs = msecs;
- wake_up_interruptible(&khugepaged_wait);
+ wake_up_interruptible(&khugepaged_alloc_wait);
 
  return count;
 }
@@ -1920,7 +1957,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 int __khugepaged_enter(struct mm_struct *mm)
 {
  struct mm_slot *mm_slot;
- int wakeup;
+ int has_work;
 
  mm_slot = alloc_mm_slot();
  if (!mm_slot)
@@ -1939,13 +1976,15 @@ int __khugepaged_enter(struct mm_struct *mm)
  * Insert just behind the scanning cursor, to let the area settle
  * down a little.
  */
- wakeup = list_empty(&khugepaged_scan.mm_head);
+ has_work = khugepaged_has_work();
  list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
- spin_unlock(&khugepaged_mm_lock);
 
  atomic_inc(&mm->mm_count);
- if (wakeup)
- wake_up_interruptible(&khugepaged_wait);
+ if (khugepaged_worker && has_work)
+ mod_delayed_kthread_work(khugepaged_worker,
+ &khugepaged_do_scan_work,
+ 0);
+ spin_unlock(&khugepaged_mm_lock);
 
  return 0;
 }
@@ -2184,10 +2223,10 @@ static void khugepaged_alloc_sleep(void)
 {
  DEFINE_WAIT(wait);
 
- add_wait_queue(&khugepaged_wait, &wait);
+ add_wait_queue(&khugepaged_alloc_wait, &wait);
  freezable_schedule_timeout_interruptible(
  msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
- remove_wait_queue(&khugepaged_wait, &wait);
+ remove_wait_queue(&khugepaged_alloc_wait, &wait);
 }
 
 static int khugepaged_node_load[MAX_NUMNODES];
@@ -2758,19 +2797,7 @@ breakouterloop_mmap_sem:
  return progress;
 }
 
-static int khugepaged_has_work(void)
-{
- return !list_empty(&khugepaged_scan.mm_head) &&
- khugepaged_enabled();
-}
-
-static int khugepaged_wait_event(void)
-{
- return !list_empty(&khugepaged_scan.mm_head) ||
- kthread_should_stop();
-}
-
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
  struct page *hpage = NULL;
  unsigned int progress = 0, pass_through_head = 0;
@@ -2785,7 +2812,7 @@ static void khugepaged_do_scan(void)
 
  cond_resched();
 
- if (unlikely(kthread_should_stop() || try_to_freeze()))
+ if (unlikely(!khugepaged_enabled() || try_to_freeze()))
  break;
 
  spin_lock(&khugepaged_mm_lock);
@@ -2802,43 +2829,30 @@ static void khugepaged_do_scan(void)
 
  if (!IS_ERR_OR_NULL(hpage))
  put_page(hpage);
-}
 
-static void khugepaged_wait_work(void)
-{
  if (khugepaged_has_work()) {
- if (!khugepaged_scan_sleep_millisecs)
- return;
 
- wait_event_freezable_timeout(khugepaged_wait,
-     kthread_should_stop(),
- msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
- return;
- }
+ unsigned long delay = 0;
+
+ if (khugepaged_scan_sleep_millisecs)
+ delay = msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
 
- if (khugepaged_enabled())
- wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+ queue_delayed_kthread_work(khugepaged_worker,
+   &khugepaged_do_scan_work,
+   delay);
+ }
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
  struct mm_slot *mm_slot;
 
- set_freezable();
- set_user_nice(current, MAX_NICE);
-
- while (!kthread_should_stop()) {
- khugepaged_do_scan();
- khugepaged_wait_work();
- }
-
  spin_lock(&khugepaged_mm_lock);
  mm_slot = khugepaged_scan.mm_slot;
  khugepaged_scan.mm_slot = NULL;
  if (mm_slot)
  collect_mm_slot(mm_slot);
  spin_unlock(&khugepaged_mm_lock);
- return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 09/20] kthread: Allow to modify delayed kthread work

Petr Mladek-2
In reply to this post by Petr Mladek-2
There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.

This patch implements mod_delayed_kthread_work() as inspired workqueues.
It cancels the timer, removes the work from any worker list and queues it
again with the given timeout.

A very special case is when the work is being canceled at the same time.
It might happen because of the regular cancel_delayed_kthread_work_sync()
or by another mod_delayed_kthread_work(). In this case, we do nothing and
let the other operation win. This should not normally happen as the caller
is supposed to synchronize these operations a reasonable way.

Signed-off-by: Petr Mladek <[hidden email]>
---
 include/linux/kthread.h |  4 ++++
 kernel/kthread.c        | 53 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 49f59b087b6b..1d5ca191562f 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -168,6 +168,10 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
  struct delayed_kthread_work *dwork,
  unsigned long delay);
 
+bool mod_delayed_kthread_work(struct kthread_worker *worker,
+      struct delayed_kthread_work *dwork,
+      unsigned long delay);
+
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 10129fdd4f3b..2cc32cad66ef 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -970,6 +970,59 @@ static bool __cancel_kthread_work(struct kthread_work *work, bool is_dwork,
  return false;
 }
 
+/**
+ * mod_delayed_kthread_work - modify delay of or queue a delayed kthread work
+ * @worker: kthread worker to use
+ * @dwork: delayed kthread work to queue
+ * @delay: number of jiffies to wait before queuing
+ *
+ * If @dwork is idle, equivalent to queue_delayed_kthread work(). Otherwise,
+ * modify @dwork's timer so that it expires after @delay. If @delay is zero,
+ * @work is guaranteed to be queued immediately.
+ *
+ * Return: %true if @dwork was pending and its timer was modified,
+ * %false otherwise.
+ *
+ * A special case is when the work is being canceled in parallel.
+ * It might be caused either by the real cancel_delayed_kthread_work_sync()
+ * or yet another mod_delayed_kthread_work() call. We let the other command
+ * win and return %false here. The caller is supposed to synchronize these
+ * operations a reasonable way.
+ *
+ * This function is safe to call from any context including IRQ handler.
+ * See __cancel_kthread_work() and delayed_kthread_work_timer_fn()
+ * for details.
+ */
+bool mod_delayed_kthread_work(struct kthread_worker *worker,
+      struct delayed_kthread_work *dwork,
+      unsigned long delay)
+{
+ struct kthread_work *work = &dwork->work;
+ unsigned long flags;
+ int ret = false;
+
+ spin_lock_irqsave(&worker->lock, flags);
+
+ /* Do not bother with canceling when never queued. */
+ if (!work->worker)
+ goto fast_queue;
+
+ /* Work must not be used with more workers, see queue_kthread_work() */
+ WARN_ON_ONCE(work->worker != worker);
+
+ /* Do not fight with another command that is canceling this work. */
+ if (work->canceling)
+ goto out;
+
+ ret = __cancel_kthread_work(work, true, &flags);
+fast_queue:
+ __queue_delayed_kthread_work(worker, dwork, delay);
+out:
+ spin_unlock_irqrestore(&worker->lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(mod_delayed_kthread_work);
+
 static bool __cancel_kthread_work_sync(struct kthread_work *work, bool is_dwork)
 {
  struct kthread_worker *worker = work->worker;
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 08/20] kthread: Allow to cancel kthread work

Petr Mladek-2
In reply to this post by Petr Mladek-2
We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.

This patch implements cancel_*_sync() operations as inspired by
workqueues. Well, we are synchronized against the other operations
via the worker lock, we use del_timer_sync() and a counter to count
parallel cancel operations. Therefore the implementation might be easier.

First, we check if a worker is assigned. If not, the work has newer
been queued after it was initialized.

Second, we take the worker lock. It must be the right one. The work must
not be assigned to another worker unless it is initialized in between.

Third, we try to cancel the timer when it exists. The timer is deleted
synchronously to make sure that the timer call back is not running.
We need to temporary release the worker->lock to avoid a possible
deadlock with the callback. In the meantime, we set work->canceling
counter to avoid any queuing.

Fourth, we try to remove the work from a worker list. It might be
the list of either normal or delayed works.

Fifth, if the work is running, we call flush_kthread_work(). It might
take an arbitrary time. We need to release the worker-lock again.
In the meantime, we again block any queuing by the canceling counter.

As already mentioned, the check for a pending kthread work is done under
a lock. In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations. Therefore do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_work_flush(). Any queuing is blocked until the counter is zero.

Signed-off-by: Petr Mladek <[hidden email]>
---
 include/linux/kthread.h |   5 ++
 kernel/kthread.c        | 131 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b27be55cffa3..49f59b087b6b 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -77,6 +77,8 @@ struct kthread_work {
  struct list_head node;
  kthread_work_func_t func;
  struct kthread_worker *worker;
+ /* Number of canceling calls that are running at the moment. */
+ int canceling;
 };
 
 struct delayed_kthread_work {
@@ -169,6 +171,9 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
+bool cancel_kthread_work_sync(struct kthread_work *work);
+bool cancel_delayed_kthread_work_sync(struct delayed_kthread_work *work);
+
 void destroy_kthread_worker(struct kthread_worker *worker);
 
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 7655357065e1..10129fdd4f3b 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -714,6 +714,18 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[], ...)
 }
 EXPORT_SYMBOL(create_kthread_worker_on_cpu);
 
+/*
+ * Returns true when the work could not be queued at the moment.
+ * It happens when it is already pending in a worker list
+ * or when it is being cancelled.
+ *
+ * This function must be called under work->worker->lock.
+ */
+static inline bool queuing_blocked(const struct kthread_work *work)
+{
+ return !list_empty(&work->node) || work->canceling;
+}
+
 static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
        struct kthread_work *work)
 {
@@ -755,7 +767,7 @@ bool queue_kthread_work(struct kthread_worker *worker,
  unsigned long flags;
 
  spin_lock_irqsave(&worker->lock, flags);
- if (list_empty(&work->node)) {
+ if (!queuing_blocked(work)) {
  insert_kthread_work(worker, work, &worker->work_list);
  ret = true;
  }
@@ -855,7 +867,7 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
 
  spin_lock_irqsave(&worker->lock, flags);
 
- if (list_empty(&work->node)) {
+ if (!queuing_blocked(work)) {
  __queue_delayed_kthread_work(worker, dwork, delay);
  ret = true;
  }
@@ -914,6 +926,121 @@ void flush_kthread_work(struct kthread_work *work)
 }
 EXPORT_SYMBOL_GPL(flush_kthread_work);
 
+/*
+ * This function removes the work from the worker queue. Also it makes sure
+ * that it won't get queued later via the delayed work's timer.
+ *
+ * The work might still be in use when this function finishes. See the
+ * current_work proceed by the worker.
+ *
+ * Return: %true if @work was pending and successfully canceled,
+ * %false if @work was not pending
+ */
+static bool __cancel_kthread_work(struct kthread_work *work, bool is_dwork,
+  unsigned long *flags)
+{
+ /* Try to cancel the timer if exists. */
+ if (is_dwork) {
+ struct delayed_kthread_work *dwork =
+ container_of(work, struct delayed_kthread_work, work);
+ struct kthread_worker *worker = work->worker;
+
+ /*
+ * del_timer_sync() must be called to make sure that the timer
+ * callback is not running. The lock must be temporary released
+ * to avoid a deadlock with the callback. In the meantime,
+ * any queuing is blocked by setting the canceling counter.
+ */
+ work->canceling++;
+ spin_unlock_irqrestore(&worker->lock, *flags);
+ del_timer_sync(&dwork->timer);
+ spin_lock_irqsave(&worker->lock, *flags);
+ work->canceling--;
+ }
+
+ /*
+ * Try to remove the work from a worker list. It might either
+ * be from worker->work_list or from worker->delayed_work_list.
+ */
+ if (!list_empty(&work->node)) {
+ list_del_init(&work->node);
+ return true;
+ }
+
+ return false;
+}
+
+static bool __cancel_kthread_work_sync(struct kthread_work *work, bool is_dwork)
+{
+ struct kthread_worker *worker = work->worker;
+ unsigned long flags;
+ int ret = false;
+
+ if (!worker)
+ goto out;
+
+ spin_lock_irqsave(&worker->lock, flags);
+ /* Work must not be used with more workers, see queue_kthread_work(). */
+ WARN_ON_ONCE(work->worker != worker);
+
+ ret = __cancel_kthread_work(work, is_dwork, &flags);
+
+ if (worker->current_work != work)
+ goto out_fast;
+
+ /*
+ * The work is in progress and we need to wait with the lock released.
+ * In the meantime, block any queuing by setting the canceling counter.
+ */
+ work->canceling++;
+ spin_unlock_irqrestore(&worker->lock, flags);
+ flush_kthread_work(work);
+ spin_lock_irqsave(&worker->lock, flags);
+ work->canceling--;
+
+out_fast:
+ spin_unlock_irqrestore(&worker->lock, flags);
+out:
+ return ret;
+}
+
+/**
+ * cancel_kthread_work_sync - cancel a kthread work and wait for it to finish
+ * @work: the kthread work to cancel
+ *
+ * Cancel @work and wait for its execution to finish.  This function
+ * can be used even if the work re-queues itself. On return from this
+ * function, @work is guaranteed to be not pending or executing on any CPU.
+ *
+ * cancel_kthread_work_sync(&delayed_work->work) must not be used for
+ * delayed_work's. Use cancel_delayed_kthread_work_sync() instead.
+ *
+ * The caller must ensure that the worker on which @work was last
+ * queued can't be destroyed before this function returns.
+ *
+ * Return: %true if @work was pending, %false otherwise.
+ */
+bool cancel_kthread_work_sync(struct kthread_work *work)
+{
+ return __cancel_kthread_work_sync(work, false);
+}
+EXPORT_SYMBOL_GPL(cancel_kthread_work_sync);
+
+/**
+ * cancel_delayed_kthread_work_sync - cancel a delayed kthread work and
+ * wait for it to finish.
+ * @dwork: the delayed kthread work to cancel
+ *
+ * This is cancel_kthread_work_sync() for delayed works.
+ *
+ * Return: %true if @dwork was pending, %false otherwise.
+ */
+bool cancel_delayed_kthread_work_sync(struct delayed_kthread_work *dwork)
+{
+ return __cancel_kthread_work_sync(&dwork->work, true);
+}
+EXPORT_SYMBOL_GPL(cancel_delayed_kthread_work_sync);
+
 /**
  * flush_kthread_worker - flush all current works on a kthread_worker
  * @worker: worker to flush
--
1.8.5.6

Reply | Threaded
Open this post in threaded view
|

[PATCH v6 06/20] kthread: Detect when a kthread work is used by more workers

Petr Mladek-2
In reply to this post by Petr Mladek-2
Nothing currently prevents a work from queuing for a kthread worker
when it is already running on another one. This means that the work
might run in parallel on more workers. Also some operations, e.g.
flush or drain are not reliable.

This problem will be even more visible after we add cancel_kthread_work()
function. It will only have "work" as the parameter and will use
worker->lock to synchronize with others.

Well, normally this is not a problem because the API users are sane.
But bugs might happen and users also might be crazy.

This patch adds a warning when we try to insert the work for another
worker. It does not fully prevent the misuse because it would make the
code much more complicated without a big benefit.

It adds the same warning also into flush_kthread_work() instead of
the repeated attempts to get the right lock.

A side effect is that one needs to explicitly reinitialize the work
if it must be queued into another worker. This is needed, for example,
when the worker is stopped and started again. It is a bit inconvenient.
But it looks like a good compromise between the stability and complexity.

I have double checked all existing users of the kthread worker API
and they all seems to initialize the work after the worker gets
started.

Just for completeness, the patch adds a check that the work is not
already in a queue.

The patch also puts all the checks into a separate function. It will
be reused when implementing delayed works.

Signed-off-by: Petr Mladek <[hidden email]>
---
 kernel/kthread.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 441651765f08..385a7d6b4872 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -574,6 +574,9 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
  * The works are not allowed to keep any locks, disable preemption or interrupts
  * when they finish. There is defined a safe point for freezing when one work
  * finishes and before a new one is started.
+ *
+ * Also the works must not be handled by more workers at the same time, see also
+ * queue_kthread_work().
  */
 int kthread_worker_fn(void *worker_ptr)
 {
@@ -710,12 +713,21 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[], ...)
 }
 EXPORT_SYMBOL(create_kthread_worker_on_cpu);
 
+static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
+       struct kthread_work *work)
+{
+ lockdep_assert_held(&worker->lock);
+ WARN_ON_ONCE(!list_empty(&work->node));
+ /* Do not use a work with more workers, see queue_kthread_work() */
+ WARN_ON_ONCE(work->worker && work->worker != worker);
+}
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
-       struct kthread_work *work,
-       struct list_head *pos)
+ struct kthread_work *work,
+ struct list_head *pos)
 {
- lockdep_assert_held(&worker->lock);
+ insert_kthread_work_sanity_check(worker, work);
 
  list_add_tail(&work->node, pos);
  work->worker = worker;
@@ -731,6 +743,9 @@ static void insert_kthread_work(struct kthread_worker *worker,
  * Queue @work to work processor @task for async execution.  @task
  * must have been created with kthread_worker_create().  Returns %true
  * if @work was successfully queued, %false if it was already pending.
+ *
+ * Reinitialize the work if it needs to be used by another worker.
+ * For example, when the worker was stopped and started again.
  */
 bool queue_kthread_work(struct kthread_worker *worker,
  struct kthread_work *work)
@@ -775,16 +790,13 @@ void flush_kthread_work(struct kthread_work *work)
  struct kthread_worker *worker;
  bool noop = false;
 
-retry:
  worker = work->worker;
  if (!worker)
  return;
 
  spin_lock_irq(&worker->lock);
- if (work->worker != worker) {
- spin_unlock_irq(&worker->lock);
- goto retry;
- }
+ /* Work must not be used with more workers, see queue_kthread_work(). */
+ WARN_ON_ONCE(work->worker != worker);
 
  if (!list_empty(&work->node))
  insert_kthread_work(worker, &fwork.work, work->node.next);
--
1.8.5.6

12