Android Builders Weblog: Below the hood: Android 17’s lock-free MessageQueue



Posted by Shai Barack, Android Platform Efficiency Lead and Charles Munger, Principal Software program Engineer

Android Builders Weblog: Below the hood: Android 17’s lock-free MessageQueue 1

In Android 17, apps focusing on SDK 37 or increased will obtain a brand new implementation of MessageQueue the place the implementation is lock-free. The brand new implementation improves efficiency and reduces missed frames, however might break purchasers that mirror on MessageQueue personal fields and strategies. To be taught extra concerning the conduct change and how one can mitigate affect, try the MessageQueue conduct change documentation. This technical weblog put up gives an summary of the MessageQueue rearchitecture and how one can analyze lock competition points utilizing Perfetto.

The Looper drives the UI thread of each Android utility. It pulls work from a MessageQueue, dispatches it to a Handler, and repeats. For twenty years, MessageQueue used a single monitor lock (i.e. a synchronized code block) to guard its state.

Android 17 introduces a major replace to this part: a lock-free implementation named DeliQueue.

This put up explains how locks have an effect on UI efficiency, the way to analyze these points with Perfetto, and the particular algorithms and optimizations used to enhance the Android fundamental thread.

The issue: Lock Rivalry and Precedence Inversion

The legacy MessageQueue functioned as a precedence queue protected by a single lock. If a background thread posts a message whereas the principle thread performs queue upkeep, the background thread blocks the principle thread.

When two or extra threads are competing for unique use of the identical lock, that is referred to as Lock competition. This competition could cause Precedence Inversion, resulting in UI jank and different efficiency issues.

Precedence inversion can occur when a high-priority thread (just like the UI thread) is made to attend for a low-priority thread. Take into account this sequence:

  1. A low precedence background thread acquires the MessageQueue lock to put up the results of work that it did.

  2. A medium precedence thread turns into runnable and the Kernel’s scheduler allocates it CPU time, preempting the low precedence thread.

  3. The excessive precedence UI thread finishes its present process and makes an attempt to learn from the queue, however is blocked as a result of the low precedence thread holds the lock.

The low-priority thread blocks the UI thread, and the medium-priority work delays it additional.

A visual representation of priority inversion. It shows 'Task L' (Low) holding a lock, blocking 'Task H' (High). 'Task M' (Medium) then preempts 'Task L', effectively delaying 'Task H' for the duration of 'Task M's' execution.

Analyzing competition with Perfetto

You’ll be able to diagnose these points utilizing Perfetto. In a typical hint, a thread blocked on a monitor lock enters the sleeping state, and Perfetto exhibits a slice indicating the lock proprietor.

Whenever you question hint information, search for slices named “monitor competition with …” adopted by the title of the thread that owns the lock and the code website the place the lock was acquired.

Case research: Launcher jank

As an example, let’s analyze a hint the place a consumer skilled jank whereas navigating dwelling on a Pixel cellphone instantly after taking a photograph within the digital camera app. Under we see a screenshot of Perfetto displaying the occasions main as much as the missed body:


A Perfetto trace screenshot diagnosing the Launcher jank. The 'Actual Timeline' shows a red missed frame. Coinciding with this, the main thread track contains a large green slice labeled 'monitor contention with owner BackgroundExecutor,' indicating that the UI thread was blocked because a background thread held the MessageQueue lock.

  • Symptom: The Launcher fundamental thread missed its body deadline. It blocked for 18ms, which exceeds the 16ms deadline required for 60Hz rendering.

  • Analysis: Perfetto confirmed the principle thread blocked on the MessageQueue lock. A “BackgroundExecutor” thread owned the lock.

  • Root Trigger: The BackgroundExecutor runs at Course of.THREAD_PRIORITY_BACKGROUND (very low precedence). It carried out a non-urgent process (checking app utilization limits). Concurrently, medium precedence threads have been utilizing CPU time to course of information from the digital camera. The OS scheduler preempted the BackgroundExecutor thread to run the digital camera threads.

This sequence prompted the Launcher’s UI thread (excessive precedence) to change into not directly blocked by the digital camera employee thread (medium precedence), which was protecting the Launcher’s background thread (low precedence) from releasing the lock.

Querying traces with PerfettoSQL

You should utilize PerfettoSQL to question hint information for particular patterns. That is helpful when you’ve got a big financial institution of traces from consumer gadgets or assessments, and also you’re looking for particular traces that display an issue.

For instance, this question finds MessageQueue competition coincident with dropped frames (jank):

INCLUDE PERFETTO MODULE android.monitor_contention;
INCLUDE PERFETTO MODULE android.frames.jank_type;

SELECT
  process_name,
  -- Convert period from nanoseconds to milliseconds
  SUM(dur) / 1000000 AS sum_dur_ms,
  COUNT(*) AS count_contention
FROM android_monitor_contention
WHERE is_blocked_thread_main
AND short_blocked_method LIKE "%MessageQueue%" 

-- Solely take a look at app processes that had jank
AND upid IN (
  SELECT DISTINCT(upid)
  FROM actual_frame_timeline_slice
  WHERE android_is_app_jank_type(jank_type) = TRUE
)
GROUP BY process_name
ORDER BY SUM(dur) DESC;

On this extra complicated instance, be part of hint information that spans a number of tables to establish MessageQueue competition throughout app startup:

INCLUDE PERFETTO MODULE android.monitor_contention; 
INCLUDE PERFETTO MODULE android.startup.startups; 

-- Be a part of bundle and course of info for startups
DROP VIEW IF EXISTS startups; 
CREATE VIEW startups AS 
SELECT startup_id, ts, dur, upid 
FROM android_startups 
JOIN android_startup_processes USING(startup_id); 

-- Intersect monitor competition with startups in the identical course of.
DROP TABLE IF EXISTS monitor_contention_during_startup; 
CREATE VIRTUAL TABLE monitor_contention_during_startup 
USING SPAN_JOIN(android_monitor_contention PARTITIONED upid, startups PARTITIONED upid); 

SELECT 
  process_name, 
  SUM(dur) / 1000000 AS sum_dur_ms, 
  COUNT(*) AS count_contention 
FROM monitor_contention_during_startup 
WHERE is_blocked_thread_main 
AND short_blocked_method LIKE "%MessageQueue%" 
GROUP BY process_name 
ORDER BY SUM(dur) DESC;

You should utilize your favourite LLM to put in writing PerfettoSQL queries to search out different patterns.

At Google, we use BigTrace to run PerfettoSQL queries throughout hundreds of thousands of traces. In doing so, we confirmed that what we noticed anecdotally was, in reality, a systemic difficulty. The information revealed that MessageQueue lock competition impacts customers throughout your complete ecosystem, substantiating the necessity for a elementary architectural change.

Answer: lock-free concurrency

We addressed the MessageQueue competition downside by implementing a lock-free information construction, utilizing atomic reminiscence operations slightly than unique locks to synchronize entry to shared state. An information construction or algorithm is lock-free if a minimum of one thread can all the time make progress whatever the scheduling conduct of the opposite threads. This property is usually exhausting to attain, and is often not price pursuing for many code.

The atomic primitives

Lock-free software program typically depends on atomic Learn-Modify-Write primitives that the {hardware} gives.

On older era ARM64 CPUs, atomics used a Load-Hyperlink/Retailer-Conditional (LL/SC) loop. The CPU masses a worth and marks the handle. If one other thread writes to that handle, the shop fails, and the loop retries. As a result of the threads can maintain making an attempt and succeed with out ready for an additional thread, this operation is lock-free.

ARM64 LL/SC loop instance
retry:
    ldxr    x0, [x1]        // Load unique from handle x1 to x0
    add     x0, x0, #1      // Increment worth by 1
    stxr    w2, x0, [x1]    // Retailer unique.
                            // w2 will get 0 on success, 1 on failure
    cbnz    w2, retry       // If w2 is non-zero (failed), department to retr

(view in Compiler Explorer)


Newer ARM architectures (ARMv8.1) help Giant System Extensions (LSE) which embrace directions within the type of Evaluate-And-Swap (CAS) or Load-And-Add (demonstrated under). In Android 17 we added help to the Android Runtime (ART) compiler to detect when LSE is supported and emit optimized directions:

/ ARMv8.1 LSE atomic instance
ldadd   x0, x1, [x2]    // Atomic load-add.
                        // Sooner, no loop required.

In our benchmarks, high-contention code that makes use of CAS achieves a ~3x speedup over the LL/SC variant.

The Java programming language presents atomic primitives through java.util.concurrent.atomic that depend on these and different specialised CPU directions.

The Knowledge Construction: DeliQueue

To take away lock competition from MessageQueue, our engineers designed a novel information construction referred to as DeliQueue. DeliQueue separates Message insertion from Message processing:

  1. The checklist of Messages (Treiber stack): A lock-free stack. Any thread can push new Messages right here with out competition.

  2. The precedence queue (Min-heap): A heap of Messages to deal with, solely owned by the Looper thread (therefore no synchronization or locks are wanted to entry).

Enqueue: pushing to a Treiber stack

The checklist of Messages is saved in a Treiber stack [1], a lock-free stack that makes use of a CAS loop to replace the top pointer.

public class TreiberStack <E> {
    AtomicReference<Node<E>> high =
            new AtomicReference<Node<E>>();
    public void push(E merchandise) {
        Node<E> newHead = new Node<E>(merchandise);
        Node<E> oldHead;
        do {
            oldHead = high.get();
            newHead.subsequent = oldHead;
        } whereas (!high.compareAndSet(oldHead, newHead));
    }

    public E pop() {
        Node<E> oldHead;
        Node<E> newHead;
        do {
            oldHead = high.get();
            if (oldHead == null) return null;
            newHead = oldHead.subsequent;
        } whereas (!high.compareAndSet(oldHead, newHead));
        return oldHead.merchandise;
    }
}

Supply code based mostly on Java Concurrency in Follow [2], accessible on-line and launched to the general public area

Any producer can push new Messages to the stack at any time. That is like pulling a ticket at a deli counter – your quantity is decided by while you confirmed up, however the order you get your meals in does not should match. As a result of it is a linked stack, every Message is a sub-stack – you’ll be able to see what the Message queue was like at any cut-off date by monitoring the top and iterating forwards – you will not see any new Messages pushed on high, even when they’re being added throughout your traversal.

Dequeue: bulk switch to a min-heap

To seek out the subsequent Message to deal with, the Looper processes new Messages from the Treiber stack by strolling the stack ranging from the highest and iterating till it finds the final Message that it beforehand processed. Because the Looper traverses down the stack, it inserts Messages into the deadline-ordered min-heap. For the reason that Looper solely owns the heap, it orders and processes Messages with out locks or atomics.

A system diagram illustrating the DeliQueue architecture. Concurrent producer threads (left) push messages onto a shared 'Lock-Free Treiber Stack' using atomic CAS operations. The single consumer 'Looper Thread' (right) claims these messages via an atomic swap, merges them into a private 'Local Min-heap' sorted by timestamp, and then executes them.

In strolling down the stack, the Looper additionally creates hyperlinks from stacked Messages again to their predecessors, thus forming a doubly-linked checklist. Creating the linked checklist is protected as a result of hyperlinks pointing down the stack are added through the Treiber stack algorithm with CAS, and hyperlinks up the stack are solely ever learn and modified by the Looper thread. These back-links are then used to take away Messages from arbitrary factors within the stack in O(1) time.

This design gives O(1) insertion for producers (threads posting work to the queue) and amortized O(log N) processing for the buyer (the Looper).

Utilizing a min-heap to order Messages additionally addresses a elementary flaw within the legacy MessageQueue, the place Messages have been saved in a singly-linked checklist (rooted on the high). Within the legacy implementation, removing from the top was O(1), however insertion had a worst case of O(N) – scaling poorly for overloaded queues! Conversely, insertion to and removing from the min-heap scale logarithmically, delivering aggressive common efficiency however actually excelling in tail latencies.


Legacy (locked) MessageQueue

DeliQueue

Insert

O(N)

O(1) for calling thread

O(logN) for Looper thread

Take away from head

O(1)

O(logN)

Within the legacy queue implementation, producers and the buyer used a lock to coordinate unique entry to the underlying singly-linked checklist. In DeliQueue, the Treiber stack handles concurrent entry, and the only client handles ordering its work queue.

Removing: consistency through tombstones

DeliQueue is a hybrid information construction, becoming a member of a lock-free Treiber stack with a single-threaded min-heap. Protecting these two buildings in sync with no international lock presents a novel problem: a message is likely to be bodily current within the stack however logically faraway from the queue.

To resolve this, DeliQueue makes use of a method referred to as “tombstoning.” Every Message tracks its place within the stack through the backwards and forwards pointers, its index within the heap’s array, and a boolean flag indicating whether or not it has been eliminated. When a Message is able to run, the Looper thread will CAS its eliminated flag, then take away it from the heap and stack.

When one other thread must take away a Message, it does not instantly extract it from the info construction. As an alternative, it performs the next steps:

  1. Logical removing: the thread makes use of a CAS to atomically set the Message’s removing flag from false to true. The Message stays within the information construction as proof of its pending removing, a so-called “tombstone”. As soon as a Message is flagged for removing, DeliQueue treats it as if it not exists within the queue at any time when it’s discovered.

  2. Deferred cleanup: The precise removing from the info construction is the accountability of the Looper thread, and is deferred till later. Moderately than modifying the stack or heap, the remover thread provides the Message to a different lock-free freelist stack.

  3. Structural removing: Solely the Looper can work together with the heap or take away parts from the stack. When it wakes up, it clears the freelist and processes the Messages it contained. Every Message is then unlinked from the stack and faraway from the heap. 

This strategy retains all administration of the heap single-threaded. It minimizes the variety of concurrent operations and reminiscence boundaries required, making the important path quicker and easier.

Traversal: benign Java reminiscence mannequin information races

Most concurrency APIs, similar to Future within the Java normal library, or Kotlin’s Job and Deferred, include a mechanism to cancel work earlier than it completes. An occasion of certainly one of these lessons matches 1:1 with a unit of underlying work, and calling cancel on an object cancels the particular operations related to them.


Right this moment’s Android gadgets have multi-core CPUs and concurrent, generational rubbish assortment. However when Android was first developed, it was too costly to allocate one object for every unit of labor. Consequently, Android’s Handler helps cancellation through quite a few overloads of removeMessages slightly than eradicating a particular Message, it removes all Messages that match the desired standards. In apply, this requires iterating by all Messages inserted earlier than removeMessages was referred to as and eradicating those that match.


When iterating ahead, a thread solely requires one ordered atomic operation, to learn the present head of the stack. After that, unusual area reads are used to search out the subsequent
Message. If the Looper thread modifies the subsequent fields whereas eradicating Messages, the Looper’s write and one other thread’s learn are unsynchronized – this can be a information race. Usually, an information race is a critical bug that may trigger large issues in your app – leaks, infinite loops, crashes, freezes, and extra. Nonetheless, beneath sure slim circumstances, information races could be benign throughout the Java Reminiscence Mannequin. Suppose we begin with a stack of:

A diagram showing the initial state of the message stack as a linked list. The 'Head' points to 'Message A', which links sequentially to 'Message B', 'Message C', and 'Message D'.


We carry out an atomic learn of the top, and see A. A’s subsequent pointer factors to B. Similtaneously we course of B, the looper may take away B and C, by updating A to level to C after which D.

A diagram illustrating a benign data race during list traversal. The Looper thread has updated 'Message A' to point directly to 'Message D', effectively removing 'Message B' and 'Message C'. Simultaneously, a concurrent thread reads a stale 'next' pointer from A, traverses through the logically removed messages B and C, and eventually rejoins the live list at 'Message D'.

Although B and C are logically eliminated, B retains its subsequent pointer to C, and C to D. The studying thread continues traversing by the indifferent eliminated nodes and finally rejoins the dwell stack at D.

By designing DeliQueue to deal with races between traversal and removing, we permit for protected, lock-free iteration.

Quitting: Native refcount

Looper is backed by a local allocation that should be manually freed as soon as the Looper has stop. If another thread is including Messages whereas the Looper is quitting, it might use the native allocation after it’s freed, a reminiscence security violation. We stop this utilizing a tagged refcount, the place one little bit of the atomic is used to point whether or not the Looper is quitting.

Earlier than utilizing the native allocation, a thread reads the refcount atomic. If the quitting bit is about, it returns that the Looper is quitting and the native allocation should not be used. If not, it makes an attempt a CAS to increment the variety of lively threads utilizing the native allocation. After doing what it must, it decrements the depend. If the quitting bit was set after its increment however earlier than the decrement, and the depend is now zero, then it wakes up the Looper thread.

When the Looper thread is able to stop, it makes use of CAS to set the quitting bit within the atomic. If the refcount was 0, it will possibly proceed to free its native allocation. In any other case, it parks itself, understanding that it is going to be woken up when the final consumer of the native allocation decrements the refcount. This strategy does imply that the Looper thread waits for the progress of different threads, however solely when it’s quitting. That solely occurs as soon as and isn’t efficiency delicate, and it retains the opposite code for utilizing the native allocation absolutely lock-free.

A state diagram illustrating the tagged refcount mechanism for safe termination. It defines three states based on the atomic value's layout (Bit 63 for teardown, Bits 0-62 for refcount):  Active (Green): The teardown bit is 0. Workers successfully increment and decrement the reference count.  Draining (Yellow): The Looper has set the teardown bit to 1. New worker increments fail, but existing workers continue to decrement.  Terminated (Red): Occurs when the reference count reaches 0 while draining. The Looper is signaled that it is safe to destroy the native allocation.

There’s a number of different tips and complexity within the implementation. You’ll be able to be taught extra about DeliQueue by reviewing the supply code.

Optimization: branchless programming

Whereas growing and testing DeliQueue, the workforce ran many benchmarks and thoroughly profiled the brand new code. One difficulty recognized utilizing the simpleperf software was pipeline flushes brought on by the Message comparator code.

A normal comparator makes use of conditional jumps, with the situation for deciding which Message comes first simplified under:

static int compareMessages(@NonNull Message m1, @NonNull Message m2) {
    if (m1 == m2) {
        return 0;
    }

    // Major queue order is by when.
    // Messages with an earlier when ought to come first within the queue.
    last lengthy whenDiff = m1.when - m2.when;
    if (whenDiff > 0) return 1;
    if (whenDiff < 0) return -1;

    // Secondary queue order is by insert sequence.
    // If two messages have been inserted with the identical `when`, the one inserted
    // first ought to come first within the queue.
    last lengthy insertSeqDiff = m1.insertSeq - m2.insertSeq;
    if (insertSeqDiff > 0) return 1;
    if (insertSeqDiff < 0) return -1;

    return 0;
}

This code compiles to conditional jumps (b.le and cbnz directions). When the CPU encounters a conditional department, it will possibly’t know whether or not the department is taken till the situation is computed, so it doesn’t know which instruction to learn subsequent, and has to guess, utilizing a method referred to as department prediction. In a case like binary search, the department path will likely be unpredictably totally different at every step, so it’s probably that half the predictions will likely be incorrect. Department prediction is commonly ineffective in looking and sorting algorithms (such because the one utilized in a min-heap), as a result of the price of guessing incorrect is bigger than the development from guessing appropriately. When the department predictor guesses incorrect, it should throw away the work it did after assuming the expected worth, and begin once more from the trail that was truly taken – that is referred to as a pipeline flush.

To seek out this difficulty, we profiled our benchmarks utilizing the branch-misses efficiency counter, which data stack traces the place the department predictor guesses incorrect. We then visualized the outcomes with Google pprof, as proven under:

A screenshot from the pprof web UI showing branch misses in MessageQueue code to compare Message instances while performing heap operations.

Recall that the unique MessageQueue code used a singly-linked checklist for the ordered queue. Insertion would traverse the checklist in sorted order as a linear search, stopping on the first ingredient that’s previous the purpose of insertion and linking the brand new Message forward of it. Removing from the top merely required unlinking the top. Whereas DeliQueue makes use of a min-heap, the place mutations require reordering some parts (sifting up or down) with logarithmic complexity in a balanced information construction, the place any comparability has a fair likelihood of directing the traversal to a left youngster or to a proper youngster. The brand new algorithm is asymptotically quicker, however exposes a brand new bottleneck because the search code stalls on department misses half the time.

Realizing that department misses have been slowing down our heap code, we optimized the code utilizing branch-free programming:

// Branchless Logic
static int compareMessages(@NonNull Message m1, @NonNull Message m2)  (-num >>> 63))
    last int whenSign = Lengthy.signum(when1 - when2);
    last int insertSeqSign = Lengthy.signum(insertSeq1 - insertSeq2);

    // whenSign takes priority over insertSeqSign,
    // so the method under is such that insertSeqSign solely issues
    // as a tie-breaker if whenSign is 0.
    return whenSign * 2 + insertSeqSign;

To know the optimization, disassemble the 2 examples in Compiler Explorer and use LLVM-MCA, a CPU simulator that may generate an estimated timeline of CPU cycles.

The unique code:
Index     01234567890123
[0,0]     DeER .    .  .   sub  x0, x2, x3
[0,1]     D=eER.    .  .   cmp  x0, #0
[0,2]     D==eER    .  .   cset w0, ne
[0,3]     .D==eER   .  .   cneg w0, w0, lt
[0,4]     .D===eER  .  .   cmp  w0, #0
[0,5]     .D====eER .  .   b.le #12
[0,6]     . DeE---R .  .   mov  w1, #1
[0,7]     . DeE---R .  .   b    #48
[0,8]     . D==eE-R .  .   tbz  w0, #31, #12
[0,9]     .  DeE--R .  .   mov  w1, #-1
[0,10]    .  DeE--R .  .   b    #36
[0,11]    .  D=eE-R .  .   sub  x0, x4, x5
[0,12]    .   D=eER .  .   cmp  x0, #0
[0,13]    .   D==eER.  .   cset w0, ne
[0,14]    .   D===eER  .   cneg w0, w0, lt
[0,15]    .    D===eER .   cmp  w0, #0
[0,16]    .    D====eER.   csetm        w1, lt
[0,17]    .    D===eE-R.   cmp  w0, #0
[0,18]    .    .D===eER.   csinc        w1, w1, wzr, le
[0,19]    .    .D====eER   mov  x0, x1
[0,20]    .    .DeE----R   ret

Observe the one conditional department, b.lewhich avoids evaluating the insertSeq fields if the result’s already identified from evaluating the when fields.

The branchless code:
Index     012345678
[0,0]     DeER .  .   sub       x0, x2, x3
[0,1]     DeER .  .   sub       x1, x4, x5
[0,2]     D=eER.  .   cmp       x0, #0
[0,3]     .D=eER  .   cset      w0, ne
[0,4]     .D==eER .   cneg      w0, w0, lt
[0,5]     .DeE--R .   cmp       x1, #0
[0,6]     . DeE-R .   cset      w1, ne
[0,7]     . D=eER .   cneg      w1, w1, lt
[0,8]     . D==eeER   add       w0, w1, w0, lsl #1
[0,9]     .  DeE--R   ret

Right here, the branchless implementation takes fewer cycles and directions than even the shortest path by the branchy code – it’s higher in all circumstances. The quicker implementation plus the elimination of mispredicted branches resulted in a 5x enchancment in a few of our benchmarks!


Nonetheless, this method just isn’t all the time relevant. Branchless approaches usually require doing work that will likely be thrown away, and if the department is predictable more often than not, that wasted work can gradual your code down. As well as, eradicating a department typically introduces a information dependency. Trendy CPUs execute a number of operations per cycle, however they will’t execute an instruction till its inputs from a earlier instruction are prepared. In distinction, a CPU can speculate about information in branches, and work forward if a department is predicted appropriately.

Testing and Validation

Validating the correctness of lock-free algorithms is notoriously troublesome!

Along with normal unit assessments for steady validation throughout improvement, we additionally wrote rigorous stress assessments to confirm queue invariants and to aim to induce information races in the event that they existed. In our take a look at labs we might run hundreds of thousands of take a look at situations on emulated gadgets and on actual {hardware}.

With Java ThreadSanitizer (JTSan) instrumentation, we might use the identical assessments to additionally detect some information races in our code. JTSan didn’t discover any problematic information races in DeliQueue, however – surprisingly -actually detected two concurrency bugs within the Robolectric framework, which we promptly fastened.

To enhance our debugging capabilities, we constructed new evaluation instruments. Under is an instance displaying a difficulty in Android platform code the place one thread is overloading one other thread with Messages, inflicting a big backlog, seen in Perfetto because of the MessageQueue instrumentation function that we added.

A screenshot of Perfetto UI, demonstrating flows and metadata for Messages being posted to a MessageQueue and delivered to a worker thread.

To allow MessageQueue tracing within the system_server course of, embrace the next in your Perfetto configuration:

data_sources {
  config {
    title: "track_event"
    target_buffer: 0  # Change this per your buffers configuration
    track_event_config {
      enabled_categories: "mq"
    }
  }
}

Impression

DeliQueue improves system and app efficiency by eliminating locks from MessageQueue.

  • Artificial benchmarks: multi-threaded insertions into busy queues is as much as 5,000x quicker than the legacy MessageQueue, because of improved concurrency (the Treiber stack) and quicker insertions (the min-heap).

  • In Perfetto traces acquired from inner beta testers, we see a discount of 15% in app fundamental thread time spent in lock competition.

  • On the identical take a look at gadgets, the diminished lock competition results in vital enhancements to the consumer expertise, similar to:

    • -4% missed frames in apps.

    • -7.7% missed frames in System UI and Launcher interactions.

    • -9.1% in time from app startup to the primary body drawn, on the 95percentile.

Subsequent steps

DeliQueue is rolling out to apps in Android 17. App builders ought to evaluate getting ready your app for the brand new lock-free MessageQueue on the Android Builders weblog to discover ways to take a look at their apps.

References

[1] Treiber, R.Okay., 1986. Programs programming: Dealing with parallelism. Worldwide Enterprise Machines Included, Thomas J. Watson Analysis Middle.

[2] Goetz, B., Peierls, T., Bloch, J., Bowbeer, J., Holmes, D., & Lea, D. (2006). Java Concurrency in Follow. Addison-Wesley Skilled.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles