原創文章,轉載注明出處,多謝合作。
經過上篇繪制過程,應用層已經準備好了DisplayList. 接下來就是渲染過程.Android硬件加速不同于軟件繪制, 它的渲染過程會單獨起一個native線程RenderThread來處理,而軟件繪制的繪制過程和渲染過程都是在UI Thread完成.
下面開始繼續分析, 仍然回到ThreadedRenderer的draw:
frameworks/base/core/java/android/view/ThreadedRenderer.java
void draw(View view, AttachInfo attachInfo, DrawCallbacks callbacks,
FrameDrawingCallback frameDrawingCallback) {
...
updateRootDisplayList(view, callbacks);//構建DisplayList
...
int syncResult = nSyncAndDrawFrame(mNativeProxy, frameInfo, frameInfo.length);//渲染視圖
...
}
上文把updateRootDisplayList部分已經分析完了,下面接著看看nSyncAndDrawFrame, 它是一個Native方法,那么對應到Native層:
frameworks/base/core/jni/android_view_ThreadedRenderer.cpp
static int android_view_ThreadedRenderer_syncAndDrawFrame(JNIEnv* env, jobject clazz,
jlong proxyPtr, jlongArray frameInfo, jint frameInfoSize) {
LOG_ALWAYS_FATAL_IF(frameInfoSize != UI_THREAD_FRAME_INFO_SIZE,
"Mismatched size expectations, given %d expected %d",
frameInfoSize, UI_THREAD_FRAME_INFO_SIZE);
RenderProxy* proxy = reinterpret_cast<RenderProxy*>(proxyPtr);
env->GetLongArrayRegion(frameInfo, 0, frameInfoSize, proxy->frameInfo());
return proxy->syncAndDrawFrame();
}
方法中RenderProxy執行它的syncAndDrawFrame方法
frameworks/base/libs/hwui/renderthread/RenderProxy.cpp
int RenderProxy::syncAndDrawFrame() {
return mDrawFrameTask.drawFrame();
}
起了一個Task執行drawFrame操作
frameworks/base/libs/hwui/renderthread/DrawFrameTask.cpp
int DrawFrameTask::drawFrame() {
LOG_ALWAYS_FATAL_IF(!mContext, "Cannot drawFrame with no CanvasContext!");
mSyncResult = SyncResult::OK;
mSyncQueued = systemTime(CLOCK_MONOTONIC);
postAndWait();
return mSyncResult;
}
void DrawFrameTask::postAndWait() {
AutoMutex _lock(mLock);
mRenderThread->queue().post([this]() { run(); });
mSignal.wait(mLock);
}
RenderThread是一個大loop,繪制操作都以RenderTask的形式post到RenderThread中完成。那么對應的run方法內就是
渲染的核心邏輯:
void DrawFrameTask::run() {
ATRACE_NAME("DrawFrame");// 對應systrace中的 DrawFrame label
bool canUnblockUiThread;
bool canDrawThisFrame;
{
TreeInfo info(TreeInfo::MODE_FULL, *mContext);
canUnblockUiThread = syncFrameState(info);//同步視圖數據
canDrawThisFrame = info.out.canDrawThisFrame;
...
/ /繪制提交openGl命令到GPU
if (CC_LIKELY(canDrawThisFrame)) {
context->draw();//CanvasContext繪制
} else {
// wait on fences so tasks don't overlap next frame
context->waitOnFences();
}
...
}
將DrawFrameTask插入RenderThread,并且阻塞等待RenderThread跟UI線程同步應用繪制階段封裝好的數據,如果同步成功,則UI線程喚醒,否則UI線程一直處于阻塞等待狀態。同步結束后RenderThread才會開始處理GPU渲染相關工作.
所以一個DrawFrameTask操作主要分為兩個部分:
1)syncFrameState 將主線程的UI數據同步到Render線程
2)CanvasContext.draw 繪制
先看syncFrameState:
bool DrawFrameTask::syncFrameState(TreeInfo& info) {
ATRACE_CALL(); // 對應systraced的syncFrameState 標簽
...
/ /同步DisplayListOp tree
mContext->prepareTree(info, mFrameInfo, mSyncQueued, mTargetNode);
...
return info.prepareTextures;
}
主要的同步過程是在CanvasContext的prepareTree中
frameworks/base/libs/hwui/renderthread/CanvasContext.cpp
void CanvasContext::prepareTree(TreeInfo& info, int64_t* uiFrameInfo,
int64_t syncQueued, RenderNode* target) {
...
mCurrentFrameInfo->importUiThreadInfo(uiFrameInfo); / /memcpy方式拷貝數據
mCurrentFrameInfo->set(FrameInfoIndex::SyncQueued) = syncQueued;
mCurrentFrameInfo->markSyncStart();
...
mRenderPipeline->onPrepareTree();
for (const sp<RenderNode>& node : mRenderNodes) {
info.mode = (node.get() == target ? TreeInfo::MODE_FULL : TreeInfo::MODE_RT_ONLY);
//mRootRenderNode遞歸遍歷所有節點執行prepareTree
node->prepareTree(info);
GL_CHECKPOINT(MODERATE);
}
...
}
這個過程簡單說就是將應用層之前準備好的DisplayListOp集完整同步到Native的 RenderThread上來.
frameworks/base/libs/hwui/RenderNode.cpp
void RenderNode::prepareTree(TreeInfo& info) {
ATRACE_CALL();
LOG_ALWAYS_FATAL_IF(!info.damageAccumulator, "DamageAccumulator missing");
MarkAndSweepRemoved observer(&info);
// The OpenGL renderer reserves the stencil buffer for overdraw debugging. Functors
// will need to be drawn in a layer.
bool functorsNeedLayer = Properties::debugOverdraw && !Properties::isSkiaEnabled();
prepareTreeImpl(observer, info, functorsNeedLayer);
}
void RenderNode::prepareTreeImpl(TreeObserver& observer, TreeInfo& info, bool functorsNeedLayer) {
info.damageAccumulator->pushTransform(this);
if (info.mode == TreeInfo::MODE_FULL) {
pushStagingPropertiesChanges(info);//同步屬性
}
...
prepareLayer(info, animatorDirtyMask);
if (info.mode == TreeInfo::MODE_FULL) {
pushStagingDisplayListChanges(observer, info);//同步DIsplayListOp
}
if (mDisplayList) {
info.out.hasFunctors |= mDisplayList->hasFunctor();
bool isDirty = mDisplayList->prepareListAndChildren(
observer, info, childFunctorsNeedLayer,
[](RenderNode* child, TreeObserver& observer, TreeInfo& info,
bool functorsNeedLayer) {
child->prepareTreeImpl(observer, info, functorsNeedLayer);//遞歸執行
});
if (isDirty) {
damageSelf(info);
}
}
pushLayerUpdate(info);
info.damageAccumulator->popTransform();
}
其中pushStagingDisplayListChanges是將setStagingDisplayList暫存的DisplayList賦值到RenderNode的mDisplayListData.
RenderNode::prepareTree()會遍歷DisplayList的樹形結構,對于子節點遞歸調用prepareTreeImpl(),如果是render layer,在RenderNode::pushLayerUpdate()中會將該layer的更新操作記錄到LayerUpdateQueue中。
再回到Task的run(),看接下來的CanvasContext.draw
void CanvasContext::draw() {
...
Frame frame = mRenderPipeline->getFrame();
SkRect windowDirty = computeDirtyRect(frame, &dirty);
bool drew = mRenderPipeline->draw(frame, windowDirty, dirty, mLightGeometry, &mLayerUpdateQueue,
mContentDrawBounds, mOpaque, mLightInfo, mRenderNodes, &(profiler()));
waitOnFences();
bool requireSwap = false;
bool didSwap = mRenderPipeline->swapBuffers(frame, drew, windowDirty, mCurrentFrameInfo,
&requireSwap);
...
}
這里由mRenderPipeline執行了三個核心邏輯分別是:getFrame draw 和 swapBuffer . 分別來看一下:
(mRenderPipeline對應的是frameworks/base/libs/hwui/renderthread/OpenGLPipeline.cpp)
getFrame過程
: 主要就是dequeueBuffer
Frame OpenGLPipeline::getFrame() {
LOG_ALWAYS_FATAL_IF(mEglSurface == EGL_NO_SURFACE,
"drawRenderNode called on a context with no surface!");
return mEglManager.beginFrame(mEglSurface);
}
Frame EglManager::beginFrame(EGLSurface surface) {
LOG_ALWAYS_FATAL_IF(surface == EGL_NO_SURFACE, "Tried to beginFrame on EGL_NO_SURFACE!");
makeCurrent(surface);
Frame frame;
frame.mSurface = surface;
eglQuerySurface(mEglDisplay, surface, EGL_WIDTH, &frame.mWidth);
eglQuerySurface(mEglDisplay, surface, EGL_HEIGHT, &frame.mHeight);
frame.mBufferAge = queryBufferAge(surface);
eglBeginFrame(mEglDisplay, surface);
return frame;
}
看對應的makeCurrent方法:
bool EglManager::makeCurrent(EGLSurface surface, EGLint* errOut) {
...
if (!eglMakeCurrent(mEglDisplay, surface, surface, mEglContext)) {
if (errOut) {
*errOut = eglGetError();
ALOGW("Failed to make current on surface %p, error=%s", (void*)surface,
egl_error_str(*errOut));
} else {
LOG_ALWAYS_FATAL("Failed to make current on surface %p, error=%s", (void*)surface,
eglErrorString());
}
}
mCurrentSurface = surface;
if (Properties::disableVsync) {
eglSwapInterval(mEglDisplay, 0);
}
return true;
}
再看eglMakeCurrent,這里很明顯傳入了EGLSurface
frameworks/native/opengl/libagl/egl.cpp
EGLBoolean eglMakeCurrent( EGLDisplay dpy, EGLSurface draw,
EGLSurface read, EGLContext ctx)
{
ogles_context_t* gl = (ogles_context_t*)ctx;
if (makeCurrent(gl) == 0) {
if (ctx) {
egl_context_t* c = egl_context_t::context(ctx);
egl_surface_t* d = (egl_surface_t*)draw;
egl_surface_t* r = (egl_surface_t*)read;
...
if (d) {
//dequeueBuffer相關邏輯
if (d->connect() == EGL_FALSE) {
return EGL_FALSE;
}
d->ctx = ctx;
d->bindDrawSurface(gl);//綁定Surface
}
...
return setError(EGL_BAD_ACCESS, EGL_FALSE);
}
再繼續跟d->connect()
d對應的結構體egl_surface_t 往下查connect的實現:
發現又被egl_window_surface_v2_t實現 : struct egl_window_surface_v2_t : public egl_surface_t
那看看egl_window_surface_v2_t的connect
EGLBoolean egl_window_surface_v2_t::connect()
{
// dequeue a buffer
int fenceFd = -1;
if (nativeWindow->dequeueBuffer(nativeWindow, &buffer,
&fenceFd) != NO_ERROR) {
return setError(EGL_BAD_ALLOC, EGL_FALSE);
}
// wait for the buffer
sp<Fence> fence(new Fence(fenceFd));
...
return EGL_TRUE;
}
nativeWindow對應的就是Surface
frameworks/native/libs/gui/Surface.cpp
Surface::Surface(
const sp<IGraphicBufferProducer>& bufferProducer,
bool controlledByApp)
ANativeWindow::dequeueBuffer = hook_dequeueBuffer;
}
int Surface::hook_dequeueBuffer(ANativeWindow* window,
ANativeWindowBuffer** buffer, int* fenceFd) {
Surface* c = getSelf(window);
return c->dequeueBuffer(buffer, fenceFd);
}
int Surface::dequeueBuffer(android_native_buffer_t** buffer, int* fenceFd) {
ATRACE_CALL(); / /這里就是對應systrace的標簽了
ALOGV("Surface::dequeueBuffer");
...
FrameEventHistoryDelta frameTimestamps;
status_t result = mGraphicBufferProducer->dequeueBuffer(&buf, &fence, reqWidth, reqHeight,
reqFormat, reqUsage, &mBufferAge,
enableFrameTimestamps ? &frameTimestamps
: nullptr);
... 如果需要重新分配,則requestBuffer,請求分配
if ((result & IGraphicBufferProducer::BUFFER_NEEDS_REALLOCATION) || gbuf == nullptr) {
//通過GraphicBufferProducer來申請buffer
result = mGraphicBufferProducer->requestBuffer(buf, &gbuf);
}
...
}
draw過程
: 遞歸issue OpenGL命令,提交給GPU繪制
bool OpenGLPipeline::draw(const Frame& frame, const SkRect& screenDirty, const SkRect& dirty,
const FrameBuilder::LightGeometry& lightGeometry,
LayerUpdateQueue* layerUpdateQueue, const Rect& contentDrawBounds,
bool opaque, bool wideColorGamut,
const BakedOpRenderer::LightInfo& lightInfo,
const std::vector<sp<RenderNode>>& renderNodes,
FrameInfoVisualizer* profiler) {
mEglManager.damageFrame(frame, dirty);
bool drew = false;
auto& caches = Caches::getInstance();
FrameBuilder frameBuilder(dirty, frame.width(), frame.height(), lightGeometry, caches);
frameBuilder.deferLayers(*layerUpdateQueue);
layerUpdateQueue->clear();
frameBuilder.deferRenderNodeScene(renderNodes, contentDrawBounds);
BakedOpRenderer renderer(caches, mRenderThread.renderState(), opaque, wideColorGamut,//
lightInfo);
frameBuilder.replayBakedOps<BakedOpDispatcher>(renderer);
ProfileRenderer profileRenderer(renderer);
profiler->draw(profileRenderer);
drew = renderer.didDraw();
// post frame cleanup
caches.clearGarbage();
caches.pathCache.trim();
caches.tessellationCache.trim();
#if DEBUG_MEMORY_USAGE
caches.dumpMemoryUsage();
#else
if (CC_UNLIKELY(Properties::debugLevel & kDebugMemory)) {
caches.dumpMemoryUsage();
}
#endif
return drew;
}
這部分邏輯非常復雜. 就不跟代碼流程了,簡單梳理下:
OpenGLPipeline::draw過程
- defer: 數據結構的重新整合, RenderNode 對應成LayerBuilder, 內部DisplayList以chunk為單位組合的RecordedOp重新封裝成BakedOpState,保存到Batch, 按能合并和不能合并:mMergingBatchLookup和mBatchLookup兩張表來索引.
- render: 將Op轉化為對應的OpenGL命令,并緩存在本地的GL命令緩沖區中.
- GL call: 將OpenGL命令提交給GPU執行
swapBuffer過程
:將繪制好的數據提交給SurfaceFlinger去合成.
bool OpenGLPipeline::swapBuffers(const Frame& frame, bool drew, const SkRect& screenDirty,
FrameInfo* currentFrameInfo, bool* requireSwap) {
GL_CHECKPOINT(LOW);
// Even if we decided to cancel the frame, from the perspective of jank
// metrics the frame was swapped at this point
currentFrameInfo->markSwapBuffers();
*requireSwap = drew || mEglManager.damageRequiresSwap();
if (*requireSwap && (CC_UNLIKELY(!mEglManager.swapBuffers(frame, screenDirty)))) {
return false;
}
return *requireSwap;
}
接著看swapBuffers
frameworks/base/libs/hwui/renderthread/EglManager.cpp
bool EglManager::swapBuffers(const Frame& frame, const SkRect& screenDirty) {
...
eglSwapBuffersWithDamageKHR(mEglDisplay, frame.mSurface, rects, screenDirty.isEmpty() ? 0 : 1);
...
return false;
}
eglSwapBuffersWithDamageKHR方法對應了systrace eglSwapBuffersWithDamageKHR標簽.主要干的事就是queueBuffer,不詳細分析了,分析方法類似dequeueBuffer,直接看最后的調用點:
frameworks/native/opengl/libs/EGL/eglApi.cpp
EGLBoolean egl_window_surface_v2_t::swapBuffers()
{
...
nativeWindow->queueBuffer(nativeWindow, buffer, -1);
buffer = 0;
// dequeue a new buffer
int fenceFd = -1;
if (nativeWindow->dequeueBuffer(nativeWindow, &buffer, &fenceFd) == NO_ERROR) {
sp<Fence> fence(new Fence(fenceFd));
// fence->wait
if (fence->wait(Fence::TIMEOUT_NEVER)) {
nativeWindow->cancelBuffer(nativeWindow, buffer, );
return setError(EGL_BAD_ALLOC, EGL_FALSE);
}
...
}
這里先將保持好繪制數據的Buffer 通過queueBuffer把Buffer投入BufferQueue ,并通知SurfaceFlinger去BufferQueue中acquire Buffer出來合成,然后再通過dequeueBuffer申請一塊Buffer用來處理下一次請求。這里是典型的生產者消費者過程,之前圖形系統的文章多次說過了,有興趣的可以去看之前的圖形系統文章Android圖形系統篇總結。
后續就是SurfaceFlinger的合成操作了,可以參考之前圖形系統文章:
Android圖形系統(十)-SurfaceFlinger啟動及圖層合成送顯過程
注:
之前圖形系統文章不是9.0的,屬于初期學習總結,重在捋思路和流程。
最后簡單總結下整個硬件加速繪制和渲染的流程圖,我懂的,一圖勝千言:
繪制和渲染過程是整個硬件加速的核心,流程圖只涵蓋了核心調用流程,要吃透這部分內容還需要花大量時間去源碼中研究和摳細節,我這只算拋磚引玉了,有問題歡迎交流,覺得文章還可以的麻煩點個贊。
最后談幾點硬件加速設計優點:
繪制命令到GL命令之間引入DisplayList"中間命令",起到一個緩沖作用, 當需要繪制時才轉化為GL命令.在View中只針對視圖臟區域做RenderNode和RenderProperties調整即可,最后更新DisplayList,從而避免重復繪制和數據組織操作。
對繪制操作進行batch/merge可以減少GL的draw call,從而減少渲染狀態切換,提高了性能。
將渲染任務轉到RenderThread,進一步解放UIThread. 同時將CPU不擅長的圖形計算轉換成GPU專用指令由GPU完成, 也進一步提升了渲染效率.
另外推薦兩篇參考過的好文:
http://www.lxweimin.com/p/dd800800145b
https://blog.csdn.net/jinzhuojun/article/details/54234354