The image below shows you what each thread is working on at various points in time using Chrome's profile viewer. For instance, here's what you should see when you first start it up:
Right now there is a 46-second render and the window is opened to threads 0 and 1. If you scroll down you will see a bunch more threads. If there's a scaling issue it can help to compare at a single point in time what each thread is doing, for instance, you might find when doing subdivision that one thread is working on it while the other threads are waiting. Since, in this example, there are shader issues, it is improbable that is the case and you can probably just focus on a single thread. You can use the GUI tool to zoom in around the 28-second mark. Click on the "forward/backward" looking arrow (selected in the screenshot above and immeidately below) and then use that to zoom in:
So now you can see exactly what is going on at each moment. Let's focus on thread 1. The very first entry indicates that a camera ray is being traced which hits a surface that is calling a node called "aiMixSparePart". If you click on that node, you can get more details which you can see at the bottom of the window. So this is a mix_shader. Because of how the Arnold profiler does the time sampling, the individual times should be ignored: this call did not actually take 31ms of wall time (in truth it was probably less than 1ms). However, as you will see later these timing stats are accurate when you combine them with thousands of other stats, similar to how path tracing with a single ray gives a noisy "wrong" image but many rays average out and give a smooth accurate looking image.
So aiMixSparePart calls another aiMixSparePart and if you select that you get more info stating that this is the "surface closure" and clicking on the next one says "AiShaderEvalParamFuncClosure", so this is showing the various bits of work being done by this one mix shader. The mix shader then calls ai_base_GPU which if you click on it you'd find is just a "standard_surface" node and that standard_surface then calls AiShaderEvalParamFuncRGB which is the aiCC_armour shader and selecting that says it's a color_correct shader. If you continue on through the shader network you will get to the bottom and see the robot_6 node which is an image shader. This is important: whatever is at the bottom of this callstack is what happened to be executed by the profiler at this moment in time and if you see a bunch of similar nodes at the bottom, then that is where time is being spent.
So what do you see at the bottom as you move to the right? You see robot_6 (image shader), aiMixSparePart, AiTextureHandleAccess (reading textures), polySurfaceShape6 (polymesh::intersect, which is the actual ray tracing part), AiMakeRay, two "root" (BVH::intersection of the top level BVH), robot_7 which is another image node, and finally a bump2d shader. So from this already you can tell that textures are likely to be a big time sink (specifically the robot_6 and robot_7 image nodes at this particular time slice) and there is of course time being spent tracing rays through geometry. If this is all there is to go on, it is recommended that you try to use fewer textures and reduce the geometric complexity.
But of course, this is just a short snippet of time. You really should keep scrolling to the right and looking at more entries. But this can be tiresome, so let's zoom back out and instead get a big picture sense of where time is going. I've now changed the cursor to use the selection mode (the regular looking arrow icon) and selected all the entries in thread 1 from about the 10s to 35s mark. Note that If you choose items near the start of the render you are more likely to get displacement, BVH builds, texture loading, etc. for where time is spent, and since you will be looking at regular shading times, you shouldn't care about that so more have been selected from the middle of the render.
If you click on "self-time" you can sort by what nodes are taking the most amount of time. Confirming the previous observations, textures are indeed an expensive component (note how AiTextureHandleAccess and all the robot_* image shaders are showing up at the top), and the mix shader is also expensive. Is it possible to remove some of these mixes? Likewise, you can see the root, polySurfaceShape8, and other BVH/polymesh intersection work showing up. What's not worth worrying about? Well, even though there are lots of color correct nodes showing up, the actual time spent by these nodes (the self-time), seems to be low, so it does not matter about them being here since texturing is so much more expensive.
If you look at the Arnold log stats, you would have found much of this information already:
top session self-times by category
AiTextureHandleAccess 0:08.09 (17.83%)
BVH::intersect 0:07.08 (15.61%)
root 0:04.12 ( 9.10%)
polySurfaceShape8 0:00.51 ( 1.13%)
polySurfaceShape6 0:00.26 ( 0.58%)
surface closure 0:05.47 (12.05%)
aiMixSparePart 0:03.87 ( 8.55%)
ai_concreteWall 0:01.09 ( 2.41%)
ai_groundConcrete 0:00.45 ( 1.00%)
AiLightsPrepare 0:05.21 (11.50%)
polySurfaceShape6 0:01.03 ( 2.27%)
polySurfaceShape8 0:00.88 ( 1.95%)
robot_soldier_0040:polySurfaceShape2026 0:00.33 ( 0.74%)
image 0:04.51 ( 9.96%)
robot_6 0:01.13 ( 2.51%)
robot_3 0:01.06 ( 2.35%)
robot_2 0:00.84 ( 1.85%)
robot_1 0:00.56 ( 1.24%)
robot_7 0:00.48 ( 1.06%)
robot_5 0:00.29 ( 0.65%)
polymesh::intersect 0:04.26 ( 9.40%)
polySurfaceShape8 0:00.46 ( 1.02%)
polySurfaceShape6 0:00.34 ( 0.76%)
robot_soldier_0040:polySurfaceShape2026 0:00.23 ( 0.52%)
standard_surface 0:01.56 ( 3.46%)
ai_base_GPU 0:01.25 ( 2.76%)
AiShaderEvalParamFuncFlt 0:01.42 ( 3.15%)
ai_base_GPU 0:00.25 ( 0.56%)
TraceCameraRay 0:01.27 ( 2.82%)
bump2d (bump2d3) 0:00.90 ( 2.00%)
So notice that about 17.83%+9.96%=27.8% is directly spent doing textures and 15.6%+9.4%=25% is ray tracing work. 11.5% is shadows. Most of the rest is shading. For more details on the shading, the detailed Chrome profiler results can help to see how the shading network itself is spending time.