While we don’t have any actual development progress to report on, I’ve decided to use this blog for some (hopefully) useful articles on game development. Unless you’re working with Unity3D engine, you can probably skip this post entirely.
I’ve been thinking lately about some performance “tips” for Unity3D that everyone seems to be repeating. One of the common tips is “don’t use GetComponent
, it’s slow” and its more strong version “don’t use built-in properties like transform
, they’re just a wrapper around GetComponent
.” This seems right, but has anyone tested it? I know of one attempt, but seems kinda lacking. I’ve repeated that one myself some time ago, but that was with older version of Unity… and, I think I can measure a lot more things. So, let’s get to benchmarking!
We’ll be using this little helper class for measuring performance:
Timer code
Let’s start with simple things. Measuring absolutely nothing:
using (new Timer("Empty")) { }
Profiled Empty: 0.00ms
Just as expected, this takes no time at all. Sanity check passed! Now, since we’re measuring pretty fast things, we’re gonna repeat every test a bunch of times. 100 000 000 times, to be exact. So, let’s measure an empty cycle:
const int NUMBER = 100000000; using (new Timer("Empty cycle")) { for (var ii = 0; ii < NUMBER; ii++) { } }
Profiled Empty cycle: 38.00ms
This is gonna give us a baseline: on my machine, just the overhead of counting to 100 000 000 times takes 38ms. Or, rather, 28-48 – I’ve tested a few times, and it’s not very stable. Let’s actually do something in the cycle now. The fastest methods of accessing anything should be when it’s cached in a class field; and let’s throw in a property access as well:
measurementsprivate Transform m_Transform; private Transform TransformProperty { get { return m_Transform; } } void OnEnable() { m_Transform = transform; }
Measurement:
Transform t; using (new Timer("Field")) { for (var ii = 0; ii < NUMBER; ii++) { t = m_Transform; } } using (new Timer("Property")) { for (var ii = 0; ii < NUMBER; ii++) { t = TransformProperty; } }
Results:
Profiled Field: 29.00ms Profiled Property: 191.00ms
Field access is just as fast as an empty cycle! This can be explained in two ways: first, the compiler (or JIT) might have just optimised the access away – since we’re not doing anything with it. Second, field access itself is so fast that it’s lost in the noise. I think the first is more likely to be right, because accessing a property, as opposed to field, added a whole 150ms to our running time. While property should add some overhead, I don’t think it’s adding more than an order of magnitude – so probably, some of these 150ms is just field access that wasn’t optimised away. In any case, it’s still nothing to worry about: 150ms over 100 000 000 times gives about 1.5 nanoseconds per access.
OK, that was just a property that we declared ourselves – basically, we just confirmed that accessing a cached component is fast. What about the “evil” built-in property?
measurementsTransform t; using (new Timer("Built-in property")) { for (var ii = 0; ii < NUMBER; ii++) { t = transform; } }
Results:
Profiled Built-in Property: 1937.00ms
That’s quite a lot slower! Around 12-15 times slower, in fact, compared to our cached property. What’s going on? Is Unity actually calling GetComponent
behind the scenes? To find out, let’s measure GetComponent
call itself!
Transform t; using (new Timer("GetComponent<T>")) { for (var ii = 0; ii < NUMBER; ii++) { t = GetComponent<Transform>(); } } using (new Timer("GetComponent")) { for (var ii = 0; ii < NUMBER; ii++) { t = (Transform)GetComponent(typeof(Transform)); } }
Results:
Profiled GetComponent<T>: 9258.00ms Profiled GetComponent: 4882.00ms
There are two surprises here. First, GetComponent
is actually slower than built-in property. Second, generic version of this method is about twice as slow as its non-generic counterpart! The latter observation seems to be relevant for all generic methods: the overhead of calling a generic method is considerably larger than that of a non-generic one. I’ve experimented with different methods, including ones I’ve written myself, and this seems to hold.
The first observation is more puzzling though. The built-in property clearly does not call GetComponent
internally (or it wouldn’t have been faster), but it’s still slower than my own. So, what does it do? Before we answer that, let’s check out another surprising thing. While transform
property might be kinda complex, Transform.position
should be really straightforward. I mean, just return the vector, that’s all! Measuring it now:
Transform t; using (new Timer("Field.position")) { for (var ii = 0; ii < NUMBER; ii++) { var p = m_Transform.position; } } using (new Timer("Property.position")) { for (var ii = 0; ii < NUMBER; ii++) { var p = TransformProperty.position; } } using (new Timer("Built-in Property.position")) { for (var ii = 0; ii < NUMBER; ii++) { var p = transform.position; } }
Results:
Profiled Field.position: 2155.00ms Profiled Property.position: 2160.00ms Profiled Built-in Property.position: 3949.00ms
Whaat?! Even when we use lightning-fast field access, the moment we try to read position
, execution time jumps to 2 seconds! In fact that time is surprisingly close to what we got when accessing built-in transform
… and when we use both built-in properties, we got 4 seconds. It seems that any built-in Unity3D property has around 20ns overhead! To understand how that happens, let’s look at Unity code. I believe decompiling UnityEngine.dll is technically illegal, but since it’s such an awesome source of information, I’m gonna do it anyway:
This is what transform
property looks like:
public Transform transform { get { return this.InternalGetTransform(); } } [WrapperlessIcall] [MethodImpl(MethodImplOptions.InternalCall)] internal Transform InternalGetTransform();
And this is Transform.position
:
public Vector3 position { get { Vector3 vector3; this.INTERNAL_get_position(out vector3); return vector3; } set { this.INTERNAL_set_position(ref value); } } [WrapperlessIcall] [MethodImpl(MethodImplOptions.InternalCall)] private void INTERNAL_get_position(out Vector3 value);
They seem pretty similar, don’t they? In case you’re not very familiar with C#, what this code means is that both properties are actually implemented in native code. Communication between C# and native C++ carries its own overhead; and I believe this is what we’re seeing. Just calling any native code from C# takes additional 20ns (on my machine.)
This explains the cost of using built-in properties. This could also make you wary of using any built-in properties – including coordinates, object names, velocities etc. Unfortunately, these are not practical to cache, so you have to eat up the cost. Fortunately, this cost is actually pretty small! 15 times relative increase might seem like a lot, but in absolute terms it’s still peanuts.
Actually, even the dreaded GetComponent
call doesn’t look all that bad, especially the non-generic version. It’s only about 2-3 times slower than accessing the built-in property.
There’s still one interesting benchmark left, though. We don’t really know what GetComponent
method does in native code, but it clearly has to do some sort of search through all attached components. If that’s the case, it should become slower, the more components we add to our gameobject. Let’s test that!
Code:
using (new Timer(name+" GetComponent")) { for (var ii = 0; ii < NUMBER; ii++) { r = (Rigidbody)GetComponent(typeof(Rigidbody)); } }
Our gameobjects:
Note that we’re using Rigidbody
instead of Transform
! This is because we need to search for last component on the gameobject, to maximize search time (assuming the search is linear)
Results:
Profiled Complex GetComponent: 9745.00ms Profiled Simple GetComponent: 5513.00ms
Just as expected, having a bunch of components on the gameobject can slow GetComponent
down significantly. Note that if we tried to access Transform
component, there would be no slowdown (I’ve actually tried). This seems to indicate that GetComponent
uses a simple linear search under the hood. It actually makes sense – given that most objects have less than 10 components, simple linear search would probably be faster than any fancy-pants O(ln(n)) algorithms, due to lack of overhead.
tl;dr
The takeaway from all this is:
- Unity’s built-in properties are slower than direct access, and that includes all properties, not just those for different components.
- However, they’re not slow enough to actually matter most of the time.
GetComponent
method is slower still, but even that is not the performance-killer. A single raycast is probably enough to drown aGetComponent
call.- You should probably cache the
GetComponent
calls that happen every frame. Don’t bother with occasional calls, they’re not that bad. - There’s no need to cache built-in properties, unless the script in question runs literally thousands of times per frame.
- You should prefer non-generic methods to generic ones, especially in performance-critical sections
Note that I’ve done all these benchmarks on a Windows PC. The overheads might be different on different platforms, so it’s best to repeat all tests yourself. Only then you would know for sure.
And the last note. While I called GetComponent
call “slow”, there is no faster alternative. I’ve tried to create one, by caching an array of all components in OnEnable
, and searching through it. Without going into details, I can say that no code of mine outperformed the GetComponent
, despite having no managed-to-native transitions. So, in fact, GetComponent
is actually quite fast for what it does!
The reason the generic versions of various methods are slower than the non-generic one is quite simple:
They were tacked-on after-the-fact and essentially just call the non-generic ones, after getting the name of the type parameter as a String. That conversion to String is not a cheap operation (especially on mobile, although to be fair I last tested String-related mobile vs. desktop performance on a first-yen iPhone so I’m guessing things have come a ways since then).
Behind the scenes, Unity is still indexing and working with everything by String rather than by type — the generic methods are there only to allow developers using Unity to get some compile-time type safety benefits.
This was my first thought too, but it’s does not explain everything. I’ve tried my own generic and non-generic methods, and generic methods are slower. Even when literally the only difference between them is the addition of
<T>
in the generic one.There may also be a performance overhead in terms of generated code, which may explain a lack of interest by UT in restructuring things around types instead of strings, but ISTR looking at the disassembled code and seeing it use the Type.Name property under the hood…
I never thought to measure which was the greater overhead, as at the time it hadn’t occurred to me generics might be slower.
Disregard this comment, I just failed to check “Notify me of follow-up comments by email” on the last one so I’m posting this in order to get follow-ups… Sorry!
Nice blog post, but I feel like your tests should also cover garbage before you start making performance tips. Because I have a feeling that a lot of the things you say aren’t so bad will in fact generate garbage, and that adds up on a handheld device and then the garbage collector comes more often, resulting in choppy performance.
I’m pretty sure that nothing that I benchmarked allocates any heap memory. You’re right though, I should’ve tested and mentioned that explicitly.
Alternatively, force a couple GC cycles before the test run in case there’s memory pressure elsewhere in the system…
Great Post!!, OK, I’ll try to don’t use built-in properties on update() calls. Thank you!
Which platform did you profile?
btw, transform.position is actually more complex than most people think. It computes the absolute position (which is not cached), not the local position. that’s why it does not just return a vector.
Pingback: Tutorial: 1st-person sneak in Unity 5, part 2 | Beyond Skin
Pingback: Unity Performance Testing! GetComponent, Fields, Tags | sam's devblog
I ran the GetComponent Generic/Non-Generic test on a Unity 5.1.0f3 build and I got contrary results i.e. the non-generic GetComponent was ~twice as slow as the generic one:
Profiled GetComponent T: 5904.00ms
Profiled GetComponent: 10806.00ms
It would be nice to see these tests repeated periodically in newer versions of Unity. I’m especially curious about IL2CPP performance and Unity 5. My fear is that developers will just read this, assume it’s gospel, then write messier code to work around something that may have been optimized by the Unity team.
That’s actually a good idea. I might do that.
Also, it would be nice to clearly identify which version of Unity these tests were done on, were they using IL2CPP, etc. A disclaimer that newer versions might have optimized performance would also be handy.
It would be really great to see this test redone in the latest version of Unity, esp. with IL2CPP.
Tested in Unity 2017.1 on Windows 10, CPU i7 4770k, x86_64
Run in Editor:
Field: 673ms
Property: 1629ms
Built-in property: 2794ms
GetComponent: 5440ms
GetComponent: 8674ms
Standalone build:
Field: 73ms
Property: 33ms
Built-in property: 616ms
GetComponent: 1563ms
GetComponent: 2655ms
Alex, the same in my tests on Unity 2017.3 on Windows 10 64bit CPU i7-6700k.
In Editor:
Empty loop: 430ms
Field: 530ms
Property: 1350ms
Built-in property: 2480ms
GetComponent: 4840ms
GetComponent(typeof()): 7980ms
Field.position: 3650ms
Property.position: 4450ms
transform.position: 5550ms
Standalone build:
Empty loop: 41ms
Field: 25ms
Property: 29ms
Built-in property: 479ms
GetComponent: 1500ms
GetComponent(typeof()): 2450ms
Field.position: 1466ms
Property.position: 1456ms
transform.position: 1900ms
As result: getting “transform” is easier than GetComponent, but havier than gettinf just field. Also getting “position” is more hevier than getting “transform”.
Also standalone runs smoother than Editor mode ))
Thx! Great article
Nice article! I have been wondering about this for a while. I was also thinking of making some elaborate system to cache components and perform index lookups on the c# side, sounds like there is no need, phew!
Any chance to re-run these tests again in Unity 2019?
“Rigid Body Test” I don’t think you can assume that if a rigid body is the last attached component it the last one. You should compare adding it first and last.
TLDR
#1 Item : Don’t treat this as gospel. Appreciate the need to time and profile on newer versions.
Even in the comment sections you see various people running test with different results.
Suggestion: You could add all these tests a Unit Test inside any Unity Project. Then as you upgrade versions you’ll get a sense if you’re profile/optimziations rules have changed. For instance I don’t see your same timing pattern on GetComponet v. GetComponent()
Great article. Thanks.
Interesting article. I thought I would have a go at profiling the latest LTS release of Unity. Here’s my results.
Unity 2019.4.0f1 on Windows 10, CPU Intel Core i7-3770K CPU @ 3.50GHz
Windows standalone x86_64 build using Mono backend:
Profiled Empty cycle: 25.00ms
Profiled Field: 25.00ms
Profiled Property: 25.00ms
Profiled Built-in property: 1874.00ms
Profiled GetComponent: 3323.00ms
Profiled GetComponent: 5867.00ms
Profiled Field.position: 2593.00ms
Profiled Property.position: 2581.00ms
Profiled Built-in Property.position: 4281.00ms
Windows standalone x86_64 build using IL2CPP backend:
Profiled Empty cycle: 0.00ms
Profiled Field: 0.00ms
Profiled Property: 0.00ms
Profiled Built-in property: 1849.00ms
Profiled GetComponent: 6552.00ms
Profiled GetComponent: 8945.00ms
Profiled Field.position: 1643.00ms
Profiled Property.position: 1641.00ms
Profiled Built-in Property.position: 3462.00ms
My results including using the Mono and IL2CPP scripting backend. The latter compiles C# to CIL to C++ finally to native machine code, in which case I guess this eliminates the C# to native code call overheads. Though I’m not sure what to make of the results – all of the tests are faster in the IL2CPP build than the Mono build, except for the GetComponent tests.