Unity3D: Mythbusting performance

While we don’t have any actual development progress to report on, I’ve decided to use this blog for some (hopefully) useful articles on game development. Unless you’re working with Unity3D engine, you can probably skip this post entirely.

I’ve been thinking lately about some performance “tips” for Unity3D that everyone seems to be repeating. One of the common tips is “don’t use GetComponent, it’s slow” and its more strong version “don’t use built-in properties like transform, they’re just a wrapper around GetComponent.” This seems right, but has anyone tested it? I know of one attempt, but seems kinda lacking. I’ve repeated that one myself some time ago, but that was with older version of Unity… and, I think I can measure a lot more things. So, let’s get to benchmarking!

We’ll be using this little helper class for measuring performance:
Timer code


Obligatory benchmarking note: the times presented only make sense for one particular computer. Only use absolute values as ballpark estimates, if at all. I’ve also ran the tests multiple times; the figures used are some typical values, not exact mean values or anything.

Let’s start with simple things. Measuring absolutely nothing:

using (new Timer("Empty")) { }
Profiled Empty: 0.00ms

Just as expected, this takes no time at all. Sanity check passed! Now, since we’re measuring pretty fast things, we’re gonna repeat every test a bunch of times. 100 000 000 times, to be exact. So, let’s measure an empty cycle:

   
const int NUMBER = 100000000;
using (new Timer("Empty cycle"))
{
    for (var ii = 0; ii < NUMBER; ii++) { }
}
Profiled Empty cycle: 38.00ms

This is gonna give us a baseline: on my machine, just the overhead of counting to 100 000 000 times takes 38ms. Or, rather, 28-48 – I’ve tested a few times, and it’s not very stable. Let’s actually do something in the cycle now. The fastest methods of accessing anything should be when it’s cached in a class field; and let’s throw in a property access as well:

measurements

Setup:

    private Transform m_Transform;
    private Transform TransformProperty { get { return m_Transform; } }
    void OnEnable()
    {
        m_Transform = transform;
    }

Measurement:

    Transform t;
    using (new Timer("Field"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            t = m_Transform;
        }
    }
    using (new Timer("Property"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            t = TransformProperty;
        }
    }

Results:

Profiled Field: 29.00ms
Profiled Property: 191.00ms

Field access is just as fast as an empty cycle! This can be explained in two ways: first, the compiler (or JIT) might have just optimised the access away – since we’re not doing anything with it. Second, field access itself is so fast that it’s lost in the noise. I think the first is more likely to be right, because accessing a property, as opposed to field, added a whole 150ms to our running time. While property should add some overhead, I don’t think it’s adding more than an order of magnitude – so probably, some of these 150ms is just field access that wasn’t optimised away. In any case, it’s still nothing to worry about: 150ms over 100 000 000 times gives about 1.5 nanoseconds per access.

OK, that was just a property that we declared ourselves – basically, we just confirmed that accessing a cached component is fast. What about the “evil” built-in property?

measurements

Measurement:

    Transform t;
    using (new Timer("Built-in property"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            t = transform;
        }
    }

Results:

Profiled Built-in Property: 1937.00ms

That’s quite a lot slower! Around 12-15 times slower, in fact, compared to our cached property. What’s going on? Is Unity actually calling GetComponent behind the scenes? To find out, let’s measure GetComponent call itself!

measurements

Measurement:

    Transform t;
    using (new Timer("GetComponent<T>"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            t = GetComponent<Transform>();
        }
    }
    using (new Timer("GetComponent"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            t = (Transform)GetComponent(typeof(Transform));
        }
    }

Results:

Profiled GetComponent<T>: 9258.00ms
Profiled GetComponent: 4882.00ms

There are two surprises here. First, GetComponent is actually slower than built-in property. Second, generic version of this method is about twice as slow as its non-generic counterpart! The latter observation seems to be relevant for all generic methods: the overhead of calling a generic method is considerably larger than that of a non-generic one. I’ve experimented with different methods, including ones I’ve written myself, and this seems to hold.

The first observation is more puzzling though. The built-in property clearly does not call GetComponent internally (or it wouldn’t have been faster), but it’s still slower than my own. So, what does it do? Before we answer that, let’s check out another surprising thing. While transform property might be kinda complex, Transform.position should be really straightforward. I mean, just return the vector, that’s all! Measuring it now:

measurements

Measurement:

    Transform t;
    using (new Timer("Field.position"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            var p = m_Transform.position;
        }
    }
    using (new Timer("Property.position"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            var p = TransformProperty.position;
        }
    }
    using (new Timer("Built-in Property.position"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            var p = transform.position;
        }
    }

Results:

Profiled Field.position: 2155.00ms
Profiled Property.position: 2160.00ms
Profiled Built-in Property.position: 3949.00ms

Whaat?! Even when we use lightning-fast field access, the moment we try to read position, execution time jumps to 2 seconds! In fact that time is surprisingly close to what we got when accessing built-in transform… and when we use both built-in properties, we got 4 seconds. It seems that any built-in Unity3D property has around 20ns overhead! To understand how that happens, let’s look at Unity code. I believe decompiling UnityEngine.dll is technically illegal, but since it’s such an awesome source of information, I’m gonna do it anyway:

viewing this is probably illegal, but illuminating

This is what transform property looks like:

public Transform transform
{
    get
    {
       return this.InternalGetTransform();
    }
}
[WrapperlessIcall]
[MethodImpl(MethodImplOptions.InternalCall)]
internal Transform InternalGetTransform();

And this is Transform.position:

public Vector3 position
{
    get
    {
        Vector3 vector3;
        this.INTERNAL_get_position(out vector3);
        return vector3;
    }
    set
    {
        this.INTERNAL_set_position(ref value);
    }
}
[WrapperlessIcall]
[MethodImpl(MethodImplOptions.InternalCall)]
private void INTERNAL_get_position(out Vector3 value);

They seem pretty similar, don’t they? In case you’re not very familiar with C#, what this code means is that both properties are actually implemented in native code. Communication between C# and native C++ carries its own overhead; and I believe this is what we’re seeing. Just calling any native code from C# takes additional 20ns (on my machine.)

This explains the cost of using built-in properties. This could also make you wary of using any built-in properties – including coordinates, object names, velocities etc. Unfortunately, these are not practical to cache, so you have to eat up the cost. Fortunately, this cost is actually pretty small! 15 times relative increase might seem like a lot, but in absolute terms it’s still peanuts.

Actually, even the dreaded GetComponent call doesn’t look all that bad, especially the non-generic version. It’s only about 2-3 times slower than accessing the built-in property.

There’s still one interesting benchmark left, though. We don’t really know what GetComponent method does in native code, but it clearly has to do some sort of search through all attached components. If that’s the case, it should become slower, the more components we add to our gameobject. Let’s test that!

measurements

Code:

    using (new Timer(name+" GetComponent"))
    {
        for (var ii = 0; ii < NUMBER; ii++)
        {
            r = (Rigidbody)GetComponent(typeof(Rigidbody));
        }
    }

Our gameobjects:

ccg_gameobjects

Note that we’re using Rigidbody instead of Transform! This is because we need to search for last component on the gameobject, to maximize search time (assuming the search is linear)
Results:

Profiled Complex GetComponent: 9745.00ms
Profiled Simple GetComponent: 5513.00ms

Just as expected, having a bunch of components on the gameobject can slow GetComponent down significantly. Note that if we tried to access Transform component, there would be no slowdown (I’ve actually tried). This seems to indicate that GetComponent uses a simple linear search under the hood. It actually makes sense – given that most objects have less than 10 components, simple linear search would probably be faster than any fancy-pants O(ln(n)) algorithms, due to lack of overhead.

tl;dr

The takeaway from all this is:

  • Unity’s built-in properties are slower than direct access, and that includes all properties, not just those for different components.
  • However, they’re not slow enough to actually matter most of the time.
  • GetComponent method is slower still, but even that is not the performance-killer. A single raycast is probably enough to drown a GetComponent call.
  • You should probably cache the GetComponent calls that happen every frame. Don’t bother with occasional calls, they’re not that bad.
  • There’s no need to cache built-in properties, unless the script in question runs literally thousands of times per frame.
  • You should prefer non-generic methods to generic ones, especially in performance-critical sections

Note that I’ve done all these benchmarks on a Windows PC. The overheads might be different on different platforms, so it’s best to repeat all tests yourself. Only then you would know for sure.

And the last note. While I called GetComponent call “slow”, there is no faster alternative. I’ve tried to create one, by caching an array of all components in OnEnable, and searching through it. Without going into details, I can say that no code of mine outperformed the GetComponent, despite having no managed-to-native transitions. So, in fact, GetComponent is actually quite fast for what it does!

How we screwed up

There’d been no new posts here for almost half a year already. This is mostly caused by the fact that writing about your failure is kinda hard. Still, I think I have to write this – I owe that to everyone who were interested in Xenos; and maybe this tale would prevent someone from making the same mistakes.

We stopped working on Xenos sometime in December. It was not a single decision we’ve made: rather, it just happened organically. Working less and less, diverting into side projects, saying things like “next week I’m gonna work on Xenos at last” – gradually, we just stopped making any progress whatsoever. It was disheartening to realise that; but at the same time, somewhat liberating.

In January, I did some full-time contract work. When that ended, I’ve “officially” pulled the plug on Xenos. We were spending a lot of effort, but not making any progress – so it made no sense to continue.

I’ve been thinking (obviously!) about what went wrong, and here’s what I’ve got. The turning point was my decision back in June to switch to a new, better and more detailed art style. I was overexcited then, seeing other people get interested in my game and offer help; and in this excitement did not think clearly. The problem was not the art itself (although even that did not live up to expectations). The problem was that more detailed art resulted in increased complexity across the board.

When I was first envisioning Xenos, I settled on a game that was just complex enough so that I could pull it off. A little more, and it would become impossible… new art added more than a little. Most of this added complexity happened because of tiles. Old version of Xenos was tile-based – all objects were placed on a strict grid. New art required objects with different sizes and rotations, so the grid had to go. With it, a lot of simplifying assumptions were gone: no more “only one object per tile”, no more easy pathfinding, etc. New art was also more detailed, requiring much more complicated physics. Suddenly, floors and walkways that were flush with the ground weren’t cutting it – I had to add elevation. This complicated the pathfinding even more, and animations… well, animations were a pain in the ass even without that. And so on, complication after complication arised.

All this stuff required a lot of coding. And a lot of arguments with artists – as they wanted the game to be prettier, while I was fighting to keep complexity low. The development was getting too slow – which is bad for motivation. And on the other hand – the new art was not good enough to inspire. I mean, it was technically better than my voxel creations. Way better, even. But, even compared to ugly voxels, it was plain and generic-looking. I had some talented artists working on different pieces, but no one to coordinate that work and ensure that all of these pieces form a stylish whole. This did not help motivation either.

To top it off, while struggling with all these problems, I’d lost my vision for the game. The mechanics that made sense in the old version had to be re-worked, and they, too, did not form any coherent whole. I finally decided to stop working on Xenos when I noticed this. You can possibly overcome all difficulties when you know what you’re doing; but without any vision, continuing just made no sense.

So, where does this leave us? I’m not yet ready to throw the towel on this “indie development” thing, even if my first project failed. We’ve spent some non-insignificant money on Xenos, but still have enough to continue (thanks to some contract work I’ve done lately). Notice that I’m using plural here: Chaos Cult Games now consists of two people working together, helped by some friends and contract workers from time to time. Xenos is not completely dead either. We’re not working on it right now, but we did a lot of work, and might well use it later. Xenos would have to change, if we ever return to it, but it is still possible.

In the meantime, we took part in a game jam, making a quick little game that some people loved. We’re not going to develop it further, but you can play it here: http://chaoscultgames.com/play-darwins-paradox/ This taught us some lessons about rapid development and prototyping, which we’re going to apply to our next project. We already have a prototype for a dungeon-crawl game, some ideas for a space sim, and more. We’re gonna settle on one of those (or maybe some completely new and wild idea!) soon. And once we do, you can expect to see a playable demo as fast as possible – that’s actually one of the lessons from games jam. So, see you then!

Animation woes

For the last two weeks, I started every day working on Xenos with the resolution: “Today, I’m going to finish working on robot animations and write a post about it!” As you can see, this didn’t work out exactly as planned.

robot_scout_close

Alien scout close-up

We got the first robot – we call it a “scout” – modeled and textured some time in september. However, animating it proved a really tough undertaking. At first, the animations just didn’t look right. It turns out that making robotic motions is hard, especially when the robot is not a giant multi-ton vaguely-humanoid mecha, as everyone is used to. Alien scouts in Xenos are relatively small and have thin legs; when animated they mostly looked like an insect of a little bird, not a mechanism. It took us a week of heated discussions and constant reworking to arrive at something nice (and even now I’m not entirely satisfied… maybe there’s something we’ve overlooked?) Add to this another problem that appeared as soon as I added first robot to the game: speed.

The player in Xenos should be able to move at a quite brisk pace: otherwise it’s just too frustrating to spend several second on crossing the road. Animation-wise, this translates to running, which is (probably) fine for humans, but robots are usually associated with slow, deliberate movements. (Yes, I’ve heard about Cheetah. But the thing is, its movements are not very robotic.) While we have a running animation for the scout, it’s not super-convincing.

When you have walk and run animations, next thing you want is blend them together, so that  the robot can move at any speed (between walk and run). Which brings me to the next problem: Mecanim. It’s a hyped addition to Unity game engine, that is supposed to greatly simplify setting up animations. Well, to put it short, it does not. I’ve had some problems with Mecanim before, but this time I’ve decided to really give it a chance… and after spending 3 days wrangling with the system, I am certain that it is horrible, broken and nearly unusable. Or, well, to be completely honest, it’s probably nice – provided you agree to do everything the Mecanim way. It is made with a very specific development process and architecture in mind, and does not really work otherwise… and, of course, Xenos does not adhere to Mecanim ideas.

After stripping the whole Mecanim nonsense and reimplementing blending with code, I finally had a working robot in-game… but that does not mean that work is finished! To be actually interesting, the robot has to do something besides walking around. I’m working on it right now: adding some AI to the robots and making them interact with the player.

This is how it should work, when it’s finished. First, I have an Alien Spawn System, that handles spawning and removing robots. The idea behind this is, I don’t want to simulate all the individual robots in the game world – there would be hundreds of them, and this has no immediate benefit. The player would probably never notice this, especially since all robots (of the same type) are exactly alike. But I still want to simulate alien activities on some level. So, the Spawn System is tasked with keeping track of general “danger level” of all places. Whenever a player is in a dangerous place (that is, almost all the time), the System spawns a number of robots depending on danger level, somewhere relatively close, but far enough to be off-screen.

The robots have an AI that causes them to approach the player. But not directly – I want the player to feel that robots don’t care about him at all, not that they’re all out to hunt him. So, the AI actually selects a target nearby the player – for example, the scout would select some environment object, approach it and play “scanning” animation. This would show that aliens really do some stuff here, and ensure that the player is actually around to see it.

After spending some time near player, the robot AI then selects some destination  far away, and goes there. The Spawn System checks all robots, and removes the ones that went far enough away from the game: the player is not there to see them, so they’re not needed.

Or that’s how it goes if the player does not interfere. If the robot sees the player, it becomes alerted: it stops and looks directly at the player. A couple of seconds later, if the player does not run away, the robot attacks. The idea behind this is to make robot’s intentions crystal clear: at first, it pays no heed to the player. Then, it becomes interested. Then, aggressive. The “alerted” state also gives a chance to run: robot’s shot are quite deadly and hard to dodge (or they will be, once I actually implement them properly.)

This system, however, requires adding more animations – scanning objects, turning around looking for the player, etc. Which is why I’m not showing videos just yet – there’s still a lot to do until it’s finished, even though every day it seems that we’re almost there.

To make up for this post’s lack of pictures, though, I can show a couple of screens of the environment. We’re slowly but surely adding more and more stuff to the city, and it’s starting to look nice. It’s way easier when stuff is not animated (-8

A robot takes a stroll in the park

A drugstore

Street corner with a billboard

Fuel station.

Fuel station.

Building a city

Let’s talk a bit about map generation in Xenos. I’ve started development of the game with making a procedural map generation system, and I’ve talked about it in the very first post. However, then we decided to change everything, and the old system had been summarily ripped out; replaced with a new, kinda-similar-but-not-quite one. So, how does Xenos create maps now?

I still have not solved it in general - current algorithm is heavily geared toward creating town maps. On the other hand, it’s simple and it works. As you may remember, maps in new Xenos are made of chunks, that are approximately 10×10 old tiles in size. A single chunk can contain a building or a length of road – i.e. open street. And a single “location” is 20×20 chunks – for now, this is the whole of Xenos, but in the future I’ll add more locations that connect seamlessly.

buildtown

Creating road grid

The generation of a location starts with a road grid. I used a very simple algorithm: imagine an empty 20×20 grid. Then, mark some rows of it as “road” – making horizontal and vertical lines 1-3 squares apart. You end up with a grid of streets and city blocks, but it looks much too regular. To break up regularity a bit, let’s simply remove some roads. First, going horizontally along every row, we randomly remove some roads between neighbouring blocks, in effect merging them. Then repeat the same, only going vertically along every column. As a result, you get a nice irregular pattern of streets.

There are a couple of gotchas to watch out for: first, we shouldn’t merge too many blocks in a row – we want to have enough roads for driving. This is easily accomplished by tweaking the probability of merge (larger block means smaller probability) Second, and more important, is to check for disjoint roads. Sometimes, when we remove a piece of road, the whole grid would be split into two unconnected parts, and we really don’t want that. Fortunately, that is pretty straightforward to check.

Once we have the road pattern, next step is to place actual streets. Again, a pretty straightforward algorithm goes over every chunk and select a matching piece of street – a straight road, a corner or an intersection.

Next we have a “special chunks” step. The idea behind it is, what if we want our map to have, e.g. one or two fuel stations, no more, no less? We might have a list of these “special” objects with respective counts. After streets are placed, the next step is going through this list and finding random places to fit objects. Some of these may take up more than one chunk – for example, our fuel station is 2×2, with 2 of those overlapping the street. Every special object has some code attached, that basically answers one question: “given the map generated so far, can this object be placed in point X?” In the above example – fuel station, that piece of code checks for a length of straight road to “attach” the station to.

Finally, when we have placed all special objects, the rest is filled with the most dumb algorithm imaginable – every chunk that is still empty is simply assigned a random object (well, not totally random, but random from the “applicable for town map” list).

And this is basically it – we have a town location that is random, more-or-less irregular (the grid of chunks is still apparent, of course, but it’s not bad in practice), and contains the stuff we want it to – but in unpredictable places. I think it’s pretty good for a game map.

Development video

Recently, while working on Xenos, we had an idea. What if we made a video of the development process? A lot of our work is not really exciting to watch: me typing code in Visual Studio is probably boring as hell. But some processes might be actually interesting.

So, today we present the Xenos dev video – first of many, I hope. This time it’s the process of making a single “chunk” of game map, containing a simple house. The house is not completely finished – it has no floor for example, and it could use more details – but the basic process is there.

The video also showcases some procedural capabilities: see how the house is built with a set of props, and then re-generated automatically with props changing and wallpapers having different color. In the end, the location generation process is shown, where the new house is automatically integrated into our town map. (The map is generated very slowly – that’s a property of Unity editor. In-game it works faster, although far from lightning-fast)

Watch and enjoy