Quaternions – a primer

What ARE quaternions? I mean, you hear them talked about – what actually are they, and what are they used for?

Here’s a brief description and some usage cases. Hopefully this will help for those struggling to understand what they are and how they are used.

Now, a quaternion, at root, is a direction (or, in math parlance, a vector – which is the same thing) and a rotation around that direction. What does that mean?

 

What is a vector?

It means this. Point your finger at something. Anything. Just point your finger. The direction your finger is pointed at is a vector. It’s a direction. In 3 dimensional terms, it’s a three value combination of up/down, left/right, and forward/back.

Imagine it this way. Imagine your hand is at position 0,0,0. It’s the ‘origin’. Now point your finger. The direction your finger is pointing at is expressed, in math, as a Y position (ie up/down), between -1 and 1. -1 would mean you are pointing exactly down. A 1 value means you are pointing directly up.

It’s the same for X (left and right) – it’s a value between -1 and 1, where -1 means you are going total left, and 1 which means you are going total right. As an aside here, there are some coordinate systems that allow for Left Handed Rotation. This means that what is considered ‘left’ is in actual fact a positive 1, not a negative 1. This means that the values for X are, effectively, flipped. It’s just one of those things that happens on occasion, no one really knows why or can explain it. It’s just one of those things you need to know up front, if you are using a left handed threaded coordinate system or not. Most systems, however, are right handed.

And similarly, for Z (depth or forward and backward). A -1 here means it’s coming out of the screen, towards you, wherever the camera is. A 1 here means it’s heading exactly away from you.

Note – it’s worth pointing out that it’s entirely possible for a vector to have a zero in one or two of the positions of the axis. So a vector of (1, 0, 0) is totally legal, and is saying, the direction is left. There is no up / down, and no back / forward values at all, we are just moving left.

A note on Normalizing.

Now, you’ll note that most of the examples I’ve given here are between -1 and 1 for each axis. Why is that? A vector coordinate does not actually need to be between -1 and 1, infact, it can be any set of numbers in there? A legal vector is x=100, y=200, z = 50. So why the insistence on keeping numbers between -1 and 1?

This is defined as normalizing. Well, it’s a little more complicated than that. A Normalized Vector is a vector where, if you add up each of the X, Y and Z values, they equal either 1 or -1. Or, to put it another way, if you define a vector direction as a Normalized Value, if you drew a line from 0,0,0 to the vector position, the length of the line would be exactly 1.

Now, interestingly, all vectors (IE all possible values of a vector, even those above 1) can be ‘normalized’. For example, a vector of 0.1, 0.8, 0.1 is the exact same as 100, 800, 100. One is just a ‘smaller’ version of the other. Or, to put it another way, normalized. To normalize a vector, it’s pretty simple. Take the length of the line from 0,0,0 to the vector position (and you can use Pythagoras’ theorem for that – length = sqrt of ((x * x) + (y * y) + (z * z))) and then divide each of the axis values by that value.

Now why bother? Why do we do this? Well, it’s because certain other things you can do with vectors – like dot product and cross product (math functions that I’m not going to go into now, but that are really useful down the line for trigonometry functions) – that rely on the vector being normalized to get useful results out of them. Normalized values are useful since all they represent in those conditions is a direction – not a position. They are, by definition, an offset from 0,0,0.  You can then multiply a normalized vector to get a line of any length, which is really useful for projecting out a position into the world.

Back to Quaternions.

So ok, we’ve defined what the vector, or direction part of a quaternion is – it’s a 3d point from 0,0,0, which is normalized – IE it’s a vector length of 1. What’s the rotation value and what does it mean? Let’s go back to the finger pointing exercise. Point your finger in a direction. Any direction. Now rotate your wrist. What happens is that your finger still points in the direction, but your wrist is rotated. So the rotation value of the quaternion is describing the rotation around the axis the vector is pointing at.

Why would we even care? The direction is still the direction – that hasn’t changed. Well, it has, actually, but in root ways of how it’s expressed. It still pointing in the same direction, but how that’s described at a math level has changed. Try this. point your finger in a direction, and now extend your middle finger left, and your thumb up. These represent the X and Y directions of the vector. The middle finger is pointing along the X axis, and the thumb represents the Y axis. Now rotate your finger again. You’ll notice that your thumb and middle finger are now pointing in different directions. This means that the rotation has done some weird things to what the quaternion now thinks is ‘up/down’ and ‘left/right’. If you rotate your wrist enough, X becomes Y, because now it’s pointing up, and Y becomes X, because now it’s pointing left/right. You can see how the rotation messes things up.

This becomes important when you start putting quaternions on top of quaternions. Because you’ve now altered the origin of rotation and vector direction, from the parent. I’ll go into that more in the next bit.

What can I use Quaternions for?

The most common use of quaternions is in animation systems. A quaternion can represent a bone position in a hierarchical model of bones in a skeleton (by hierarchical, I mean that bones have a parent / child relationship. A wrist bone is parented by the forearm, and that in turn  is parented by the upper arm / shoulder bone. You end up with a tree of bones, each having children and parents. The reason for this is that bone positions / orientations are additive. This means that each bone inherits both rotation and raw position from it’s parent. If you rotate a shoulder, then all the bones underneath it move along with it, because they are attached to their parents. Then each bone underneath rotates itself, and adds that to the rotation the parent already has.).

So for each bone, you don’t have a position – since you don’t need that. Your original skeleton definition already has position offsets for each bone from it’s parent, and those offsets don’t change, frame to frame. The length of bones doesn’t change – it’s a set thing for all animations.  What you do have, per bone, is a quaternion, which describes the angle and rotation of each limb, relative to it’s parent, based on where the end of the parent bone ends up being in the world. So, to put it in more real world terms – I know that my upper arm is of length 10 units, because that’s in the root skeleton definition. I also know that the default direction of the shoulder bone is straight out in X (this is because root skeletons define the arms as being flat out, stretched out to the side. This is known as the T Pose. No one knows why this is the default definition of a skeleton, but it is.) Now, when I have a quaternion, the rotation in the bone is an offset from the raw skeleton position. So in order for the bone to point down, as it would do for a ‘normal’ stance for the skeleton, we would need to rotate the bone down by 90 degrees. So the resulting quaternion would look like this – with a vector of (0, -1, 0) and a rotation of 0, since we aren’t rotating the bone at all – just giving it a new ‘direction’. This is saying “Point the bone down”.

What then happens is that, knowing the bone length is 10, you’d take the parent model position of the shoulder (which, again, we’d know from the root skeleton definition), get the quaternion vector position, scale that by 10, and then add that to the shoulder position, and that gives us the new position of where the upper arm ends – ie where the forearm begins.

This is better than storing a real transformation matrix per bone (which can do the same thing, but in different ways) for two reasons. One is that it’s smaller. A real matrix is 4×4 floats -  12 floats in total. Per bone. If you have fifty bones in a model (and that’s a conservative estimate for a biped, for example, once you start including fingers), that’s 50×12 floats (or 50*12*4 = 2400 bytes) per frame of animation. A quaternion is only four floats per bone, so the math means that a single frame of animation is considerably smaller – 50*4 floats (or 50*4*4 = 800 bytes per frame). That’s a saving of 1600 bytes per frame, which if you have thousands of frames (and most modern games do), is a significant saving.

But the other reason is even more significant. Matrices cannot be interpolates, and quaternions can.

What does that actually mean though? I mean, it sounds good, but it’s really gobbledygook, isn’t it? Lets go through it.

In our animation example, imagine we have three frames of animation. The animation frequency is 12hz. That means we have a different frame of animation every 5 frames, assuming we are running at 60 fps. So for 5 frames, we display frame 1, then on the 6th frame, we start displaying frame 2, and on the 11th frame, we display frame 3 etc.

But that’s not how animation systems actually work. What they do is actually interpolate between frames, based on how close you actually are to each frame.

So in our example, we have five frames of display, but not enough animation frames for each frame of display. So what animation systems do is take a percentage of frame 1 and frame 2, dependent on how close the rendered frame is to either, and then add those together. That’s not really any clearer, is it?

Ok, so in our example, we are rendering 5 frames, using frame 1 of the animation. But what we actually do is for frame 1, we are rendering 100 percent of the first frame of the animation. For frame 2, because we are moving on in time, towards frame 2, we take 4/5 (or 80%) of what frame 1 represents, and then 1/5 (or 20%) of what is in frame 2, and add them together, to generate a merged frame from frame 1 to frame 2. We are, in effect, generating a new frame of animation from two others. This is called Linear Interpolating, or in game dev parlance, Lerping.

The actual effect is basically saying “Take the rotation and the vector of frame 1, scale the vector and rotation by 0.8, then do the same for frame 2, only scale it by 0.2, then add those two together, re-normalize the vector, and that’s your interpolated frame”.

Then, for frame 3, the amounts you scale by change, so now it’s 0.6 for frame 1 and 0.4 for frame 2, because we are now getting further away from frame 1 and closer to frame 2. And so on.

The thing is, you can do this interpolation for quaternions. You cannot do this for matrices, because you end up potentially flattening the matrix (you don’t need to know what this means, just that it’s bad) and so this is one important way that quaternions score over ‘real’ matrices.

Drawbacks of using Quaternions

1) They don’t have position built in. They are purely a direction and rotation. Root position needs to be held elsewhere. A ‘real’ matrix has position built in (which is one reason why it’s larger). But this is by design, since Quaternions are designed to be used in hierarchical situations, where the result of the parent would dictate where it’s starting position in the world actually is.

2) They don’t have scale built in. A ‘real’ matrix has scale built in, for each axis (so you can scale a model by each axis individually. So you can say “I want this model to be fatter on the X Axis, but not on the Y or Z axis”, and a matrix can handle that -  a quaternion cannot. Incidentally, you may ask why you’d want to do that. Well, it’s a way of being able to scale a rendered model to your view port aspect ratio. Models are built assuming that the window they are being displayed on is 1:1, so it’s a square. The moment that is no longer true, you need to cope with that in code. One way to do it is to affect the X and Y scale values of a matrix, to ‘stretch’ out a model, so it fits in the display correctly. Most games do NOT do this, and that’s ok too. But a matrix approach enables you to do this.

Now, there ARE things that a quaternion can do to represent scale. We talked about the quaternion vector component being normalized – ie a vector length of 1. What if that is not true? What if we have a vector that is not a length of 1? What if it’s 2? Or 20? Well, the practical effect is that this is a way of storing scale. The scale itself is whatever the difference is of vector length from 1. So if the vector length of the vector stored in the quaternion is 10, then the scale is 10. Now, this is different from how a matrix stores scale, because a matrix stores scale per axis – ie it has different scale values for x, y and z. A quaternion scale affects x, y and z at the same time. It’s a scale of the length of the vector along the vector. In our animation example, it would make a limb longer, not just fatter along one axis.
Ok, so that’s a basic primer of what a quaternion is, what it can do, and what some of the advantages and disadvantages are. Hope that helps.

This entry was posted in Animation, Game Development. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>