Advanced Data Structures

Advanced Data Structures.

Time for us to go into a bit more detail regarding data structures. We will cover Typedef here, memory setting and memory allocation and de-allocation. That should be enough for now.

Typedef.

This is quite a subject. In order to really understand what you are doing with typedef you really need to know what a type is. Types – at their most basic – are those things we defined as variable types back in the variables lesson. An int, or a float is a type. Typedef-ing allows us to re-define these types using new names as we see fit. For instance

typedef int blah;

will allow us to use the type blah instead of int. int = blah, blah = int. Get it? Not that powerful really, but there are some circumstances that allow us to use this feature.
The really powerful bit is the ability to group together more than one type into a structure, and use these as variable structures. As an example

Typdef struct
{
int            key;
char            name[50];
char            address1[50];
char            address2[50];
char            address3[50];
char            phone[20];
} employee_record_t;

This requires some explaining since there are some concepts to get across here. What we are doing here is creating a new variable type called employee_record_t. We can use this variable type in our functions just like we us int and char and float. However, when we create a variable using this structure, we get a whole bunch of variables that hang off the main one. To explain showing some code using the definition as above.

void assign_record(int key, char name[50], char add1[50], char add2[50], char add3[50], char phone[20], FILE* fp)
{
employee_record_t            emp_rec;
emp_rec.key = key;
// copy address lines of text and phone number into record
strcpy(emp_rec.name, name);
strcpy(emp_rec.address1, add1);
strcpy(emp_rec.address2, add2);
strcpy(emp_rec.address3, add3);
strcpy(emp_rec.phone, phone);
// write out address variable
fwrite(emp_rec, sizeof(emp_rec), 1, fp);
}

Ok –this is a bit complicated. Don’t worry what strcpy does – it just copies strings from one variable to another. Fwrite is also covered in a later lesson – it writes out data from memory to a file held on disc.
The point of what I’m trying to demonstrate here is that you can access each part of the emp_rec data structure – all those parts we defined up above, like key and name are all accessible. The idea of the data structure is to group together all the elements for this record into one place. The variable we are using is called emp_rec, but it contains a bunch of smaller variables that we need access to. We only define the variable emp_rec, but we get all the others along with it.  In order to access the data you use a period between the name of the variable you defined, and the data element you want to access. For example if I want to look at the name string, I would use emp_rec.name.
Now, why would all this be useful? Well, there are a few reasons, and we’ll go into some of them here.

Nesting

You can nest variable definitions. As an example, it’s like looking at the contents page of a reference book. You get chapter headings, then the contents of that chapter, then often lower divisions of these parts of the chapter. Imagine that each level of this is a different data structure definition.
Using our example of the employee record above, we might use that data within the data structure a department within a company – for example.

Typedef struct
{
char            dept_name[100];
int            num_employees;
employee_record_t            emp_recs[100];
}dept_rec_t;

Notice it’s quite possible to make an array using the data structure we defined.
Notice also that I use a naming convention for the data structure types. I always stick a ‘_t’ on the end to remind myself that this is a user built structure.
So how would we access say, the name of the 3rd employee in this department? Well, assuming we define the variable as

dept_rec_t            department;

we would get to it this way. department.emp_recs[2].name, remembering of course that 0 is actually a number as far as arrays go. 0 =first entry, 1 = second and 2 = third.

Congruency in memory

One of the nicest things about typedefing a structure is that they are all built congruent in memory. That means they are all next to each other in memory. Using the example above, if emp_rec was created at location 100, then the variable emp_rec.key would be found at location 100, emp_rec.name at 104 (since an integer is almost always 32bits, or 4 bytes – memory location counting almost always goes up in bytes – there are exceptions to this, and we’ll get to that in the pointer lesson), emp_rec.address1 would be at 154 – since key is 4 bytes, name is 50 bytes – you get the picture.
This is extremely useful when it comes to clearing the whole data structure in one go, copying it to other memory or file locations, or just examining it in the debugger.

Memory usage

When you declare a defined data structure like emp_rec, the system itself will go away and allocate the memory that is necessary to hold that data structure. In the same way that when you declare an int you don’t have to ask the C runtime system for 4 bytes of memory to store the data in, the C runtime system will automatically allocate memory space for this data structure.

Sizeof

There are occasions when you will need to know the size of the data structure you have created,  – we will see some of them in a moment. In order to get this size you can use the c function sizeof. ‘sizeof’ returns a integer which records how big the structure you gave it is, in bytes. If you look back at the practical lessons, you can see some examples of its use. To find out how big emp_rec is you would use this line.

Size = sizeof(emp_rec);

Size would equal 124, since we have 4+50+50+50+50+20 bytes. Using sizeof rather than using an absolute number means you can modify the structure of the employee_record_t data type and the code will automatically cope with this when you re-compile.

Memset

Memset at it’s most basic will set a specific range of memory to a pre-defined value. An example

memset (emp_rec, 0, sizeof(emp_rec));

The first value passed into the function is the pointer to memory where we want to start setting memory. The next value is the byte value we want to set memory to, and the last is the length of memory (in bytes) from the initial location we want to set. Since data structures are congruent in memory, this enables us to zero the whole data structure with one command. You can do this for any area of memory, and sometimes you will need to do this in other circumstances.
NOTE – this only works for bytes and integers. If you have floats in your data structure, zeroing the data structure with memset will NOT set floats to zero. This has to do with the way that floats are represented in a 32 bit location. I won’t get into why, as long as you remember this.

Memcpy

This is similar to memset, except, surprise surprise, it copies memory from one location to another. The syntax is

memcpy( from, to, size);

So if we had two employee_record_t variables defined, like emp_rec1 and emp_rec2, and we wanted to copy from emp_rec1 to emp_rec2, we would write

memcpy (emp_rec1, emp_rec2, sizeof(emp_rec1));

Memcpy does allow us to overwrite our own memory. See if you can figure out what would happen if we did this

memcpy(emp_rec_1, emp_rec1+10, sizeof(emp_rec1));

Unions

One last thing to cover is the idea of unions. Remember the bit on congruency in memory? Unions allow us to actually get variables to look at the same bit of memory. For instance

typedef struct
{
union
{
struct
{
unsigned char r,g,b,a;
};
}; unsigned c;
} paletteRGBA_t;

would create a variable type called paletteRGBA_t. When used, like this paletteRGBA_t color; we would access the members as usual, color.c, color.r, color,b etc as usual. However, what is different here is that while the member c is a int long (4 bytes), each of the r, g, b and a elements are just one byte long, and they share the same memory as the c element. If you look at element r, it’s the same as the top byte in the element c. This facility is nice when you sometimes need to manipulate data in a large fashion, or be able to just access the whole thing in one go as a direct variable. It’s also nice where memory is limited and you need to save it, re-using variable space. Basically it allows you to re-name variables with different types so every time you use them they have meaningful names. There is nothing to stop you swapping the name out several times in one process, although that kind of practice tends to be frowned upon, since it’s easy to get confused and start thinking that the variables are all different, and start expecting them to behave that way.

Ok – enough of that. We are starting to get into pointer territory now, so it’s off to the next lesson to learn what a point is, implications of pointers and all that good stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>