Home
Foreword
Preface
Class Idioms
Collections
Implements
Constructors
Terminate
Forms
On Error
Frameworks
F.A.Q.
Value-added
FSMs
Constants
GOTO
Hungarian
Nothing
Properties
Big EXEs

The question of executable size is one that crops up with disturbing frequency among VB programmers of all experience levels. The thrust, usually, is 'my EXE is too big so I need to find ways of breaking it up', and 'breaking it up' is typically phrased in terms of ActiveX components. This question, more than any other, demonstrates the curious mixture of sophistication and naivety that is the hallmark of Visual Basic developers. The documentation doesn't help, as the following extract from the VB6 Programmer's Guide illustrates:

    "Visual Basic loads modules on demand — that is, it loads a module into memory only when your code calls one of the procedures in that module. If you never call a procedure in a particular module, Visual Basic never loads that module. Placing related procedures in the same module causes Visual Basic to load modules only as needed."

Like much of the VB documentation when it comes down to details, this is tantalizingly vague, yet it sounds sufficiently precise as to be often quoted as gospel with 'module' taken to mean the contents of a BAS file. There is a germ of truth in this, as you'll see when we discuss paging, but it's not as simple as it sounds. Here's another extract, from a section that talks about reasons for splitting an application into ActiveX components:

    "The components are loaded on demand and can be unloaded when no longer needed"

This may be true, but if we read it alongside the previous extract we see something of a contradiction – if code modules are loaded on demand, why do we need to bother splitting our code into components? This vagueness hints that the kinds of issues affecting our application's architecture are actually much more subtle than we're being told.

It's important to understand that we're not talking about keeping the overall code size to a minimum; the wisdom of this is self-evident – it doesn't matter how much memory we have or how it's managed, if we have smaller code it's going to make better use of a finite resource. What we are talking about is a perception of Windows that suggests a modular program composed of several smaller components will use memory more efficiently than an equivalent monolithic executable. For the purposes of discussion we will assume that such components are ActiveX DLLs, that the only reason for using DLLs is to split a monolithic EXE into smaller chunks, and that the split simply takes the EXE's code and redistributes it in smaller packages. There are, of course, many subtle differences between the two architectures, but the issue we're addressing here is anything but subtle.

First, a bit of background. Win32 sees everything in terms of Virtual Memory, which is the sum of physical memory and a proportion of our disk space (usually in the swap file). Each Win32 process is allocated a private 4Gb address space, and bits of virtual memory are mapped into various portions of that address space. At any time most of the address space is unmapped, but we can think of certain areas as having chunks of virtual memory allocated to them - these are the chunks of code and data associated with a task that is running in the process.

The distinction between physical and virtual memory is pretty well hidden, and as far as the task is concerned, there is no difference. The Win32 virtual memory manager takes care of loading data from the swap file to physical memory as it's required -- this is called 'paging'. When a task tries to read or write a piece of virtual memory that isn't mapped to physical memory, Win32 generates a 'page fault'; this causes the virtual memory manager to first commit a piece of physical memory to that portion of the address space, and then to load the required page into it from disk. A 'page' is a 4K chunk of code or data. The virtual memory manager also takes care of reclaiming physical memory when required by writing modified pages back to the swap file or discarding unmodified pages.

A further wrinkle is that Win32 treats EXEs as memory-mapped files. This means that when we start a program the EXE file itself is mapped into the process address space, and page faults cause pages to be loaded directly from the EXE file instead of the swap file. The upshot of all this is that whenever a bit of VB code is called, one or more page faults may be generated, which in turn may cause one or more 4K pages to be loaded from the EXE file. And that's really the point –- the pages are only loaded as they are required, and since code from only one page can ever be executing at a time the actual physical memory requirement is, in principle, very small. In practice, of course, a large code to memory ratio can result in excessive paging, particularly if we're running many applications simultaneously.

We can watch paging taking place using the System Monitor program that ships with Windows. It's instructive to monitor page faults, page-in and page-out statistics while manipulating a VB application, but watching the 'unused physical memory' stats isn't very useful because Windows is keen to keep all physical memory in use. The paging behaviour outlined above will be immediately apparent and requires little further comment (although in practice the paging algorithms are rather complicated). To illustrate, we'll have a quick look at paging in a program written to disprove another popular myth – that creating multiple instances of a class loads multiple copies of the code. I have heard this used as an argument to favour BAS modules over Classes, so it's another very damaging misconception. This program creates instances of two different classes, and it's pretty short:

 

Example Code:
Successive Class Instances

Form1

Option Explicit

Private Arr1() As Class1
Private Arr2() As Class2

Private Sub Command1_Click()
    ReDim Preserve Arr1(0 To UBound(Arr1) + 1)
    Set Arr1(UBound(Arr1)) = New Class1
    Command1.Caption = "Class1 (" & CStr(UBound(Arr1)) & ")"
End Sub

Private Sub Command2_Click()
    ReDim Preserve Arr2(0 To UBound(Arr2) + 1)
    Set Arr2(UBound(Arr2)) = New Class2
    Command2.Caption = "Class2 (" & CStr(UBound(Arr2)) & ")"
End Sub

Private Sub Form_Load()
    ReDim Preserve Arr1(0 To 0) As Class1
    ReDim Preserve Arr2(0 To 0) As Class2
End Sub

Class1

Option Explicit

Private x As Integer

Private Sub Class_Initialize()
    x = 0
End Sub

Class2

Option Explicit

Private Sub Stuff()

    Dim i As Integer

    For i = 1 To 10
        Debug.Print CStr(i)
    Next i
    For i = 1 To 10
        Debug.Print CStr(i)
    Next i

    ` This function has to compile to more than
    ` 4K bytes. You can just put many repetitions
    ` of the same thing here if you compile without
    ` optimization.

End Sub

Private Sub Class_Initialize()
    Stuff
End Sub
 

There are a couple of things to be aware of if you want to try this for yourself. Notice that Class1 is trivial but Class2 contains lots of code. If both classes are small, their code ends up on the same page so they get loaded together and the test doesn't work. You also need to load Class1 (the small class) first, since the two classes will almost certainly share the page that contains Class1's code. You should compile the program, since in the VB IDE the process persists between runs and hence the paging behaviour isn't realistic. Compiling to native code and disabling optimization lets you just repeat the same code over and over to ensure that Class2 overflows onto more than one page.

You also need to appreciate that bits of VB runtime are going to be paged in too, so not all the pages you see load are necessarily from your VB code. Finally you need to make sure there's not much else going on on the machine, as lots of background programs can causes spurious paging that makes it difficult to interpret the results.

All this code does is let us create successive instances of Class1 and Class2, which are trivial classes that don't really do anything useful. We make sure each class has some code in the Class_Initialize event, but it doesn't have to do anything useful. The point of this is that we want to see if creating multiple instances of a class causes multiple copies of the same code to be loaded. The class instances are created by clicking the two buttons, and references to these instances are saved in a persistent array so that the objects aren't destroyed and subsequently garbage-collected.

By setting things up carefully and then observing page-in events with System Monitor, we should be able to see pages being loaded when we create the first instance of a class and execute some of its code. What we should NOT see is the same (or any) paging-in when we create and activate subsequent instances of the same class. We don't really need two classes for this, but it's more convincing if we can demonstrate nothing happening as we repeatedly click button1, and then some definite paging activity when we click button2. (Incidentally, adding a button3 with no code behind it pages in some VB runtime code, which presumably has something to do with clicking a button. This reduces the amount of paging activity generated by the first click on one of our live 'buttons', but I've left it out here for simplicity.) Here's the Sysmon log:

sysmon

These days programming Windows applications demands a lot less technical savvy than it used to, at least if we're using a high-level programming system such as Visual Basic. This is a double-edged sword, because while it means that just about anyone can tinker together a working program it makes no guarantees about quality. Visual Basic has added an extra layer of software to the Windows platform; where traditional Windows programmers interacted directly with the operating system, VB programmers are in a very real sense targeting a virtual machine that we might call WinVB. This remoteness from the operating system is at the heart of the problem we´ve been looking at here.
 

Key Spinner

© 1998 - 2009 Mark Hurst. All rights reserved.   Updated March 01, 2009