|
The question of executable size is one that crops up with disturbing frequency among VB programmers of all experience levels. The thrust, usually, is 'my EXE is too big so I need to find ways
of breaking it up', and 'breaking it up' is typically phrased in terms of ActiveX components. This question, more than any other, demonstrates the curious mixture of sophistication and naivety that is the hallmark of
Visual Basic developers. The documentation doesn't help, as the following extract from the VB6 Programmer's Guide illustrates:
"Visual Basic loads modules on demand — that is, it loads a module into memory only when your code calls one of the procedures in that module. If you never call a procedure in a particular module,
Visual Basic never loads that module. Placing related procedures in the same module causes Visual Basic to load modules only as needed."
Like much of the VB documentation when it comes down to details, this is tantalizingly vague, yet it sounds sufficiently precise as to be often quoted as gospel with 'module' taken to mean the contents
of a BAS file. There is a germ of truth in this, as you'll see when we discuss paging, but it's not as simple as it sounds. Here's another extract, from a section that talks about reasons for splitting an application
into ActiveX components:
This may be true, but if we read it alongside the previous extract we see something of a contradiction – if code modules are loaded on demand, why do we need to bother splitting our code into components? This
vagueness hints that the kinds of issues affecting our application's architecture are actually much more subtle than we're being told. It's important to understand that we're not talking about keeping the overall
code size to a minimum; the wisdom of this is self-evident – it doesn't matter how much memory we have or how it's managed, if we have smaller code it's going to make better use of a finite resource. What we are
talking about is a perception of Windows that suggests a modular program composed of several smaller components will use memory more efficiently than an equivalent monolithic executable. For the purposes of discussion we will assume that such components are ActiveX DLLs, that the only reason for using DLLs is to split a monolithic EXE into smaller chunks, and that the split simply takes the EXE's code and redistributes it in smaller packages. There are, of course, many subtle differences between the two architectures, but the issue we're addressing here is anything but subtle.
First, a bit of background. Win32 sees everything in terms of Virtual Memory, which is the sum of physical memory and a proportion of our disk space (usually in the swap file). Each Win32 process is allocated a
private 4Gb address space, and bits of virtual memory are mapped into various portions of that address space. At any time most of the address space is unmapped, but we can think of certain areas as having chunks of
virtual memory allocated to them - these are the chunks of code and data associated with a task that is running in the process. The distinction between physical and virtual memory is pretty well hidden, and as far as
the task is concerned, there is no difference. The Win32 virtual memory manager takes care of loading data from the swap file to physical memory as it's required -- this is called 'paging'. When a task tries to read or
write a piece of virtual memory that isn't mapped to physical memory, Win32 generates a 'page fault'; this causes the virtual memory manager to first commit a piece of physical memory to that portion of the address
space, and then to load the required page into it from disk. A 'page' is a 4K chunk of code or data. The virtual memory manager also takes care of reclaiming physical memory when required by writing modified pages back
to the swap file or discarding unmodified pages. A further wrinkle is that Win32 treats EXEs as memory-mapped files. This means that when we start a program the EXE file itself is mapped into the process address
space, and page faults cause pages to be loaded directly from the EXE file instead of the swap file. The upshot of all this is that whenever a bit of VB code is called, one or more page faults may be generated, which in
turn may cause one or more 4K pages to be loaded from the EXE file. And that's really the point –- the pages are only loaded as they are required, and since code from only one page can ever be executing at a time the
actual physical memory requirement is, in principle, very small. In practice, of course, a large code to memory ratio can result in excessive paging, particularly if we're running many applications simultaneously. We
can watch paging taking place using the System Monitor program that ships with Windows. It's instructive to monitor page faults, page-in and page-out statistics while manipulating a VB application, but watching the
'unused physical memory' stats isn't very useful because Windows is keen to keep all physical memory in use. The paging behaviour outlined above will be immediately apparent and requires little further comment (although
in practice the paging algorithms are rather complicated). To illustrate, we'll have a quick look at paging in a program written to disprove another popular myth – that creating multiple instances of a class loads
multiple copies of the code. I have heard this used as an argument to favour BAS modules over Classes, so it's another very damaging misconception. This program creates instances of two different classes, and it's
pretty short:
Example Code: Successive Class InstancesForm1 Option Explicit
Private Arr1() As Class1
Private Arr2() As Class2
Private Sub Command1_Click() ReDim Preserve Arr1(0 To UBound(Arr1) + 1) Set Arr1(UBound(Arr1)) = New Class1
Command1.Caption = "Class1 (" & CStr(UBound(Arr1)) & ")" End Sub
Private Sub Command2_Click()
ReDim Preserve Arr2(0 To UBound(Arr2) + 1) Set Arr2(UBound(Arr2)) = New Class2
Command2.Caption = "Class2 (" & CStr(UBound(Arr2)) & ")" End Sub
Private Sub Form_Load() ReDim Preserve Arr1(0 To 0) As Class1
ReDim Preserve Arr2(0 To 0) As Class2 End Sub Class1 Option Explicit
Private x As Integer
Private Sub Class_Initialize() x = 0 End Sub Class2 Option Explicit
Private Sub Stuff()
Dim i As Integer
For i = 1 To 10 Debug.Print CStr(i) Next i
For i = 1 To 10 Debug.Print CStr(i) Next i
` This function has to compile to more than
` 4K bytes. You can just put many repetitions ` of the same thing here if you compile without ` optimization.
End Sub
Private Sub Class_Initialize() Stuff End Sub There are a couple of things to be aware of if you want to try this for yourself. Notice that Class1 is trivial but Class2
contains lots of code. If both classes are small, their code ends up on the same page so they get loaded together and the test doesn't work. You also need to load Class1 (the small class) first, since the
two classes will almost certainly share the page that contains Class1's code. You should compile the program, since in the VB IDE the process persists between runs and hence the paging behaviour isn't
realistic. Compiling to native code and disabling optimization lets you just repeat the same code over and over to ensure that Class2 overflows onto more than one page. You also need to appreciate that
bits of VB runtime are going to be paged in too, so not all the pages you see load are necessarily from your VB code. Finally you need to make sure there's not much else going on on the machine, as lots of
background programs can causes spurious paging that makes it difficult to interpret the results. |
All this code does is let us create successive instances of Class1 and Class2, which are trivial classes that don't really do anything useful. We make sure each class has some code in the
Class_Initialize event, but it doesn't have to do anything useful. The point of this is that we want to see if creating multiple instances of a class causes multiple copies of the same
code to be loaded. The class instances are created by clicking the two buttons, and references to these instances are saved in a persistent array so that the objects aren't destroyed and subsequently garbage-collected.
By setting things up carefully and then observing page-in events with System Monitor, we should be able to see pages being loaded when we create the first instance of a class and execute some of
its code. What we should NOT see is the same (or any) paging-in when we create and activate subsequent instances of the same class. We don't really need two classes for this, but it's more
convincing if we can demonstrate nothing happening as we repeatedly click button1, and then some definite paging activity when we click button2. (Incidentally, adding a button3 with no
code behind it pages in some VB runtime code, which presumably has something to do with clicking a button. This reduces the amount of paging activity generated by the first click on one of
our live 'buttons', but I've left it out here for simplicity.) Here's the Sysmon log: |
|