Come let us explore together…

Archive for October, 2010

Understanding Script Garbage Collection


In the course of test automation programming we declare many variables/array/objects and use them throughout the script (test) execution. So my questions are,

  1. Is there any way to keep track of what is declared/assigned and released (variables)?
  2. Is this depends on language/ platform?
  3. If so, who is responsible for it – Automation Tool?, Host Environment?, or need to handle with code itself.
  4. What is Garbage Collection (GC) and how it works?

It is completely true that scripting GC still remains an unrevealed area when compared to programming/development languages like Java and .NET languages. Any ‘Objections’ here?

The answer to the first question was, YES. There is a way to track all the variables, arrays and object or whatever you defined in the scripts.

This tracking may be done based on the few properties like,

  1. Scope of the variable
  2. Type of the variable
  3. Validity of the variable

For example, we don’t want a variable to exist outside a function which is declared as private to it (or) a variable/object which is explicitly given to GC for clearing. For me simple way is, checking the validity of it.  Either by its existence property or equating to a null or undefined (VBScript / Jscript)

Dependency

Yes. It depends on the language you are using. Few languages like C, C++ doesn’t have its own GC mechanism. We need to allocate and release it. Whereas Jscript and Vbscript has…  More down in the post…

Who is responsible?

Recently when I ask the same question to an AutomatedQA customer agent (I was using TestComplete automation tool and Jscript), the answer was “TestComplete doesn’t have any Garbage Collection mechanism and relies on Windows Script Host (WSH) in Windows environment.” [Do you know any Automation tool having its own garbage collection mechanism? please comment…]

Now how the WSH actually does it?

GC is done in WSH based on the language used… For example ‘JScript GC’ is different from ‘VBScript GC’.

JScript engine will call the GC based on the following parameters,

  • The number of variable allocations in the script
  • The number of literal values that are used in the script
  • The total size of the string values that are allocated in the script

When thresholds for these values are exceeded, garbage collection occurs.

JScript uses a non-generational mark-and-sweep garbage collector. It works like this:

  • Every variable which is “in scope” is called a “scavenger”. A scavenger may refer to a number, an object, a string, whatever. It maintains a list of scavengers — variables are moved on to the scav list when they come into scope and off the scav list when they go out of scope.
  • Every now and then the garbage collector runs. First it puts a “mark” on every object, variable, string, etc – all the memory tracked by the GC. (JScript uses the VARIANT data structure internally and there are plenty of extra unused bits in that structure, so we just set one of them.)
  • Second, it clears the mark on the scavengers and the transitive closure of scavenger references. So if a scavenger object references a nonscavenger object then we clear the bits on the nonscavenger, and on everything that it refers to.
  • At this point we know that all the memory still marked is allocated memory which cannot be reached by any path from any in-scope variable. All of those objects are instructed to tear themselves down, which destroys any circular references.

More on how VBScript has different GC,

VBScript, on the other hand, has a much simpler stack-based garbage collector. Scavengers are added to a stack when they come into scope, removed when they go out of scope, and any time an object is discarded it is immediately freed.

You might wonder why we didn’t put a mark-and-sweep GC into VBScript. There are two reasons. First, VBScript did not have classes until version 5, but JScript had objects from day one; VBScript did not need a complex GC because there was no way to get circular references in the first place! Second, VBScript is supposed to be like VB6 where possible, and VB6 does not have a mark-n-sweep collector either.

The VBScript approach pretty much has the opposite pros and cons. It is fast, simple and predictable, but circular references of VBScript objects are not broken until the engine itself is shut down.

In spite of all these actions we can call the GC (through code) to collect anytime using the Jscript method – CollectGarbage() (No equivalent in VBScript L) which will force the GC to run and collect the garbage objects.

(As got from Eric Lippert’s blog)

Here is the question on our coding practices —

We have a practice of tear down/Un-assigning a used variable to ‘Nothing’ (in VBScript) or ‘undefined’/’null’ (in Jscript). Does these statements really needed?

Can I say these are not needed when some one is taking responsibility of looking the GC operation and clearing when needed?

It may be one of the best practices to code it, but need a critical examination to use.

  1. Use when any expensive object is declared. Example:- File pointers, ADO instances, etc.,
  2. Use for globally declared objects. Since it will have valid existence throughout the execution, can be explicitly put down.
  3. Use when you suspect the declared object may lead to circular references and may not be cleared properly even though GC can handle it. Need to be careful here in the order of discarding the object.

I welcome your experiences and opinions on the topic which can reveal new cases.

References:

http://support.microsoft.com/kb/942840

http://blogs.msdn.com/b/ericlippert/archive/2004/04/28/when-are-you-required-to-set-objects-to-nothing.aspx

Hope there are 101 ways to automate…
Thanks & Regards,
Giri Prasad

 

Data truncation problem in Excel sheets


Hi All,

Problem

Recently I faced a problem like automation scripts (I used Jscript, but the problem is independent of any scripting language) not able to extract more than 255 characters from a Microsoft Excel column which holds strings as rows.

Basically I need to store strings in the particular column and get those inside scripts…

The Excel was able to hold the strings when we store it (> 255 characters (or) < 255 characters) but the problem occurs only while reading it through scripts.

Finding

It is the functionality of Excel to scan the first eight rows (or 16) in a column to determine its data type and then act up on the data type.

In my Excel sheet, the first 8 rows of desired column don’t have any string with more than 255 characters. Thus the column is expected to hold only 255 characters and data type string is set to that column. (Determined by Excel by scanning first 8 rows)

< 255 Characters – ‘String’ type

> 255 Characters – ‘Memo’ type

Whatever data stored in the rest of the rows (even with more than 255 characters) will be discarded and only the first 255 characters are fetched when reading out for scripts through any excel driver.

Similar conflict may happen with numeric and alpha-numeric data types when you store numeric data in first 8/16 rows of particular column and have alpha-numeric data in the below rows. Excel assumes this column will hold only numeric data and will truncate any alphabets found in the following rows.

To overcome it, we have workarounds [http://support.microsoft.com/kb/189897]. It suggest to tweak the registry values of Excel Application.

Due to the following reasons the workaround may not be feasible…

1. Security policies may not allow you to change registry entries (or) typo error mat cause application to crash

2. Having 0 in the ‘TypeGuessRows’ DWORD may affect performance because it scans all the rows to determine the data type

Simple solution will be placing data which will exactly reflect your intended data type in the first 8 rows of the column so that excel scans and gets the exact data type. (this worked for me)

Have you faced similar kind of problem before? What other workaround you suggest to this problem?

Any suggestion/corrections are welcome…

 

Thanks & Regards,
Giri Prasad