Friday, August 13, 2010

Secure programming languages...

I think many developers these days consider secure programming languages to only be VM based... such as Java, C#, and others. It's seem like a logical idea - but in reality a VM is simply a run-time compiler which runs native code like any other code.

The huge downside of a VM system is that all the security is *not* a language feature, but it comes from carefully crafted runtime machine code. The code running on the processor is no more secure than C code, it's just limited. The only actual advantage is run-time permission checks, but it could be argued that such checking is more apt to be a promise, rather than a guarantee.

On the other hand, native C/C++ compilers have issues with "bare buffers" and stack overflows. Such problems are simple to overcome with "good programming practices" - but rarely do programmers take the time to implement these practices. And likely, they were taught that such practices are the specialty of others.

I've been developing security software professionally for a decade now, and I've come to realize that the only way to do it right, is to simply always do it. Always check your bounds, always check your inputs, always validate pointer references.

Plato had type-safety built into every part of the system. Size and length of buffers are *always* passed on the stack. Iterators use a special type of size which is bounds-checked. String are immutable.

This will require a new type of calling convention - I'm calling it a safecall for the time being. Functions on the stack will look like this: (Called with a heap buffer reference)

_printf:
; myBuf buffer
.size 4 myBuf.type ; Type lookup
.size 4 myBuf.len ; length
.size 4 myBuf.size ; size
.size 4 myBuf ; pointer to buffer

This gives us a lot of flexibility. For instance, we can now run-time check boundaries, or check the validity of type casts during runtime. Of course, Guaranteed safe functions can still be called with normal C calling conventions if the compiler can evaluate the safety.

For instance a function like so:

// table is a buffer of 256 characters
void printCharTable(buffer table)
{
for( int i=0; i %lt; table.length; i++)
printf("%c", i);
}

This can always be evaluated as safe, and therefor a normal calling convention like so will work:

_printCharTable:
; myBuf buffer
.size 4 myBuf.len ; length
.size 4 myBuf ; pointer to buffer

So there you have it.

Project Status:
I've got a tokenizer completed, I'm working on the type management. Then after that, I'll have an Abstract Syntax Tree to work on!