Finally have the lexer to tree functionality going... yes, I took a year off. Looking very nice!
This is a single-pass dictionary-based lexing system. The lexer creates a stack of matches and then commits the stack into the tree as child nodes. Recursive parsing yanks out code blocks and adds them to the parent's node. Right now we're only scanning import, class and method blocks.
I've also expanded the dictionary system to be able to use sub-dictionaries. This will make expression parsing faster since it doesn't need to look for parse rules which don't apply.
Anyway, sample scan below:
Scan complete: /Users/rdavenpo/git/plato/plato_parser/buffer.plato
0-Node 0:SCAN_IMPORT [import]
1-Node 0:SCAN_TERM [system.libc]
0-Node 0:SCAN_IMPORT [import]
1-Node 0:SCAN_TERM [std.Math]
0-Node 0:SCAN_METHOD [(null)]
1-Node 0:SCAN_METHOD [static]
2-Node 0:SCAN_METHOD [int]
3-Node 0:SCAN_METHOD [_cdecl]
4-Node 0:SCAN_METHOD [main]
5-Node 0:SCAN_NONE [{
printf("Hello, World!\n");
}]
0-Node 0:SCAN_CLASS [class]
1-Node 0:SCAN_STORAGE [Buffer]
2-Node 0:SCAN_NONE [(null)]
1-Node 0:SCAN_METHOD [(null)]
2-Node 0:SCAN_METHOD [Buffer]
3-Node 0:SCAN_NONE [{
buffer = malloc(size);
if(buffer == null){
Err.Warning("Out of memory");
}
printf("// Hello my friend!\n");
length=0;
bufsize = size;
return;
}]
1-Node 0:SCAN_METHOD [(null)]
2-Node 0:SCAN_METHOD [public]
3-Node 0:SCAN_METHOD [Set]
4-Node 0:SCAN_NONE [{
memset(buffer, set, bufsize);
return;
}]
1-Node 0:SCAN_METHOD [(null)]
2-Node 0:SCAN_METHOD [public]
3-Node 0:SCAN_METHOD [Zero]
4-Node 0:SCAN_NONE [{
Set(0);
}]
1-Node 0:SCAN_METHOD [(null)]
2-Node 0:SCAN_METHOD [public]
3-Node 0:SCAN_METHOD [Buffer]
4-Node 0:SCAN_METHOD [Copy]
5-Node 0:SCAN_NONE [{
Buffer newBuf = new Buffer(bufsize);
this.CopyTo(newBuf, 0, newBuf.bufsize);
return newBuf;
}]
1-Node 0:SCAN_METHOD [(null)]
2-Node 0:SCAN_METHOD [public]
3-Node 0:SCAN_METHOD [CopyTo]
4-Node 0:SCAN_NONE [{
memcpy(target.buffer, buffer+targetStart, targetLength);
}]
0-Node 0:SCAN_METHOD [(null)]
1-Node 0:SCAN_METHOD [int]
2-Node 0:SCAN_METHOD [end]
3-Node 0:SCAN_NONE [{
printf("Goodbyte, World!\n");
}]
Wednesday, August 31, 2011
Friday, August 13, 2010
Secure programming languages...
I think many developers these days consider secure programming languages to only be VM based... such as Java, C#, and others. It's seem like a logical idea - but in reality a VM is simply a run-time compiler which runs native code like any other code.
The huge downside of a VM system is that all the security is *not* a language feature, but it comes from carefully crafted runtime machine code. The code running on the processor is no more secure than C code, it's just limited. The only actual advantage is run-time permission checks, but it could be argued that such checking is more apt to be a promise, rather than a guarantee.
On the other hand, native C/C++ compilers have issues with "bare buffers" and stack overflows. Such problems are simple to overcome with "good programming practices" - but rarely do programmers take the time to implement these practices. And likely, they were taught that such practices are the specialty of others.
I've been developing security software professionally for a decade now, and I've come to realize that the only way to do it right, is to simply always do it. Always check your bounds, always check your inputs, always validate pointer references.
Plato had type-safety built into every part of the system. Size and length of buffers are *always* passed on the stack. Iterators use a special type of size which is bounds-checked. String are immutable.
This will require a new type of calling convention - I'm calling it a safecall for the time being. Functions on the stack will look like this: (Called with a heap buffer reference)
_printf:
; myBuf buffer
.size 4 myBuf.type ; Type lookup
.size 4 myBuf.len ; length
.size 4 myBuf.size ; size
.size 4 myBuf ; pointer to buffer
This gives us a lot of flexibility. For instance, we can now run-time check boundaries, or check the validity of type casts during runtime. Of course, Guaranteed safe functions can still be called with normal C calling conventions if the compiler can evaluate the safety.
For instance a function like so:
// table is a buffer of 256 characters
void printCharTable(buffer table)
{
for( int i=0; i %lt; table.length; i++)
printf("%c", i);
}
This can always be evaluated as safe, and therefor a normal calling convention like so will work:
_printCharTable:
; myBuf buffer
.size 4 myBuf.len ; length
.size 4 myBuf ; pointer to buffer
So there you have it.
Project Status:
I've got a tokenizer completed, I'm working on the type management. Then after that, I'll have an Abstract Syntax Tree to work on!
Friday, July 30, 2010
Some sample code... still evolving.
This bit of useful code will eventually be for buffer management, and so it uses some C library import code.
I think the interesting part of this is the separation of the method functional code, from the safety and unit test code; this comes after the declaration and before the method body code.
// Plato buffer class
// (C) 2009 Adamantine Software, Roger Davenport
/* Module: Buffer
*
* Description: The buffer class handles native buffers
*/
import system.libc;
import std.Math /* comment in the middle */;
static int _cdecl main()
{
printf("Hello, World!\n");
}
// This is the Buffer class
class Buffer {
pointer buffer;
public size_t length;
protected size_t bufsize;
Buffer(size_t size)
Buffer.desc: "A buffer of memory";
Buffer.castTo: *;
size.desc: "Size of the buffer";
size.limit: (size <= 0) = Err.Error("Buffer size must be greater than zero");
{
buffer = malloc(size);
if(buffer == null){
Err.Warning("Out of memory");
}
printf("// Hello my friend!\n");
length=0;
bufsize = size;
return;
}
public pointer Set(char set)
Set.desc: "Sets the buffer contents to a character or byte value";
{
memset(buffer, set, bufsize);
return;
}
public Zero()
Zero.desc: "Zeroes out a buffer";
{
Set(0);
}
public Buffer Copy() {
Buffer newBuf = new Buffer(bufsize);
this.CopyTo(newBuf, 0, newBuf.bufsize);
}
public void CopyTo(Buffer target, size_t targetStart, size_t targetLength)
CopyTo.desc: "Copies one buffer to another";
target (
desc: "Buffer in which the contents are copied to";
limit: null = Err.Error("Target can't be null");
);
targetStart (
desc: "Start position in the buffer";
limit: (targetStart <>
limit: (targetStart > bufsize) = Err.Error("targetStart is out of bounds");
);
targetLength (
desc: "Length to copy";
limit: (targetLength == 0) = return;
) ;
CopyTo.limit: (targetStart + targetLength > bufsize) = Err.Error("targeStart+targetLength would copy out of bounds");
{
memcpy(target.buffer, buffer+targetStart, targetLength);
}
}
Sunday, March 1, 2009
First pass at the Plato Language...
What is it? Plato is a computer language, like perl, C, and Java. It's a strongly typed language, but is flexible enough to handle low-level & high level software without compromising either.
Some interesting ideas I have designed into the language are:
- Everything is defaulted to a secure type.
- Unlike Java (which ignores pointers), I have pointers as a full-blown type. Unlike C, you can't point into arbitrary memory. Pointers must point to an existing type and are deterministic.
- Multi-thread safe. All lists and structures are "lockless locking" (if possible). All other accesses are defined through a transaction, which signals the compiler to modify a sequence of variables safely.
- Garbage collection memory management
- The base types are:
- Numbers: signed/unsigned int, char, short, float, double, boolean
- Buffers: string, buffer, memmap
- Storage: class, structure, enumerator, array(as a type modifier)
- Lists: vector, list, stack, hash map
- Threads: thread, rwlock, mutex, transaction
That's about it for now.
Subscribe to:
Comments (Atom)