Project Goals

Managed Code

The original PST SDK is written in standards compliant C++ with an intent for cross-platform compatibility and as few external dependencies as possible. This is great for C++ programmers, but not so useful for the rest of us who use managed languages.

Why C# and not MC++

The goal of this project is not simply to allow access to the SDK for .NET developers. The larger goal is to express the SDK in a managed OO language like C#. This will facilitate porting it to Java, Python, etc., as the same code constructs, syntax, and strategies can be more easily understood in C# than in C++. In other words, once there is a working implementation in any manged language, translating from that language to other managed languages should be relatively simple. Ideally this will be the first in what is hoped to be a series of ports into other languages and platforms.

That said, we're also hoping to help .NET developers get access to this code sooner than later, so we're approaching this in multiple stages. The first stage will be a MC++ wrapper around the unmanaged library. I know I just said that we're not going to do that. What I should have said is, we're not going to stop at that. Some of the downsides of MC++ are that it a) still does the real work in unmanaged C++, so debugging is very difficult, and you don't get any of the GC and JIT benefits that a normal 100% .NET code library would have and b) MC++ only works on Windows with the MS .NET CLR. We'd like this to run on Mono.

Why not bind to a C++ DLL and wrap that in .NET using DllImport?

The goal is that this is 100% .NET code and only requires a C# compiler and 2.0 compatible .NET runtime environment to execute. When porting to Java/Python, the intention there will be the same -- pure Java or pure Python code, not wrappers to a C++ library. Also we don't want to dumb down interaction to the lowest common denominator: static C functions.. The unmanaged SDK is nice flexible well structured OO code. We want to maintain that experience.

Cross Platform Compatibility

One of the primary goals for the project is that it can run under Mono as well as the Microsoft CLR. For the same reasons, we want the code to be Framework v2.0 compliant (yes we know Mono can support later versions). This will ensure it's wide usability.

Operationally Equivalent

The C# SDK should be operationally equivalent and produce the same results in the unit tests and demo tools as the C++ SDK.

Similar Code

Note: I've abandoned this goal in favour of a more intuitive C# expression.
Rather than re-writing the SDK to be merely functionally equivalent, there is a strong emphasis on having the C# code be as similar to the C++ code as possible. This will facilitate later maintenance, as improvements that occur in the C++ code can be more directly mapped to the C# code. There may be a branch in the future which is a re-write that expresses the same ideas in a way that is more "C#-like" and less "C++-like". This goal is in some ways at odds with the "Managed Code" project goal. This disparity will need to be resolved at some point... ;)

Comments, Citations and Credit

Everywhere possible the original C++ comments are copied over, and citations from the MS published specs included. Credit is given to C++ SDK authors in the header along with original header comments.

Challenges

Templates and Typedef

The PST SDK uses C++ templates and typedefs heavily in a way that is difficult to express in C#. Generics unfortunately do not cover it. This has led to some kludgey workarounds. One of those is using C# style unions, which are structs with LayoutKind.Explicit and FieldOffsetAttribute with some fields occupying the same offsets. Since this creates a non-deterministic size profile, the sizeof() operator can't be used. In that case, constants are included in the struct to specify unicode vs ansi size, etc...

The lack of typedef/templates has caused some hardcoding of types that could cause errors in certain platform scenarios. This is almost entirely based around integer types of various sorts, and the ability to inject that type into a struct, modifying it's size and layout depending on which template you're using. To synthesize that we're using generics where possible... .NET has a bunch of limitations on struct inheritance, generic type constraints, referring to generic type arguments, and inheritance from integral types, which makes all of this much more complex. Where that system falls down some of the tactics involve creating types which mimic inheritance from integral types using operator overloading and generics to allow the injection of integral types.

When forced to hardcode types the following assumptions are being made:
  • Use LLP64 standard
  • short/ushort = Int16/UInt16
  • int/uint = Int32/UInt32
  • long/ulong = Int32/UInt32
  • longlong/ulonglong = Int64/UInt64

No Struct Inheritance in C#

Since C# doesn't allow struct inheritance, we're using classes and generics as much as possible to replicate the template structure. The problem with this is that classes lack the value semantics inherit in the structs, and this will slow down disk reads when attempting to map a byte array to a struct. This is a design point open to exploration and discussion. Currently this is implemented with by the following conventions:
  • Provide FromBytes(byte[] bytes) and byte[] GetBytes() methods for mapping the data to and from these classes
    • note: This will become an interface later and tooling will be provided for mapping data in and out of these classes
  • Provide a constant size_of value in the type that can be used to calculate it's byte size where Marshal.SizeOf or sizeof() would fail.
  • Attempt to calculate using statics instead of constants for derived size values and byte array limits.

Boost Libraries and Differences Between STL and .NET BCL

The use of third party libraries such as Boost's IoStream library will demand more re-expressing of code as there is no direct equivalent of Boost in C#. Many of the features of that library are already present in the .NET BCL. This means either re-writing the C++ to work in terms of the .NET BCL System.IO classes, which conflicts with the "Similar Code" goal, or wrap those classes to have the same interface/behaviour as the Boost IO libraries, which is yet another kuldge as the two libraries have significant differences.

More To Come...

This project is still very young, and has only one two developers who are doing this in their spare time. If this is interesting to you, please contribute either with code, commentary, or whatever you have that you feel will improve the project. Starting another project to port to Java/Python would be a great start, and we can share efforts and code.

Last edited Oct 13, 2010 at 10:18 PM by thoward, version 6

Comments

No comments yet.