SimpleORM White Paper

Simple Java Object Relational Mapping
Logo
Power without complexity.

Dr Anthony Berglas
October 2007


Why SimpleORM?

The key feature of SimpleORM is the fact that it represents individual records as a map of fields to values.  The use of rich record structures enables greater control of ORM functionality while simplifying their use.  No XML is required, and SimpleORM does not use reflection, byte code post processors or other obscuring techniques.

Thus SimpleORM does not store data records in pseudo POJOs (Plain Old Java Objects).  SimpleORM Automates persistence, but unlike Hibernate and JDO it does not pretend to make persistence Transparent.  SimpleORM uses ordinary Java data structures which are real POJOs, with no magic reflective techniques.  Certainly no complex EJB-like constructs are used.

Record and field definitions are just instances of SimpleORM classes which are normally declared as constants.  For example:-

  class Employee...
private static final SRecordMeta EMPLOYEE // the record
= new SRecordMeta(Employee.class, "EMPLOYEE_TABLE");
...
public static final SFieldString PHONE_NR // a field
= new SFieldString(EMPLOYEE, "PHONE_NR", 30);

public String getPhoneNr() { // Optional get method if you like them
meta.getString(PHONE_NR);
}

EMPLOYEE contains an object that represents the Employee record (SQL Table), and PHONE_NR  contains meta data for a Phone Number field (SQL Column) that might be contained within that record.

One can then access the field as
  Employee employee = findOrCreate("MYEMP");
String phone = employee.getString(Employee.PHONE_NR);
 String phone = employee.getPhoneNr(); // Alternative if get* defined.

This structure has the following advantages:-

Transparent persistence attempts to hide persistence issues from the business logic.  However, for most information systems persisting and querying the information is pervasive and tends to dominate the "business logic".  Thus we believe that the advantage of having a rich yet simple record structure easily outweighs the loss of transparency for most applications.    It is also easy to implement unit tests in SimpleORM, but most of those involve persistence in practice. 

It should be noted that SimpleORM is a "real" ORM.  There is an object cache, and each record is only represented once within each transaction.  Object identity is tied to database identity.  This means that business rules can be written largely independently of each other without the need to coordinate data retrieval.  This distinguishes SimpleORM from iBATIS, Ruby on Rails, Butler etc. 

SimpleORM is full featured.  For example, it is also possible to detach records and then reattach them to a new transaction using optimistic locks.  Several primary key generators are provided, but unlike most ORMs it also provides very good support for identifying foreign keys.

SimpleORM is open source with an old Apache style licence.

Why Use Object / Relational Mapping

JDBC provides an efficient way for Java programs to persist data in relational databases. It provides the power of SQL to perform advanced queries that can efficiently join and aggregate data on the server.  It also provides precise and flexible control over transactions and locking that are only restricted by the SQL standard and the underlying database implementation.

However, applications that do a significant amount of database interaction will want to encapsulate business rules and information about different database tables in Java classes. For example, there might be a Employee class that corresponds to an Employee table, and a Department class that corresponds to a Department table. Methods could be written to validate and process data in those classes.

An Object / Relational Mapping tool automates the mapping between these classes in memory and the database by providing methods to automatically select, insert, update and delete rows. They should also keep a table of primary keys to ensure that there is at most one object in memory for each row in the database, and they should automatically detect updates to the in memory classes and flush them to the database when transactions commit. A good tool will also handle foreign key relationships and provide a measure of database independence.

These are important services, so any substantial Java database oriented application would definitely benefit from a good O/R tool.

Simple Example

The following code fragments show everything that is required to declare and use a mapping to an Employee table. Each SRecordInstance represents an individual record instances while SRecordMeta objects represent the meta-data associated with each instance. Each SFieldMeta object augments this meta data.

Note that in SimpleORM a database has "tables" and "columns", while a persisted class is a "record" which has "fields". A non persisted class is just a "class" which has "instance variables".

public class Employee extends SRecordInstance { // ie. a class mapped to a table.

private static final SRecordMeta meta
= new SRecordMeta(Employee.class, "XX_EMPLOYEE");
// SRecordMeta objects describe SRecordInstances
 
public static final SFieldString EMPEE_ID
= new SFieldString(meta, "EMPEE_ID", 20, SFD_PRIMARY_KEY);

public static final SFieldString NAME
= new SFieldString(meta, "NAME", 40, SFD_DESCRIPTIVE);

public static final SFieldString PHONE_NR
= new SFieldString(meta, "PHONE_NR", 30);

public static final SFieldReference DEPARTMENT
= new SFieldReference(meta, Department.meta);

public static final SFieldBoolean IMPORTANT
= new SFieldBooleanCharYN("IMPORTANT");

public SRecordMeta getMeta() { return meta };
// (Abstract SRecordInstance method)

public static Employee findOrCreate(String empeeId) { // Convenience
return (Employee)meta.findOrCreate(empeeId);

// Completely optional get/set methods.
String getPhoneNumber() {
return getString(PHONE_NR);
}
String setPhoneNumber(String value) {
setString(PHONE_NR, value); }
}

The Employee record is mapped to the XX_EMPLOYEE table. It has a primary key EMPEE_ID, a NAME and a PONE_NR field, and a direct reference to the Department record. All these definitions simply define static constants which are threaded together via the meta variable.  The meta object is tied to the Employee.class using the getMeta method.

The following code then manipulates this data.

  SConnection.attach(myJDBCConnection, null);
SConnection.begin();
...
Employee employee =
(Employee)Employee.meta.mustFind("123-45-6789");

String name = employee.getString(Employee.Name);
String phone = employee.getPhoneNr(); // if getter defined
...
employee.setPhoneNr("(123) 456 7890");
...
Department department =
(Department)employee.getReference(employee.DEPARTMENT);
...
oldEmployee.delete();
...
SConnection.commit();
SConnection.detachAndClose();

A SimpleORM connection is created based on an existing JDBC Connection, and associates it with the current thread.  SimpleORM does not care how the JDBCConnection was created, normally one  would use a connection pool such as the one supplied with Tomcat.  A new transaction is then started.

<>mustFind(keys) lazily finds a employee object based on the primary key. If the object had been previously retrieved in the same transaction a pointer to it is returned, otherwise a JDBC query is issued to retrieve the row. If "123-45-6789" does not exist in the database then a meaningful exception is thrown whose message includes the unfound primary key.  find(keys) returns null if the row was not found, create(keys) creates a new record, while findOrCreate(keys) can be used to insert a new row using exactly the same techniques as used to update a record.

The NAME and PHONE_NR fields are then retrieved from the record, the first using the generalized syntax and the second using the get method that had been defined.  The PHONE_NR, is then set to a new value. The generalized design makes it easy for SimpleORM to mark the field and record as dirty when the PHONE_NR is set.  One never needs to compare 100 records at commit time just to find the 10 that have changed.

The employee's DEPARTMENT record is then retrieved from either the cache or database. Note that unlike other O/R mapping systems this is normally performed lazily so that the Department row is only retrieved from the database if it is needed.

An oldEmployee is then flagged for deletion before the transaction is finally committed. As part of this process SimpleORM flushes all pending dirty objects to the database. Thus there is no need to manually track which records need to be updated. There is also no risk that separate parts of the same application will accidentally override changes to different in memory copies of the same database record.

The generalized design also makes it easy to integrate sophisticated UI layers with the ORM.  See SimpleWebApp.

Note that as no reflection is used one can organize the record definitions in more sophisticated ways.  For example, one could create 13 MONTH_nn fields (ie. columns) using the following code:-

  SFieldBigDecimal MONTHLY_TOTALS = new SFieldBigDecimal[13];
  static {
     for (int mx=0; mx<SFieldBigDecimal.length; mx++)
        MONTLY_TOTALS[mx] = new SFieldBigDecimal(meta, "MONTH_" + mx);
  }
   ...
  total = myRecord.getBigDecimal(MonthlyTotals[3]);

There is nothing special about this code, it just creates SFieldMeta objects and references them using an array instead of individual constants.  Simple.

Queries

findOrCreate(...) can can retrieve records based on equality of their primary key(s).   But a relational database can also retrieve rows based on arbitrary boolean expressions.  Consider the following query which selects employees with Name starting with a letter after "J" in myDept.  The equivalent SQL is shown after the query.

SResultSet res = Employee.meta.newQuery()
.gt(Employee.NAME, "'J'")
// .and() implied. .or() etc. override.
.eq(Employee.DEPARTMENT, myDept)
.decending(Employee.NAME)
.execute();

// ps = Prepare("SELECT * FROM XX_EMPLOYEE
// WHERE NAME > 'J' AND DEPT_ID = ?
// ORDER BY NAME DESC"<br>
// ps.setString(1, myDept.DEPT_ID);
// res = ps.executeQuery();

while (res.hasNext()) {
Employee emp = res.getRecord();...
}

// ArrayList recs = res.getArrayList(100000); // Alternative

Note that the Department is queried directly, and SimpleORM determines that the key to Department is DEPT_ID ( it could also be a multi column key).  The records are normally retrieved individually using a normal hasNext() loop, but they can also be retrieved as a set using  getArrayList which takes a parameter that limits the number of rows retrieved (else an exception is thrown).

It is also possible to use direct SQL SimpleORM tries to form a very thin layer on top of JDBC. Thus queries are performed using ordinary SQL queries, with SimpleORM providing default Select and From clauses. The general flow is very similar to a JDBC query. For example:-

  SPreparedStatement stmt = Employee.meta.select(
"NAME > 'j' AND DEPT_ID = ?", "NAME");
// SELECT * FROM XX_EMPLOYEE
// WHERE NAME > 'j' AND DEPT_ID = ? ORDER BY NAME

stmt.setInt(1, myDept.getInt(DEPT_ID));
SResultSet res = stmt.execute();

while (res.hasNext()) {
Employee emp = res.getRecord();
... emp.getString(Employee.NAME) ...
}

By default all SFields in a record are retrieved unless they have the SFD_UNQUERIED property set. This is used for large, rarely used fields. However, if the SQY_UNQUERIED flag is specified on the query then unqueried fields also be retrieved. Naming fields such as DEPARTMENT.NAME can be identified by having their SFD_DESCRIPTIVE property set in which case specifying the SQY_DESCRIPTIVE flag causes just those descriptive fields to be retrieved. Alternatively, one can specify an explicit list of fields to retrieve for each query.  Fields that are not queried are marked as invalid, and attempts to access them produce fail fast exceptions.

Thus one never has to retrieve all 100 columns of an Employee Table just to obtain the Employee's Name.  And any attempt to access a field that has not been queried produces fail fast exceptions. (All these options also apply to foreign key getRecord(), findOrCreate(), etc.)

No attempt has been made to concoct some new query language that is then transformed into SQL in ill defined ways. Such new query languages lack key features such as sort order or leave important details poorly defined such as whether joins are inner or outer. They also may reflect a lack of understanding about the nature of query languages. For example, EJB QL tries to navigate a database with a mysterious IN operator in queries like

SELECT p FROM Player p, IN(p.team) AS t WHERE t.city  = ?1
rather than the more obvious
SELECT p FROM Player p WHERE p.team.city  = ?1

The Cache and the Database

Perhaps the most important function of an O/R system is to efficiently synchronize objects cached in memory with rows persisted in the database. SimpleORM does this very carefully to minimize inconsistent query results and update anomalies.

In particular, if two separate queries in the same transaction would both return the same row in the database, then SimpleORM ensures that there is only one record in memory that corresponds to that row. The second query simply returns a pointer to the record created by the first query. SimpleORM then automatically updates the row in the database when the transaction commits. This prevents changes to a row made by one business rule from accidentally overwriting changes to the same row made by other business rules in the same transaction.

These records are stored in the main "Transaction Cache" which is simply the set of records read during a transaction. It is not a separate structure, but just a hash map of each SRecordInstance's primary key and a pointer to the record's object. A separate Transaction Cache (ie. hash map) is created for each transaction, and data is never shared between two thread's transaction caches. Thus two threads on the same JVM behave identically to two threads on different JVMs, and this is achieved without the need for any fancy SimpleORM distributed locking mechanism. The database is used to do all the locking, and so non-SimpleORM and importantly even non-Java applications can co-exist with SimpleORM.

Changes to records in memory are not normally flushed to the database until a transaction commits. The flushing operation performs inserts, updates and deletes in the order that records were first made dirty. Records are made dirty when a field is set to a new value or they are deleted.

Maintaining the dirty flushing order generally avoids putting the database temporarily into an inconsistent state and so violate database referential integrity constraints. We strongly suggest that SQL FOREIGN KEY constraints be declared for all foreign key relationships and SimpleORM supports this. If the default order is not optimal for an unusual transaction records can be manually flushed or markDirty()ed.  Occassionally it might even be necessary to flush a record multiple times if mutually referencing foreign keys are used.  (The main potential source of trouble arises if parents and children are inserted in the same transaction. Simply make sure that the parents are created first -- eg. insert Departments before Employees, just like you would with JDBC.)

Once a transaction commits all records are flushed and then normally purged and destroyed. Destroyed records are removed them from the cache hash map and their values are removed. Any attempt to access destroyed records produces fail fast exceptions.

When a record is delete()ed, it is destroyed at the time that the delete() is specified. Thus one cannot access deleted records, even though the actual database deletion will not occur until the transaction is flushed. A findOrCreate() for the same key as a just deleted record will return the destroyed record, and its fields cannot be referenced. Deleting a newly created record is allowed and does nothing, with no SQL update being issued at flush time.

One important and usually undocumented problem with most O/R systems is that records are cached in memory yet queries are made against the database. For example suppose a transaction increases Employee Fred's salary from $45,000 to $55,000, and then within the same transaction makes a query for employee's with SALARY < $50,000. Would Fred be included in the query's result?

SimpleORM guarantees that queries made via findOrCreate and look ups via getReference are always made on the cached values. This means that they always reflect the current state of the transaction in memory.  However, like most other O/R tools queries performed via result sets are always made on the database, and so may not reflect the current cached values. This enables the full power of SQL to be used without having to implement a special in-memory query engine. As a result, Fred would be included in the result set. SimpleORM does make sure that retrieved records are consistent with the cache, so Fred's existing record would be returned and yet it would have the current salary of $55,000.  In rare cases where this is a problem SimpleORM enables dirty rows to be explicitly flushed to the database without having to commit the transaction. 

SimpleORM also allows the user to explicitly purge (and flush and destroy) specific records from the cache after they have been flushed to the database. This enables direct JDBC commands to be used to update the database within a SimpleORM transaction without any danger of cache inconsistencies. Raw JDBC can be much more efficient for certain types of operations. For example a single SQL DELETE statement that deletes all transaction details over 12 months old is much more efficient than having to read each of the transaction details into the JVM and then delete them individually.

Primary Keys and Foreign Keys

Having unique, immutable primary keys are fundamental to any ORM.  It is the keys that are used to link specific records in memory to specific rows in a database.  They are also used to access records in the transaction cache.  Very occasionally it might be useful to have a transaction table that has no primary key, but such a table will have to be updated directly with JDBC, not an ORM.

SimpleORM links records together using both a direct pointer and the key values.  So the following line from the Employee definition actually defines towo SFieldMeta objects.  The first contains the reference field that links to the Department record, and a reference to it is stored in the DEPARTMENT constant.  There is a second slightly hidden scalar field that contains the actualy DEPARTMENT_NR field because that is the primary key of the Department record.

public static final SFieldReference DEPARTMENT
  = new SFieldReference(meta, Department.meta);

When the following line is executed, the department row is read from the database if it has not already been read into the current transaction.

Department department =
  (Department)employee.getReference(employee.DEPARTMENT);

employee.IsNull(employee.DEPARTMENT) means that there is no Department associated with this Employee.  It does Not mean that the Department has simply not been loaded into memory.  It is slightly different from just sayng department == null because it does not acually need to retrieve the Department, it just needs to see whether the scalar DEPARTMENT_NR field is null.Use getReferenceNoQuery(employee.DEPARTMENT) if you just want to see whether the reference has been loaded, useful for Detached records (below).  DEPARTMENT.foreignKeyField(0) will return the actual scalar foreign key.

Note that there are many different ways to access a field in SimpleORM other than just getting and setting its value.  Note how how the generalized design makes this easy and natural to do.  Records are not POJOs.

SimpleORM fully supports multi column primary keys. So if the Department table actually had keys (CompanyNr, DepartementName) then both these scalar fields would be automatically included in the Employee record.  Contrary to popular belief this introduces little overhead as long as indexes are declared properly.

Further, SimpleORM supports multi column Identifying foreign keys. In other words, references to one table can be used as part of the key to another. This means that references may cause foreign keys to be propogated recursively. For example, if the key to PaySlip is (Employee, Period), and the key to Period is (Year, PeriodNr) then the actual primary key columns for PaySlip would be (EMPEE_ID, YEAR, PERIOD_NR). A PayslipDetail could thus have the key (EMPEE_ID, YEAR, PERIOD_NR, DETAIL_TYPE).

The programmer mainly works with the abstract references, with the underlying foreign keys being manipulated automatically.  So one can say
  paySlipDetail = findOrCreate(new Object[]{myPaySlip, myDetialType})
And this will create a record with 4 valued primary key.

Many people beleive that all records should have surrogate keys that have no meaning to the application.  These are usually just integers that are generated as records are created.  So in the above example PaySlip would have a key PaySlipNumber and extra non-key columns EmployeeNumber, PeriodNumber.  Likewise PaySlipDetail would have key PaySlipDetailNumber and non key columns (PaySlipNumber, DetailType).  We recommend against this practice because:-
However, there are many situations in which there are no good natural keys.  Sometimes one can have the operator/user choose their own fairly arbitrary immutable keys, as is done when obtaining a web based E-Mail account on Yahoo etc.   But it is often necessary to generate numeric keys such as Order Numbers automatically.

This is suprisingly difficult to do, and there are many techniques none of which are entirely satisfactory.  SimpleORM lets you specify the type of generation algorithm in a flexible way.  For example, the following fragment specifies that the SelectMax method be used for the INVOICE_NR.  One then uses createWithGeneratedKey instead of findOrCreate to create the records.
   public static final SFieldInteger INVOICE_NR =
    new SFieldInteger(meta, "INVOICE_NR", SFD_PRIMARY_KEY,
       SGENERATED_KEY.pvalue(new SGeneratorSelectMax(meta)));

This uses the obvious techniques for generating keys using "SELECT MAX(MY_KEY)+1...".  However while effective it is limited because it will create locking problems in a multi user environment.  Other alternatives tend to be complex and/or database dependent.   They include:-
See the subtypes of SGenerator for more options and details.

Associations and Class Mappings

One of the fundamental differences between Java and databases is that Java has no real concept of association.  In a relational database if you
   UPDATE EMPLOYEE  E SET DEPARTMENT = 'SALES' WHERE E.ID='FRED'
then employee FRED is automatically an implicilty in
   SELECT * FROM EMPLOYEES WHERE DEPARTMENT='SALES'
In particualr, there is no separate structure DEPARTMENT.EMPLOYEES.  If you want to find the employees in a department you just query the database.  The database would normally provide an index on EMPLOYEE.DEPARTMENT,  and that index is updated automatically when you update the EMPLOYEE record.

In Java each end of the relationship would normally be updated as distinct statements:-
  sales.getEmployees().add(fred);
fred.setDepartment(sales);
Some transparent ORMs such as Hibernate actually attempt to map both these statements into the one EMPLOYEE.DEPARTMENT column.  But in Java there is no need for these two statements to be consistent, one could write:-
  sales.getEmployees().add(fred);
fred.setDepartment(engineering);
The semantics of the above statements is not well defined.  The problem is that Java makes you have to update the department.employees index manually, where a database does it automatically.

(In Hibernate the semantics seems to depend upon the order in which they each happen to update the database.  One can also specify inverse="true" which means just ignore one end of the association. The excellent Hibernate In Action book is somewhat appologetic on this point,  p106.)

A better approach would be to have setDepartment automatically update the getEmployees collection and make the getEmployees.add method private.  In normal Java this requires copied code or reflection.  But it could be done quite easily using SimpleORMs generalized structures (without the need for any fancy containers).  To retrieve all the Employees whose Department is sales one would then write
  emps = sales.getChildren(Employee.DEPARTMENT); // not provided

However, SimpleORM does not provide this method.  The reason is that the semantics are still unclear.  Supose Annie and Tania are members of the Sales departement and one executed
  SConnection.Begin()
Departement sales = ...
SResultSet result = Employee.meta.newQuery()
.gt(Employee.NAME, "'J'")
.eq(Employee.DEPARTMENT, sales)
.execute();
This would retrieve Tania but not Annie.  So should Annie be included in sales.getChildren?  Really we need two methods, getAllChildren and getRetrievedChildren.  If we want to show the second page of a list of employees with names after 'J', then getRetrievedChildren is more useful.  But its semantics are vague to say the least -- what if some unrelated query in the same transaction happened to retrieve Annie, should she be included?  And what if Tania where to change her name to "Betty"?  And what order should the collection be sorted by?   getAllChildren has fairly well defined semantics but is often less useful particularly as a sort order is almost always required in practice.

So SimpleORM takes the simple approach to these thorny issues.  If you want to have a list of employees you simply write
  ArrayList recs = result.getArrayList(100000);
The user has complete control, the semantics are clear, and there is nothing to stop the ArrayList from being stored in an Employee instance variable.  SimpleORM does not attempt to track the many end of relationships.  In practice this works very well and avoids many possible misunderstandings and bugs.  It also largely avoids the "n+1" select problem.

Many to many relationships are handled in a similar way, but the intersection table needs to be explicitly referenced.  This is almost always required in practice anyway because there is almost always important information pertaining to the relationship itself.  For example many Students may be enrolled in many Subjects, but one probably wants to know the grade achieved for each Student for each Subject.  Likewise many Players may play for many Teams, but one probably wants to record the position of each Player in each Team.

There is no attempt to cascade deletion of parents to their children.  This avoids many thorny problems.  So it is the programmer's responsibility to explicitly delete any dependent rows. We also recommend against declaring DELETE CASCADE at the database level as this will confuse SimpleORM (and most other tools).

Likewise, other complex mapping between Java (sub)classes and SQL tables are not supported.  There are many subtle and confusing issues with most complex mappings.  We assume that the relational data design is reasonable.  If it is not and cannot be changed then you need to be very careful, and might be better off with JDBC.  If it is, then you are generally much better off keeping the mapping simple and transparent.

Connections, Transactions, Swing and J2EE

A SimpleORM connection is simply wrapped around an existing JDBC connection. SimpleORM does not concern itself with JDBC drivers, connection pooling, or other related issues. This makes SimpleORM easy to integrate into existing frameworks and application servers. (You would normally write a small wrapper method that obtains the connection from the pool and then calls SConnection.attach -- see the examples.)

There are slightly different SimpleORM drivers for each database that account for different locking strategies, performance trade offs and support for JDBC 2. SDatabase.connect(Connection con) can invoke the correct driver using con.getMetaData(). All drivers extend SGenericDriver which should work on any vaguely standard SQL database.

The connection is associated with the current thread. This avoids the need to specify the connection object for every SimpleORM method call. More importantly it makes it very difficult to directly access the same transaction via two different threads at the same time which would be very dangerous. It also makes it easy to trap connection pool problems caused by connections that are never closed which can otherwise be difficult to find.

Many JDBC programmers use far too many transactions. As a rule, a single database transactions should be used for one user interaction event such as submitting a web page. This is partially because committing a transaction is several times more expensive than updating a row, but also because it is important to see and maintain a consistent view of the database in a multi user environment. Associating the transaction with the current thread encourages proper transaction design by avoiding the need to pass a JDBC connection paramater.

AWT/Swing is essentially single threaded, which is good, but what is surprising is that the thread it uses is not the main thread. Swing/AWT programs often use the main thread to set up the forms, and then exit, and then the second AWT thread starts to do the real work. In other words a simple single threaded application pointlessly using two threads, one after the other!  (.Net does not make this mistake.)

SimpleORM works well with Swing but it is best to execute all SimpleORM operations in the same thread. This can be easily achieved using invokeAndWait() to set up the forms the AWT thread. The main thead is hardly used at all. An example of this cleaner approach has been provided.

SimpleORM can be used with J2EE EJBs and JTA.  SConnectionEJB is a subclass of SConnection that associates itself with the current JTA object rather than the current thread.  A Synchronization object then tracks the JTA commit/rollback methods.

However, it should be noted that CMT is often misused, and can result in many unecessary transactions. Even the experts that wrote the Weblogic 7.0 basic examples did this. This has a serious performance problem, but more importantly it can produce unreproducable incorrect results. Further, two phase commit algorithms can alway produce nasty "in doubt" transactions. Thus the use of CMT in these examples is not an endorsement of the use of CMT.

Record Locking

Locking is very important  not because locking problems cause many errors, but rather because they cause very few errors! Badly constructed transactions that do not lock data properly will work perfectly when unit tested. They will also work pretty well in production. But every so often records will be corrupted in ways that are impossible to reproduce. And worse, the problems usually only arise once after the project goes live and gets heavily used. A nightmare.

We therefor strongly recommend that locking be used very conservatively, with Isolation set to Serializable.  Contrary to popular belief, there is usually very little overhead in doing this.  If after testing an overhead is suspected the isolation level can be temporarily reduced and the results remeasured.  But careful investigation should be made before simple reducing the isolation level.  The precise semantics of the isolation levels is subtle and varies between databases, particularly those with Multi Version Concurrency Control (MVCC).  (Locking caused a lot of performance issues with databases that locked pages instead of records.  Older versions of Microsoft SQL Server do this.  But hopefully this is a thing of the past.)

Many databases now include Multi Version Concurrency Control (MVCC) systems to provide better isolation.  These include Oracle, PostgreSQL, DaffodilDB, the very latest MS SQL but not Cloudscape/Derby, DB2.  This means that if transaction T updates X from 2 to 3 and does not commit, then transaction U can still read X and will see the previously committed value, 2.  This is great for reports because readers do not have to wait for all writers to commit.  Bit it is not OK if transaction U wishes to add 1 to X, which should then be 4 not 3!

To avoid this problem the databases generally provide a SELECT FOR UPDATE statement.  This will wait until no other transaction has a lock on the row, and then will lock the row.  So this is the default for SimpleORM which is normally used to update values.  However it is also possible to specify a SQY_UNLOCKED flag which causes records to be selected without the FOR UPDATE clause and so not obtaining any locks. This should be used with caution. 

(The exact behaviour of MVCC varies between databases and their default isolation levels.  For example, Oracle will happily return X=2 for a SELECT and then X=3 for the SELECT FOR UPDATE in the same transaction.  A subsequent SELECT in the same transaction will surprisingly return X=3.  But COMMIT and then SELECT will return X=2, which is consistent if you think about it.  PostgreSQL tends to throw an exception if a SELECT FOR UDATE would be different from the SELECT. Exceptions are easy to diagnose, corrupted data is much harder.  Be conservative and careful.)

A third alternative is to use the SQY_OPTIMISTIC flag which leaves the record unlocked but remembers the column values when the record was read.  When the record is updated SimpleORM checks that no other transaction has changed the record.  Thus an optimistic "lock" does not actually lock the record at all, it is just optimistic that no other transaction will update it, but if it does it throws an exception to prevent data corruption.  It is up to the user to deal with this exception.

Optimistic locks are implemented with virtually no overhead by adding updated colunms redundantly to the WHERE clause of the UPDATE statement -- no additional query is required.  Only the primary key fields and fields that were actually updated to a different value are checked, so in practice this provides column level locking. The old values of the fields are stored directly in the rich non-POJO record structure.

Some databases do not allow certain data types to appear in where clauses. For example, Oracle does not allow "long" varchar fields that are over 2000 chars to be used. Such fields can be flagged SOPTIMISTIC_UNCHECKED in which case they will not used in the WHERE clause. But be sure to manually check for concurrent updates of such fields.  It is also possible to add special timestamp or update number columns to implement optimistic locking, but this is generally unnecessary with SimpleORM.

Detached Records

SimpleORM  records to be detached from the current transaction, possibly serialized and sent to a different tier, and eventually be reattached to a new transaction which can then be used to update the database.   Thus one never has to create special Data Transfer Objects (DTOs).  This is particularly useful in Web applications in which data is shown to the user in one transaction and then updated when the user commits the form.

The records are optimistically locked, ie. they contain both the old and new values of the fields, and these are compared to values in the database during an update.  The old values are stored directly with the records (they are not POJOs) so there is no need for some inter-transaction session object.  Note that DTOs as normally implemented do not handle locking and are therefor plain wrong.

When a record is detached, the direct references to non-detached records are normally nulled, with just the scalar key values of those references being retained. This avoids serializing more than one expected.  So suppose Employee e has Department d and Manager m, and then d.detach() and e.detach() but not m.detach(). If e is serialized then d but not m will also be serialized. In the unattached state e.Department will be available, but e.Manager would not normally be available.   Should e be reattached later, then e.Manager can be retrieved into the cache. See LongTransactionTest for details.

(Advanced users can also use an interface getReferenceWhileDetached to fetch e.Manager, possibly accross multiple tiers.  So one does not have to decide exactly which records to detach before commiting the first transaction.)

The following fragment shows how long transactions are managed by detatching selected records from the current transaction.  Such records are not marked as invalid when the transaction commits, and so they can then be associated with the session state. If a modified web page is eventually submitted, the long lived records can be attached to the current transaction. Its fields can then be updated from the web form's input.

SConnection.begin();
Employee employee1 = Employee.meta.findOrCreate("123-45-6789");
employee1.detach(); // Must not be dirty.
SConnection.commit();
...
<input value="<%= employee1.getString(Employee.PHONE_NR)%>">
session.putAttribute("employeeRecord", employee1);
// Could also send to another tier
...
Employee employee2 = session.getAttribute("employeeRecord");
employee2.setString(Employee.PHONE_NR, request.getAttribute("PhoneNr");
...
SConnection.begin();
employee2.attach();
SConnection.commit(); // Checks optimistic locks for employee2

Following the way SimpleORM handles associations, we leave it to the user to maintain a list of detached records as appropriate.  This may involve adding ArrayList to parent records.  There is no automatic and potentially surprising transitive de/ataching.  In practice this adds little code but provides much greater clarity to the real semantics.

A record must not be dirty when it is detached, and optimistic locking is automatically enabled. So if the record was modified and is then reattached to a second transaction, SimpleORM checks that no other session has modified the record between the two transactions.

Detatched SimpleORM records are Serializable so that they can be sent to a different tier using RMI etc. When a record is serialized, references to other records are also serialized provided that they are also detatched. If the record is reattached later then the references are also reattached. And getReference can then automatically find any referenced records that had not been detatched and so were not serialized.

Long transactions are oriented towards HTTP page at a time updates. If more highly interactive Java applications are required then it may be more appropriate to use short transactions but avoiding exclusive locks. The first time a user modifies a record its lock can be promoted to exclusive locks by using findOrCreate on the existing record's key but without the UNLOCKED flag.

Fields and Enumerations

SimpleORM retrieves values from records in a very similar mechanism to ordinary JDBC.  Multiple get* methods retrieve values in different ways. getString returns the value as a String, getInt returns an int (not an Integer) or 0 if the column is null.  isNullcan then be used to determine whether the column is actually null.  Note that the actual column type is somewhat independent of the get method used, eg. getString can be used on an integer column -- it will just return the value as a string.  Like basic JDBC, getObject provides the most general access and can be used for any type, inlcuding Blobs, Clobs, Arrays and any new ones that may be added in the future.

Internally each field type is processed by its own driver which handles conversions etc. This makes it easy to add new types. Existing types include TimeStamp and BigDecimal.  There are several SFieldBoolean* types depending on how one wants to represent a boolean, eg. 'Y'/'N', 'T'/'F', 1/0 (SQL does not have an intrinsic boolean type).

[Enumerations and are not implemented yet.  When Java 1.5 becomes more widely used we will use it.]


Business Rule Framework

The SRecordInstance class provides a simple framework for adding business rules to database operations. Each method in the framework is called after a specific event such as when a field is set to a new value or when the record is about to be written to the database. The SFieldMeta object is passed as an explicit parameter to methods that work on individual fields which avoids the need for reflection and so makes it easy to extend and modify the framework.

As discussed in the previous section most field level validations are performed declaratively using SFieldMeta subtypes such as SEnum. Internally those classes implement the validate method which is called every time SRecordInstance.set*(SFieldMeta, value) is called. validate methods can also change the value stored in the record object. For example, SEnum.validate() is passed the string returned from the database but returns a SEnum object which is actually set into the SRecordInstance.

Per record field level validation is performed by specializing validateField which is called after SFieldMeta.validate(). Like SFieldMeta.validate() there is only one method used for all the fields, with the first parameter used to determine which field is being updated. The following example validates three fields:-

class Employee extends SRecordInstance {
...
protected void validateField(SFieldMeta field, Object newValue) {
if ( field == SALARY) {
Department dept = getReference(DEPARTMENT);
if (newValue > dept.getDouble(MAX_SALARY))
throw new SValidationException(
"Salary {0} is greater than the Department's Maximum {1}."
new Object[]{newValue, maxSalary});
}
if ( (field == BEGIN_DATE || field == END_DATE)
&& beginDate.after(endDate) )
...
}
}

An alternative mechanism [yet to be implemented] is to associate a validator directly with a field.  Thus the following is an alternative more conventional (but also slightly longer) way to write the above using an anonymouse inner class:-

class Employee extends SRecordInstance {
...
static {
SALARY.addValidator(new SFieldValidator() {
protected void validateAField(SRecordInstance record, SFieldMeta field, Object newValue) {
Department dept = getReference(DEPARTMENT);
if (newValue > dept.getDouble(MAX_SALARY))
throw new SValidationException(
"Salary {0} is greater than the Department's Maximum {1}."
new Object[]{newValue, maxSalary});
}
}
}
}

Note how easy it is to write meta level code due to the generic design.  No need for reflection.

This allows data oriented business rules to be associated with the data. No matter what modules use Employee, the salary rule will be enforced. Note the way that like most non-trivial validations, this one involved accessing data from a different record (Department). This is why it is important to have an record cache to minimize JDBC calls while ensuring that the retrieved values are current.

It is, of course, also possible to add arbitrary methods on the Employee class for business rules related to Employees that are explicitly called.  The goal is to reduce the amount of indirection.  We do not recommend a DAO approach (although it is certainly possible).  Try, in general, to reduce the amount of indirection and code.  Just get the job done directly.

It also should be noted that this approach means that one only needs to add code for real business rules. The traditional approach of writing getter and setter methods means that one needs to write two dummy methods for every field in an object just to allow for the possibility of adding a rule later. This framework makes that unnecessary because the generalized get* and set* methods are used in the same way regardless of whether there is a validate etc. method written for the field. In practice this substantially reduces the amount of code that needs to be written.

The following events are (or will be) supported:-

SRecordInstance.validateField(
  SFieldMeta field,
  Object newValue)
The main field validation method, this is specialized for records that need to perform validation. Throws an SValidationException if not OK.

SRecordInstance.validateRecord()
Invoked just before a record is inserted or updated.  Used for more complex validations that involve the relationships between several fields. Is normally called just before a record is flushed but may also be called directly. It is only called for dirty records, and should not change the values of fields.
SFieldMeta.validate(
  SRecordInstance instance,
  Object newValue)
Invoked after every set*() before SRecordInstance.validate. Used by SFieldMeta subclasses such as SEnum to perform declarative style validation. Returns the new value to assign to the field. Throws (a subclass of) SException.SValidate if invalid. Returns the actual value returned by get*()
SFieldMeta.mapGetValue(
  SFieldMeta field, Object cacheValue)
Called every time a value is retrieved from a record. Can be used to return a different value than actually stored in the SRecordMeta. (getRawObject() returns the value acutally stored.)
SSFieldMeta.mapSetValue(
  SFieldMeta field, Object cacheValue)
Called every time a value is set in a record. Can be used to return a different value than actually stored in the SRecordMeta.
SRecordInstance.watchField(
  SFieldMeta field, Object newValue)
Invoked after validateField(), but also just after a row is retrieved.  May have side affects such as updating a user interface to reflect the changed values.  (The distinction from valdiateField() becomes important when advanced sub-types are used.)
SRecordInstance.derivation(
  SFieldMeta field)
A field may be declared to be derived in terms of other fields, in which case this method calculate's its value.  The calculated value may be cached the first time it is calcualted.  Can be used for lookups.
SRecordInstance.synchronizeRecord()
Invoked just before validateRecord, and used to update fields as necessary before a record is written to the database. If enabled, then synchronizeRecord is invoked for all retrieved records regardless of whether they have been updated, and so can be used to synchronize a record with an external object -- see the JDO section.
SRecordInstance.preUpdate() Analogous to watchField() -- ie. called just after the record is validated but before any DML.
SRecordInstance.doUpdate() This replaces the method that normally does the actual update or delte to the database. It enables some other mechanism to be used to flush the row, for example to call a database procedure.
SRecordInstance.postUpdate() Invoked just after an update or insert to the databse during a flush operation.
postQuery() Invoked just after a record is returned from the database, before watchField().


A future version will extend the framework to provide a more advanced sub-type mechanism. This will go beyond the basic Java subclasses to enable objects to belong to more than one sub-type at a time, and to enable an object to change its sub-type at run time. These are important for persisted storage. For example, an employee starts as a Intern but later change to being a Manager. They may also become an Engineer and be Disabled, all at the same time. That cannot be represented directly in the very plain Java object model because a Java instance may never change its class and may only belong to one class.

The generalized framework lends itself to doing this. One would write something like:-

class Contractor extends Employee {
SRecordMeta mgrMeta = new SSubRecord(meta, Contractor.class);
...
boolean isMember() {
EmployeeType et = (EmployeeType)this.getObject(EE_TYPE);
return et == EmployeeType.CONTRACTOR;
}
...
}

SimpleORM could then determine the type(s) of each record, and apply validations appropriately. However, the details of this functionality are yet to be determined.

Software Development Process and DDL Generation

SimpleORM requires a minimum of code and so it also requires minimal use of other tools and techniques to maintain that code. Certainly there is no need for generators such as XDoclet, nor any special compilation techniques or post processing.

SimpleORM itself can be built with Ant.  But it minimizes the reliance on Ant so as to be easy to use with IDEs.  For example, the test cases are classes with a main method, and there is one TestAll case that simply calls the others.

There should be a single source of truth for the database table definitions. If most of the development is to be done with SimpleORM then this source of truth should be the SimpleORM Record definitions.  SRecordMeta.createTableSQL returns a string that contains the CREATE TABLE statement. It is also possible to augment and override the basic SimpleORM definitions with database specific physical clauses. This makes it easy to generates a complete .sql script that recreates the database.

There is also a separate SimpleORMGenerate package that can generate SimpleORM definitions from an existing database schema.

Tight integration with and Entity-Relationship style tool remains future work. Existing tools do not seem to have the flexibility to add SimpleORM specific annotations. UML tools tend to be too low level for database modeling.

If you wish to temporarily extend SimpleORM itself then note that most local variables and methods have package scope rather than private scope, and the .jar file is not sealed. Thus rather than changing the actual source, you can usually just poke extra classes into the SimpleORM.core package and then access everything with static methods. This is obviously not supported, and you should make sure that your changes are folded back into the main source.

One more feature is a convenient way to bulk load data into the database, which is particularly useful for testing. This is provided in the DataLoad class. The generalized design makes this easy to implement and use.

SimpleORM provides accurate, selectable tracing of all its operations, including all the SQL that is issues.  This can be enabled in the SLog class.  (It does not use Log4J to avoid the dependency, but it is trivial to subclass if required.)

SimpleORM has been successfully ported to J#.Net!  It uses the same source code, with the SJSharp class providing a few compatiblity methods to ensure that it can also be called from C# and VB.Net.  (Ie. the same code runs on Java and .Net).  This means that non Java 1.1 code needs to be factored out, which has turned out to be easy in practice.

Meta Properties and Automated User Interfaces

Internally all meta data properties about SFieldMetas and SRecordMetas are stored as SProperty values. This is a distinct subsystem that enables properties to be given default values, and to inherit values from other properties and objects. In particular, it allows UI fields to inherit default values for properties such as a Prompt from a data base field. The UI modules will put this to very good use.

For example, the following would retrieve the default prompt meta data that could be associated with the DEPT_ID field.

 String deptIdLab = Department.DEPT_ID.getString(SUI_PROMPT)

<>This produces more consistent user intefaces because the prompt need only be defined once, and this can be easily internationalized.  More importantly, other structural information can be associated with records. For example, other properties can return a list of the descriptive fields to display in a lookup.

SimpleORM's generalized design makes it easy to write generic code without reflection along the following lines:-

 void displayField(SRecordInstance rec, SFieldMeta theField) { ...
String thePrompt = theField.getString(SUI_PROMPT);
cout.println("<td>" + thePrompt + "</td>");
cout.println("<td><input value='" + rec.getFieldString(theField) "'></td>");
... }

Many database application mainly consiste of letting the user update the data subject to constraints.  If a UI module like this is used the the programmer would rarely ever even call the get*/set* methods -- it will mainly be completely automatic.

This work is on going, but tight integration between the MVC tiers is an important longer term goal of SimpleORM.

Performance and Benchmarks

Last but by no means least it is very important that an ORM not add any significant overheads to an application that is running in an enterprise application. To check that that is the case a careful series of benchmarks have been constructed.

The tests are based on the classic Department/Employee schema. Rows are inserted, queried and updated, both sequentially and randomly. The results are for PostgreSQL. (It would be interesting to see how they compare with other datbases but care is required, eg. HSQL has checkpointing overhead.)

The tests were done on a single PC with no network overheads. Each test is run in its own single transaction. All the benchmarks actually do something, eg. calculate a total salary and budget, so that one can have confidence that they are all actually working correctly.

The tests and times are are as follows. All times are quoted in terms of the number of milliseconds per iteration, with at least 1,000 iterations depending on the test. (The times vary about 10% between runs due to database buffer state etc.)

Test Description Raw JDBC SimpleORM
Insert Insert Emp rows. 2.4 2.5
Query Sequential Retrieve all the Employees and Departments sequentially. 0.31 0.39
Query Random Query the Employees randomly. 5.4 1.7
Query Field Queries five fields in one row, repeatedly. 0.12 0.0050
Update Random Give random Employees pay rises. 7.8 3.4
Update Bulk Update all rows in single Update statement. 0.17 0.18

The first result is that SimpleORM adds no measurable overhead to using raw JDBC. The queries for the Insert and QuerySequential were very similar, and so were the results. The over head for building the SRecordInstant structure and adding it to the cache was a negligable 0.1 milliseconds/record.

The cache then made SimpleORM over 3.0 milliseconds per record faster than JDBC on the QueryRandom and UpdateRandom trials because the cache minimized expensive database hits. If a network had been used the reduction in database hits would have produced even more dramatic improvements. SimpleORM is also faster for repeated field access, which is significant because there are typically many fields per record.

The bulk update test updates many rows with one SQL statement. The times are the same because they are the same SQL statement. But note that this is an order of magnitude faster than using individual update statements for each row, even without the network overhead. Thus it is important that an ORM allow the buffer to be flushed and purged so that this can be done safely.

<>But probably the biggest performance issue is not tested by the benchmark, namely transaction management. JDBC programmers on large systems tend to loose track of the connection object across modules, and so create far too many transactions. EJB users often use CMT, which also tends to produce far too many transactions (eg. the Weblogic demos). By associating the connection with the current thread, but still giving explicit control over transactions, SimpleORM encourages transaction structures that are efficient and correct. This performance consideration can dwarf all others

Conclusion

A good object-relational mapping tool helps manage the interaction between objects in memory and data in a database. SimpleORM achieves this by providing a thin layer over JDBC. It provides methods to retrieve data into objects and flush changes back to the database. Objects in memory are indexed to ensures that each row in the database is represented by at most one object in memory.

SimpleORM provides Automated persistence, but not Transparent persistence.  The generalized design makes it easy for it to intercept get* and set* methods to object instance variables. This means that it can easily detect when objects are changed, lazily retrieve related objects, and raise exceptions if attempts are made to access stale or undefined data. This is all achieved with a tiny fraction of the amount of user written code required by other O/R systems. The small and simple design also produces very good run time performance.

SimpleORM takes great care with persistent database semantics. The behaviour of the cache is well defined, and it is possible to both flush and purge the cache selectively. Records are locked appropriately, and any attempts to update unlocked data are detected. The caching strategy allows for access to a database from multiple JVMs and even from non-Java applications. And tracing and error handling are taken seriously.

Advanced features include a well defined mechanism for handling inter-transaction updates as occur in typical web applications. Common business rules can be specified declaratively, and more complex rules can be specified a both a field and record level.

The ability to easily build UI layers on SimpleORM offers the possibility of building very efficient development tools.

And last but not least, SimpleORM is open source, ie. free. It can be integrated into projects without concern for licencing costs. Moreover, great care has been taken to keep the source code concise, transparent and documented so that it can be easily understood. There are no nasty tricks, not even any XML and hardly any reflection.

We believe that SimpleORM is a natural choice for most Object/Relational mapping requirements.


Just because it is possible to push twigs along the ground with ones nose does not necessarily mean that is the best way to collect firewood.

Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems. -- Jamie Zawinski / James Robertson

The superior pilot uses his superior judgment to avoid the need for his superior skills. -- Should also apply to software!

Some programmers are clever enough to write clever code, other programmers are clever enough not to.