SimpleORM White Paper/Tutorial

Simple Java Object Relational Mapping

Logo
Power without complexity.


Dr Anthony Berglas
August 2009
SimpleORM v 3

Just because it is possible to push twigs along the ground with ones nose does not necessarily mean that is the best way to collect firewood.

 

Why SimpleORM?

SimpleORM provides a full featured object /relational mapping and persistence system in a tiny package that is easy to understand and safe to work with.

A key technology is to represent objects as essentially a strongly typed map of attributes to their values. This avoids the complex reflection and byte code generation needed to represent the private fields in pseudo POJOs of other ORMs.  SimpleORM objects can still look like POJOs with values accessed by convetional (but optional) public get and set methods.  The simple structure facilitates a clean, string free query language and provides access to extended field state information such as the initial queried value.  

SimpleORM also packages objects into "DataSets", which enables it to provide well defined semantics for relationships and detached records (unlike other ORMs).   Records in DataSets can be manipulated independently from any database connection.  SimpleORM also provides transparent access to user extensible meta data, and can also enable different modules to extend that meta data which can provide very flexible application architectures.

Other less flexible ORMs are far too complex for ordinary developers to understand their detailed semantics or implementation.  This introduces substantial technical risk when problems arise.  On the other hand, the absence of complex reflection, query parsers, byte code generation etc. makes the SimpleORM behavior and source code very transparent.  If necessary any competent developer can read from the SimpleORM API to the underlying JDBC calls in a very few clicks of the mouse.   The entire SimpleORM jars are just 79K and 52K in size, with only one small and optional dependency (Slf4j).  (Hibernate is over 2400K plus about 2000K of dependent Jars.)

It should be noted that although simply implemented, SimpleORM is a full featured ORM and not just a database row reader/writer.  Queries can be specified safely in terms of Java objects, object identity is aligned with database keys, relationships between objects are maintained and modified objects are automatically flushed to the database with optimistic locks.   Optimistic locks are critical for the integrity of multi user web applications.  And object identity is important in larger systems to prevent independent business rules from overriding each other's data.  SimpleORM functionality is therefor much closer to Hibernate and entity beans than basic tools such as iBatis, DdlUtils, Butler, JaQu, Persist, Ammentos etc.

SimpleORM Generalized Records

SimpleORM represents individual records as a map of fields to values.  The use of rich record structures enables greater control of ORM functionality while simplifying their use.

Record and field definitions are just instances of SimpleORM classes which are normally declared as constants.  For example:-

 class Employee...
 private static final SRecordMeta EMPLOYEE // the record
   = new SRecordMeta(Employee.class, "EMPLOYEE_TABLE");
 ...
 public static final SFieldString PHONE_NR // a field
   = new SFieldString(EMPLOYEE, "PHONE_NR", 30);
 
 public String getPhoneNr() { // Optional get method if you like them
   return getString(PHONE_NR);
 }

EMPLOYEE contains an object that represents the Employee record (SQL Table) meta data, and PHONE_NR  contains meta data for a Phone Number field (SQL Column) that might be contained within that record.

One can then access the field as

 Employee employee = session.mustFind(Employee.EMPLOYEE, "Emp101");
 String phone = employee.getString(Employee.PHONE_NR);
 String phone = employee.getPhoneNr(); // Indistinguishable from a complex pseudo POJO
<c:out value="${employee.NAME}"/> // Integrated with JSP etc. without need for get/set methods.

This generalized structure has the following advantages:-

 newQuery(EMPLOYEE).greaterThan(Employee.NAME, "J")

instead of run-time string based queries like

 createQuery("from EMPLOYEE E where E.NAME > 'J'");

or

 createCriteria(EMPLOYEE.class).add(Expression.greaterThan("NAME", "J");
 EMPEE_ID...putUserProperty("DISPLAY_LABEL", "Telephone Number")...
 ... EMPEE_ID.getMaxSize() ...


SimpleORM Automates persistence, but unlike Hibernate and JPA it does not pretend to make persistence Transparent.  Transparent persistence attempts to hide persistence issues from the business logic.  However, for most information systems persisting and querying the information is pervasive and tends to dominate the "business logic".  So running the "business logic" without the persistence layer is meaningless in practice.  SimpleORM's independent DataSet layer facilitates unit tests and detached processing without a database connection, and get/set methods make records look identical to POJOs anyway.  SimpleORM does not pretend that persistence can be added to an existing application as an afterthought.  

It should be noted that SimpleORM is a "real" ORM.  DataSets provide a proper object cache, and each record is only represented once within each transaction.  Object identity is tied to database identity.  This means that business rules can be written largely independently of each other without the need to coordinate data retrieval.  This distinguishes SimpleORM from iBATIS, Apache DdlUtils, Butler etc. 

SimpleORM is full featured.  For example, it is possible to detach DataSets and then reattach them to a new transaction using optimistic locks.  Several primary key generators are provided.  And unlike many other ORMs it also provides good support for multi column primary keys without the need to create special key objects.

SimpleORM is open source with an easy Apache style license.

Data Definition Example

The following code fragments show everything that is required to declare and use a mapping to an Employee table. Each SRecordInstance represents an individual record instances while SRecordMeta objects represent the meta-data associated with each instance. SFieldMeta objects describe the fields within the record.

Note that in SimpleORM a database has "tables" and "columns", while a persisted class is a "record" which has "fields". A non persisted class is just a "class" which has "instance variables".

public class Employee extends SRecordInstance { // ie. a class mapped to a table.
 
 private static final SRecordMeta EMPLOYEE = new SRecordMeta(Employee.class, "XX_EMPLOYEE");
 
 public static final SFieldString EMPEE_ID
   = new SFieldString(EMPLOYEE, "EMPEE_ID", 20, SPRIMARY_KEY);

 public static final SFieldString NAME
   = new SFieldString(EMPLOYEE, "ENAME", 40, SMANDATORY, SDESCRIPTIVE);
 
 public static final SFieldString PHONE_NR
   = new SFieldString(EMPLOYEE, "PHONE_NR", 20)
     .putUserProperty("DISPLAY_LABEL", "Telephone Number");

 public static final SFieldDouble SALARY
   = new SFieldDouble(EMPLOYEE, "SALARY").addValidator(new SValidatorGreaterEqual(0));

 public static final SFieldString DEPT_ID = new SFieldString(EMPLOYEE, "DEPT_ID", 10);
       
 static final SFieldReference<Department> DEPARTMENT
   = new SFieldReference(EMPLOYEE, Department.DEPARTMENT, "DEPT");

 public enum EType{PERMANENT, CASUAL, CONTRACT};
 static final SFieldEnum<Type> ETYPE = new SFieldEnum(EMPLOYEE, "ETYPE", EType.class);

 static final SFieldString RESUME // Curriculum Vitae
   = new SFieldString(EMPLOYEE, "RESUME", 200, SUNQUERIED)
     .overrideSqlDataType("VARCHAR ( 200)"); // Maybe LONG VARCHAR for Oracle etc.

 public @Override SRecordMeta<Employee> getMeta() {return EMPLOYEE;};

 // Completely optional get/set methods. NOT needed for JSP access.
 public String getPhoneNumber() {return getString(PHONE_NR);}
 public void setPhoneNumber(String value) {setString(PHONE_NR, value);}
}

The Employee record is mapped to the XX_EMPLOYEE table. It has a primary key EMPEE_ID, a NAME and a PHONE_NR field, and a direct reference to the Department record. All these definitions simply define static constants which are threaded together via the EMPLOYEE object.  The EMPLOYEE object is tied to the Employee.class using the getMeta method.

Data Manipulation Example

The following code then manipulates this data.

 SSessionJdbc session = SSessionJdbc.open(myJdbcConnection, "MyInformativeLoggingLabel"); 
// or SSessionJdbc.open(myDataSource, "...");
 session.begin();
 ...
 Employee employee = session.mustFind(Employee.EMPLOYEE, SFOR_UPDATE, "123-45-6789");

 String name = employee.getString(Employee.NAME);
 String phone = employee.getPhoneNr(); // if getter defined
 employee.setPhoneNr("(123) 456 7890");
 Employee.EType etype = e100a.getEnum(e100a.ETYPE);
 ...
 println(employee.getString(Employee.DEPT_ID));
 Department department = employee.findReference(employee.DEPARTMENT);
 ...
 oldEmployee.delete();
 ...
 session.commit();
 session.begin(); ... session.commit();
 session.close();

A SimpleORM session is created based on an ordinary java.sql.Connection (or java.sqlx.DataSource).  A new transaction is then started.

session.mustFind(Employee.EMPLOYEE, SFOR_UPDATE, "123-45-6789")  finds a Employee object based on the primary key. If the object had been previously retrieved in the same transaction a pointer to it is returned, otherwise a JDBC query is issued to retrieve the row. If "123-45-6789" does not exist in the database then a meaningful exception is thrown whose message includes the unfound primary key. 

find(...) returns null if the row was not found, create(...) creates a new record, while findOrCreate(...) can be used to insert a new row using exactly the same techniques as used to update a record.   SFOR_UPDATE adds a locking clause to the SQL for MVCC databases, but it is not strictly necessary as optimistic locking is always used.  If there is a second column in the primary key then its value is simply provided as the fourth parameter – no need to create any special JPA/Hibernate-like key object.

The NAME and PHONE_NR fields are then retrieved from the record, the first using the generalized syntax and the second using the get method that had been defined.  The PHONE_NR, is then set to a new value. The generalized design makes it easy for SimpleORM to mark the field and record as dirty when the PHONE_NR is set.  SimpleORM never needs to compare 200 records at commit time just to find the 2 that have changed.

The employee's DEPARTMENT record is then retrieved from either the cache or database by findReference.  Note that the actual scalar foreign key value Employee.DEPT_ID is also available and can be accessed without the need to retrieve the department record.

An oldEmployee is then flagged for deletion before the transaction is finally committed. As part of this process SimpleORM flushes all pending dirty objects to the database. Thus there is no need to manually track which records need to be updated. There is also no risk that separate parts of the same application will accidentally override changes to different in memory copies of the same database record.

Instance Queries

SimpleORM queries are defined as a SQuery object that can be executed in a SSession to produce a List of  records.  For example

Department d100q = session.findOrCreate(Department.DEPARTMENT, "100"); 
SQuery<Employee> query = new SQuery(Employee.EMPLOYEE)
 .eq(Employee.DEPARTMENT, d100q) // and
 .like(Employee.NAME, "%One%")
 .descending(Employee.SALARY);
List<Employee> emps = session.query(query);

The query object contains details of the query, and is independent of the session.  It can then be executed to produce a list of records.  By default the cache is flushed (but not committed) before the query is executed so that query results take into account any changed values.

Note that fields are refered to using the constants defined as part of the SRecord definitions.  So we have Employee.NAME rather than the string "NAME".  The .eq also refers directly to Department, and SimpleORM determines that the key to Department is DEPT_ID ( it could also be a multi column key).  

Records can be joined in to retrieve parents along with children and so avoid the "N+1" problem.  In the following example all Employees are retrieved along with their departments in a single query.  (If the number of parents is substantially less than the number of children then it maybe faster to simply fetch the parents lazily using findReference().)  Recursive joins such as Employees to Managers are also supported with aliases.

SQuery<Employee> joinQ1 = new SQuery<Employee>(Employee.EMPLOYEE)
  .leftJoin(Employee.DEPARTMENT)

Not all columns need to be retrieved by a query.  Both SQuery and findOrCreate() calls can take a list of fields to retrieve, or a SSelectMode which defines which types of fields to retrieve.   In the Employee example above RESUME is flagged as SUNQUERIED so it is not retrieved by default, unless SSelectMode.ALL is specified.  Likewise, if SSelectMode.DESCRIPTIVE is specified then only fields marked as DESCRIPTIVE are returned, ie. NAME in the example.  Thus one never has to retrieve all 100 columns of an Employee Table just to obtain the Employee's Name.   And any attempt to access a field that has not been queried produces fail fast exceptions.

SQuery.rawPredicate(query, parameters...) can be used to append an arbitrary predicate to the WHERE clause.   SimpleORM also provides convenience methods for performing ad hoc business intelligence JDBC queries that do not produce normal objects are a result.  For example:-

 List<Map> sals = ses.rawQueryMaps(
   "SELECT E.DEPT_ID, SUM(E.SALARY) as \"SUMSAL\" FROM XX_EMPLOYEE E " +
   " WHERE E.DEPT_ID IS NOT NULL" +
   " GROUP BY E.DEPT_ID ORDER BY E.DEPT_ID");
 for (Map sal: sals) {
  ...sal.get("SUMSAL")...}  

Transient Queries

Instance Queries creatd by SQuery above return normal SRecordInstances such as Employee and Department.  They are identified by a primary key, and stored in the DataSet.  If two queries would return the same row of a table then only one instance is created and added to the DataSet.  

SimpleORM also alows Transient records to be returned by ad hoc queries that do not have a primary key and thus cannot be added to a DataSet.  They are mainly used to implement aggregations, although arbitrary queries can be specified.  For example

SQuery<Department> leftQ = new SQuery(Department.DEPARTMENT).as("dept")
  .innerJoin("dept", Employee.DEPARTMENT).as("emp");
SQueryTransient aggQry = new SQueryTransient(leftQ)
  .sum("emp", Employee.SALARY).as("sum_sal")
  .groupBy("dept", Department.NAME).as("dname")
  .ascending("dname");
List<SRecordTransient> result = ses.queryTransient(aggQry);
... result.get(0).getDouble("sum_sal") ...

Internally, SRecordTransients are implemented as a Map of query aliases to values,whereas SRecordInstances are implemented more efficiently as arrays indexed by SFieldMeta.index.   However, both SRecordInstance and SRecordTransient extend  SRecordGeneric which does implement the Map interface.  This means that the field values can be accessed directly from tools like JSP without the need to write explicit get and set methods.  

The following is an extract from extras/Department.jsp that utilizes the Map feature to access the DEPT_ID field directly even though no Department.getDEPT_ID method has been defined:

   SQueryResult<Department> res  = ses.query(new SQuery(Department.DEPARTMENT));
   pageContext.setAttribute("Depts", res);
   ...
  <c:forEach items="${Depts}" var="dept">
    Dept: <c:out value="${dept.DEPT_ID}"/>: <c:out value="${dept.DNAME}>/><br>
  </c:forEach>

SimpleORM Transactions

The following shows the general flow:-

 SSessionJdbc session0 = SSessionJdbc.open(myConnection, "MyInformativeLoggingLabel"); 
 ...
 SSessionJdbc session = SSessionJdbc.getThreadLocalSession();
 session.begin();
 ... SSessionJdbc session = SSessionJdbc.getThreadLocalSession(); ...
 session.commit();
 session.begin(); ... session.commit();
 session.close();

All SimpleORM database manipulations occur within a SSessionJdbc object, which encapsulates an ordinary JDBC Connection.  When sessions are opened they are automatically associated with the current thread.   This means that they can be easily retrieved using getThreadLocalSession without the need to pass the session object around explicitly.   It also means that unclosed sessions left over from previous uses of the thread can be detected and fail fast exceptions thrown.  Normally a given session can only be accessed by one thread, otherwise exceptions are thrown (detachFromThread can override this.)  The session description is displayed in logging messages and so can be helpful in multi threaded application.  An optional third parameter can override the default SimpleORM database driver which is normally defaulted from the URL in the Connection.

SimpleORM guarantees that each session will only contain one record from with a given primary key (and the same SRecordMeta).  Thus if a record were to be retrieved several times by different parts of an application within the same session then the same SRecordInstance object will be returned each time.  (If the a record with the same key is retrieved in different session then different SRecordInstance objects will be returned and normal database and optimistic locking will maintain database integrity.)

Records are normally flushed from the session to the database when a transaction commits.  Records are also normally flushed before queries are executed to ensure that their results represent the current state of the database.  Records can also be flushed manually without committing the transaction.   They are flushed in the order that they were first made dirty, which provides a simple and predictable update order.  Fail fast exceptions are raised if this would create a referential integrity violation.

Records may also be purged from a DataSet to force a re query should the database be updated using raw JDBC calls or stored procedures.  Raw JDBC can be much more efficient than any ORM for certain types of operations. For example a single SQL DELETE statement that deletes all transaction details over 12 months old is much more efficient than having to read each of the transaction details into the JVM and then delete them individually.

SimpleORM retrieves values from records in the same way as ordinary JDBC.  Multiple get* methods retrieve values in different ways. getString returns the value as a String, getInt returns an int or 0 if the column is null.  isNull can then be used to determine whether the column is actually null.  Note that the actual column type is somewhat independent of the get method used, eg. getString can be used on an integer column -- it will just return the value as a string.  Like basic JDBC, getObject provides the most general access and can be used for any type, including Blobs, Clobs, Arrays and any new ones that may be added in the future.

Internally each field type is processed by its own driver which handles conversions etc. This makes it easy to add new types. Existing types include TimeStamp and BigDecimal.  There are several SFieldBoolean* types depending on how one wants to represent a boolean, eg. 'Y'/'N', 'T'/'F', 1/0 (SQL does not have an intrinsic boolean type).

SimpleORM can be used to automatically generate keys using a number of different approaches depending what the DataBase supports.  For example, the following code causes a INVOICE_NR field to be created.

 public static final SFieldInteger INVOICE_NR = new SFieldInteger(INVOICE, "INVOICE_NR", SFieldFlags.PRIMARY_KEY)
 .setGeneratorMode(SSEQUENCE, "invoice_seq");
 ...
 Invoice inv1 = session.createWithGeneratedKey(Invoice.INVOICE);

Generating sequence numbers is difficult due to record locking problems.  Sequence mode uses a database specific sequence object to do this, which is named invoice_seq in the example.  Other methods are provided, including a generic SELECT_MAX  method that will work on any database albeit with dubious locking semantics.  The identities are normally generated when the record is first created by SimpleORM which avoids problems with the identities of unsaved pseudo objects.

(While generated "surrogate" keys can be very useful, we would suggest that they be avoided where solid natural keys are available.  Excessive use of hidden surrogate keys complicates the design, makes it hard to do ad hoc queries, and are somewhat less efficient.  If secondary unique indexes are not declared appropriately then surrogate keys can also enable invalid data to be stored.  Solid application keys such as Product Numbers do not change.  This is because they are distributed across many databases, including suppliers, customers and regulators.  Updating one database is the least of the issues involved if they were to change.)

Record Locking and Isolation

Proper record locking and transaction isolation is essential to maintain database integrity in a multi transaction production environment.  But it is also very complex and has subtly but importantly different semantics in each type of database.  SimpleORM avoids mush of this complexity by using optimistic locks that work largely independently of individual database locking behaviors.  

When a record would be updated in the database an optimistic locking strategy simply checks that each record has not in fact been changed since it was queried.  This can be performed efficiently by adding redundant terms to the WHERE clause of  UPDATE and DELETE statements.   If the value had been changed by a different transaction then an exception is thrown.

These exceptions can be avoided by explicitly locking the record when it is queried using the SQueryMode.SFOR_UPDATE. This adds FOR UPDATE clauses on many databases, but the optimistic lock checks are still performed for safety.  SFOR_UPDATE reduces broken optimistic lock exceptions, but can increase deadlock exceptions.  It should be used on records that are likely to be updated, or whose current value is critical for an update, but not for general queries.

SimpleORM always sets AutoCommit Off, but it does not change the database's default isolation level (see session.setIsolationLevel).   But one should strongly consider raising it to TRANSACTION_SERIALIZABLE to avoid potential integrity issues.  In particular if a transaction is summing a value over a number of detail records to store the total in a master record, then the detail records should be retrieved SFOR_UPDATE and the transaction should be serializable to avoid the details being changed while the master is being updated.  The performance and concurrency issues with serializable transactions are often greatly overstated -- start safe and then tune for performance based on actual measurements.

To understand the effect of SFOR_UPDATE one needs to understand the advanced Multi Version Concurrency Control (MVCC) now provided by most databases.  These include Oracle, PostgreSQL, DaffodilDB, MS SQL after 2005 ("Snapshots"), but not Cloudscape/Derby, DB2.  We describe this below because much of what has been written about locking and isolation levels is very misleading because it implicitly assumes a conventional, non MVCC locking mechanisms.

MVCC means that queries normally retrieve values at the time that the transaction started, not their current values.  Consider the following sequence:-

Step

Transaction  A

Transaction  B

 

1

BEGIN; SELECT something...

 

 

2

 

BEGIN

 

3

 

SELECT BALANCE INTO :BAL FROM LEDGER WHERE ID=123

returns 1000

4

 

UPDATE LEDGER SET BALANCE = BAL + 100 WHERE ID=123

set to 1100

5

SELECT BALANCE INTO :BAL FROM LEDGER WHERE ID=123

 

MVCC returns 1000, not 1100

6

 

COMMIT

 

7

UPDATE LEDGER SET BALANCE = BAL + 200 WHERE ID=123

 

Set to 1200, NOT 1300.

8

COMMIT

 

Database corrupted.


In a conventional locking system the query in step 5 would normally wait until transaction B commits, and then return 1100.  However,  an MVCC system would immediately return 1000 in step 5  because that was the value of the balance at the beginning of the transaction in step 1.  So MVCC provides a consistent snapshot of the data, which is very good for reporting.  However, for transaction processing it can lead to lost and inconsistent data as the update from Transaction B is clobbered by the update in Transaction A.

The precise behavior is database dependent.  Oracle will detect the anomaly if and only if it is set to the strictest Serializable transaction level, otherwise it will be silently ignored unless optimistic locks are also used.  Many authors dangerously advise against Serialization mode.  (Note that this use of Serialization has nothing to do with "Phantom" records.)

SimpleORMs optimistic locking would change the query in step 7 to  UPDATE LEDGER SET BALANCE = BAL + 200 WHERE ID=123 AND BALANCE = 1000.  All databases seem to apply the WHERE condition to the current value that is in the database (unlike the SELECT in step 5) so the update will fail because by step 7 the balance is 1100.  This failure can be detected by SimpleORM and an exception thrown.  (It is also possible to add special TimeStamps or update number columns to implement optimistic locking, but this is generally unnecessary with SimpleORM because it does not use pseudo POJOs and therefor always knows the value of each column when it was retrieved.)

The exception can be prevented by adding a FOR UPDATE-like clause to SELECT statements.   This produces conventional locking behavior in which Transaction A waits at step 5 for Transaction B to commit.  (The lock is also normally an exclusive lock, not a shared read lock.)   However, if too many FOR UPDATE locks are held then concurrency will be severely reduced and excessive deadlocks will arise.  (There will always be some exceptions in a highly concurrent database system, applications have to be programmed to cope with them.)

Locking is very important not because locking problems cause many errors, but rather because they cause very few errors. Badly constructed transactions that do not lock data properly will work perfectly when unit tested. They will also work most of the time in production. But every so often records will be corrupted in ways that are impossible to reproduce and thus extremely difficult to fix. And worse, the problems usually only arise after the project becomes heavily used.

SimpleORM DataSets

SimpleORM actually consists of two distinct modules, namely the DataSet module that actually stores the SRecordInsances and the SSessionJdbc module that moves data between the DataSet and the JDBC accessed database. 

A DataSet is simply a collection of heterogeneous SRecordInstances and their meta data.  Records can be added, removed or updated within a dataset. SimpleORM SRecordInstances are only ever stored within an SDataSet.  DataSets are independent of any database manipulation.

By default, each SSessionJdbc has a default SDataSet that is created each time a transaction is begun.  The DataSet stores the "cache" of records.   It is normally destroyed when the transaction is committed or rolled back.  When the DataSet is destroyed, so are all the records within it and any attempt to access them will result in a fail fast exception (unlike pseudo POJO based ORMs).

It is also possible to detach the DataSet from a transaction, serialize it, move it to a different JVM, manipulate it without access to the database, and then reattach it to a different transaction.  The DataSet automatically tracks the records that have changed and the previous values of fields used for optimistic locking when  the record is reattached.  This is much easier and more correct than creating special Data Transfer Objects (DTOs).  

For Example:-

session.begin();

Department dept400a = ses.mustFind(Department.DEPARTMENT, "400");
Employee emp200e = ses.mustFind(Employee.EMPLOYEE, "200"); ...
Department dept500a = ses.mustFind(Department.DEPARTMENT, "500");

SDataSet ds = ses.commitAndDetachDataSet();
ses.close();

// DataSet ds now contains the three records.
// ds has been detached from database, can be moved far away.
 
Department d400d = ds.find(Department.DEPARTMENT, "400");
d400d.setDouble(d400d.BUDGET, 50001);
e200d.setReference(e200d.DEPARTMENT, d400d);
Department d500d = ds.find(Department.DEPARTMENT, "500");
d500d.deleteRecord(); // Deletion will actually happen upon reattachment.
 
Department d600 = ds.create(Department.DEPARTMENT, "600");...
Department d700 = ds.createWithNullKey(Department.DEPARTMENT); // Can set the key after attaching
 
// Reattach the dataset to the database. session.begin(ds);
d700.setString(d700.DEPT_ID, "700"); // Must set key before commit (or be generated).session.commit();


There are several semantic issues that are resolved by having a well defined concept of DataSet.  For example, records can only reference other records within the same DataSet, which must be detached as a whole.  This resolves the issues of dangling references to non-detached records.  It also clarifies the meaning of the many end of associations, which will be discussed below.  (Microsoft ADO uses a similar approach.)

DataSets also provides a clean architecture for SimpleORM internally.  The two modules are packaged into two jars, namely simpleorm-dataset.jar and simpleorm-sessionJdbc.jar.  The DataSet module contains the record and field meta data, plus a list of all the record instances.   The sessionJdbc module is responsible for communicating with generating SQL and communicating with the JDBC driver.   There is no dependency from DataSet to sessionJdbc, so other session modules could be developed such as one for XML.  simpleorm-dataset.jar may also be used on its own for unit testing without a database.  

(ADO.NET uses a similar architecture, and shares its benefits for remoting etc.  But it does not have SimpleORM's SFieldMeta objects, using raws strings to access data instead. The APIs are also a bit more clumsy, IMHO.)

Associations Between Records

SimpleORM stores relations between records in SFieldReference objects that point directly to the referenced record and also track the underlying scalar fields that make up the foreign key.  

By default the scalar field is assumed to have the same name as the primary key column(s) of the referenced record.  So in the following example the EMPLOYEE.DEPARTMENT foreign key uses the DEPT_ID string field because the primary key of DEPARTMENT is named "DEPT_ID".

 public static final SFieldString DEPT_ID = new SFieldString(EMPLOYEE, "DEPT_ID", 10);
 static final SFieldReference<Department> DEPARTMENT
 = new SFieldReference(EMPLOYEE, Department.DEPARTMENT, "DEPT");


findReference can then be used to retrieve the record lazily from either the database or the DataSet if it had already been retrieved.  (Finding a record in the DataSet is just a pointer dereference and so very fast.)

 Department department = employee.findReference(employee.DEPARTMENT);


The normal way to find all Employees in a Department (say) is to simply query the database

   ...new SQuery(Employee.EMPLOYEE).eq(Employee.DEPARTMENT, department)...


But it is also possible to query a dataset directly which is useful if the dataset is detached from the database.  For example

 List<Employee> emps = dataset.queryReferencing(department, Employee.DEPARTMENT)

retrieves all the Employees in department that are in the dataset.  So if just Employees with .gt(Employee.SALARY, 100000).ascending(Employee.NAME) had been loaded into the dataset then just those would be returned, and in the sorted order.  It does not attempt to retrieve all Employees in the department in a random order.  In this way datasets provide sound and useful semantics to associations.  queryReferencing can never throw a LazyInitializationException.

Many to many relationships handled in a similar way, but the intersection table needs to be explicitly referenced.  This is almost always required in practice anyway because there is almost always important information pertaining to the relationship itself.  For example many Students may be enrolled in many Subjects, but one probably wants to know the grade achieved for each Student for each Subject.  Likewise many Players may play for many Teams, but one probably wants to record the position of each Player in each Team.

There is no attempt to cascade deletion of parents to their children.  This avoids many thorny problems.  So it is the programmer's responsibility to explicitly delete any dependent rows. We also recommend against declaring DELETE CASCADE at the database level as this will confuse SimpleORM (and most other tools).

It should be noted that trying to model associations as in POJOs has major problems because each end of the association normally needs to be updated as distinct statements:-

  sales.getEmployees().add(fred);
  fred.setDepartment(sales);

Some pseudo POJO ORMs such as Hibernate actually attempt to map both these statements into the one EMPLOYEE.DEPARTMENT column.  But in Java there is no need for these two statements to be consistent, one could write:-

  sales.getEmployees().add(fred);
  fred.setDepartment(engineering);

The semantics of the above statements is not well defined.  The essential problem is that POJOs make you have to update the department.employees index manually, where a database does it automatically.

(In Hibernate the semantics seems to depend upon the order in which they each happen to update the database.  One can also specify inverse="true" which means just ignore one end of the association. The excellent Hibernate In Action book is somewhat apologetic on this point,  p106.)

Flexible Record Definitions

SimpleORM's generalized record definitions allow plug in modules to extend the schema of a core application with extra information that they need.  For example, a security plug in could include code such as the following.



static final SFieldString PUBLIC_KEY ;
@Override void extendApplication() {
  PUBLIC_KEY = new SFieldString(Employee.EMPLOYEE, "PUBLIC_KEY", 200);
}
...
  ses.begin();
  Employee fred = ses.create(Employee.EMPLOYEE, "100");
  fred.setString(fred.NAME, "Fred");
  fred.setString(PUBLIC_KEY, "ae6d82f9a7e9c...");
  ses.commit();

The core application calls SecurityPlugIn.extendApplication at the appropriate time, and the PUBLIC_KEY field is added.  The plug in can then access the new field.  The new field will be stored in the database etc.  

The ability to extend a core application with plug ins is important for modern, flexible software.  But a plug in is of little value if it cannot store additional information in the database.  SalesForce.com use this technique extensively in there extension architecture.  This type of extensibility cannot be achieved with rigid pseudo POJOs.

Another example of the generalized structures is the following code which creates 13 MONTH_nn columns:-


 SFieldBigDecimal MONTHLY_TOTALS = new SFieldBigDecimal[13];
static {
for (int mx=0; mx<SFieldBigDecimal.length; mx++)
MONTHLY_TOTALS[mx] = new SFieldBigDecimal(LEDGER, "MONTH_" + mx);
}
...
total = myRecord.getBigDecimal(MonthlyTotals[3]);

There is nothing special about this code, it just creates SFieldMeta objects and references them using an array instead of individual constants.  Simple.

Basic subtypes and inheritance can be easily implemented using SimpleORM's generalized structure as demonstrated in examples.SubTypeTest.  But more complex mapping between Java (sub)classes and SQL tables are not supported.  There are many subtle and confusing issues that arise with more complex mappings.  And  simple relational Views can provide much more powerful mappings than any ORM should that really be required.

Business Rule Framework

SimpleORM provides events that allow business rules to be added to database operations.   In particular SRecordInstance.onValidateField is called each time a field is set to a value, SRecordInstance.onValidateRecord whenever a record is about to be flushed to the database, and SRecordInstance.onQueryRecord when a record is read from the database.  These can simply be overridden to provide extra validation or other functionality.

The following example checks that an employee's salary does not exceed the department's maximum salary.  (It needs to be on the record flush rather than the field to ensure that both DEPARTMENT and SALARY are valid.)

static public class Validated extends SRecordInstance {...
  @Override public void onValidateRecord() {
    double sal = getDouble(SALARY);
    Department dept = findReference(Validated.DEPARTMENT);
    double max = dept.getDouble(Department.MAX_SALARY);
    if (sal > max) throw new SException.Validation(
        "Salary " + sal + " is greater than the Departments Maximum " + max, max);
  }

It is also possible to add validators to individual fields.

  public static final SFieldString THIS_THAT = new SFieldString(RECORD, "THIS_THAT", 20)
   .addValidator(new SValidatorEnumeratedValues("THIS", "THAT"));

If SException.Validation are thrown then SimpleORM adds extra information to the exception such as the record instance and the new field value.  These can be trapped at a higher level, or simply reported directly to the end user.

This allows data oriented business rules to be associated with the data. No matter what modules use Employee, the salary rule will be enforced.   It is, of course, also possible to add arbitrary methods on the Employee class for business rules related to Employees that are explicitly called. 

The goal is to reduce the amount of indirection.  We do not recommend a DAO approach (although it is certainly possible).  Try, in general, to reduce the amount of indirection and code.  Just get the job done directly.

It also should be noted that this approach means that one only needs to add code for real business rules. The traditional approach of writing getter and setter methods means that one needs to write two dummy methods for every field in an object just to allow for the possibility of adding a rule later. This framework makes that unnecessary because the generalized get* and set* methods are used in the same way regardless of whether there is a validation method is actually written for the field. In practice this substantially reduces the amount of code that needs to be read.

Configuration

There isn't any.    At least nothing beyond occasionally setting a property on a class.

The SimpleORM connection is derived from an ordinary java.sql.Connection or java.sqlx.DataSource object which would normally be obtained from connection pool software.  If the number of transactions is not large it can be simpler and safer to just use the old java.sql.DriverManager to create a non-pooled connections (examples/A0TinyExample.java has a few lines of code that demonstrates how to do this).  But SimpleORM itself does not care how the Connection has been created.  The user simply writes a small method such as TestUte.initializeTest to provide it.

SLog.setLogClass can be used to provide more complex logging, the provided SLogSlf4j interface is recommended.  

SimpleORM itself is built with Ant.  build.xml is very simple, and the default task compiles SimpleORM and runs all the tests on the provided Hsqldb database.   Examples are classes with a main method, and there is one TestAll class that simply calls the others so it is easy to run directly from IDEs.   The SDataLoader is used to load test data (and the generalized design makes this easy to implement and use).

There should be a single source of truth for the database table definitions. If most of the development is to be done with SimpleORM then this source of truth should be the SimpleORM Record definitions.  SRecordMeta.createTableSQL returns a string that contains the CREATE TABLE statement. It is also possible to augment and override the basic SimpleORM definitions with database specific physical clauses. This makes it easy to generates a complete .sql script that recreates the database.

There is also a separate SimpleORMGenerate package that can generate SimpleORM definitions from an existing database schema.

An Entity-Relationship tool can be used to view the resulting database, but tight integration remains future work. Existing tools do not seem to have the flexibility to add SimpleORM specific annotations. UML tools tend to be too low level for database modeling.

If you wish to temporarily extend SimpleORM itself then note that most local variables and methods have package scope rather than private scope, and the .jar file is not sealed. Thus rather than changing the actual source, you can usually just poke extra classes into the SimpleORM.core package and then access everything with static methods. This is obviously not supported, and you should make sure that your changes are folded back into the main source.

Conclusion

SimpleORM provides a simple yet powerful  persistence framework without the need for complex configuration, reflection, parsing, byte code processing etc.   It is full featured, yet only about 5% of the size of comparable frameworks such as Hibernate.  This makes it easy to understand and work with and greatly reduces technical risk.

SimpleORM represents objects as a mapping from field objects to their values.  This avoids many of the complexities of attempting to represent them as pseudo POJOs.  It also provides solid handles to fields and meta data.  If get and set methods are added then they can also look exactly like POJOs to other classes should that be a requirement.

The objects are packaged into DataSets which provides sound semantics for objects detached from a connection.  It also ensures that associations are properly defined and that parents and children are always consistent.  Internally, DataSets are independent of the database access layer, which facilitates other interfaces.

The explicit use of meta data makes it easy to create meta data driven user interfaces such as SimpleWebApp.  It also enables very flexible architectures to be constructed in which a core package's data schema can be cleanly extended by plug ins.

The superior pilot uses his superior judgment to avoid the need for his superior skills. -- Aviation adage that also applies to software.


"Simple. Flexible. Open Source. Doesn't require me to do a crash course in yet-another-application-server to get practical ORM. Doesn't require me to go on a search for tools to help me code due to excessive implementation complexity (you want how many classes for one ejb?). XML is great, but XML is not a programming language. Yay for less XML. I like that it allows the use of specific SQL when that is the better option. I like the performance, especially Query Random and Query Field. I like the adaptability, I can use/put it anywhere."
Luke