SimpleORM White PaperSimple Java Object Relational Mapping |
Power without complexity. |
class Employee...
private static final SRecordMeta EMPLOYEE // the record
= new SRecordMeta(Employee.class, "EMPLOYEE_TABLE");
...
public static final SFieldString PHONE_NR // a field
= new SFieldString(EMPLOYEE, "PHONE_NR", 30);
public String getPhoneNr() { // Optional get method if you like them
meta.getString(PHONE_NR);
}
EMPLOYEE contains an object that represents the Employee
record (SQL Table), and PHONE_NR contains meta data
for a Phone Number field (SQL Column) that might be contained within
that record. Employee employee = findOrCreate("MYEMP");
String phone = employee.getString(Employee.PHONE_NR);
String phone = employee.getPhoneNr(); // Alternative if get* defined.
newQuery().greaterThan(Employee.NAME, "'J'")instead of
newQuery().greaterThan("Employee.Name", "'J'")likewise
isNull(Employee.MANAGER_NR), isValid(Employee.MANAGER),
addListener(Employee.PHONE_NR,
listener) etc. There are many things that one needs
to do with persisted fields
other than get and set their values.session.load(MyClass.class, 13)
does not normally return a MyClass object, rather it returns a magic
byte coded subclass of MyClass to handle lazy loading. This can
cause nasty bugs if it
is processed by other reflecting software such as a web service
serializer.<sj:Input record="Employee" field="NAME">(The above is illustrative only. See SimpleWebApp for details)
makeInput(SFieldMeta theField).
One can simply write code like getString(theField)
without the need for reflection. One can even associate default
UI properties with the field objects.However, applications that do a significant amount of database interaction will want to encapsulate business rules and information about different database tables in Java classes. For example, there might be a Employee class that corresponds to an Employee table, and a Department class that corresponds to a Department table. Methods could be written to validate and process data in those classes.
An Object / Relational Mapping tool automates the mapping between these classes in memory and the database by providing methods to automatically select, insert, update and delete rows. They should also keep a table of primary keys to ensure that there is at most one object in memory for each row in the database, and they should automatically detect updates to the in memory classes and flush them to the database when transactions commit. A good tool will also handle foreign key relationships and provide a measure of database independence.
These are important services, so any substantial Java database oriented application would definitely benefit from a good O/R tool.
Note that in SimpleORM a database has "tables" and "columns", while a persisted class is a "record" which has "fields". A non persisted class is just a "class" which has "instance variables".
public class Employee extends SRecordInstance { // ie. a class mapped to a table.
private static final SRecordMeta meta
= new SRecordMeta(Employee.class, "XX_EMPLOYEE");
// SRecordMeta objects describe SRecordInstances
public static final SFieldString EMPEE_ID
= new SFieldString(meta, "EMPEE_ID", 20, SFD_PRIMARY_KEY);
public static final SFieldString NAME
= new SFieldString(meta, "NAME", 40, SFD_DESCRIPTIVE);
public static final SFieldString PHONE_NR
= new SFieldString(meta, "PHONE_NR", 30);
public static final SFieldReference DEPARTMENT
= new SFieldReference(meta, Department.meta);
public static final SFieldBoolean IMPORTANT
= new SFieldBooleanCharYN("IMPORTANT");
public SRecordMeta getMeta() { return meta };
// (Abstract SRecordInstance method)
public static Employee findOrCreate(String empeeId) { // Convenience
return (Employee)meta.findOrCreate(empeeId);
// Completely optional get/set methods.
String getPhoneNumber() {
return getString(PHONE_NR);
}
String setPhoneNumber(String value) {
setString(PHONE_NR, value); }
}
The Employee record is mapped to the XX_EMPLOYEE
table. It has a primary key EMPEE_ID, a NAME
and a PONE_NR
field, and a direct reference to the Department
record. All these
definitions simply define static constants which are threaded
together via the meta variable. The meta
object is tied to the Employee.class using the getMeta
method.
The following code then manipulates this data.
SConnection.attach(myJDBCConnection, null);
SConnection.begin();
...
Employee employee =
(Employee)Employee.meta.mustFind("123-45-6789");
String name = employee.getString(Employee.Name);
String phone = employee.getPhoneNr(); // if getter defined
...
employee.setPhoneNr("(123) 456 7890");
...
Department department =
(Department)employee.getReference(employee.DEPARTMENT);
...
oldEmployee.delete();
...
SConnection.commit();
SConnection.detachAndClose();
A SimpleORM connection is created based on an existing JDBC Connection, and associates it with the current thread. SimpleORM does not care how the JDBCConnection was created, normally one would use a connection pool such as the one supplied with Tomcat. A new transaction is then started.
<>mustFind(keys) lazily finds a employee
object based on the
primary key. If the object had been previously retrieved in the same
transaction a pointer to it is returned, otherwise a JDBC query is
issued to retrieve the row. If "123-45-6789" does
not exist
in the database then a meaningful exception is thrown whose message
includes the unfound primary key. find(keys) returns
null if the row was not found, create(keys) creates a new
record, while findOrCreate(keys) can be used to insert a
new row using exactly the same
techniques as used to update a record.The NAME and PHONE_NR fields
are
then retrieved from the record, the first using the generalized syntax
and the second using the get method that had been defined. The PHONE_NR,
is then set to a new value. The generalized design makes it easy for
SimpleORM to mark the field and record as dirty when the PHONE_NR
is set. One never needs to compare 100
records at commit time just to find the 10 that have changed.
The employee's DEPARTMENT record is then retrieved
from either
the cache or database. Note that unlike other O/R mapping systems this
is normally performed lazily so that the Department row
is only retrieved from the database if it is needed.
An oldEmployee is then flagged for deletion
before the
transaction is finally committed. As part of this process SimpleORM
flushes all pending dirty objects to the database. Thus there is no
need to manually track which records need to be updated. There is
also no risk that separate parts of the same application will
accidentally override changes to different in memory copies of the
same database record.
The generalized design also makes it easy to integrate sophisticated UI layers
with the ORM. See SimpleWebApp.
Note that as no reflection is used one can organize the record
definitions in more sophisticated ways. For example, one could
create 13 MONTH_nn fields (ie. columns) using the
following code:-
SFieldBigDecimal MONTHLY_TOTALS = new
SFieldBigDecimal[13];
static {
for (int mx=0; mx<SFieldBigDecimal.length;
mx++)
MONTLY_TOTALS[mx] = new
SFieldBigDecimal(meta, "MONTH_" + mx);
}
...
total = myRecord.getBigDecimal(MonthlyTotals[3]);
findOrCreate(...) can can retrieve records based on
equality of their primary key(s). But a relational database
can also retrieve rows based on arbitrary boolean expressions.
Consider the following query which selects employees with Name starting
with a letter after "J" in myDept. The equivalent SQL is shown
after the query.SResultSet res = Employee.meta.newQuery()
.gt(Employee.NAME, "'J'")
// .and() implied. .or() etc. override.
.eq(Employee.DEPARTMENT, myDept)
.decending(Employee.NAME)
.execute();
// ps = Prepare("SELECT * FROM XX_EMPLOYEE
// WHERE NAME > 'J' AND DEPT_ID = ?
// ORDER BY NAME DESC"<br>
// ps.setString(1, myDept.DEPT_ID);
// res = ps.executeQuery();
while (res.hasNext()) {
Employee emp = res.getRecord();...
}
// ArrayList recs = res.getArrayList(100000); // Alternative
Note that the Department is queried directly, and SimpleORM
determines that the key to Department is DEPT_ID ( it
could also be a multi column key). The records are normally
retrieved individually using a normal hasNext() loop, but
they can also be retrieved as a set using getArrayList
which takes a parameter that limits the number of rows retrieved (else
an exception is thrown).
It is also possible to use direct SQL SimpleORM tries to form a very thin layer on top of JDBC. Thus queries are performed using ordinary SQL queries, with SimpleORM providing default Select and From clauses. The general flow is very similar to a JDBC query. For example:-
SPreparedStatement stmt = Employee.meta.select(
"NAME > 'j' AND DEPT_ID = ?", "NAME");
// SELECT * FROM XX_EMPLOYEE
// WHERE NAME > 'j' AND DEPT_ID = ? ORDER BY NAME
stmt.setInt(1, myDept.getInt(DEPT_ID));
SResultSet res = stmt.execute();
while (res.hasNext()) {
Employee emp = res.getRecord();
... emp.getString(Employee.NAME) ...
}
By default all SFields in a
record are retrieved unless they have the SFD_UNQUERIED
property set.
This is used for large, rarely used fields.
However, if the SQY_UNQUERIED flag is specified
on the query then unqueried fields also be retrieved. Naming fields
such as DEPARTMENT.NAME can be identified by
having their
SFD_DESCRIPTIVE property set in which case
specifying the
SQY_DESCRIPTIVE flag causes just those
descriptive fields to
be
retrieved. Alternatively, one can specify an explicit list of fields
to retrieve for each query. Fields that are not queried are
marked as invalid, and attempts to access them produce fail fast
exceptions.
Thus one never has to
retrieve all 100
columns of an Employee Table just to obtain the Employee's
Name. And any attempt to access a field that has not been
queried produces fail fast exceptions. (All these options also apply
to foreign key getRecord(), findOrCreate(),
etc.)
No attempt has been made to concoct some new query language that is then transformed into SQL in ill defined ways. Such new query languages lack key features such as sort order or leave important details poorly defined such as whether joins are inner or outer. They also may reflect a lack of understanding about the nature of query languages. For example, EJB QL tries to navigate a database with a mysterious IN operator in queries like
SELECT p FROM Player p, IN(p.team) AS t WHERE t.city = ?1rather than the more obvious
SELECT p FROM Player p WHERE p.team.city = ?1
In particular, if two separate queries in the same transaction would both return the same row in the database, then SimpleORM ensures that there is only one record in memory that corresponds to that row. The second query simply returns a pointer to the record created by the first query. SimpleORM then automatically updates the row in the database when the transaction commits. This prevents changes to a row made by one business rule from accidentally overwriting changes to the same row made by other business rules in the same transaction.
These records are stored in the main "Transaction Cache" which is
simply the set of records read during a transaction. It is not a
separate structure, but just a hash map of each
SRecordInstance's primary key and a pointer to
the record's
object. A separate Transaction Cache (ie. hash map) is created for
each transaction, and data is never shared between two thread's
transaction caches. Thus two threads on the same JVM behave
identically to two threads on different JVMs, and this is achieved
without the need for any fancy SimpleORM distributed locking
mechanism. The database is used to do all the locking, and so
non-SimpleORM and importantly even non-Java applications can co-exist
with SimpleORM.
Changes to records in memory are not normally flushed to the database until a transaction commits. The flushing operation performs inserts, updates and deletes in the order that records were first made dirty. Records are made dirty when a field is set to a new value or they are deleted.
Maintaining the dirty flushing order generally avoids putting the
database
temporarily into an inconsistent state and so violate database
referential integrity constraints. We strongly suggest that SQL
FOREIGN KEY constraints be declared for all
foreign key
relationships and SimpleORM supports this. If the default order is
not optimal for an unusual transaction records can be manually flushed
or
markDirty()ed. Occassionally it might even be necessary
to flush a record multiple times if mutually referencing foreign keys
are used. (The main potential source of trouble arises
if parents and children are inserted in the same transaction. Simply
make sure that the parents are created first -- eg. insert
Departments before Employees, just like you would with JDBC.)
When a record is delete()ed, it is destroyed
at the time
that
the delete() is specified. Thus one cannot access deleted records,
even though the actual database deletion will not occur until the
transaction is flushed. A findOrCreate() for the
same key as
a just deleted record will return the destroyed record, and its fields
cannot be referenced. Deleting a newly created record is allowed and
does nothing, with no SQL update being issued at flush time.
One important and usually undocumented problem with most O/R systems is that records are cached in memory yet queries are made against the database. For example suppose a transaction increases Employee Fred's salary from $45,000 to $55,000, and then within the same transaction makes a query for employee's with SALARY < $50,000. Would Fred be included in the query's result?
SimpleORM guarantees that queries made viafindOrCreate
and
look ups via getReference are always made
on the cached values. This means that they always reflect the current
state of the transaction in memory. However, like most other
O/R tools queries performed via result sets are always made on the
database, and so may not reflect the current cached values. This
enables the full power of SQL to be used without having to implement a
special in-memory query engine. As a result, Fred would
be
included in the result set. SimpleORM does make sure that retrieved
records are consistent with the cache, so Fred's existing record would
be returned and yet it would have the current salary of $55,000.
In rare cases where this is a problem SimpleORM enables dirty rows to
be
explicitly flushed to the database without having to commit the
transaction. DELETE statement that deletes all transaction
details over 12
months old is much more efficient than having to read each of the
transaction details into the JVM and then delete them
individually.Having unique, immutable primary keys are fundamental to any
ORM.
It is the keys that are used to link specific records in memory to
specific rows in a database. They are also used to access records
in
the transaction cache. Very occasionally it might be useful to
have a
transaction table that has no primary key, but such a table will have
to be updated directly with JDBC, not an ORM.
SimpleORM links records together using both a direct pointer and the
key values. So the following line from the Employee
definition actually defines towo SFieldMeta
objects. The first contains the reference
field that links to the Department record, and a
reference to it is stored in the DEPARTMENT
constant. There is a second slightly hidden scalar field that contains the
actualy DEPARTMENT_NR field because that is the primary
key of the Department record.
public static final SFieldReference DEPARTMENT
= new SFieldReference(meta, Department.meta);
When the following line is executed, the department row is read from
the database if it has not already been read into the current
transaction.
Department department =
(Department)employee.getReference(employee.DEPARTMENT);
employee.IsNull(employee.DEPARTMENT) means that there
is no Department
associated with this Employee. It does Not mean that the
Department has simply not been loaded into memory. It is slightly
different from just sayng department == null because it
does not acually need to retrieve the Department, it just needs to see
whether the scalar DEPARTMENT_NR field is null.Use getReferenceNoQuery(employee.DEPARTMENT) if you just want to see
whether the reference has been loaded, useful for Detached records
(below). DEPARTMENT.foreignKeyField(0) will return
the actual scalar foreign key.
Note that there are many different ways to access a field in
SimpleORM other than just getting and setting its value. Note how
how
the generalized design makes this easy and natural to do. Records
are not POJOs.
SimpleORM fully supports multi column primary keys. So if the Department
table actually had keys (CompanyNr, DepartementName) then
both these scalar fields would be automatically included in the
Employee record. Contrary to popular
belief this introduces little overhead as long as indexes are declared
properly.
Further, SimpleORM supports multi column Identifying foreign
keys. In other words, references to one table can be used as part of
the key to another. This means that references may cause foreign keys
to be propogated recursively. For example, if the key to
PaySlip is (Employee, Period), and the key
to Period is (Year, PeriodNr) then the
actual primary key columns for PaySlip would be
(EMPEE_ID, YEAR, PERIOD_NR). A PayslipDetail could thus
have the key (EMPEE_ID, YEAR, PERIOD_NR, DETAIL_TYPE).
paySlipDetail = findOrCreate(new Object[]{myPaySlip, myDetialType}) PaySlip would have a key PaySlipNumber
and extra non-key columns EmployeeNumber, PeriodNumber.
Likewise PaySlipDetail would have key PaySlipDetailNumber
and non key columns (PaySlipNumber, DetailType). We
recommend against this practice because:-PaySlipNumbers instead of externally defined
and permanent EmployeeNrs and Periods.SelectMax
method be
used for the INVOICE_NR. One then uses
createWithGeneratedKey instead of findOrCreate
to
create the records.
public static final SFieldInteger INVOICE_NR =
new SFieldInteger(meta, "INVOICE_NR",
SFD_PRIMARY_KEY,
SGENERATED_KEY.pvalue(new
SGeneratorSelectMax(meta)));
SELECT
MAX(MY_KEY)+1...".
However while effective it is limited because it will create locking
problems in a multi user environment. Other alternatives tend to
be complex and/or database dependent. They include:-UPDATE EMPLOYEE E SET DEPARTMENT = 'SALES' WHERE E.ID='FRED'then employee
FRED is automatically an implicilty in SELECT * FROM EMPLOYEES WHERE DEPARTMENT='SALES'In particualr, there is no separate structure
DEPARTMENT.EMPLOYEES. If you want to find the
employees in a
department you just query the database. The database would
normally provide an index on EMPLOYEE.DEPARTMENT,
and that index
is updated automatically when you update the EMPLOYEE
record.sales.getEmployees().add(fred);Some transparent ORMs such as Hibernate actually attempt to map both these statements into the one
fred.setDepartment(sales);
EMPLOYEE.DEPARTMENT
column. But in Java there is no need for these two statements to
be consistent, one could write:-sales.getEmployees().add(fred);The semantics of the above statements is not well defined. The problem is that Java makes you have to update the
fred.setDepartment(engineering);
department.employees index manually, where a database does it
automatically.inverse="true"
which means just ignore one end of the association. The excellent
Hibernate In Action book is
somewhat appologetic on this point,
p106.) setDepartment
automatically update the getEmployees collection and make
the getEmployees.add
method private. In normal Java this requires copied code or
reflection. But it could be done quite easily using SimpleORMs
generalized structures (without the need for any fancy
containers). To retrieve all the Employees whose Department
is sales one would then writeemps = sales.getChildren(Employee.DEPARTMENT); // not provided
SConnection.Begin()This would retrieve Tania but not Annie. So should Annie be included in
Departement sales = ...
SResultSet result = Employee.meta.newQuery()
.gt(Employee.NAME, "'J'")
.eq(Employee.DEPARTMENT, sales)
.execute();
sales.getChildren? Really we need two
methods, getAllChildren and getRetrievedChildren.
If we want to show the second page of a list of employees with names
after 'J', then getRetrievedChildren
is more useful. But its semantics are vague to say the least --
what if some unrelated query in the same transaction happened to
retrieve Annie, should she be included? And what if Tania where
to change her name to "Betty"? And what order should the
collection be sorted by? getAllChildren has
fairly well defined semantics but is often less useful particularly as
a sort order is almost always required in practice.ArrayList recs = result.getArrayList(100000);The user has complete control, the semantics are clear, and there is nothing to stop the ArrayList from being stored in an Employee instance variable. SimpleORM does not attempt to track the many end of relationships. In practice this works very well and avoids many possible misunderstandings and bugs. It also largely avoids the "n+1" select problem.
There is no attempt to cascade deletion of parents to their
children. This avoids many thorny problems. So
it is the programmer's responsibility to explicitly delete any
dependent rows. We also recommend against declaring DELETE
CASCADE at the database level as this will confuse
SimpleORM (and
most other tools).
A SimpleORM connection is simply wrapped around an existing JDBC
connection. SimpleORM does not concern itself with JDBC drivers,
connection pooling, or other related issues. This makes SimpleORM easy
to integrate into existing frameworks and application servers. (You
would normally write a small wrapper method that obtains the
connection from the pool and then calls
SConnection.attach -- see the examples.)
There are slightly different SimpleORM drivers for each database
that account for different locking strategies, performance trade offs
and support for JDBC 2. SDatabase.connect(Connection con)
can
invoke the correct driver using con.getMetaData().
All
drivers extend SGenericDriver which should work
on any
vaguely standard SQL database.
The connection is associated with the current thread. This avoids the need to specify the connection object for every SimpleORM method call. More importantly it makes it very difficult to directly access the same transaction via two different threads at the same time which would be very dangerous. It also makes it easy to trap connection pool problems caused by connections that are never closed which can otherwise be difficult to find.
Many JDBC programmers use far too many transactions. As a rule, a single database transactions should be used for one user interaction event such as submitting a web page. This is partially because committing a transaction is several times more expensive than updating a row, but also because it is important to see and maintain a consistent view of the database in a multi user environment. Associating the transaction with the current thread encourages proper transaction design by avoiding the need to pass a JDBC connection paramater.
AWT/Swing is essentially single threaded, which is good, but what
is surprising is that the thread it uses is not the main thread.
Swing/AWT programs often use the main thread to set up the forms, and
then exit, and then the second AWT thread starts to do the real work.
In other words a simple single threaded application pointlessly using
two threads, one after the other! (.Net does not make this
mistake.)
SimpleORM works well with Swing but it is best to execute all
SimpleORM operations in the same thread. This can be easily achieved
using invokeAndWait() to set up the forms the AWT thread.
The main thead is hardly used at all. An example of this cleaner
approach has been provided.
SimpleORM can be used with J2EE EJBs and JTA. SConnectionEJB
is a subclass of SConnection that associates itself with the current
JTA object rather than the current thread. A Synchronization
object then tracks the JTA commit/rollback methods.
However, it should be noted that CMT is often misused, and can result in many unecessary transactions. Even the experts that wrote the Weblogic 7.0 basic examples did this. This has a serious performance problem, but more importantly it can produce unreproducable incorrect results. Further, two phase commit algorithms can alway produce nasty "in doubt" transactions. Thus the use of CMT in these examples is not an endorsement of the use of CMT.
SELECT
FOR UPDATE
statement. This will wait until no other transaction has a lock
on the row, and then will lock the row. So this is the default
for SimpleORM which is normally used to update values. However it
is also possible to specify a SQY_UNLOCKED flag
which causes records to be selected without the FOR UPDATE
clause and so not obtaining
any locks. This should be used with caution. SELECT and then X=3 for the SELECT FOR UPDATE
in the same transaction. A subsequent SELECT in the
same transaction will surprisingly return X=3. But COMMIT
and then SELECT will return X=2, which is consistent if
you think about it. PostgreSQL tends to throw an exception if a SELECT
FOR UDATE would be different from the SELECT.
Exceptions are easy to diagnose, corrupted data is much harder.
Be conservative and careful.)A third alternative is to use the SQY_OPTIMISTIC
flag
which leaves the record unlocked but remembers the column values when
the record was read. When the record is updated SimpleORM checks
that no other transaction has changed the record. Thus an
optimistic "lock" does not actually lock the
record at all, it is just optimistic that no other transaction will
update it,
but if it does it throws an exception to prevent data corruption.
It is up to the user to deal with this exception.
Optimistic locks are implemented with virtually no overhead by
adding updated colunms redundantly to the WHERE
clause of the UPDATE statement -- no additional
query is
required. Only the primary key fields and fields that were
actually updated to
a different value are checked, so in practice this
provides column level locking. The old values of the fields are stored
directly in the rich non-POJO record structure.
Some databases do not allow certain data types to appear in
where clauses. For example, Oracle does not allow "long"
varchar fields that are over 2000 chars to be used. Such fields can
be flagged SOPTIMISTIC_UNCHECKED in which case
they will not
used in the WHERE
clause. But be sure to manually check for
concurrent updates of such fields. It is also possible to add
special timestamp or update number columns to implement optimistic
locking, but this is generally unnecessary with SimpleORM.
d.detach() and e.detach() but not m.detach().
If e is serialized
then
d but not m will also be serialized. In the
unattached state
e.Department will be available, but e.Manager
would not normally be available. Should e be
reattached later, then e.Manager can be retrieved into
the cache. See
LongTransactionTest for details.getReferenceWhileDetached
to fetch e.Manager,
possibly accross multiple tiers. So one does not have to decide
exactly which records to detach before commiting the first transaction.)The following fragment shows how long transactions are managed by detatching selected records from the current transaction. Such records are not marked as invalid when the transaction commits, and so they can then be associated with the session state. If a modified web page is eventually submitted, the long lived records can be attached to the current transaction. Its fields can then be updated from the web form's input.
SConnection.begin();
Employee employee1 = Employee.meta.findOrCreate("123-45-6789");
employee1.detach(); // Must not be dirty.
SConnection.commit();
...
<input value="<%= employee1.getString(Employee.PHONE_NR)%>">
session.putAttribute("employeeRecord", employee1);
// Could also send to another tier
...
Employee employee2 = session.getAttribute("employeeRecord");
employee2.setString(Employee.PHONE_NR, request.getAttribute("PhoneNr");
...
SConnection.begin();
employee2.attach();
SConnection.commit(); // Checks optimistic locks for employee2
Following the way SimpleORM handles associations, we leave it to the
user to maintain a list of detached records as appropriate. This
may involve adding ArrayList to parent records. There is no
automatic and potentially surprising transitive de/ataching. In
practice this adds little code but provides much greater clarity to the
real semantics.
A record must not be dirty when it is detached, and optimistic locking is automatically enabled. So if the record was modified and is then reattached to a second transaction, SimpleORM checks that no other session has modified the record between the two transactions.
Detatched SimpleORM records are Serializable so that they can be sent to a different tier using RMI etc. When a record is serialized, references to other records are also serialized provided that they are also detatched. If the record is reattached later then the references are also reattached. And getReference can then automatically find any referenced records that had not been detatched and so were not serialized.
Long transactions are oriented towards HTTP page at a time
updates. If more highly interactive Java applications are required
then it may be more appropriate to use short transactions but avoiding
exclusive locks. The first time a user modifies a record its lock can
be promoted to exclusive locks by using findOrCreate on the
existing record's key but without the UNLOCKED flag.
SimpleORM retrieves values from records in a very similar mechanism
to ordinary JDBC. Multiple get* methods
retrieve values in
different ways. getString returns the value as a
String,
getInt returns an int (not an Integer) or 0 if
the column
is null. isNullcan then be used to
determine whether the
column is actually null. Note that the actual column type is
somewhat independent of the get method used, eg. getString
can be used on an integer column -- it will just return the value as a
string. Like basic JDBC, getObject
provides the most general access
and can be used for any type, inlcuding Blobs, Clobs, Arrays and any
new ones that may be added in the future.
Internally each field type is processed by its own driver which
handles conversions etc. This makes it easy to add new types.
Existing types include TimeStamp and BigDecimal.
There are several SFieldBoolean*
types depending on how one wants to represent a boolean, eg. 'Y'/'N',
'T'/'F', 1/0 (SQL does not have an intrinsic boolean type).
[Enumerations and are not implemented yet. When Java 1.5 becomes more widely used we will use it.]
The SRecordInstance class provides a simple
framework for
adding business rules to database operations. Each method in the
framework is called after a specific event such as when a field
is set to a new value or when the record is about to be written
to the database. The SFieldMeta object is
passed as an
explicit parameter to methods that work on individual fields
which avoids the need for reflection and so makes it easy to
extend and modify the framework.
As discussed in the previous section most field level validations
are performed declaratively using SFieldMeta subtypes such as
SEnum. Internally those classes implement the
validate method which is called every time
SRecordInstance.set*(SFieldMeta, value) is
called. validate methods can also change the value stored in
the record object. For example, SEnum.validate() is passed
the string returned from the database but returns a SEnum
object which is actually set into the SRecordInstance.
Per record field level validation is performed by specializing
validateField which is called after SFieldMeta.validate().
Like SFieldMeta.validate() there is only one
method used for
all the fields, with the first parameter used to determine which
field is being updated. The following example validates three
fields:-
class Employee extends SRecordInstance {
...
protected void validateField(SFieldMeta field, Object newValue) {
if ( field == SALARY) {
Department dept = getReference(DEPARTMENT);
if (newValue > dept.getDouble(MAX_SALARY))
throw new SValidationException(
"Salary {0} is greater than the Department's Maximum {1}."
new Object[]{newValue, maxSalary});
}
if ( (field == BEGIN_DATE || field == END_DATE)
&& beginDate.after(endDate) )
...
}
}
An alternative mechanism [yet to be implemented] is to associate a
validator directly with a field. Thus the following is an
alternative more conventional (but also slightly longer) way to write
the above using an anonymouse inner class:-
class Employee extends SRecordInstance {
...
static {
SALARY.addValidator(new SFieldValidator() {
protected void validateAField(SRecordInstance record, SFieldMeta field, Object newValue) {
Department dept = getReference(DEPARTMENT);
if (newValue > dept.getDouble(MAX_SALARY))
throw new SValidationException(
"Salary {0} is greater than the Department's Maximum {1}."
new Object[]{newValue, maxSalary});
}
}
}
}
Note how easy it is to write meta level code due to the generic
design. No need for reflection.
This allows data oriented business rules to be associated with the data. No matter what modules use Employee, the salary rule will be enforced. Note the way that like most non-trivial validations, this one involved accessing data from a different record (Department). This is why it is important to have an record cache to minimize JDBC calls while ensuring that the retrieved values are current.
It is, of course, also possible to add arbitrary methods on the
Employee class for business rules related to Employees that are
explicitly called. The goal is to reduce the amount of
indirection. We do not recommend a DAO approach (although it is
certainly possible). Try, in general, to reduce the amount of
indirection and code. Just get the job done directly.
It also should be noted that this approach means that one only
needs to add code for real business rules. The traditional approach
of writing getter and setter methods means that one needs to write two
dummy methods for every field in an object just to allow for
the possibility of adding a rule later. This framework makes
that unnecessary because the generalized get*
and
set* methods are used in the same way regardless
of whether
there is a validate etc. method written for the
field. In
practice this substantially reduces the amount of code that needs to
be written.
The following events are (or will be) supported:-
| SRecordInstance.validateField( SFieldMeta field, Object newValue) |
The main field validation method, this is
specialized for records that need to perform validation. Throws an
SValidationException if not OK.
|
| SRecordInstance.validateRecord() |
Invoked just before a record is inserted or updated. Used for more complex validations that involve the relationships between several fields. Is normally called just before a record is flushed but may also be called directly. It is only called for dirty records, and should not change the values of fields. |
| SFieldMeta.validate( SRecordInstance instance, Object newValue) |
Invoked after every set*() before SRecordInstance.validate. Used by SFieldMeta subclasses such as SEnum to perform declarative style validation. Returns the new value to assign to the field. Throws (a subclass of) SException.SValidate if invalid. Returns the actual value returned by get*() |
| SFieldMeta.mapGetValue( SFieldMeta field, Object cacheValue) |
Called every time a value is retrieved from a record. Can be used to return a different value than actually stored in the SRecordMeta. (getRawObject() returns the value acutally stored.) |
| SSFieldMeta.mapSetValue( SFieldMeta field, Object cacheValue) |
Called every time a value is set in a record. Can be used to return a different value than actually stored in the SRecordMeta. |
| SRecordInstance.watchField( SFieldMeta field, Object newValue) |
Invoked after validateField(), but also just after a row is retrieved. May have side affects such as updating a user interface to reflect the changed values. (The distinction from valdiateField() becomes important when advanced sub-types are used.) |
| SRecordInstance.derivation( SFieldMeta field) |
A field may be declared to be derived in terms of other fields, in which case this method calculate's its value. The calculated value may be cached the first time it is calcualted. Can be used for lookups. |
| SRecordInstance.synchronizeRecord() |
Invoked just before validateRecord, and used to update fields as necessary before a record is written to the database. If enabled, then synchronizeRecord is invoked for all retrieved records regardless of whether they have been updated, and so can be used to synchronize a record with an external object -- see the JDO section. |
| SRecordInstance.preUpdate() | Analogous to watchField() -- ie. called just after the record is validated but before any DML. |
| SRecordInstance.doUpdate() | This replaces the method that normally does the actual update or delte to the database. It enables some other mechanism to be used to flush the row, for example to call a database procedure. |
| SRecordInstance.postUpdate() | Invoked just after an update or insert to the databse during a flush operation. |
| postQuery() | Invoked just after a record is returned from the database, before watchField(). |
A future version will extend the framework to provide a more
advanced sub-type mechanism. This will go beyond the basic Java
subclasses to enable objects to belong to more than one sub-type at a
time, and to enable an object to change its sub-type at run
time. These are important for persisted storage. For example, an
employee starts as a Intern but later change to being a
Manager. They may also become an Engineer and be
Disabled, all at the same time. That cannot be represented
directly in the very plain Java object model because a Java instance
may never change its class and may only belong to one class.
The generalized framework lends itself to doing this. One would write something like:-
class Contractor extends Employee {
SRecordMeta mgrMeta = new SSubRecord(meta, Contractor.class);
...
boolean isMember() {
EmployeeType et = (EmployeeType)this.getObject(EE_TYPE);
return et == EmployeeType.CONTRACTOR;
}
...
}
SimpleORM could then determine the type(s) of each record, and apply validations appropriately. However, the details of this functionality are yet to be determined.
SimpleORM requires a minimum of code and so it also requires minimal use of other tools and techniques to maintain that code. Certainly there is no need for generators such as XDoclet, nor any special compilation techniques or post processing.
SimpleORM itself can be built with Ant. But it minimizes the
reliance on Ant so as to be easy to use with IDEs. For example,
the test cases are classes with a main method, and there is one TestAll
case that simply calls the others.
There is also a separate SimpleORMGenerate package that can generate
SimpleORM definitions from an existing database schema.
Tight integration with and Entity-Relationship style tool remains future work. Existing tools do not seem to have the flexibility to add SimpleORM specific annotations. UML tools tend to be too low level for database modeling.
If you wish to temporarily extend SimpleORM itself then note that
most local variables and methods have package scope rather than
private scope, and the .jar file is not sealed. Thus rather than
changing the actual source, you can usually just poke extra classes
into the SimpleORM.core package and then access everything
with static methods. This is obviously not supported, and you should
make sure that your changes are folded back into the main source.
SJSharp class providing a few
compatiblity methods to ensure that it can also be called from C# and
VB.Net. (Ie. the same
code runs on Java and .Net). This means that non Java 1.1 code
needs to be factored out, which has turned out to be easy in practice.Internally all meta data properties about SFieldMetas
and SRecordMetas are
stored as SProperty values. This is a distinct
subsystem
that enables properties to be given default values, and to inherit
values from other properties and objects. In particular, it allows UI
fields to inherit default values for properties such as a Prompt from
a data base field. The UI modules will put this to very good use.
For example, the following would retrieve the default prompt meta
data that could be associated with the DEPT_ID field.
String deptIdLab = Department.DEPT_ID.getString(SUI_PROMPT)<>This produces more consistent user intefaces because the prompt need only be defined once, and this can be easily internationalized. More importantly, other structural information can be associated with records. For example, other properties can return a list of the descriptive fields to display in a lookup.
SimpleORM's generalized design makes it easy to write generic code without reflection along the following lines:-
void displayField(SRecordInstance rec, SFieldMeta theField) { ...
String thePrompt = theField.getString(SUI_PROMPT);
cout.println("<td>" + thePrompt + "</td>");
cout.println("<td><input value='" + rec.getFieldString(theField) "'></td>");
... }
get*/set*
methods -- it will mainly be completely automatic.The tests are based on the classic Department/Employee schema. Rows are inserted, queried and updated, both sequentially and randomly. The results are for PostgreSQL. (It would be interesting to see how they compare with other datbases but care is required, eg. HSQL has checkpointing overhead.)
The tests were done on a single PC with no network overheads. Each test is run in its own single transaction. All the benchmarks actually do something, eg. calculate a total salary and budget, so that one can have confidence that they are all actually working correctly.
The tests and times are are as follows. All times are quoted in terms of the number of milliseconds per iteration, with at least 1,000 iterations depending on the test. (The times vary about 10% between runs due to database buffer state etc.)
| Test | Description | Raw JDBC | SimpleORM |
| Insert | Insert Emp rows. | 2.4 | 2.5 |
| Query Sequential | Retrieve all the Employees and Departments sequentially. | 0.31 | 0.39 |
| Query Random | Query the Employees randomly. | 5.4 | 1.7 |
| Query Field | Queries five fields in one row, repeatedly. | 0.12 | 0.0050 |
| Update Random | Give random Employees pay rises. | 7.8 | 3.4 |
| Update Bulk | Update all rows in single Update statement. | 0.17 | 0.18 |
The first result is that SimpleORM adds no measurable overhead to using raw JDBC. The queries for the Insert and QuerySequential were very similar, and so were the results. The over head for building the SRecordInstant structure and adding it to the cache was a negligable 0.1 milliseconds/record.
The cache then made SimpleORM over 3.0 milliseconds per record faster than JDBC on the QueryRandom and UpdateRandom trials because the cache minimized expensive database hits. If a network had been used the reduction in database hits would have produced even more dramatic improvements. SimpleORM is also faster for repeated field access, which is significant because there are typically many fields per record.
The bulk update test updates many rows with one SQL statement. The times are the same because they are the same SQL statement. But note that this is an order of magnitude faster than using individual update statements for each row, even without the network overhead. Thus it is important that an ORM allow the buffer to be flushed and purged so that this can be done safely.
<>But probably the biggest performance issue is not tested by the benchmark, namely transaction management. JDBC programmers on large systems tend to loose track of the connection object across modules, and so create far too many transactions. EJB users often use CMT, which also tends to produce far too many transactions (eg. the Weblogic demos). By associating the connection with the current thread, but still giving explicit control over transactions, SimpleORM encourages transaction structures that are efficient and correct. This performance consideration can dwarf all othersA good object-relational mapping tool helps manage the interaction between objects in memory and data in a database. SimpleORM achieves this by providing a thin layer over JDBC. It provides methods to retrieve data into objects and flush changes back to the database. Objects in memory are indexed to ensures that each row in the database is represented by at most one object in memory.
SimpleORM provides Automated
persistence, but not Transparent
persistence. The generalized design makes it easy for it to
intercept get* and set*
methods to object
instance variables. This means that it can easily detect when objects
are changed, lazily retrieve related objects, and raise exceptions if
attempts are made to access stale or undefined data. This is all
achieved with a tiny fraction of the amount of user written code
required by other O/R systems. The small and simple design also
produces very good run time performance.
SimpleORM takes great care with persistent database semantics. The behaviour of the cache is well defined, and it is possible to both flush and purge the cache selectively. Records are locked appropriately, and any attempts to update unlocked data are detected. The caching strategy allows for access to a database from multiple JVMs and even from non-Java applications. And tracing and error handling are taken seriously.
Advanced features include a well defined mechanism for handling
inter-transaction updates as occur in typical web applications. Common
business rules can be specified declaratively, and more complex rules
can be specified a both a field and record level.
The ability to easily
build UI layers on SimpleORM offers the possibility of building very
efficient development tools.
And last but not least, SimpleORM is open source, ie. free. It can be integrated into projects without concern for licencing costs. Moreover, great care has been taken to keep the source code concise, transparent and documented so that it can be easily understood. There are no nasty tricks, not even any XML and hardly any reflection.
We believe that SimpleORM is a natural choice for most Object/Relational mapping requirements.
Just because it is possible to push twigs along the ground with ones nose does not necessarily mean that is the best way to collect firewood.
Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems. -- Jamie Zawinski / James Robertson