Overview


Motivation

While crafting Health IT systems we understand an importance of a properly chosen domain model. FHIR is an open source new generation lightweight standard for health data interoperability, which (we hope) could be used as a foundation for Health IT systems. FHIR is based on a concept of resource.

FHIR® is a next generation standards framework created by HL7. FHIR combines the best features of HL7 Version 2, Version 3 and CDA® product lines while leveraging the latest web standards and applying a tight focus on implementability.

Also we learned that data is a heart of any information system, and should be reliably managed. PostgreSQL is a battle proved open source database which supports structured documents (jsonb) while preserving ACID guaranties and richness of SQL query language.

PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness.

Here is the list of PostgreSQL features that we use:

We actively collaborate with PostgreSQL lead developers to craft production ready storage for FHIR.

Why are we doing this inside a database?

We decided to implement most of FHIR specification inside a database for scalability reason (all data operations are done efficiently in a database).

This approach also gives you a possibility to use FHIRbase from your preferred lang/platform (.NET, java, ruby, nodejs etc). We have implemented FHIR compliant server in clojure with small amount of code - FHIRPlace.

And there is an option to break FHIR specification abstraction (if required) and go into the database by generic SQL interface and complete your business task.

Features

FHIRbase implements 80% of FHIR specification inside the database as procedures:

Roadmap

License

Copyright © 2014 health samurai.

FHIRbase is released under the terms of the MIT License.


Performance


FHIRBase Performance

Generally, FHIRBase performance mainly depends on PostgreSQL configuration (TODO: explain configuration options). User should expect that FHIRBase is a bit slower in CRUD operations than general-purpose PostgreSQL setup because it stores additional meta-data for each resource and also uses Stored Procedures instead of vanilla SQL statements.

Resources stored in FHIRBase will consume a bit more disk space than it’s raw JSON representation - about 1.3 of it’s original size. For example, if you store 1 million of resources (1KB each), such table will consume 1.3GB, not 1GB as you may expect. Additional indexes for fast searches will consume additional space depending on type of index.

Searching, one of most important operations, is optimized very well, thanks to PostgreSQL GiN and GiST indices. It performs well on rather complex searches with sorting and chained params.

Following table shows detailed timing which we get on m3.large Amazon EC2 Instance Type https://aws.amazon.com/ec2/instance-types/#M3.

Operation Elapsed time (ms)
fhir_create_resource called just one time 14
fhir_create_resource called 1000 times in batch 102
fhir_read_resource called just one time 6
fhir_read_resource called 1000 times in batch 1073
disk usage right after generation of seed data 3
Updating single patient with fhir_update_resource 5
fhir_delete_resource called one time 3
fhir_delete_resource called 1000 times in batch 2360
searching for non-existent name without index 51268
building Patient.name index 31537
building Patient.gender index 21758
building Patient.address index 42335
building Patient.telecom index 39600
building Practitioner.name index 52
building Organization.name index 55
building Encounter.status index 16194
building Encounter.patient index 42495
building Encounter.participant index 26344
building Encounter.practitioner index 26377
building Patient.organization index 28969
building Patient.birthdate index 36464
searching for patient with unique name 5
searching for all Johns in database 44
searching Patient with name=John&gender=female&_count=100 (should have no matches at all) 217
searching Patient with name=John&gender=male&_count=100 137
searching Patient with name=John&gender=male&active=true&address=YALUMBA&_count=100 7
searching Patient with name=John&gender=male&_count=100&_sort=name 161
searching Patient with name=John&gender=male&_count=100&_sort=active 159
searching Encounter with patient:Patient.name=John&_count=100&status=finished&practitioner:Practitioner.name=Alex 10
searching Encounter with patient:Patient.name=John&_count=100&patient:Patient.organization:Organization.name=Mollis 58831

Generating test data and running benchmarks on your machine

Fhirbase perf utility can be used on your machine to generate test data:

path/to/fhirbase/perf/perf --verbose=3 \
                           --number-of-patients=1000000 \
                           --pgdatabase=your_db_name \
                           --pghost=locahost \
                           --pgpassword=your_password \
                           --pgport=5432 \
                           --pguser=your_user_name

This command will generate 1 million of Patients, 1.3 million of Encounters, 200 Practitioners and 400 Organizations, and will cross-link them with ResourceReferences. Usually it takes from 5 to 20 minutes to generate such amount of data.

Now you can run benchmark suite:

echo "SET plv8.start_proc = 'plv8_init'; SELECT SELECT fhir_benchmark('{}'::json);" \
     | psql postgres://your_user_name:your_password@localhost:5432/your_db_name

This operation can take some time, be patient.


Development


Development Guide

Quick installation on ubuntu:

sudo apt-get install -qqy postgresql-9.4 postgresql-contrib-9.4 curl python
sudo su postgres -c 'createuser -s <you-local-user>'
export PGUSER=<you-local-user>
export DB=test

git clone https://github.com/fhirbase/fhirbase
cd fhirbase
./runme integrate

Project structure

Code is split into two categories - schema changing code (migrations in schema/ directory) and reloadable functions (in src/).

To reduce sql boilerplate and modularize application we use simple SQL preprocessor written in python (in ql/ directory).

runme

All tasks for fhirbase could be done using runme script in form ~~~bash env DB=test ./runme ~~~

Here is the list of subcomands:

Pre-processor & Dependency management

PostgreSQL has one-level namespaces - schemas. We use schemas to implement code modules, this allow create and reload modules as a whole. Every file in src/ is mapped to schema with same name as file. For example functions defined in src/fhir.sql will be in fhir schema.

Having modules allows us introduce dependencies between modules. They implemented as magic comments in source files and using loader (./runme).

For example you have created module a.sql with function util. Then you want to create module b.sql, which depends on module a:

-- file src/b.sql
-- #import ./a.sql

...
  b.util()
...

Then you can load module b into databse:

./runme load src/a.sql

Loader read #import instructions and resolve dependencies recursively, then load all modules in right order and write module status into special table modules. Loader also calculate sha1 of module file content and reload module only if hash changed (caching). If you run load command again without changes in files - it will do nothing.

To reduce sql boilerplate you can use some macro expansions in source files.

func and func!

you can write sql functions in python style, ie body of function should be idented:

-- FILE: src/coll.sql
func _butlast(_ar_ anyarray) returns anyarray
  SELECT _ar_[array_lower(_ar_,1) : array_upper(_ar_,1) - 1]

Preprocessor will produce:

-- FILE: src/coll.sql
drop schema if exists coll cascade;
create schema coll;
CREATE OR REPLACE
function coll._butlast(_ar_ anyarray) RETURNS anyarray
LANGUAGE sql AS $$
  SELECT _ar_[array_lower(_ar_,1) : array_upper(_ar_,1) - 1] -- coll:3
$$ IMMUTABLE;

You can always inspect result of preprocessor by running ./runme compile src/file.sql.

Here is list of macro expansions:

Migrations

To allow update existing fhirbase installation to next version, we track schema changes using migrations approach - all schema changes are in schema/ directory and enumerated using timestamps. When you run /.runme migrate all pending migrations will be applied.

Tests

There are some preprocessor sugar for tests:

-- one-line test
select 1 => 1

---will be compiled into:

SELECT tests.expect('', 'test/<file>.sql:<line-number>',(select 1),(1));

Here is also multiline form (body should be idented):

expect "Comment"
  select 1
=> 1

-- compiled

SELECT tests.expect('Comment', 'test/<file>.sql:<line-number>',(select 1),(1));

To test for exeptions there is special syntax:

expect_raise 'does not exist'
  SELECT crud.delete('{}'::jsonb, 'Patient', 'nonexisting')

--- compiled

SELECT tests.expect_raise('does not exist', 'test/fhir_crud_spec.sql:151',
($SQL$
  SELECT crud.delete('{}'::jsonb, 'Patient', 'nonexisting') -- fhir_crud_spec:151
$SQL$));

Source code of expect functions is in src/tests.sql.