Modularization
Refactor parts of ID3C which are generic from parts which are specific to the Seattle Flu Study, making a core component which is extended by customizations and plugins. This is a standard design approach and will probably be easier to do sooner than later.
The common core (epiphyla/id3c?) would contain:
- base CLI
- base web API
- ETL framework
- sqitch project with warehouse schema and some of receiving (FHIR, RDML, sequence read sets, genomes, etc)
Extension packages (seattleflu/id3c?) would contain:
-
custom ETL routines, loaded as command plugins (via setuptools entrypoints) and using id3c Python libraries
-
custom database schema (receiving, shipping) as a sqitch project with cross-project dependencies (these are supported!)
-
code to handle the “edges”, the places where the core meets the outside world
Motivation
-
Replace fauna and power the next generation of Nextstrain work.
-
Be a re-usable data system like Augur is a re-usable bioinformatics toolkit and Auspice is a re-usable visualization tool.
Challenges
-
Adds complexity during development with an additional repo to touch.
-
After doing the work, there will be a one-time cutover/flag day where we go through the incantations to make it so in production. This will involve things like making use of sqitch’s log-only deploys to pick up new names for already-deployed changes.