Informatics Matters Overview

Workflows infrastructure

Fragalysis stack to kubernetes

Fragalysis stack CI/CD

Fragment network to kubernetes

Fragment network CI/CD

Squonk to galaxy link

Squonk to fragalysis link

Scoring containerised

Finalise docking workflow

Scoring tool in galaxy

Workflow in galaxy


Share time spreadsheets (TD)
Create github repo for fragalysis documentation (RS)
Send Rachael a list of all github repos (& context) (TD)

Major tasks:

  • Fragalysis stack → kubernetes prototype

    • Backend neo4j

    • Deploy JupyterHub (ORN version)

    • Switch to postgres

    • Fragalysis backend to Fragment Network by API

  • Fragment network → kubernetes DONE

  • Squonk → kubernetes DONE

  • Galaxy → kubernetes planning

  • Migrating Verne to K8s Planning

  • Fragalysis stack CI/CD PENDING

  • Fragment network CI/CD PENDING

  • Openshift K8s HOLD


  • Early March for K8S

  • Don’t need CI/CD for demo on kubernetes – should we hold off for 2/3 months? (Travis vs. Jenkins)

  • Testing of documentation (Janssen)

  • Process for AWS exists, but openstack doesn’t (for DLS) – 1 week after start (as long as networking, accessibility etc. work)

  • Need to discuss Galaxy:

    • use helm charts instead of playbooks?

  • Fragalysis questions: Mechanisms for running the data loaders?

  • Janssen: Scott will have a better idea of wether it will be applicable outside of the Fragalysis scope/use (still investigate)

  • Pin down date for starting deployment

  • Long TC for deployment on AWS

  • M2M/Janssen can deploy - can run playbook, entering version that deploys stack to cluster



Follow-up on technology stack questions with Janssen (TD)
Second technology stack meeting - arrange (RS) [week after next]
Send containers to Janssen so they can run through tests (TD) - keep sending
Link Janssen up with Björn for Galaxy deployment
Documentation Repo (for deployment) (RS)
Dates for STFC for deployment (openstack) (TD/AC)

Major tasks:

  • Squonk to galaxy link unknown

  • Fragalysis to squonk link unknown


Review timeline/roadmap

Major tasks:

  1. Scoring method containerised DONE

  2. Finalise docking workflow in progress

  3. Insert Bayesian optimisation discovery

  4. Implement scoring into Galaxy tool unknown

  5. Docking+scoring workflow in galaxy unknown

  6. Improve container HOLD


  • Currently putting full workflow together before it goes into squonk/galaxy

  • Orchestration between layers needs thought

  • Options should be configurable (e.g. number of poses to dock)


Chat to Galaxy guys to include GPU node and configure to target it (TD)
Check current docking workflow (RS) – arrange f2f @SGC meeting for Friday afternoon (RS)
Scope out enumeration step – should we add as galaxy tool? (TD)
Scores and poses into SDF file as output of scoring step (TD)

Major tasks:

  1. Improve fragmentation algorithm in progress

  2. Define data-building workflow in progress

  3. Implement API HOLD

  4. Add compounds into network via. API IN progress


  • Relational database - being used to index molecules (all datasets that are processed) – it’s where the molecules are stored before the graph network is created

  • Upper limit (guess) of adding molecules directly into network is ~10,000