martes, 23 de enero de 2018

Having fun with Snomed expression constraints (and learning something in the meantime)

This article wants to be a fun introduction to the Snomed expression constraint language in order to show its capabilities. This article assumes no prior knowledge of the expression constraint language, so it will start with a little introduction to it (if you already know about the Snomed expression constraint language should be safe to jump to point 2). In this post both IHTSDO Snomed browser and VeraTech SNQuery will be used.

Snomed Expression Constraint Language basics

In this section a few operators from the Snomed Expression Constraint Language will be explained. For a complete explanation of the Snomed Expression Constraint Language visit the official documentation.

Simple expression constraints

The following simple operators already provide great functionality for querying Snomed hierarchy:

Descendant of: The constraint is satisfied by all the transitive descendants of a given Snomed concept. This is denoted by the operator 'less than' (<). For example, the expression
< 64572001 | Disease (disorder) | provides about 74k concepts which includes concepts such as Anemia, Hematoma, or Inflammatory fibroid polyps of stomach

Descendant or self: Similar to 'descendants of' operator, the operator Descendant or self (denoted by two 'less than'  symbols) is satisfied by all the transitive descendants of a given Snomed concept plus the concept itself. For example, << 11466000 |Cesarean section (procedure)| includes both the descendants of cesarean section and the cesarean section term itself.

There are more simple operators, but probably these two are the most used by far.

Refinements

A refinement in a Snomed expression allows the filtering the resulting set using one or more attribute constraints.

One of the great things about Snomed is that terms themselves can be defined by refining existing terms (see Snomed compositional grammar). E.g. Hepatitis A (40468003) is a disease (64572001) found (363698007) at the liver structure (10200004) with a inflammation (23583003) morphology (116676008) caused by (246075003) Hepatitis A virus (32452004). Note that these expressions contain both clinical terms such as hepatitis A or disease, but also attributes such as associated morphology and finding site, which have their own Snomed codes. These attributes can be used to refine defined sets.

Attribute refinement restricts the meaning set of clinical meanings to those satisfying the refinement condition. Similarly to the Snomed compositional grammar, a 'colon' (':') is used in the expression.

As an example, pulmonar diseases could be defined as <64572001 |Disease (disorder)| : 363698007 |Finding site| = 39607008 |Lung structure (body structure)|

Note that these attributes have a "direction": the above expression returns all the diseases whose finding site is the lung structure. There will be times where we want to select the target term of a relationship and constraint the source. We can achieve this by using the 'reverse' operator ('R'). E.g with <64572001 |Disease (disorder)| : 246075003 |Causative agent| = 49872002 |Virus (organism)|  we can obtain all the diseases caused by viruses, and by reversing the attribute with an expression such as <49872002 |Virus (organism)| : R 246075003 |Causative agent| = <64572001 |Disease (disorder)| the subset of the viruses that cause diseases can be obtained.

There are more ways to refine an Snomed expression, but with these basic ones we can start 'playing' with Snomed.

Having Fun

With these operators and refinements in mind, we can start navigating Snomed without a deep knowledge of the underlying Snomed conceptual model, i.e. what attributes are valid in each hierarchy.

Finding what to look for

We defined an application that needed a list of cancer diagnosis (coded in ICD-10), but also a location where these cancer were found. Could we use Snomed to provide us with a (tentative) set of terms to fill this field?

Even if you don't know Snomed conceptual model, you probably know examples of what you are looking for. I will use 'lung cancer' as an example.

Navigating Snomed

Searching the term in the Snomed browser allows us to dig the different terms that make up the term meaning. 'Lung cancer' is a synonym of 'Primary malignant neoplasm of lung (disorder)', with term
9388000.

In the concept details we can examine the Expression' tab and then look at 'Expression from Stated Concept Definition'. That expression precisely defines the Snomed term from other existing Snomed terms. In this case we want to know which attributes are valid, in our case 'disorders'. By that definition, lung cancer (93880001) is a disease (64572001) found (363698007) at the lung structure (39607008) with a malignant neoplasm (86049000) morphology (116676008). We can generalize that expression by navigating Snomed hierarchy. For example, we could ask for all the primary cancers that have a finding site in any body structure <64572001 |Disease (disorder)| :{363698007 |Finding site| = <123037004 | Body structure (body structure) |,  116676008 |Associated morphology| = 86049000 |Malignant neoplasm, primary (morphologic abnormality)|} which results in a subset of ~600 terms.
An alternative is to look for all primary, secondary, or other cancers with a finding site in any body structure <363346000 |Malignant neoplastic disease (disorder)| :363698007 |Finding site| = <123037004 | Body structure (body structure) | which contains ~3700 terms.

In addition to give us this subset list, SNQuery allows us to simplify the expressions in order to reduce expression processing time. These simplified queries return the same terms (same subset) but contain more precise codes that ease the expression or makes it more precise and clearer. As an example, the expression <64572001 |Disease (disorder)| :{363698007 |Finding site| = <123037004 | Body structure (body structure) |,  116676008 |Associated morphology| = 86049000 |Malignant neoplasm, primary (morphologic abnormality)|} can be simplified as  <372087000 |Primary malignant neoplasm (disorder)|:{363698007 |Finding site| = <123037004 |Body structure (body structure)|, 116676008 |Associated morphology| = 86049000 |Malignant neoplasm, primary (morphologic abnormality)|}. First expression needs more than 5 seconds to be processed, while the second expression is about 150 milliseconds

Going in reverse

Once we have found a suitable expression we can reverse it to get the desired results. In our case, instead of looking for 'all primary, secondary, or other cancers with a finding site in any body structure' we will reverse it to express 'all body structures thar are finding sites in primary, secondary, or other cancers'. This subset can be expressed as <91723000 |Anatomical structure (body structure)|: R 363698007 |Finding site (attribute)| =<363346000 |Malignant neoplastic disease (disorder)| and contains ~880 terms

We could use this subset list as a first approach to populate our user interface and add Snomed codes into the mix. If we have a ICD code we could potentially use the official Snomed mapping and use it to validate the other fields (in this case, diagnosis with their location).