Module 3

Lecture

Lab

This markdown file will serve as the central guide throughout the tutorial. Participants can follow along as the guide will provide example code, references, example files as well as explanations of each step.

The tutorial is designed to instruct users of how to structure and validate data using JSON Schema. The aim is to highlight the benefits of a machine readable language for harmonizing data as well as other data management practices. While the JSON context is secondary, participants will benefit from recognizing and learning to work with JSON as it has been widely adopted across industries.

The tutorial is designed to mostly work locally on your computer, but several components will need an online connection. That said a few requirements will need to be installed by the user to fully benefit from the tutorial. These are highlighted in the requirements section.

We are conscious of the varying knowledge levels of workshop participants throughout. As such, participants are encouraged to move on ahead to different sections or help out others. Please refrain from asking questions regarding upcoming content until appropriate.

All code provided assumes you are working from a directory module3_tutorial

Due to time constraints not all sections of the tutorial will be showcased. Participants are encouraged to explore and check out the sections on their own time.

Sections covered:

    1. Introducing Json
    1. Validating Schemas
    • 4.2. Bash
    • 4.3. VScode
    1. JSON Syntax
    1. External Ontology validation
    • 7.1 Exploration
    • 7.2 Incorporating external ontology validation

1. Organization

Content will be organized into each sections or subsections, containing the following folders (when applicable):

Folder Significance
work This directory will serve as the work directory for participants to freely write their code in
test Any prepared files will be shared here.
happy_path When relevant, an example of the happy_path result will be provided to demonstrate full results of the lesson

2. Requirements

The following items are needed to fully benefit from the tutorial.

Required :

    • sudo apt-get install jq
    • pipx install check-jsonschema

Optional :

3. Introducing JSON

Working directory : module3_tutorial/3_json

JSON stands for JavaScript Object Notation. It serves as a format to store and handle data.

Important to note, the way you manage your data model and ETL (Extract, Transform, Load) pipelines will influence the tools you use and whether JSON is relevant.

That being said, we’ll be exploring how to validate JSON through schemas and many concepts will be cross applicable regardless of implementation.

For the purposes of this tutorial we will go over JSON because we can quickly prototype data validation.

A JSON file typically has the suffix .json e.g. example.json

The contents of a JSON file are usually wrapped by curly braces {...}

It can be as simple as:

{
    "key":"value"
}

Or nested such as so:

{
    "ParentA":{
        "propertyA_1":{...}
    },
    "ParentB":[
        "childB_1":{
            "propertyB_1_1":{"..."}
        },
        "childB_2":{
            "propertyB_1_2":{"..."}
        },
    ]
}

We’ll quickly investigate two uses of JSON. 1) For schema and 2) for a record/instance to see how similar but different they are.

3.1. Basic Schema

Working directory : module3_tutorial/3_json/work

Let’s investigate what a basic schema looks like. First we open the file 3_json/work/schema.json

We add the contents as follows:

{
    "title": "Person",
    "type": "object",
    "properties": {
      "firstName": {
        "type": "string",
        "description": "The person's first name."
      },
      "lastName": {
        "type": "string",
        "description": "The person's last name."
      },
      "age": {
        "description": "Age in years which must be equal to or greater than zero.",
        "type": "integer",
        "minimum": 0
      }
    }
}
3.1.1. Code Breakdown
"title": "Person"

title field declares the name of our schema. For those familiar with entity relationships, Person will serve as our central node and will be where all other nodes branch out from.

As an example comparison to dataframes, the title would be the name of the dataframe.

"type":"object"

This piece of code is declared multiple times throughout the schema. The values as per https://www.w3schools.com/js/js_json_datatypes.asp can be the following:

string
number
object
array
boolean
null

We declare our Person schema as an object because that enables us to add other fields and properties to our schema. Said properties cannot be represented by other data types.

Additionally because the object schema follows the notation {'key':'value'}, it allows us to add properties and give the fields and properties informative names.

"properties": {
    "name": {...},
    "age": {...}
    }

If title was the name of the dataframe, properties would be the column values of the dataframe. Within properties we can assign key value pairs corresponding to the data we like to collect/validate.

"firstName": {
        "type": "string",
        "description": "The person's first name."
      },

Our first example is firstName. We declare that the property firstName of a person must be string. We provide an optional and brief description to describe the field.

      "age": {
        "description": "Age in years which must be equal to or greater than zero.",
        "type": "integer",
        "minimum": 0
      }

Conversely in our second property age we declare it as integer with a minimum value of 0

3.2. Basic record/instance

Let’s construct a record to demonstrate the schema

Starting off with curly braces

{
}

Based on the schema we add the first property by declaring it firstName and giving it a corresponding value:

{
    "firstName" : "JoeExample"
}

Followed by our second property age

{
    "firstName" : "JoeExample",
    "age": 1
}

Let’s demonstrate how the schema works by flipping the values:

{
    "firstName" : 1,
    "age":"JoeExample"
}

Having the wrong values in our instance can be problematic.

Let’s step back and see how we can use our newly generated schema to validate our instance.

4. Validating Schemas

A core function and purpose of utilizing JSON format and JSON schema is data validation.
The JSON format provides a structured method of wrangling the data into a machine readable format.
The JSON Schema validates said data.
Both are customizable and can be authored to suit specific needs.

This function serves to aid data harmonization and ensures it meets the requirements structurally and semantically.

Three methods will be highlighted to demonstrate how to perform validation.

4.1. Python

For time purposes this section will not be covered live and will be optional

Working directory : module3_tutorial/4_validating_schemas/4_1_python/work

As a higher level programming language, Python is quick to mock up and useful for a variety of data parsing and data wrangling needs.

One can easily validate their schema through a few lines of code, as follows

cd /4_validating_schemas/4_1_python/
python test/validate.py 

or paste the following code into a python instance

import json
from jsonschema import validate

with open('test/schema.json', 'r') as f:
    schema = json.load(f)

with open('test/bad_example.json', 'r') as f:
    instance = json.load(f)

validate(instance=instance, schema=schema)
4.1.1. Breaking down the code
import json
from jsonschema import validate

Imports the libraries containing tools needed for python to wrangle JSON data. Json quickly ingests and converts our data into a dictionary and validate from jsonschema does the validation action.

with open('test/schema.json', 'r') as f:
    schema = json.load(f)

with open('test/bad_example.json', 'r') as f:
    instance = json.load(f)

Next, the instance and schema are ingested by opening the file and parsing them as JSON. This is done per schema and instance.

validate(instance=instance, schema=schema)

This performs the validation. If the instance is properly structured according to the schema, it will silently pass.

4.1.2. Understanding the error

Otherwise an error is reported. The following will produce this result:

Traceback (most recent call last):
  File "RDM_workshop/validate.py", line 11, in <module>
    validate(instance=instance, schema=schema)
  File "/opt/miniconda3/lib/python3.12/site-packages/jsonschema/validators.py", line 1332, in validate
    raise error
jsonschema.exceptions.ValidationError: 1 is not of type 'string'

Failed validating 'type' in schema['properties']['lastName']:
    {'type': 'string', 'description': "The person's last name."}

On instance['lastName']:
    1

Note the error produced jsonschema.exceptions.ValidationError: 1 is not of type 'string' indicating one of our fields does not conform to the schema. Specifically lastName as per line Failed validating 'type' in schema['properties']['lastName']

4.2. Bash

Working directory : module3_tutorial/4_validating_schemas/4_2_bash

Similar to python, we can perform validation in terminal through bash.

check-jsonschema is a powerful tool that validates several types of files including JSON. To validate, see below code example:

cd module3_tutorial/4_validating_schemas/4_2_bash
check-jsonschema --schemafile test/schema.json test/good_example.json

4.2.2. Code breakdown

check-jsonschema invokes the tool. Specifying the --schemafile identifies what the schema file should be.

The following is the happy path result :

ok -- validation done

4.2.3. Understanding the error

If we validated the bad example:

check-jsonschema --schemafile test/schema.json test/bad_example.json

The following is the bad path result :

Schema validation errors were encountered.
  test/bad_example.json::$.lastName: 1 is not of type 'string'

As noted in the error message the .lastName field value is 1 and does not match the type string

4.2.4. Validating multiple files

We can also utilize the command to validate multiple files:

check-jsonschema --schemafile test/schema.json test/good_example.json test/bad_example.json

and the result will indicate which of JSON files contains errors

test/bad_example.json::$.lastName: 1 is not of type 'string'

This is especially useful to parse multiple files

4.3. VScode

Working directory : module3_tutorial/4_validating_schemas/4_3_vscode/work

With the VScode plug-in, when we open 4_validating_schemas/4_3_vscode/test/bad_example.json, it dynamically applies the schema for us and highlights errors

{
  "$schema" : "./schema.json",
  "firstName": "John",
  "lastName": 1, Incorrect type. Expected "String".
  "age": 21
}

Removing line 2 aka the schema reference removes the flagged error:

{
  "firstName": "John",
  "lastName": 1,
  "age": 21
}

As the interpreter no longer has the context that lastName must be a string.

Alternatively, we fix the error:

{
  "$schema" : "./schema.json",
  "firstName": "John",
  "lastName": "Smith",
  "age": 21
}

This method is useful for dynamic coding such as crafting example datasets to sanity check your schema.

5. JSON Syntax

Now that we know how to validate our JSON objects, let’s explore more syntax to assist structuring our data. The examples we explored simply constrain datatypes, where the following will allow for conditional rules and more advanced structure.

5.1. Controlled List and arrays

Working directory : module3_tutorial/5_json_syntax/5_1_controlled_lists_and_arrays/work

Starting off with our previous schema:

{
    "$id": "https://example.com/person.schema.json",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Person",
    "type": "object",
    "properties": {
      "firstName": {
        "type": "string",
        "description": "The person's first name."
      },
      "lastName": {
        "type": "string",
        "description": "The person's last name."
      },
      "age": {
        "description": "Age in years which must be equal to or greater than zero.",
        "type": "integer",
        "minimum": 0
      }
    }
  }

Our example of :

{
    "$schema":"./schema.json",
    "firstName" : 1,
    "age": "JoeExample",
    "honorific":"Mr"
}

honorific is unflagged.

We can add the property :

"additionalProperties": false

to our schema such that it becomes:

{
    "$id": "https://example.com/person.schema.json",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Person",
    "type": "object",
    "additionalProperties": false,
    "properties": {
      "firstName": {
        "type": "string",
        "description": "The person's first name."
      },
      "lastName": {
        "type": "string",
        "description": "The person's last name."
      },
      "age": {
        "description": "Age in years which must be equal to or greater than zero.",
        "type": "integer",
        "minimum": 0
      }
    }
  }

This restricts data to only have properties defined by the schema.

{
    "$schema":"./schema.json",
    "firstName" : 1, Incorrect type. Expected "string"
    "age": "JoeExample", Incorrect type. Expected "integer"
    "honorific":"Mr" Property honorific not allowed
}

But we want to add a schema property for honorific.

{
    "$id": "https://example.com/person.schema.json",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Person",
    "type": "object",
    "additionalProperties": false,
    "properties": {
      "firstName": {
        "type": "string",
        "description": "The person's first name."
      },
      "lastName": {
        "type": "string",
        "description": "The person's last name."
      },
      "age": {
        "description": "Age in years which must be equal to or greater than zero.",
        "type": "integer",
        "minimum": 0
      },
      "honorific": {
        "type": "string",
        "description": " A title that conveys esteem, courtesy, or respect for position or rank when used in addressing or referring to a person"
      }
    }
  }

Let’s validate our example:

{
    "$schema":"./schema.json",
    "firstName" : 1, Incorrect type. Expected "string"
    "age":"JoeExample", Incorrect type. Expected "integer"
    "honorific":"Lord Paramount"
}

While correct, Lord Paramount is an uncommon title. Let’s add a controlled list to restrict the types of honorifics.

      "honorific": {
        "enum": ["Mr","Miss","Mrs","Mx"]
      }

We utilize the enum property and fill the array with properties that we want. This is useful in limiting the values we want to receive. It should be noted the contents are not limited to “string” data type and could include any data type e.g. mix of strings and numbers

The result on our validated example:

    "honorific":"Lord Paramount" Value is not accepted. Valid values : "Mr","Miss","Mrs","Mx"

Validating our previous example shows Lord Paramount is no longer valid. We should correct to address:

{
    "$schema":"./schema.json",
    "firstName" : 1, Incorrect type. Expected "string"
    "age" "JoeExample" Incorrect type. Expected "integer"
    "honorific":"Mx"
}

What if we want to provide multiple values for a field? Typically that’s done through an array. Let’s add a new field called diagnosis to our schema.

      "diagnosis": {
        "type": "array",
        "description": "Identification of a disease, condition, or injury based on signs, symptoms, medical history, and physical exams",
        "items" : {"type" : "string"}
      }
  • "type": "array" as other data types informs the record must provide the field as an array/list.
  • "items" : {"type" : "string"} indicates the items we want within the array to match a specific criteria, in this case they must be string items

Let’s test on an example:

{
    "$schema":"./schema.json",
    "firstName" : 1, Incorrect type. Expected "string"
    "age" "JoeExample", Incorrect type. Expected "integer"
    "honorific":"Lord Paramount", Value is not accepted. Valid values : "Mr","Miss","Mrs","Mx"
    "diagnosis" : []
}

Our validation works but that’s not informative if no diagnosis is given. Let’s add a requirement that at least one diagnosis be provided.

    "diagnosis": {
      "type": "array",
      "description": "Identification of a disease, condition, or injury based on signs, symptoms, medical history, and physical exams",
      "items" : {"type" : "string"},
      "minItems": 1
    }

Validating our example again now shows the following error:

    "diagnosis" : [] Array has too few items. Expected 1 or more

A fully valid example should look like:

{
    "$schema":"./schema.json",
    "firstName" : "Joe",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"]
}

5.2. Values by Pattern/Regex

Working Directory : module3_tutorial/5_json_syntax/5_2_patterns_regex/work

Another useful feature is applying a regex or pattern check on values. This is particularly useful to ensure strings follow a particular pattern. For example, currently the field lastName in the schema has the contents:

    "lastName": {
      "type": "string",
      "description": "The person's last name."
    }

This means even an example with a string using numbers works:

{
    "$schema":"./schema.json",
    "firstName" : "Joe",
    "lastName": "12",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"]
}

To ensure only letters are used in the lastName, we can amend the field

    "lastName": {
      "type": "string",
      "description": "The person's last name.",
      "pattern": "[A-Za-z- ]+"
    }

We add the field pattern to enforce a pattern check. Pattern field utilizes regex aka regular expression (for more info see https://regex101.com/).

We use the expression [A-Za-z- ]+, breaking down this regex:
- [...]+ ensures any contents within the brackets can have multiple instances
- A-Z allows for capital letters
- a-z allows for lower case letters
- - allows for hyphens
- allows for spaces

If we recheck our last example against the schema:

{
    "$schema":"./schema.json",
    "firstName" : "Joe",
    "lastName": "12", String does not match pattern of "[A-Za-z- ]+"
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"]
}

A happy path example:

{
    "$schema":"./schema.json",
    "firstName" : "Joe",
    "lastName": "Foster-Smith",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"]
}

5.3. Required vs optional values

Working Directory module3_tutorial/5_json_syntax/5_3_required_optional_values/work

For the current schema, we can provide records with missing fields. This is not ideal as we curate/build datasets that require mandatory fields. Our current schema view is :

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "additionalProperties": false,
  "properties": {...} // Collapsed/Abbreviated Code
}

To demonstrate this flaw:

{
    "$schema":"./schema.json",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"]
}

To ensure firstName and lastName are provided with all our persons, we add the field required

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "required": ["firstName","lastName"],
  "additionalProperties": false,
  "properties": {...} // Collapsed/Abbreviated Code
}

Within required we list the fields that we want. In this case as an array.

Let’s confirm with an example:

{
    Missing property "lastName"
    "$schema":"./schema.json",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"]
}

If we add a firstName

{
    Missing property "lastName"
    "$schema":"./schema.json",
    "firstName" : "Joe",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"]
}

5.4. Conditional rules

Working Directory : module3_tutorial/5_json_syntax/5_4_conditional_rules/work

What if we want one field to influence another? We can introduce conditional rules.

Let’s add two new fields into our schema

    "country": {
      "enum": ["USA","Canada"]
    },
    "province/territory/state": {
      "enum": ["New York","British Columbia","Washington","Ontario"]
    }

We set country to be an enum list of USA and Canada. Depending on country, we want province/territory/state to be a specific enum list.

If we simply define country and province/territory/state with enum lists, a mismatch can occur such as:

{
    Missing property "lastName"
    "$schema":"./schema.json",
    "firstName" : "Joe",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"],
    "country" : "USA",
    "province/territory/state" : "Ontario"
}

More logic is needed to ensure the country and province/territory/state match. We can add a layer of control via if,else,then as properties within the schema

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "additionalProperties": false,
  "required": ["firstName","lastName"],
  "properties": {...}, // Collapsed/Abbreviated Code
  "if":{...},
  "then":{...},
  "else":{...}
}

Filling out the details of the if,else,and then :

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "additionalProperties": false,
  "required": ["firstName","lastName"],
  "properties": {...}, // Collapsed/Abbreviated Code
  "if":{
    "properties":[
        "country":[]
    ]
  },
  "then":{...},
  "else":{...}
}

Let’s add

  "if": {
    "properties": {
        "country": { 
          "enum": ["USA"] 
        }
    }
  }

In our if we specify which properties we want the conditional logic to verify, in our example country. To validate the property country, we can do regex/values or numbers. In our example, we want the logic to occur if country is USA

If country is USA the then code portion occurs

  "then": {
    "required": ["province/territory/state"],
    "properties": {
      "province/territory/state": {
        "enum": ["Wisconsin", "Virginia", "Dakota"]
      }
    }
  },

Like with if we specify when conditions are met which properties are affected by the conditional logic. In our example it would be province/territory/state.

We specify that when USA is provided we want province/territory/state to be mandatory. We overwrite the logic previously provided for province/territory/state, such that when conditional logic is active, value must be from “enum” list.

Let’s use a bad example to see if validation works.

{
    Missing property "province/territory/state"
    "$schema":"./schema.json",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"],
    "country" : "USA"

}

The conditional required field works. What about the enum value?

{
    "$schema":"./schema.json",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"],
    "country" : "USA",
    "province/territory/state" : "Quebec" Value is not accepted. Valid values : "Wisconsin", "Virginia", "Dakota"
}

What if we switch country from USA to Canada and Quebec to Dakota. What would be the result?

{
    "$schema":"./schema.json",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"],
    "country" : "Canada",
    "province/territory/state" : "Dakota"
}

Clearly the wrong pairing but schema is not flagging it so let’s fix that.

Our condition logic operates on the if and then. Now we need to do an else

Like with the then we can format else similarly but swap out the values:

  "if": {
    "properties": {
        "country": { 
          "enum": ["USA"] 
        }
    }
  },
  "then": {
    "required": ["province/territory/state"],
    "properties": {
      "province/territory/state": {
        "enum": ["Wisconsin", "Virginia", "Dakota"]
      }
    }
  },
  "else": {
    "required": ["province/territory/state"],
    "properties": {
      "province/territory/state": {
        "enum": ["Quebec","Ontario","British Columbia"]
      }
    }
  }

Let’s revalidate our previous example:

{
    "$schema":"./schema.json",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"],
    "country" : "Canada",
    "province/territory/state" : "Dakota" Value is not accepted. Valid values : "Quebec","Ontario","British Columbia"
}

6. Mermaid

For time purposes this section will not be covered live and will be optional

Data models can become complex and when written in code, they become hard to navigate. One way around that is via diagrams that visualize the relationships of our data model.

There are a variety of ways to do so, the following is one suggestion.

Given our JSON

{
    "$schema":"./schema.json",
    "firstName" : "Joe",
    "lastName": "Foster-Smith",
    "age": 18,
    "honorific":"Mx",
    "diagnosis" : ["cold"],
    "country" : "Canada",
    "province/territory/state" : "Quebec"
}

We can pass that into the following online converter. As mentioned there are multiple ways of doing the conversion. This is a quick and fast method.

We utilize mermaid, a Javascript based language for constructing charts and diagrams.

We navigate to the site: https://toolshref.com/json-to-mermaid-generator/

Paste our JSON, set to ER diagram and receive the following output.

classDiagram
class C0 {
  +_schema : ./schema.j
  +firstName : Joe
  +lastName : Foster-Smi
  +age : 18
  +honorific : Mx
  +diagnosis[] Array
  +country : Canada
  +province_territory_state : Quebec
}

We can see our instance has the following attributes.

What if we tried a more complicated example?:

{
        "person_identification":{
            "firstName" : "Joe",
            "lastName": "Foster-Smith",
            "age": 18,
            "honorific":"Mx"
        },
        "treatment":{
            "medication":[
                {
                    "medication_name":"acetaminophen",
                    "dose":"5mg"
                },
                {
                    "medication_name":"dextromethorphan",
                    "dose":"5mg"
                },
                {
                    "medication_name":"phenylephrine",
                    "dose":"5mg"
                }
            ]

        },
        "residence":{
        "country" : "Canada",
        "province/territory/state" : "Quebec"
        },
        "diagnosis" : ["cold","dizziness"],
        "occupation" : "hooligan"
}

Copying the above yields the following:

erDiagram
E1 {
  string firstName
  string lastName
  number age
  string honorific
}
E0 ||--|| E1 : contains
E2 {
  ARRAY medication
}
E0 ||--|| E2 : contains
E3 {
  string country
  string province_territory_state
}
E0 ||--|| E3 : contains
E0 {
  ARRAY diagnosis
  string occupation
}
  • E0 represents our person who has the properties diagnosis and occupation
  • additionally our person has the properties: person_identification,treatment and residence as E1,E2, and E3 respectively.

7. External Ontology validation

7.1. Exploration

Ontologies are briefly explored with several examples examined for utility and purpose. Utilizing them can often be tricky as ontologies can be implemented differently.

As such depending on said resource, it can be difficult to incorporate ontology validation in our working example of JSON schemas.

While we enforce patterns, we cannot verify the existence of content. We will further explore how to implement custom code to do so. Note this will differ according to circumstance.

For the tutorial we’ll be exploring ICD10 and some resources affiliated with the project.

ICD10 stands for International Statistical Classification of Diseases and Related Health Problems 10th Revision and codes medical diagnoses, symptoms, and external causes of injury.
- can be explored in https://icd.who.int/browse10/2019/en
e.g. https://icd.who.int/browse10/2019/en#/C34 and https://icd.who.int/browse10/2019/en#/U10.9
- follow a pattern structure of ^[A-Z][0-9][A-Z0-9](\.[A-Z0-9]{1,4})?$
e.g. C34 and U10.9

We can verify codes utilizing the resource : icd10api.com

  • It provides a useful interface for searching up codes
  • The backend is powered by an API (application programming interface) that allows us to use a search function to verify ICD10.
  • Note APIs often have rate limits (queries allowed), so strategize and plan accordingly when utilizing public resources

Let’s explore how to utilize the API.

Going to https://icd10api.com/#:\~:text=JSONP%20callback%20name.-,Examples,-Code%20Lookup we can perform a code look up.

If we provide C34 we get the following response:

http://icd10api.com/?code=C34&desc=short&r=json
{
"Response":
  {
      "Name": "C34",
      "Description": "Malignant neoplasm of bronchus and lung",
      "Valid": "0",
      "Inclusions": [],
      "ExcludesOne": [
          "Kaposi's sarcoma of lung (C46.5-)",
          "malignant carcinoid tumor of the bronchus and lung (C7A.090)"
      ],
      "ExcludesTwo": [],
      "Type": "ICD-10-CM",
      "Response": "True"
  }
}

The API generates a URL and response. The Request will be useful later as we automate the check. For now let’s explore the response.

The response is provided in JSON format and includes multiple fields.

We can get a breakdown of the contents : https://icd10api.com/#:\~:text=RESET-,Fields,-When%20validating%20codes

  • name returns the queried code
  • description describes the diagnosis and its classification
  • Response a binary status indicating if our code is in the database

Let’s explore an example of querying a nonexistent code - Paste the following into your browser:

https://icd10api.com/?code=A0001&desc=short&r=json

We get the following response:

{"Response":"False","Error":"Incorrect ICD Code"}

Now that we have an understanding of how to validate ICD10 codes, let’s incorporate it into our validation flow.

7.2. Incorporating external ontology validation

Working Directory : 7_external_ontology_validation/7_2_incorporating_external_ontology_validation/work

In this section we’ll explore how to perform both schema and external ontology validation on a set of files programmatically.

Note the following section is not the only method for validating your data in bulk. It is conceptual, combining multiple validations to demonstrate a flow.

Navigating to module3_tutorial/7_external_ontology_validation/7_2_incorporating_external_ontology_validation/work, we can view contents by performing:

ls

We’ll see we have three examples of objects/records to be validated

A.json
B.json
C.json

We also have the schema.json. If following the tutorial, the contents are recognizable aside from the addition

    "icd10": {
    "description": "Identification of a disease, condition, or injury based on signs, symptoms, medical history, and physical exams. Using ICD10 codes",
    "type": "string"
    }

Let’s create a few folders to track our results

mkdir schema_validation
mkdir icd10_validation
mkdir validation_results

Check all folders are empty:

ls schema_validation
ls icd10_validation
ls validation_results

We know we can validate our schema, for example using:

check-jsonschema --schemafile schema.json A.json

But let’s capture the result

check-jsonschema --schemafile schema.json A.json && touch schema_validation/A.SUCCESS || touch schema_validation/A.FAILURE 
  • check-jsonschema --schemafile schema.json A.json run our command as previous.
  • && touch schema_validation/A.SUCCESS AND if successful we generate a schema_validation/A.SUCCESS file
  • || touch schema_validation/A.FAILURE OR if not successful generate a schema_validation/A.FAILURE file

Let’s continue with external ontology validation:

jq -r '.icd10' A.json \
  | xargs -I{} curl -s "https://icd10api.com/?code={}&desc=short&r=json" \
  | jq -e '.Response == "True"' \
  && touch icd10_validation/A.SUCCESS || touch icd10_validation/A.FAILURE
  • jq is a useful unix tool for parsing JSONs
  • jq -r '.icd10' A.json reads A.json and returns the icd10 field.
  • xargs -I{} our returned response is iterated over like a forloop
  • curl -s "https://icd10api.com/?code={}&desc=short&r=json where each value of icd10 in our A.json is substituted into the query in the section defined by {}.
  • curl is a useful tool to retrieve and return data
  • our result from retrieving data from the icd10api is then passed into another jq interpretation
  • jq -e '.Response == "True"', the returned JSON is evaluated for whether the Response field is true or not.
  • && touch icd10_validation/A.SUCCESS if true generate .SUCCESS file
  • touch icd10_validation/A.FAILURE otherwise .FAILURE

To combine all our results and summarize :

[ -f schema_validation/A.SUCCESS ] && [ -f icd10_validation/A.SUCCESS ] && touch validation_result/A.SUCCESS || touch validation_result/A.FAILURE
  • -f schema_validation/A.SUCCESS checks for the existence of the file schema_validation/A.SUCCESS
  • [ -f schema_validation/A.SUCCESS ] && [ -f icd10_validation/A.SUCCESS ] combine the two means both SUCCESS files must exist
  • && touch validation_result/A.SUCCESS if they do summarize into validation_result/A.SUCCESS
  • || touch validation_result/A.FAILURE otherwise report as a failure

7.3. Incorporating external ontology validation for all examples

For time purposes this section will not be covered live and will be optional

Working Directory : 7_external_ontology_validation/7_2_incorporating_external_ontology_validation/work

Doing schema validation for for all three

check-jsonschema --schemafile schema.json A.json && touch schema_validation/A.SUCCESS || touch schema_validation/A.FAILURE 
check-jsonschema --schemafile schema.json B.json && touch schema_validation/B.SUCCESS || touch schema_validation/B.FAILURE 
check-jsonschema --schemafile schema.json C.json && touch schema_validation/C.SUCCESS || touch schema_validation/C.FAILURE 

Inspecting our results:

ls schema_validation

shows:

A.SUCCESS  B.FAILURE  C.SUCCESS

Success for A and C but schema validation failure on B

Let’s continue with external ontology validation for all our examples:

jq -r '.icd10' A.json \
  | xargs -I{} curl -s "https://icd10api.com/?code={}&desc=short&r=json" \
  | jq -e '.Response == "True"' \
  && touch icd10_validation/A.SUCCESS || touch icd10_validation/A.FAILURE

jq -r '.icd10' B.json \
  | xargs -I{} curl -s "https://icd10api.com/?code={}&desc=short&r=json" \
  | jq -e '.Response == "True"' \
  && touch icd10_validation/B.SUCCESS || touch icd10_validation/B.FAILURE

jq -r '.icd10' C.json \
  | xargs -I{} curl -s "https://icd10api.com/?code={}&desc=short&r=json" \
  | jq -e '.Response == "True"' \
  && touch icd10_validation/C.SUCCESS || touch icd10_validation/C.FAILURE

Now let’s combine our results:

Performing this action for all our objects

[ -f schema_validation/A.SUCCESS ] && [ -f icd10_validation/A.SUCCESS ] && touch validation_result/A.SUCCESS || touch validation_result/A.FAILURE
[ -f schema_validation/B.SUCCESS ] && [ -f icd10_validation/B.SUCCESS ] && touch validation_result/B.SUCCESS || touch validation_result/B.FAILURE
[ -f schema_validation/C.SUCCESS ] && [ -f icd10_validation/C.SUCCESS ] && touch validation_result/C.SUCCESS || touch validation_result/C.FAILURE

Let’s review our progress:

ls *

We can see the following:

icd10_validation:
A.SUCCESS  B.SUCCESS  C.FAILURE

schema_validation:
A.SUCCESS  B.FAILURE  C.SUCCESS

validation_result:
A.SUCCESS  B.FAILURE  C.FAILURE
  • A was the only record that passed both schema and external ICD10 check
  • B had a valid ICD10 code but failed the schema
  • C passed the schema check but failed the ICD10 validation.

8. Additional challenges

For time purposes this section will not be covered live and will be optional

The following are additional scenarios for participants to test. While these are not covered in the tutorial, participants are encouraged to test what they’ve learnt by attempting the challenges.

8.1. Required object

We’ve explored how to enforce required properties, but what if those fields were not numbers or strings but rather an object.

Would that affect how we handled required properties? What would that look like?

In 8_additional_challenges/8_1_required_object/work you’ll find an example schema.json missing the object occupation.
Fill in occupation based on the example example.json. Make all occupation’s fields required.

8.2. Nested IF ELSE

Given our schema contains if else to catch two scenarios, what if we added a third?

How would we account for that? This can be done by nesting another if else under the else of the original.

What would that look like? The following are a few test cases and a starting place for the schema.

Your goal is to produce a nested if else where the following occurs:

  • if country is USA then the states are Wisconsin,Virginia,Dakota
  • if country is Canada then the states are Quebec,Ontario,`Vancouver
  • if country is Mexico then the states are Sonora,Puebla,Yucatán

in 8_additional_challenges/8_2_nested_if_else/work you’ll find a starting schema.json and the example files A.json,B.json and C.json to test your schema on