Module 3
Lab
This markdown file will serve as the central guide throughout the tutorial. Participants can follow along as the guide will provide example code, references, example files as well as explanations of each step.
The tutorial is designed to instruct users of how to structure and validate data using JSON Schema. The aim is to highlight the benefits of a machine readable language for harmonizing data as well as other data management practices. While the JSON context is secondary, participants will benefit from recognizing and learning to work with JSON as it has been widely adopted across industries.
The tutorial is designed to mostly work locally on your computer, but several components will need an online connection. That said a few requirements will need to be installed by the user to fully benefit from the tutorial. These are highlighted in the requirements section.
We are conscious of the varying knowledge levels of workshop participants throughout. As such, participants are encouraged to move on ahead to different sections or help out others. Please refrain from asking questions regarding upcoming content until appropriate.
All code provided assumes you are working from a directory module3_tutorial
Due to time constraints not all sections of the tutorial will be showcased. Participants are encouraged to explore and check out the sections on their own time.
Sections covered:
- Introducing Json
- Validating Schemas
- 4.2. Bash
- 4.3. VScode
- JSON Syntax
- External Ontology validation
- 7.1 Exploration
- 7.2 Incorporating external ontology validation
1. Organization
Content will be organized into each sections or subsections, containing the following folders (when applicable):
| Folder | Significance |
|---|---|
work |
This directory will serve as the work directory for participants to freely write their code in |
test |
Any prepared files will be shared here. |
happy_path |
When relevant, an example of the happy_path result will be provided to demonstrate full results of the lesson |
2. Requirements
The following items are needed to fully benefit from the tutorial.
3. Introducing JSON
Working directory : module3_tutorial/3_json
JSON stands for JavaScript Object Notation. It serves as a format to store and handle data.
Important to note, the way you manage your data model and ETL (Extract, Transform, Load) pipelines will influence the tools you use and whether JSON is relevant.
That being said, we’ll be exploring how to validate JSON through schemas and many concepts will be cross applicable regardless of implementation.
For the purposes of this tutorial we will go over JSON because we can quickly prototype data validation.
A JSON file typically has the suffix .json e.g. example.json
The contents of a JSON file are usually wrapped by curly braces {...}
It can be as simple as:
Or nested such as so:
{
"ParentA":{
"propertyA_1":{...}
},
"ParentB":[
"childB_1":{
"propertyB_1_1":{"..."}
},
"childB_2":{
"propertyB_1_2":{"..."}
},
]
}We’ll quickly investigate two uses of JSON. 1) For schema and 2) for a record/instance to see how similar but different they are.
3.1. Basic Schema
Working directory : module3_tutorial/3_json/work
Let’s investigate what a basic schema looks like. First we open the file 3_json/work/schema.json
We add the contents as follows:
{
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}3.1.1. Code Breakdown
title field declares the name of our schema. For those familiar with entity relationships, Person will serve as our central node and will be where all other nodes branch out from.
As an example comparison to dataframes, the title would be the name of the dataframe.
This piece of code is declared multiple times throughout the schema. The values as per https://www.w3schools.com/js/js_json_datatypes.asp can be the following:
We declare our Person schema as an object because that enables us to add other fields and properties to our schema. Said properties cannot be represented by other data types.
Additionally because the object schema follows the notation {'key':'value'}, it allows us to add properties and give the fields and properties informative names.
If title was the name of the dataframe, properties would be the column values of the dataframe. Within properties we can assign key value pairs corresponding to the data we like to collect/validate.
Our first example is firstName. We declare that the property firstName of a person must be string. We provide an optional and brief description to describe the field.
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}Conversely in our second property age we declare it as integer with a minimum value of 0
3.2. Basic record/instance
Let’s construct a record to demonstrate the schema
Starting off with curly braces
Based on the schema we add the first property by declaring it firstName and giving it a corresponding value:
Followed by our second property age
Let’s demonstrate how the schema works by flipping the values:
Having the wrong values in our instance can be problematic.
Let’s step back and see how we can use our newly generated schema to validate our instance.
4. Validating Schemas
A core function and purpose of utilizing JSON format and JSON schema is data validation.
The JSON format provides a structured method of wrangling the data into a machine readable format.
The JSON Schema validates said data.
Both are customizable and can be authored to suit specific needs.
This function serves to aid data harmonization and ensures it meets the requirements structurally and semantically.
Three methods will be highlighted to demonstrate how to perform validation.
4.1. Python
For time purposes this section will not be covered live and will be optional
Working directory : module3_tutorial/4_validating_schemas/4_1_python/work
As a higher level programming language, Python is quick to mock up and useful for a variety of data parsing and data wrangling needs.
One can easily validate their schema through a few lines of code, as follows
or paste the following code into a python instance
import json
from jsonschema import validate
with open('test/schema.json', 'r') as f:
schema = json.load(f)
with open('test/bad_example.json', 'r') as f:
instance = json.load(f)
validate(instance=instance, schema=schema)4.1.1. Breaking down the code
Imports the libraries containing tools needed for python to wrangle JSON data. Json quickly ingests and converts our data into a dictionary and validate from jsonschema does the validation action.
with open('test/schema.json', 'r') as f:
schema = json.load(f)
with open('test/bad_example.json', 'r') as f:
instance = json.load(f)Next, the instance and schema are ingested by opening the file and parsing them as JSON. This is done per schema and instance.
This performs the validation. If the instance is properly structured according to the schema, it will silently pass.
4.1.2. Understanding the error
Otherwise an error is reported. The following will produce this result:
Traceback (most recent call last):
File "RDM_workshop/validate.py", line 11, in <module>
validate(instance=instance, schema=schema)
File "/opt/miniconda3/lib/python3.12/site-packages/jsonschema/validators.py", line 1332, in validate
raise error
jsonschema.exceptions.ValidationError: 1 is not of type 'string'
Failed validating 'type' in schema['properties']['lastName']:
{'type': 'string', 'description': "The person's last name."}
On instance['lastName']:
1Note the error produced jsonschema.exceptions.ValidationError: 1 is not of type 'string' indicating one of our fields does not conform to the schema. Specifically lastName as per line Failed validating 'type' in schema['properties']['lastName']
4.2. Bash
Working directory : module3_tutorial/4_validating_schemas/4_2_bash
Similar to python, we can perform validation in terminal through bash.
check-jsonschema is a powerful tool that validates several types of files including JSON. To validate, see below code example:
4.2.2. Code breakdown
check-jsonschema invokes the tool. Specifying the --schemafile identifies what the schema file should be.
The following is the happy path result :
4.2.3. Understanding the error
If we validated the bad example:
The following is the bad path result :
Schema validation errors were encountered.
test/bad_example.json::$.lastName: 1 is not of type 'string'As noted in the error message the .lastName field value is 1 and does not match the type string
4.2.4. Validating multiple files
We can also utilize the command to validate multiple files:
and the result will indicate which of JSON files contains errors
This is especially useful to parse multiple files
4.3. VScode
Working directory : module3_tutorial/4_validating_schemas/4_3_vscode/work
With the VScode plug-in, when we open 4_validating_schemas/4_3_vscode/test/bad_example.json, it dynamically applies the schema for us and highlights errors
{
"$schema" : "./schema.json",
"firstName": "John",
"lastName": 1, Incorrect type. Expected "String".
"age": 21
}Removing line 2 aka the schema reference removes the flagged error:
As the interpreter no longer has the context that lastName must be a string.
Alternatively, we fix the error:
This method is useful for dynamic coding such as crafting example datasets to sanity check your schema.
5. JSON Syntax
Now that we know how to validate our JSON objects, let’s explore more syntax to assist structuring our data. The examples we explored simply constrain datatypes, where the following will allow for conditional rules and more advanced structure.
5.1. Controlled List and arrays
Working directory : module3_tutorial/5_json_syntax/5_1_controlled_lists_and_arrays/work
Starting off with our previous schema:
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}Our example of :
honorific is unflagged.
We can add the property :
to our schema such that it becomes:
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"additionalProperties": false,
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}This restricts data to only have properties defined by the schema.
{
"$schema":"./schema.json",
"firstName" : 1, Incorrect type. Expected "string"
"age": "JoeExample", Incorrect type. Expected "integer"
"honorific":"Mr" Property honorific not allowed
}But we want to add a schema property for honorific.
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"additionalProperties": false,
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
},
"honorific": {
"type": "string",
"description": " A title that conveys esteem, courtesy, or respect for position or rank when used in addressing or referring to a person"
}
}
}Let’s validate our example:
{
"$schema":"./schema.json",
"firstName" : 1, Incorrect type. Expected "string"
"age":"JoeExample", Incorrect type. Expected "integer"
"honorific":"Lord Paramount"
}While correct, Lord Paramount is an uncommon title. Let’s add a controlled list to restrict the types of honorifics.
We utilize the enum property and fill the array with properties that we want. This is useful in limiting the values we want to receive. It should be noted the contents are not limited to “string” data type and could include any data type e.g. mix of strings and numbers
The result on our validated example:
Validating our previous example shows Lord Paramount is no longer valid. We should correct to address:
{
"$schema":"./schema.json",
"firstName" : 1, Incorrect type. Expected "string"
"age" "JoeExample" Incorrect type. Expected "integer"
"honorific":"Mx"
}What if we want to provide multiple values for a field? Typically that’s done through an array. Let’s add a new field called diagnosis to our schema.
"diagnosis": {
"type": "array",
"description": "Identification of a disease, condition, or injury based on signs, symptoms, medical history, and physical exams",
"items" : {"type" : "string"}
}"type": "array"as other data types informs the record must provide the field as anarray/list."items" : {"type" : "string"}indicates the items we want within the array to match a specific criteria, in this case they must be string items
Let’s test on an example:
{
"$schema":"./schema.json",
"firstName" : 1, Incorrect type. Expected "string"
"age" "JoeExample", Incorrect type. Expected "integer"
"honorific":"Lord Paramount", Value is not accepted. Valid values : "Mr","Miss","Mrs","Mx"
"diagnosis" : []
}Our validation works but that’s not informative if no diagnosis is given. Let’s add a requirement that at least one diagnosis be provided.
"diagnosis": {
"type": "array",
"description": "Identification of a disease, condition, or injury based on signs, symptoms, medical history, and physical exams",
"items" : {"type" : "string"},
"minItems": 1
}Validating our example again now shows the following error:
A fully valid example should look like:
5.2. Values by Pattern/Regex
Working Directory : module3_tutorial/5_json_syntax/5_2_patterns_regex/work
Another useful feature is applying a regex or pattern check on values. This is particularly useful to ensure strings follow a particular pattern. For example, currently the field lastName in the schema has the contents:
This means even an example with a string using numbers works:
{
"$schema":"./schema.json",
"firstName" : "Joe",
"lastName": "12",
"age": 18,
"honorific":"Mx",
"diagnosis" : ["cold"]
}To ensure only letters are used in the lastName, we can amend the field
"lastName": {
"type": "string",
"description": "The person's last name.",
"pattern": "[A-Za-z- ]+"
}We add the field pattern to enforce a pattern check. Pattern field utilizes regex aka regular expression (for more info see https://regex101.com/).
We use the expression [A-Za-z- ]+, breaking down this regex:
- [...]+ ensures any contents within the brackets can have multiple instances
- A-Z allows for capital letters
- a-z allows for lower case letters
- - allows for hyphens
- allows for spaces
If we recheck our last example against the schema:
{
"$schema":"./schema.json",
"firstName" : "Joe",
"lastName": "12", String does not match pattern of "[A-Za-z- ]+"
"age": 18,
"honorific":"Mx",
"diagnosis" : ["cold"]
}A happy path example:
5.3. Required vs optional values
Working Directory module3_tutorial/5_json_syntax/5_3_required_optional_values/work
For the current schema, we can provide records with missing fields. This is not ideal as we curate/build datasets that require mandatory fields. Our current schema view is :
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"additionalProperties": false,
"properties": {...} // Collapsed/Abbreviated Code
}To demonstrate this flaw:
To ensure firstName and lastName are provided with all our persons, we add the field required
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"required": ["firstName","lastName"],
"additionalProperties": false,
"properties": {...} // Collapsed/Abbreviated Code
}Within required we list the fields that we want. In this case as an array.
Let’s confirm with an example:
{
Missing property "lastName"
"$schema":"./schema.json",
"age": 18,
"honorific":"Mx",
"diagnosis" : ["cold"]
}If we add a firstName
5.4. Conditional rules
Working Directory : module3_tutorial/5_json_syntax/5_4_conditional_rules/work
What if we want one field to influence another? We can introduce conditional rules.
Let’s add two new fields into our schema
"country": {
"enum": ["USA","Canada"]
},
"province/territory/state": {
"enum": ["New York","British Columbia","Washington","Ontario"]
}We set country to be an enum list of USA and Canada. Depending on country, we want province/territory/state to be a specific enum list.
If we simply define country and province/territory/state with enum lists, a mismatch can occur such as:
{
Missing property "lastName"
"$schema":"./schema.json",
"firstName" : "Joe",
"age": 18,
"honorific":"Mx",
"diagnosis" : ["cold"],
"country" : "USA",
"province/territory/state" : "Ontario"
}More logic is needed to ensure the country and province/territory/state match. We can add a layer of control via if,else,then as properties within the schema
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"additionalProperties": false,
"required": ["firstName","lastName"],
"properties": {...}, // Collapsed/Abbreviated Code
"if":{...},
"then":{...},
"else":{...}
}Filling out the details of the if,else,and then :
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"additionalProperties": false,
"required": ["firstName","lastName"],
"properties": {...}, // Collapsed/Abbreviated Code
"if":{
"properties":[
"country":[]
]
},
"then":{...},
"else":{...}
}Let’s add
In our if we specify which properties we want the conditional logic to verify, in our example country. To validate the property country, we can do regex/values or numbers. In our example, we want the logic to occur if country is USA
If country is USA the then code portion occurs
"then": {
"required": ["province/territory/state"],
"properties": {
"province/territory/state": {
"enum": ["Wisconsin", "Virginia", "Dakota"]
}
}
},Like with if we specify when conditions are met which properties are affected by the conditional logic. In our example it would be province/territory/state.
We specify that when USA is provided we want province/territory/state to be mandatory. We overwrite the logic previously provided for province/territory/state, such that when conditional logic is active, value must be from “enum” list.
Let’s use a bad example to see if validation works.
{
Missing property "province/territory/state"
"$schema":"./schema.json",
"age": 18,
"honorific":"Mx",
"diagnosis" : ["cold"],
"country" : "USA"
}The conditional required field works. What about the enum value?
{
"$schema":"./schema.json",
"age": 18,
"honorific":"Mx",
"diagnosis" : ["cold"],
"country" : "USA",
"province/territory/state" : "Quebec" Value is not accepted. Valid values : "Wisconsin", "Virginia", "Dakota"
}What if we switch country from USA to Canada and Quebec to Dakota. What would be the result?
{
"$schema":"./schema.json",
"age": 18,
"honorific":"Mx",
"diagnosis" : ["cold"],
"country" : "Canada",
"province/territory/state" : "Dakota"
}Clearly the wrong pairing but schema is not flagging it so let’s fix that.
Our condition logic operates on the if and then. Now we need to do an else
Like with the then we can format else similarly but swap out the values:
"if": {
"properties": {
"country": {
"enum": ["USA"]
}
}
},
"then": {
"required": ["province/territory/state"],
"properties": {
"province/territory/state": {
"enum": ["Wisconsin", "Virginia", "Dakota"]
}
}
},
"else": {
"required": ["province/territory/state"],
"properties": {
"province/territory/state": {
"enum": ["Quebec","Ontario","British Columbia"]
}
}
}Let’s revalidate our previous example:
6. Mermaid
For time purposes this section will not be covered live and will be optional
Data models can become complex and when written in code, they become hard to navigate. One way around that is via diagrams that visualize the relationships of our data model.
There are a variety of ways to do so, the following is one suggestion.
Given our JSON
{
"$schema":"./schema.json",
"firstName" : "Joe",
"lastName": "Foster-Smith",
"age": 18,
"honorific":"Mx",
"diagnosis" : ["cold"],
"country" : "Canada",
"province/territory/state" : "Quebec"
}We can pass that into the following online converter. As mentioned there are multiple ways of doing the conversion. This is a quick and fast method.
We utilize mermaid, a Javascript based language for constructing charts and diagrams.
We navigate to the site: https://toolshref.com/json-to-mermaid-generator/
Paste our JSON, set to ER diagram and receive the following output.
classDiagram
class C0 {
+_schema : ./schema.j
+firstName : Joe
+lastName : Foster-Smi
+age : 18
+honorific : Mx
+diagnosis[] Array
+country : Canada
+province_territory_state : Quebec
}
We can see our instance has the following attributes.
What if we tried a more complicated example?:
{
"person_identification":{
"firstName" : "Joe",
"lastName": "Foster-Smith",
"age": 18,
"honorific":"Mx"
},
"treatment":{
"medication":[
{
"medication_name":"acetaminophen",
"dose":"5mg"
},
{
"medication_name":"dextromethorphan",
"dose":"5mg"
},
{
"medication_name":"phenylephrine",
"dose":"5mg"
}
]
},
"residence":{
"country" : "Canada",
"province/territory/state" : "Quebec"
},
"diagnosis" : ["cold","dizziness"],
"occupation" : "hooligan"
}Copying the above yields the following:
erDiagram
E1 {
string firstName
string lastName
number age
string honorific
}
E0 ||--|| E1 : contains
E2 {
ARRAY medication
}
E0 ||--|| E2 : contains
E3 {
string country
string province_territory_state
}
E0 ||--|| E3 : contains
E0 {
ARRAY diagnosis
string occupation
}
E0represents our person who has the propertiesdiagnosisandoccupation- additionally our person has the properties:
person_identification,treatmentandresidenceasE1,E2, andE3respectively.
7. External Ontology validation
7.1. Exploration
Ontologies are briefly explored with several examples examined for utility and purpose. Utilizing them can often be tricky as ontologies can be implemented differently.
As such depending on said resource, it can be difficult to incorporate ontology validation in our working example of JSON schemas.
While we enforce patterns, we cannot verify the existence of content. We will further explore how to implement custom code to do so. Note this will differ according to circumstance.
For the tutorial we’ll be exploring ICD10 and some resources affiliated with the project.
ICD10 stands for International Statistical Classification of Diseases and Related Health Problems 10th Revision and codes medical diagnoses, symptoms, and external causes of injury.
- can be explored in https://icd.who.int/browse10/2019/en
e.g. https://icd.who.int/browse10/2019/en#/C34 and https://icd.who.int/browse10/2019/en#/U10.9
- follow a pattern structure of ^[A-Z][0-9][A-Z0-9](\.[A-Z0-9]{1,4})?$
e.g. C34 and U10.9
We can verify codes utilizing the resource : icd10api.com
- It provides a useful interface for searching up codes
- The backend is powered by an API (application programming interface) that allows us to use a search function to verify ICD10.
- Note APIs often have rate limits (queries allowed), so strategize and plan accordingly when utilizing public resources
Let’s explore how to utilize the API.
Going to https://icd10api.com/#:\~:text=JSONP%20callback%20name.-,Examples,-Code%20Lookup we can perform a code look up.
If we provide C34 we get the following response:
http://icd10api.com/?code=C34&desc=short&r=json
{
"Response":
{
"Name": "C34",
"Description": "Malignant neoplasm of bronchus and lung",
"Valid": "0",
"Inclusions": [],
"ExcludesOne": [
"Kaposi's sarcoma of lung (C46.5-)",
"malignant carcinoid tumor of the bronchus and lung (C7A.090)"
],
"ExcludesTwo": [],
"Type": "ICD-10-CM",
"Response": "True"
}
}The API generates a URL and response. The Request will be useful later as we automate the check. For now let’s explore the response.
The response is provided in JSON format and includes multiple fields.
We can get a breakdown of the contents : https://icd10api.com/#:\~:text=RESET-,Fields,-When%20validating%20codes
namereturns the queried codedescriptiondescribes the diagnosis and its classificationResponsea binary status indicating if our code is in the database
Let’s explore an example of querying a nonexistent code - Paste the following into your browser:
https://icd10api.com/?code=A0001&desc=short&r=json
We get the following response:
Now that we have an understanding of how to validate ICD10 codes, let’s incorporate it into our validation flow.
7.2. Incorporating external ontology validation
Working Directory : 7_external_ontology_validation/7_2_incorporating_external_ontology_validation/work
In this section we’ll explore how to perform both schema and external ontology validation on a set of files programmatically.
Note the following section is not the only method for validating your data in bulk. It is conceptual, combining multiple validations to demonstrate a flow.
Navigating to module3_tutorial/7_external_ontology_validation/7_2_incorporating_external_ontology_validation/work, we can view contents by performing:
We’ll see we have three examples of objects/records to be validated
We also have the schema.json. If following the tutorial, the contents are recognizable aside from the addition
"icd10": {
"description": "Identification of a disease, condition, or injury based on signs, symptoms, medical history, and physical exams. Using ICD10 codes",
"type": "string"
}Let’s create a few folders to track our results
Check all folders are empty:
We know we can validate our schema, for example using:
But let’s capture the result
check-jsonschema --schemafile schema.json A.json && touch schema_validation/A.SUCCESS || touch schema_validation/A.FAILURE check-jsonschema --schemafile schema.json A.jsonrun our command as previous.&& touch schema_validation/A.SUCCESSAND if successful we generate aschema_validation/A.SUCCESSfile|| touch schema_validation/A.FAILUREOR if not successful generate aschema_validation/A.FAILUREfile
Let’s continue with external ontology validation:
jq -r '.icd10' A.json \
| xargs -I{} curl -s "https://icd10api.com/?code={}&desc=short&r=json" \
| jq -e '.Response == "True"' \
&& touch icd10_validation/A.SUCCESS || touch icd10_validation/A.FAILUREjqis a useful unix tool for parsing JSONsjq -r '.icd10' A.jsonreadsA.jsonand returns theicd10field.xargs -I{}our returned response is iterated over like a forloopcurl -s "https://icd10api.com/?code={}&desc=short&r=jsonwhere each value oficd10in ourA.jsonis substituted into the query in the section defined by{}.curlis a useful tool to retrieve and return data- our result from retrieving data from the icd10api is then passed into another
jqinterpretation jq -e '.Response == "True"', the returned JSON is evaluated for whether theResponsefield is true or not.&& touch icd10_validation/A.SUCCESSif true generate.SUCCESSfiletouch icd10_validation/A.FAILUREotherwise.FAILURE
To combine all our results and summarize :
[ -f schema_validation/A.SUCCESS ] && [ -f icd10_validation/A.SUCCESS ] && touch validation_result/A.SUCCESS || touch validation_result/A.FAILURE-f schema_validation/A.SUCCESSchecks for the existence of the fileschema_validation/A.SUCCESS[ -f schema_validation/A.SUCCESS ] && [ -f icd10_validation/A.SUCCESS ]combine the two means bothSUCCESSfiles must exist&& touch validation_result/A.SUCCESSif they do summarize intovalidation_result/A.SUCCESS|| touch validation_result/A.FAILUREotherwise report as a failure
7.3. Incorporating external ontology validation for all examples
For time purposes this section will not be covered live and will be optional
Working Directory : 7_external_ontology_validation/7_2_incorporating_external_ontology_validation/work
Doing schema validation for for all three
check-jsonschema --schemafile schema.json A.json && touch schema_validation/A.SUCCESS || touch schema_validation/A.FAILURE
check-jsonschema --schemafile schema.json B.json && touch schema_validation/B.SUCCESS || touch schema_validation/B.FAILURE
check-jsonschema --schemafile schema.json C.json && touch schema_validation/C.SUCCESS || touch schema_validation/C.FAILURE Inspecting our results:
shows:
Success for A and C but schema validation failure on B
Let’s continue with external ontology validation for all our examples:
jq -r '.icd10' A.json \
| xargs -I{} curl -s "https://icd10api.com/?code={}&desc=short&r=json" \
| jq -e '.Response == "True"' \
&& touch icd10_validation/A.SUCCESS || touch icd10_validation/A.FAILURE
jq -r '.icd10' B.json \
| xargs -I{} curl -s "https://icd10api.com/?code={}&desc=short&r=json" \
| jq -e '.Response == "True"' \
&& touch icd10_validation/B.SUCCESS || touch icd10_validation/B.FAILURE
jq -r '.icd10' C.json \
| xargs -I{} curl -s "https://icd10api.com/?code={}&desc=short&r=json" \
| jq -e '.Response == "True"' \
&& touch icd10_validation/C.SUCCESS || touch icd10_validation/C.FAILURENow let’s combine our results:
Performing this action for all our objects
[ -f schema_validation/A.SUCCESS ] && [ -f icd10_validation/A.SUCCESS ] && touch validation_result/A.SUCCESS || touch validation_result/A.FAILURE
[ -f schema_validation/B.SUCCESS ] && [ -f icd10_validation/B.SUCCESS ] && touch validation_result/B.SUCCESS || touch validation_result/B.FAILURE
[ -f schema_validation/C.SUCCESS ] && [ -f icd10_validation/C.SUCCESS ] && touch validation_result/C.SUCCESS || touch validation_result/C.FAILURELet’s review our progress:
We can see the following:
icd10_validation:
A.SUCCESS B.SUCCESS C.FAILURE
schema_validation:
A.SUCCESS B.FAILURE C.SUCCESS
validation_result:
A.SUCCESS B.FAILURE C.FAILUREAwas the only record that passed both schema and external ICD10 checkBhad a valid ICD10 code but failed the schemaCpassed the schema check but failed the ICD10 validation.
8. Additional challenges
For time purposes this section will not be covered live and will be optional
The following are additional scenarios for participants to test. While these are not covered in the tutorial, participants are encouraged to test what they’ve learnt by attempting the challenges.
8.1. Required object
We’ve explored how to enforce required properties, but what if those fields were not numbers or strings but rather an object.
Would that affect how we handled required properties? What would that look like?
In 8_additional_challenges/8_1_required_object/work you’ll find an example schema.json missing the object occupation.
Fill in occupation based on the example example.json. Make all occupation’s fields required.
8.2. Nested IF ELSE
Given our schema contains if else to catch two scenarios, what if we added a third?
How would we account for that? This can be done by nesting another if else under the else of the original.
What would that look like? The following are a few test cases and a starting place for the schema.
Your goal is to produce a nested if else where the following occurs:
- if
countryisUSAthen the states areWisconsin,Virginia,Dakota - if
countryisCanadathen the states areQuebec,Ontario,`Vancouver - if
countryisMexicothen the states areSonora,Puebla,Yucatán
in 8_additional_challenges/8_2_nested_if_else/work you’ll find a starting schema.json and the example files A.json,B.json and C.json to test your schema on