Bash Script for Parsing HL7 Data

John Thuma
3 min readMar 26, 2019

--

First: I am not sure who or what came up with the data format/architecture for HL7 but all I can say is HUH? scratching my head… I am sure they attempted to make things easier but could have done a little bit more to make it A LOT EASIER TO WORK WITH! (I am sure that their intent was not to make it too easy)

This article is about making HL7 data easy enough to perform analytics and it is really simple to use. I am sure there are other ways of doing this but this was a simple way I constructed one morning

Background:

I recently had to work with some HL7 data and get it into a data set that was usable for analytics. I took a look at the data and found some things I didn’t really appreciate or things I had to change:

1. Records are terminated by a \r. I have a conversion to \n. Simple. (dos2unix)

2. Each record type is delimited by a pipe “|”.

3. Each record has its own structure based on its type. These types of records are not normalized to fit into a single table.

4. The records in the file are coming in a specific order, meaning they are synchronous thus the order of the records is critical.

5. A patient episode will be made up of many ordered record types.

6. Each record type for a patient episode has a counter of the number records for that record type.

7. There is no natural business/surrogate key that transcends a patient episode that I could see. There is nothing in the OBX type which contains notes.

8. Each patient episode starts with a ‘MSH’ record type. There is no trailing ‘MSH’ record type to delimit the patient episode set of records.

9. There is some noise in the data and I have a workaround for that too. (awk)

So in order to get around these challenges and with very little/ZERO budget I wrote a BASH script to get the data into a usable format. (I am not going to share this data as it would be a violation of some PHI and other regulatory compliance issues).

I am sure you have 100 ways of improving this script and I welcome this input.

SCRIPT: Making HL7 data useful:

# SET CONSTANTS AND INITIALIZE VARIABLES
nll=”MSH”
c=1
z=1

#LOOP THROUGH CONTENTS OF FILE
while read -r f;
do

c=$[$c + 1]

var=(${f//\|/ })
tt=${var[0]}

#IF FIRST VALUE IS MESSAGE HEADER OR NOT INCREMENT NEW SURROGATE KEY
if [ $tt == $nll ]; then
b=$(date +%s)
z=$[$z + 1]
echo $b$z”|”$c”|”$f >> /data/hl7/mdm_2_load.txt
else
echo $b$z”|”$c”|”$f >> /data/hl7/mdm_2_load.txt

fi

done < MDM_origCR.txt

--

--

John Thuma
John Thuma

Written by John Thuma

Experienced Data and Analytics guru. 30 years of hands-on keyboard experience. Love hiking, writing, reading, and constant learning. All content is my opinion.

No responses yet