visit the hl7 website The Demo site for our new HL7 Version 2+ (plus) Standard

8 Encoding (92)

TBD (92.00001): Short Introduction to Encoding

ANSI/HL7 V2 XML, R2-2012

June 19, 2012

HL7 Version 2: XML Encoding Rules, Release 2(revision of ANSI/HL7 V2 XML-2003 (R2010))

HL7 Version 2XML Encoding Rules,

Release 2

HL7 Implementable Technology Specifications Work Group Cochairs:

Paul Knapp, Paul Knapp Consulting, inc.

Dale Nelson, Lantana Consulting Group

Andy Stechishin, CANA Software & Services, Ltd.

Conformance & Guidance for Implementation/Testing Work Group Cochairs:

Wendy Huang, Canada Health Infoway Inc.

Frank Oemig, PhD, HL7 Germany

Robert Snelick, National Institute of Standards & Technology

Copyright © 2012 Health Level Seven International ® ALL RIGHTS RESERVED. The reproduction of this material in any form is strictly forbidden without the written permission of the publisher. HL7 International and Health Level Seven are registered trademarks of Health Level Seven International. Reg. U.S. Pat & TM Off.


HL7 licenses its standards and select IP free of charge. If you did not acquire a free license from HL7 for this document, you are not authorized to access or make any use of it. To obtain a free license, please visit

If you are the individual that obtained the license for this HL7 Standard, specification or other freely licensed work (in each and every instance "Specified Material"), the following describes the permitted uses of the Material.

A. HL7 INDIVIDUAL, STUDENT AND HEALTH PROFESSIONAL MEMBERS, who register and agree to the terms of HL7’s license, are authorized, without additional charge, to read, and to use Specified Material to develop and sell products and services that implement, but do not directly incorporate, the Specified Material in whole or in part without paying license fees to HL7. 

INDIVIDUAL, STUDENT AND HEALTH PROFESSIONAL MEMBERS wishing to incorporate additional items of Special Material in whole or part, into products and services, or to enjoy additional authorizations granted to HL7 ORGANIZATIONAL MEMBERS as noted below, must become ORGANIZATIONAL MEMBERS of HL7.

B. HL7 ORGANIZATION MEMBERS, who register and agree to the terms of HL7's License, are authorized, without additional charge, on a perpetual (except as provided for in the full license terms governing the Material), non-exclusive and worldwide basis, the right to (a) download, copy (for internal purposes only) and share this Material with your employees and consultants for study purposes, and (b) utilize the Material for the purpose of developing, making, having made, using, marketing, importing, offering to sell or license, and selling or licensing, and to otherwise distribute, Compliant Products, in all cases subject to the conditions set forth in this Agreement and any relevant patent and other intellectual property rights of third parties (which may include members of HL7). No other license, sublicense, or other rights of any kind are granted under this Agreement.

C. NON-MEMBERS, who register and agree to the terms of HL7’s IP policy for Specified Material, are authorized, without additional charge, to read and use the Specified Material for evaluating whether to implement, or in implementing, the Specified Material, and to use Specified Material to develop and sell products and services that implement, but do not directly incorporate, the Specified Material in whole or in part.

NON-MEMBERS wishing to incorporate additional items of Specified Material in whole or part, into products and services, or to enjoy the additional authorizations granted to HL7 ORGANIZATIONAL MEMBERS, as noted above, must become ORGANIZATIONAL MEMBERS of HL7.

Please see for the full license terms governing the Material.


Frank Oemig, PhD ( Healthcare GmbH, HL7 Germany

Paul Knapp ( Consulting Inc.

Andy Stechishin ( Software and Services

Dale Nelson ( LLC

Frank Oemig, PhD ( Healthcare GmbH, HL7 Germany

Robert Snelick ( Institute of Standard & Technology

Wendy Huang ( Health Infoway

Ioana Singueranu ( LLC

Implementable Technology Specifications (ITS)

This document supersedes Release 1 and contains additional specifications to accommodate new features introduced beginning HL7 Version 2.3.1, e.g. the use of choices within message structures. As of the time of this writing the current version is v2.7. This document is valid for all v2.x versions which have passed ballot. Chapter 2 of the HL7 Version 2.3.1 and 2.7 [rfHL7v231, rfHL7v27] specifies standard message structures (syntax) and content (semantics), the message definitions. It also specifies an interchange format and management rules, the encoding rules for HL7 message instances (see Figure 1). The objective of this document is to present alternate encoding rules for HL7 Version 2.3.1 to 2.7 messages (and a mechanism for determining alternate encoding rules for subsequent HL7 2.x versions) based on the Extensible Markup Language XML [rfXML] that could be used in environments where senders and receivers both understand XML.

It is not the intent of this document to replace the standard sequence oriented encoding rules, that use “vertical bars” and other delimiters (so called “vertical bar encoding”), but rather to provide an alternative way of encoding. Furthermore, message definitions given in the Version 2.x standard are also untouched. However, if you are going to use XML for version 2.x messages, this HL7 normative document describes how to do that. This document does not modify the message definitions, only the way they are encoded.

In principle, many XML encodings could serve as alternate messaging syntaxes for HL7 Version 2.x messages. This document describes the one suggested and standardized by HL7. It primarily addresses the translation between standard encoded and XML encoded HL7 version 2.x, describing the underlying rules and principles. XML schema [rfXMLSchema] definitions are provided for all version 2.x messages types. Due to their greater expressiveness, schemas are the preferred way to describe a set of constraints on message instances. The outdated Document Type Definitions (DTDs) are not addressed any more. The algorithms used for this specification to derive the database excerpts and to create schemas are also presented in the informative appendix.

This document is the normative successor of the first release (2003) and the informative document “HL7 Recommendation: Using XML as a Supplementary Messaging Syntax for HL7 Version 2.3.1 - HL7 XML Special Interest Group, Informative Document” as of February, 2000 (rfINFO)The former document is replaced by this specification, at the moment this document is successfully balloted.

This document assumes a basic understanding of HL7 version 2. However, some background information has been included to aid those without version 2 experience.

This document is the second release of this specification to capture enhancements to the standard. As such, I wish to thank Kai Heitmann who has written the first release.

This standard is the result of about two years of intense work through e-mail, telephone conferences and meeting discussions. I wish to thank Bob Dolin and Paul Biron, who wrote the Informative Document.

This work was made possible by Frank Oemig, Lloyd McKenzie, Vassil Peytchev, Ralf Schweiger, Joachim Dudeck, and Wes Rishel. Valuable discussions came from James Case, Ivan Emelin, Susan Abernathy, Peter Rontey, Nick Radov, John Firl, Jennifer Puyenbroek, Chuck Meyer, Tim Barry, Jacub Valenta, Eliot Muir, Grahame Grieve, Koo Weng On, Andrew Hinchley, Dennis Janssen. Special thanks for his support to Tom de Jong.

Thanks also to all members of the ITS Work Group and the InM Work Group for their input during the development process.

General Knowledge

This specification assumes general knowledge of XML technology on the part of readers. Readers unfamiliar with XML may gain the requisite knowledge from the following standards:

Accompanying Material

Subject to technical corrections


The reader is reminded that both examples and XML schema fragments presented within the document for illustrating purposes are informative and do not form a part of the normative content.

9 Introduction (1)

9.1 Background (1.1)

In 1993, the European Committee for Standardization (CEN) studied several syntaxes (including ASN.1, ASTM, EDIFACT, EUCLIDES, and ODA) for interchange formats in healthcare (rfCEN)A subsequent report extended the CEN study to look at SGML (rfDolin1997)By using the same methodology, example scenarios, healthcare data model, and evaluation metrics, the report presented a direct comparison of SGML with the other syntaxes studied by CEN, and found SGML to compare favorably.

In February 1998, XML became a recommendation of the World Wide Web Consortium (W3C). XML was further tested as a messaging syntax for HL7 Version 2.x and Version 3 messages (rfDolin1998)In 1999, Wes Rishel coordinated a 10-vendor HL7-XML interoperability demonstration at the annual HIMSS Conference. All vendors rated the demo a success.

In 1999, the XML SIG developed an informative document in cooperation with Control/Query TC “HL7 Recommendation: Using XML as a Supplementary Messaging Syntax for HL7 Version 2.3.1 - HL7 XML Special Interest Group, Informative Document” that was approved as an HL7 Informative Document on membership level in February, 2000.

In August, 2000, at the HL7 Board Retreat meeting in Dresden (Germany), it was decided that XML will become the 2nd normative encoding for versions 2.3.1 and 2.4 and future 2.x versions, i. e., the XML syntax that will be submitted for ANSI approval and that has the same status as the traditional syntax. Another reason for a normative XML syntax is to support future Claims Attachment messages, which are currently using v2.4 encoding.

Enhancing v2.x even further with v2.6 and v2.7 new concepts have been introduced which require an enhancement of this specification.

This document stays with the original strategy for the representation of XML instances for backward compatibility.

9.2 Benefits from Using XML as an Alternative v2 Interchange Format (1.2)

There are several benefits using XML as an interchange format.

The ability to explicitly represent an HL7 requirement in XML confers the ability to parse and validate messages with any XML parser. Many “off-the-shelf” XML tools are available (freeware and commercial) such as parsers, transformation applications and instance viewers, which can perform much of the validation of message/document instances, so that applications don't have to. For the encoding part, trained personnel are much easier to find if using XML than experts familiar with vertical bar encoding rules. Of course explicit knowledge about the underlying semantic assumptions is still essential.

Frequently, a typical healthcare messaging application includes an in-house developed parser (message reader) and generator (message writer) to process traditional (“vertical bar” encoded) HL7 messages with an almost certain negative impact on development and maintenance costs. The only alternative to in house tool development which quite often is not implemented correctly and completely is to choose from among the limited but often expensive commercial tool sets. Increasing, the traditional encoding often contributes to the isolation of healthcare from the generic data interchange approaches used by other business areas. Adoption of across the board generic messaging encoding will become critical for cost and error reduction as healthcare and other areas of business increase their daily interactions. Using XML message parsers and generators will undoubtedly help to prepare healthcare for this growing challenge to increase data interchange commonality with other business areas.

Finally, an XML syntax for v2.x messages will also help vendors and providers transition from HL7 Version 2 family of standards to Version 3 by encouraging the early retooling of applications to support XML interfaces.

9.3 XML representation derivation from HL7 Database (1.3)

The XML representation of HL7 messages presented here is algorithmically derived directly from the HL7 Database (see below). This is done to prevent that work has to be done by hand, which often is susceptible to errors. Furthermore deriving the XML representation algorithmically allows generating schemas for future HL7 v2.x versions easily.

Underlying the HL7 2.x messaging Standards is a Microsoft Access database (the "HL7 Database") that contains a copy of the official definitions of events, messages, segments, fields, data types, data type components, tables, and table values. The database is designed to have the same content and is used to accurately reflect on what is given in the paper based standard documents and, in addition, on what the membership voted on and including technical correction.

This database arose as the German HL7 user group undertook careful analysis of the standard. They became aware that the chapters of the standard had been developed by different groups, and that there had been no distinct rules or guidelines for the development of various parts of the standard. They therefore defined a comprehensive database of the HL7 Standard (including Version 2.1 through Version 2.7 for now) to allow consistency checks of items and to support the application of the standard by the user. All data were drawn from the normative standard documents, largely algorithmically and to some minor amount handcrafted.

Within the HL7 Database, all data added is checked for its consistency. Referential integrity among relations assures this consistency. The side effect of referential integrity is to modify the data from the standard documents because the standard is defined in the form of a document but not in the form of a relational database.

As a consequence, the database is not an identical equivalent to the standard, but the differences are documented and reflected as technical corrections and new proposals.

While developing the analytic object model for the definition of the comprehensive HL7 Database, the German HL7 user group became aware that two problems are not handled satisfactorily in the standard:

Further details of the HL7 Database as well as known problems encountered in the construction of the database have been documented by Frank Oemig et al. ([rfOemig1996], see also [rfOemig]). Most of the problems have been solved with newer releases of the v2.x standard in the meantime. However, the database has been constructed to maintain all versions and perhaps derivations thereof in parallel.

Ambiguities or errors in the standard are reflected “as is” in the XML encoding. Fixing any such errors in the XML will require making appropriate technical corrections to the HL7 Database. There have been many such fixes, both in the database and in the XML encoding since the last ballot cycle (committee level ballot). The procedures for deriving the schemas are described in the informative appendix.

It should be mentioned that the database itself or extracts of the database are not needed in order to implement or use the XML encoding of version 2 messages as described in this specification. The database and its excerpts are used for the schema creation process only. Implementers should be able to develop v2.xml interfaces having only the schemas and the printed version of both this specification and the HL7 standard. Implementers may also choose to hand-generate or adjust existing schemas to reflect localizations such as Z-segments.

9.4 Scope for HL7 Version 2 (1.4)

This specification presents XML encoding rules starting with HL7 Version 2.3.1 messages. Former versions of the HL7 Version 2 family of message standards are explicitly not covered, because a construct (MSH-9.3 - Message Structure) needed in this specification is not present in versions prior to v2.3.1. Therefore there is no XML encoding support for Versions prior to v2.3.1.

If a supplier claims conformance for V2 messages in XML the messages must be valid against schemas produced from the HL7 specification by the rules in the v2.xml specification.

9.5 Version 2 Message Definitions (1.5)

9.5.1 Version 2 Hierarchical Message Structure Overview (1.5.1)

A specific HL7 version 2.x message is a hierarchical structure and is initiated by a trigger, representing a real world event. A message is the atomic unit of data transferred between systems and is comprised of a group of segments in a defined sequence. Messages begin with the Message Header Segment MSH and are identified by the message type and the initiating event. A three-character code contained within each message identifies its type. For example the ADT message type is used to transmit portions of a patient’s Admission, Discharge and Transfer (ADT) data from one system to another.

HL7 defines the content of the message as an abstract set of data elements contained in data segments. Segments are ordered sequences of fields and can be declared as required or optional and repeatable or non-repeating. Each segment begins with a threecharacter literal value that identifies it within a message (segment identifier). For example, an ADT message may contain the following segments: Message Header (MSH), Event Type (EVN), Patient ID (PID), and Patient Visit (PV1).

The semantic content of a message is transferred in the fields of the segment. Fields can be of variable length. Field contents can be required or optional, individual fields may be repeated. Individual data fields are found in the message by their position within their associated segments. Multi-component fields are used for further subdivision of a field and facilitate the transmission of locally related semantic contents.

For each field or field component, a data type is defined. Simple data types include string of characters, number, code etc. Complex data types are comprised of two or more components. Examples are the CE data type (coded elements) which components are “coded value”, “code designator” and “code system”, or XPN data type (extended person name), which has several components that are each comprised of several sub-components in order to express the various parts of a person’s name.

9.5.2 Abstract Message Syntax Definitions (1.5.2)

Each message is documented in the standard using a special notation that lists the segment IDs in the order they would appear in the message (see Figure 2). Braces, { ... }, indicate one or more repetitions of the enclosed group of segments. Of course, the group may contain only a single segment. Brackets, [ … ], show that the enclosed group of segments is optional. If a group of segments is optional and may repeat it should be enclosed in brackets and braces, { [ … ] }. Note that [{...}] and {[...]} are equivalent.

Groups with more than a single segment are handled in a special way in this specification (see section ”2.4.1. Optional/Repeating Groups of Segments”), because they are named. Such segment group names are uppercase (e. g. “PROCEDURE”, “INSURANCE”) and do not contain spaces or other special characters.

The brackets and braces in the Abstract Message Syntax relate to XML occurrence indicators as shown in the following:

HL7 Abstract Message Syntax

Equivalent Cardinality in XML Schema (minOccurs .. maxOccurs)

[ ]

0 .. 1

{ }

1 .. unbounded

{[ ]} = [{ }]

0 .. unbounded

<, > and |

1 .. 1 (complexType Choice)

no bracket or brace

1 .. 1

10 Specification (2)

10.1 Introduction to the XML Representation (2.1)

The XML encoding rules specified here represents HL7 message structures as XML elements. Message structures contain segments, also represented as XML elements. Segments contain fields, again represented as XML elements. A field's data type is stored as a fixed attribute in the field's attribute list, while a field's content model contains the data type components. Other fixed attributes are used to expand abbreviations and indicate HL7 Table value restrictions. In addition, the XML schema annotation mechanism is used to provide the same information, as represented in the fixed attributes of field and data type definition (please refer to section “2.5. Fields” and “2.6. Data Types” for details).

10.2 A First Example (2.2)

Here a simple message in the syntax of the standard encoding rules can be seen:

Here is the same message in the syntax of the recommended XML encoding rules:

As is always the case with XML when processed with a validating processor, the extra white space between elements and line breaks (provided to make the message easier for people to read) can be removed in actual message instances, resulting in shorter messages in situations when overall message length is a factor.

The next section describes the stepwise creation of the XML representation.

10.3 Message Identification and Trigger Events (2.3)

A key role is played by the Message Type that is defined in the abstract message definition of a message and also given in the MSH-9 field of the message header segment. This field contains the Message Type, Trigger Event, and the Message Structure ID for the message.

All mentioned tables are defined in chapter 2 or 2C of the standard.

10.3.1 Message Structure IDs (2.3.1)

In the 2.x standard(s), many trigger events share the same abstract message syntax. This fact became standardized in v2.3.1 and was introduced in the form of the Message Structure component of MSH-9 (component 3).

The v2.xml schemas (see also section”3.1.2. List of Schemas”) are based on the described message structure ID. Looking at message definitions in 2.3.1 and later, the abstract message definition (see example a in Figure 3) and the MSH-9 field (see example b in Figure 3) contain the message type, trigger event, and the message structure ID for the message, e. g., ADT^A04^ADT_A01. This indicates that the ADT message with trigger event A04 has the message structure ID ADT_A01 (i.e., it has the same sequence and cardinality of segments). All messages with that structure ID are structurally the same, though they differ in the semantics of the event (A04 in the example case). In detail, message structure code ADT_A01 describes the single abstract message structure used by the trigger events A01, A04, A05, A08, A13, A14, A28 and A31.

As a consequence, encoding an A04 message, which has the ADT_A01 message structure, requires using the schema definition for the ADT_A01 message. The standard documents contain tables where the message structure IDs are listed (see section ”3.1.1. List of Messages With Equal Message Structures”).

The message structure ID is used as a root element for the XML instance documents. As an example the corresponding XML message fragment is shown below. The element carries the segment elements (see following section) as child elements.

10.4 Segments (2.4)

Message structures contain segments, also represented as XML elements. Segments are ordered sequences of fields. Each segment begins with a threecharacter literal value that identifies it within a message (segment identifier). The v2.xml schema definition uses the segment identifier as XML element names. An MSH segment, for example, has as an XML element name, a PID segment etc.

Considering the ADT_A04 example above, the corresponding XML message fragment is shown below. The element for example carries the corresponding field elements (see following section) as child elements.

10.4.1 Optional/Repeating Groups of Segments (2.4.1)

Some segments are grouped by braces { ... } or brackets [ … ] to denote repetitions or optionality of the segment(s). If a group of segments is optional and may repeat it is enclosed in brackets and braces, { [ … ] }, where [{...}] and {[...]} are equivalent. Upon further consideration by both users and implementers alike (per the clarifying reasoning given below), it has become increasing persuasive to deepen the XML element hierarchy by the addition of grouping elements.

Groups containing more than a single segment are thus handled in a special way in this specification. For example in Figure 4, a group is denoted by [{ PR1 [{ ROL }] }]. This group is named “PROCEDURE” (see 2nd column in Figure 4 containing “--- group_name begin/end”). Another example is the [{ IN1 [ IN2 ] [{ IN3 }] [{ ROL }] }] group which is named “INSURANCE”. These names also appear in the v2.xml schema definitions of the corresponding messages and thus have to appear also in an XML message instance containing messages of that type, i.e. groups of segments are surrounded with their own tags.

There was no explicit way to express these groups in the traditional v2 “vertical bar” encoding of messages. Introduction of the explicit segment group names marks a major difference between vertical bar and XML encoding. Furthermore, this allows elements to be accessed in a reasonable manner within an X-Path expression (see [rfXPATH]). By this, an application can refer to specific XML items explicitly by name (e.g. ADT_A01/PROCEDURE/PR1.3/CNE.1) or they can refer to them by position (e.g. ADT_A01/PROCEDURE/PR1.3/*[position()=1]). By taking the latter approach, one no longer has to take care what the name of the field, data element or data type is. See also section on 0 data types.

Segment group names are uppercase. In almost all cases the names convey the semantics carried by the group of segments itself, for example INx segments are bundled by the “INSURANCE” group, PV1 PV2 segments are bundles as the “VISIT” group etc.

Please note: The narrative segment group names where this specification makes use of are neither in the paper version of v2.3.1 nor v2.4. They are drawn from the v2.5 specification.

ADT^A01^ADT_A01: ADT Message

About 400 different groups of that kind could be identified in the standard. Some of the groups have identical content concerning segment sequence, some of the contained segments, however, have different cardinalities. As an example the group “INSURANCE” could be found in ADR_A19, ADR_A01, ADR_A05, ADR_A06 etc. but the single segments IN1, IN2 etc. have different cardinalities within these groups. Consequentially, the v2.xml XML schema segment group naming convention has adopted the use of the owning message structure id as a prefix for the group name to insure uniqueness in regard to content.

Considering the ADT_A04 example above, the corresponding XML message fragment with groups is shown below.

The corresponding schema definition fragment for the ADT_A01 message is shown below.

As an example, the corresponding schema definition fragment for the EVN segment is shown below. Please note that, consistent with the processing rule for v2 whereby receivers are to ignore fields not expected, the schemas will also allow additional elements at the end of a segment.

10.4.2 Choice Groups of Segments (2.4.2)

Another way of grouping segments is by a choice, i.e. here a decision has to be made which set of segments should be conveyed in a message: This is indicated by angle brackets: < and >. The different options for a choice are then separated by a vertical bar.

Please note, that this vertical bar is independent from the vertical bar in the conventional encoding reflecting the standard field delimiter.

As an example, the corresponding schema definition fragment for the CLINICAL_HISTORY_OBJECT choice group is shown below.

If more than a single segment is to be used, subgroups must be created.

10.5 Fields (2.5)

The semantic content of a message is transferred in the fields of the segment. Fields contents can be required or optional. Individual fields may be repeated. Individual data fields are found in the message by their position within their associated segments and are described in segment tables (see Figure 5 as an example).

Multi-component fields are used for further subdivision of a field and facilitate the transmission of locally related semantic contents.

In the v2.xml specification, individual fields are represented by three-character literal segment ID of the corresponding segment plus their individual position within the segment (sequence). The first field (Event Type Code) in segment EVN for example is named EVN.1, the second EVN.2 etc. An example of an EVN segment, traditionally encoded and v2.xml encoded is shown below. Please note that the EVN encoding contains time stamp representation (TS) in EVN.2, EVN.3 and EVN.6 which are not primitive but composite data types and which are expressed in a way described in detail in section ”2.6.2. Composite Data Types”.

In the traditional sequence oriented approach, empty fields (containing no data) are denoted as two vertical bars “||” in sequence to express the empty contents. This is essential in sequence-oriented approaches. In v2.xml an element with no contents simply can be omitted (unless explicit use of the "" is required to force a data delete action by the receiving application, see section ”2.7.6. Delete Indicators, Empty Values”). In the example above there is no information for EVN.5, thus the element is omitted in the corresponding XML instance.

The content model of each field is a reference to the field’s data type. In the XML schemas, the component’s item number, table reference, long name, and data type is provided by the mechanism, in addition a tag is given containing the long name of the field (also the language is defined by the standard xml:lang attribute) as specified in the standard. In addition, the same information is provided as fixed attributes.

The example below shows the XML schema definition of the EVN.1 field element along with its annotations.

If a receiver receives an XML instance that is validated against the schema, the receiving parser can make use of the information that is provided in the annotations appinfo (application information) and documentation (user information) element content of the underlying schema.

The constraints on minLength and maxLength allow for a schema-based validation. Nevertheless, providing this information as attributes in messages instances may be useful as well. Therefore, both options are valid. However, for backward compatibility reasons the new constraints should not be used with HL7 versions prior to 2.7, although this is a matter of negotiation between trading partners..

These rules also apply for data types.

10.6 Data Types (2.6)

For each field or field component, a data type is defined. Some data types are primitive, i. e. they have no components. Composite data types are comprised of data type components, which, like fields, have a data type of their own and a long name. Some data type components also specify an HL7 Table that contains enumerated values for use in the component.

10.6.1 Primitive Data Types (2.6.1)

Some data types are primitive, in which case they have no components. Simple data types are for example string of characters, date etc.

A field for which a primitive data type is defined simply contains the information without additional nesting or hierarchy. As an example the 4th field of the EVN segment (see Figure 5) is of type IS (a value drawn from an HL7 defined table), which is a primitive data type. The corresponding XML instance fragment looks like the following example:

The v2.xml schema definitions define all primitive data types as “string” (XML schema).

10.6.2 Composite Data Types (2.6.2)

Complex data types are comprised of two or more components. As an example, consider the CNE data type (coded elements) which components are “identifier”, “text” and “name of coding system” etc. The standard defines the individual components of the composite data types in chapter 2 (see example below, alternatively the table shown in Figure 6, not presented in the standard, but used for later reference in this specification).

Analogous to field components, data types components are modeled by specifying the data structure name plus their individual position within the data type component (sequence). As an example, the first component of data type CNE is defined as CNE.1, the second as CNE.2 and so on. This allows individual access to any of the components of a composite data type. The following example shows a CNE data type encoded traditionally (“vertical bar”) and as v2.xml fragment.

Also, empty components may be omitted in the v2.xml encoding, whereas empty components in the traditional encoding must specify an empty component by two component delimiters “^^”in sequence in order to preserve sequence.

Where a field has a data type with multiple components but only a single component is populated with information, the corresponding data type element of the component may not be omitted.

Considering the following example where a field of type CWE carries information in the first component only (i. e. the identifier of a coded element), the correct v2.xml encoding is shown as in the following example with an OBX.3 field:



Data type components of composite data types are modeled similarly to fields. The content model of each component contains reference to the component's data type. Annotation mechanism is used to express the component’s data type, long name, and table, as shown here for CNE.1. In addition, the same information is provided as fixed attributes.

10.6.3 Wildcard (2.6.3)

In the HL7 Standard, a few data types (components) specify “WILDCARD” or “varies” in order to express an undefined type (data of any type can be specified). This data type is modeled to be any HL7 data type.

10.6.4 CM Data Types (2.6.4)

A special data type, the CM data type, was used up to and including HL7 v2.3.1 to express that the explicit data type of the content is undefined (i. e. a type=”anyType” in XML Schema definitions). Use of this data type was deprecated beginning with v2.4 in order to allow more restricted conformance testing (no new fields would use the CM data type).

In v2.5, new data types were created for (and applied to) all existing fields/components using the CM data type. An addendum, for XML encoding, was applied to HL7 v2.3.1 and v2.4, where these renamed data types are listed. These corrected names must be used when encoding CM data types with XML.

10.7 Processing Rules for v2.xml Messages (2.7)

10.7.1 XML Application Processing Rules (2.7.1)

The original and enhanced processing rules described in chapter 2 of the v2.x standard are not affected by this specification. However, concerning the exchange of XML messages between sender and receiver, additional assumptions are made in terms of “well-formed” and “valid” XML instance documents.

The sender of a v2.xml XML message is required to create both well-formed and valid message instances. The instances created should be valid against the corresponding XML schema definitions (see section ”3.1.2. List of Schemas”). However, this does not necessarily imply validation of the transaction at run time. The decision to do so and incur associated overhead should be made on a site-by-site basis or on interface development status.

The receiver who accepts a v2.xml XML message is required to check well-formedness of the XML instance. He may (but is not required to) validate the message against the schema.

10.7.2 Inter-version Backward Compatibility (2.7.2)

The vertical bar encoding provides a certain amount of backward compatibility between versions of the v2.x world. For example, there is no difficulty changing data types to new data types that are backward compatible (as an example, IS to CE data type), or converting a repeating/optional segment into a repeating/optional group of segments. This helps ensure inter-version compatibility. Because the XML encoding makes explicit use of constructs touched by the changes mentioned above, inter-version backward compatibility is not a given. For example, if a data type for a field is changed from IS (primitive) to CE (composite), the CE composite data type introduces its own tags, in other words, the former IS field now has child elements drawn from the composite CE data type.

However, it should be easy to achieve XML transformations from an XML instance for one version to another using corresponding transformation rules or style sheets (which are not provided here).

10.7.3 Message Fragmentation and Continuation (2.7.3)

Sometimes, implementation limitations require that large messages or segments be broken into manageable chunks. Message fragmentation and continuation as described in chapter 2 of the v2.x standard is not supported in this v2.xml specification. It is assumed that XML aware systems, for which this specification is written, are able to accept stream character messages of an arbitrary length, i. e. several 100k bytes of information or more at once.

10.7.4 Batch Messages (2.7.4)

There are instances when it is convenient to transfer a batch of HL7 messages. Common examples would be a batch of financial posting detail transactions (DFTs) sent from an ancillary to a financial system, a backload of persons, admissions (ADT message batch for initial patient backload), employees, and master files.

Chapter 2 of the standard defines such a mechanism to wrap multiple valid HL7 messages by wrapping control segments in order to form a batch of messages. For that purpose specific file and batch header and trailer control segments FHS, FTS, BHS, BTS are defined.

In the XML encoding, it is also possible to wrap multiple messages with the corresponding control segments. The definitions can be found in the messages schema (batch.xsd). For queries there is the need to define a QPD segment differently for one query to a different query. The only way to support batches of queries (e. g. for non time critical processing) or responses is to wrap the contents of the batch tags as CDATA. This approach has been used for the general definitions of batch message “payload”, regardless of containing query segments or not.

For further information on batch messages refer to Chapter 2 and 5 of the standard.

10.7.5 Message Delimiters (2.7.5)

In constructing a traditionally encoded v2 message, certain special characters are used. They are the segment terminator, the field separator, the component separator, subcomponent separator, repetition separator, escape, and truncation (as of v2.7) character. The segment terminator is always a carriage return (in ASCII, a hex 0D). The other delimiters are defined in the MSH segment, with the field delimiter in the 4th character position, and the other delimiters occurring as in the field called Encoding Characters, which is the first field after the segment ID. The delimiter values used in the MSH segment are the delimiter values used throughout the entire message. In the absence of other considerations, HL7 recommends the suggested delimiter values.

At any given site, the subset of the possible delimiters may be limited by negotiations between applications. This implies that the receiving applications will use the agreed upon delimiters, as they appear in the Message Header Segment (MSH), to parse the message.

In the v2.xml encoding the message delimiter characters are contained in the MSH.1 and MSH.2 element of the MSH segment as well. Although the message delimiter characters are meaningless in the v2.xml encoding, they are represented as shown in the example fragment of the MSH segment. However, they can be useful when translating from vertical bar to XML representation and vice versa. They must still be sent, because MSH.1 and MSH.2 are required fields in the v2.x standard. Please note, that the special character “&” must be escaped in order to be included in an XML message instance (see also section ”2.7.10. Special Characters in Schemas”).

10.7.6 Delete Indicators, Empty Values (2.7.6)

Where a sending system can ascertain that a data field has been deleted, then the two double quote marks ("") will be used to define the state of that data field. An encoded field with a value of two double quote marks ("") would instruct the receiving system to delete the contents in the database field.

If the state of a blank or null data field cannot be determined, the sending system will send the empty value or omit the element at all. An encoded field with an empty value or a missing element would instruct the receiving system to bypass processing and does not affect an already existing value in the corresponding receiving database.

The occurrence of an empty element is treated as not existing to keep backward compatibility with ER7.

The following example carries a delete indication in the data type component. Explicit empty (missing) values are expressed by empty (missing) element content. In the example, is omitted (empty).

10.7.7 Repetition of Segment Groups, Segments and Fields (2.7.7)

Repetition of segment groups, segments and fields are handled by repeating the appropriate tags. This also has to be done with fields, which are comprised of composite data types. The following examples demonstrate correct repetition in the XML encoding.

example 1

example 2

example 3

10.7.8 Escape Character Sequences Used in v2 Data Types (2.7.8)

Chapter 2 of the v2.x standard specifies escape character sequences to be used in fields of certain types. When a field of data type TX, FT, or CF is being encoded, these escape characters may be used to signal certain special characteristics of portions of the text field. The escape character is whatever display ASCII character is specified in the escape character component of MSH-2-encoding characters.

For the XML encoding we must differentiate between data type associated escape characters (text formatting), structural escape sequences and character encoding / character set switching characters. They have to be handled differently when using v2.xml. Text Formatting Escape Sequences (

\H\ and \N\ are defined in chapter 2 of the standard as indicating begin and end highlight of text in a text field. In v2.xml these characters are replaced by corresponding XML elements that can easily be processed.

Escape charactersas defined in the v2.x standard

An example, v2.x and below the corresponding v2.xml notation

There is also the possibility of specifying troff commands in text fields. They are escaped accordingly. The following table just shows examples and is not complete. Please refer to chapter 2 of the v2.x standard.

Escape charactersas defined in the v2.x standard

An example:

is expressed in v2.xml as Structural Escape Sequences (

The escape character sequences \F\, \S\, \T\ and \R\ mentioned in chapter 2 of the v2.x standard used to indicate the literal field, component, subcomponent and repetition separators may not be used in the v2.xml encoding. They are superfluous because the XML does not make use of these structural separator characters. Character References (

The vertical bar character encoding mechanism using \Xxxx\ as a character reference, and \Zxxx\ to refer to a locally defined character reference is deprecated in the v2.xml encoding. Instead, the standard XML character reference mechanism for UTF-8 must be used. Even non-printable characters like form feed can be represented that way.

For locally defined character references outside that scope, the private area of Unicode should be addressed. Character Set Switching (

Character set switching as described in chapter 2 of the v2.x standard cannot be addressed in XML. XML has only a single character set - UCS/Unicode. Each XML entity must have a single encoding. An XML document can be made up of several entities, which each may have different encodings, but switching character sets in a single entity is thus not supported. Escaping XML Markup (

It is the responsibility of sending applications to escape all characters occurring in data that may be interpreted by an XML processor as being markup characters. Depending on context, these include all characters listed in the XML specification (rfXML)For the receiving application, XML markup characters are normally handled by the XML parser.

10.7.9 Message Building Rules (2.7.9)

The message building rules remain the same as described in chapter 2 of the standards. However, there are some exceptions if the v2.xml encoding is used.

A receiver who accepts a v2.xml XML message is required to check well-formedness of the XML instance and may (but is not required to) validate the message against the schemas. As described in chapter 2 of the standard, the receiver

but in terms of validating against the v2.xml schema definition, the cardinality of the components is determined by the v2.xml schema.

Please note, that, in correspondence of what the processing rules for v2 say for additional stuff after a segment, the schemas also allow any elements following after the end of a segment.

10.7.10 Special Characters in Schemas (2.7.10)

Certain characters within the HL7 Database must be “escaped” before inclusion in a schema. The ampersand is a reserved XML meta character.

Where an ampersand occurs in the long name of a field, it is converted to an XML entity representation “&” An example is “Critical Range for Ordinal & Continuous Obs” that becomes “Critical Range for Ordinal & Continuous Obs”.

Because the Schema wraps the value of attribute LongName in single quotes, when a single quote occurs in the long name of a field, it is converted to an XML entity representation “'”, e. g. “Contact's Tel. Number” becomes “Contact's Tel. Number”.

The same rules apply to XML message instances.

Please note, that spelling and capitalization of all tags in the XML encoding must be the same as defined in the HL7 database (see section ”1.3. XML representation derivation from HL7 Database”). Please refer to the schemas, which reflect these rules.

10.8 Translating Between Standard Encoding and XML Encoding (2.8)

In environments where not all senders and receivers understand this XML encoding it may be necessary to translate instance messages between the standard encoding and this XML encoding and vice versa. This recommendation does not require that any such translations be supported nor does it prescribe how such transformations should be performed in environments where they are supported.

Because of several important differences between the standard encoding and this XML encoding, translations between the two encodings are not straightforward although it is not hard. The issues described in section “2.7. Processing Rules for v2.xml Messages” need to be taken into account when performing the translations.

11 Appendix (3)

11.1 Normative Appendix (3.1)

11.1.1 List of Messages With Equal Message Structures (3.1.1)

As previously mentioned, the v2.xml schemas are based on the message structure ID - a concept introduced in version 2.3.1. The standard documents contain tables with the message structure IDs.



HL7 Table

Message structure table 0354



Message structure table 0354



Message structure table 0354



Message structure table 0354



Message structure table 0354



Message structure table 0354



Message structure table 0354



Message structure table 0354

11.1.2 List of Schemas (3.1.2)

This specification provides the set of constraint definition files:

as shown by the following table. There is a set for each HL7 version supported by the v2.xml specification. In addition, HTML files are provided, one for each message structure, containing a short description of the message and links to the corresponding schemas (in directory xsd).

Please note that the use of XML schemas is recommended by HL7 for all normative specifications. The use of XML schema ([rfXMLSchema], a W3C recommendation since May 2001) is recommended by HL7 for all normative specifications. The schemas are not part of the normative specification, but rather added as an informative appendix in order to support vendors with migration from DTDs to XML schemas.

It should be mentioned that DTDs can coexist in the same interface with schemas and not cause any issues. For example, the sending interface can implement XML messages using schemas and the receiving system using DTDs. However, schemas have a much greater expressiveness and should be preferred.

A set of many files in HTML format containing a short description of the message and links to the corresponding schemas.

A set of many schemas each containing the schema definitions for a specific message structure specified by MessageStructureID, for example ADT_A01.xsd contains the definitions for the ADT A01 message structure, ADT_A02.xsd for ADT A02 and so forth.

The schemas import the segments definitions.

schema for all segment definitions, imports fields definitions

schema for all field definitions, imports data type definitions

schema for all data type definitions for v2

schema containing definition of batch messages (refer to section ”2.7.4. Batch Messages”)

schema containing all message definitions together

An XML instance of a specific message should refer to the corresponding schema. The following examples show a schema reference within a v2.xml XML message instance fragment. In both cases is the root element of the instance.

11.1.3 Localization of messages (3.1.3)

The HL7 Standard describes the responsibilities for parties sending and receiving HL7 messages (see also section “2.7.1. XML Application Processing Rules” of this specification). These responsibilities enable exchange of messages that contain localizations (or local variations or Z-segments, Z-fields etc.). The v2.xml specification attempts to provide full support for local encodings (see section ”Localization Issues”). Examples to follow show how to introduce local variations. This has to be done in concordance to the v2.x standard itself, i. e. where localizations are appropriate and allowed.

The mechanism shown here is a non-normative recommendation. Schema Localization (

The schemas provided in this specification can be localized. This is done by redefinitions of the existing definitions.

A corresponding instance message fragment would look like this:

Expanding the FOO message by adding a local Z segment (with own field definitions not mentioned in detail here), let’s say the new content model should be ABC, DEF, ZZZ, GHI, JKL. To achieve this, the entity of the content model describing FOO can be changed in the internal subset like this:

A redefinition containing the new localized content of the FOO content model can then be made on a copy of the schema definition.

Now, the FOO message is redefined by the local modification. The copy of the schema will be used instead of the original version. A new root element can be defined called FOO.LOCAL that serves as a localized version of the original FOO root element.

The corresponding local message instance would look like this:

Please note that it is good practice to intersperse local stuff by using a different namespace. It is therefore recommended to associate Z-stuff with another namespace.

11.2 Informative Appendix (3.2)

11.2.1 Design Considerations (3.2.1)

As noted above, there are many possible XML representations of HL7 messages. This section describes those factors considered in deciding on the particular approach presented in this specification. XML Schema Optimization (

XML schema optimization means balancing functional, technical, and practical requirements. Some metrics are fairly straightforward to quantify (e. g. message length), while others are less so. There is a risk that the easily quantifiable measurements will assume significance out of proportion to other metrics. All relevant metrics must be factored together in the determination of the optimal XML representation. Message Length (

Message length minimization techniques are employed to decrease the total number of characters (including data and/or markup) comprising a message. The optimal techniques used to minimize SGML messages are not necessarily the same as those best suited to minimize XML messages. Techniques used here common to both SGML and XML include the use of abbreviations. In some cases modeling components as XML attributes as opposed to elements could result in further minimization. This specification represents HL7 message structures, segments, fields, components and subcomponents as XML elements. A field's data type is represented as a fixed attribute, while data type components are represented as XML elements. Full SGML provides even greater minimization capacity with the use of SHORTTAG, OMITTAG, and SHORTREF techniques, resulting in very small messages that are not valid XML, and are therefore not employed here.

The greater the percentage of data characters (as opposed to markup characters) in an average message, the less important any additional overhead imposed by changing from the standard HL7 encoding rules to XML becomes. Data from the Duke University Medical Center (DUMC) HL7 production environment suggests that on average, for standard ‘vertical bar encoding’, data characters comprise about 70% of overall message length. (Data from DUMC courtesy of Al Stone, and posted to the HL7 SGML/XML SIG List Server 1998-01-15 and 1998-01-16) The XML encoding recommended here will result in messages that are approximately five to ten times longer, although this estimate has yet to be subjected to rigorous testing nor is officially published.

Message length is an issue for bandwidth requirements but also for long term archiving the original messages (as done e. g. by some healthcare providers). It should be mentioned that the use of compression is considered as a solution to deal with both bandwidth and archiving issues. It’s a matter of fact, that using appropriate compression algorithms XML instances compress very well. This is for example because starting and ending tags are almost the same sequence of characters. However, describing compression methods is out of scope of this specification.

An example: the large messages shown in section “3.2.4. Algorithms” show 1,426 bytes for the vertical bar encoding and 6,442 bytes for the v2.xml encoding (4.5 times larger). After compression the v2.xml message is 1,714 bytes long. That is about 20% larger than the uncompressed vertical bar variant.

Furthermore it should be mentioned that we’ve learned from early v2.xml implementers that performance could be gained (along with the use of less bandwidth) if large batch files are broken into many small batch files. Structural Complexity (

Krueger [rfKrueger] describes the use of “structural complexity” as a metric to analyze HL7 messages. “It would be nice to be able to estimate or compare the time needed by human users to understand or implement different messages or the time needed for a parsing program to analyze different messages.” The exact determinates of structural complexity were outside the scope of Krueger's work, although he comments that “empirical investigations must be carried out to monitor the effort users will take to understand and implement different HL7 messages”. Potential components of this metric are listed below. In some cases, the metric will be the time and/or space complexity required to carry out the functions. We agree with Krueger that “it does not make any sense to expect absolute results. However, relative (i. e. comparable) results could also be a valuable source of information.” Localization Issues (

The HL7 standard describes the responsibilities for parties sending and receiving HL7 messages (see also section “2.7.1. XML Application Processing Rules” of this specification). These responsibilities also enable exchange of messages that contain localizations (or referred to as local variations or Z-segments). Consequent to these requirements, an XML representation needs to fulfill the following design considerations:

The v2.xml schemas are crafted to fulfill these requirements. Please refer to section “3.1.3. Localization of messages” of the informative appendix for further information. Conformance Testing of Other Business Rules (

XML is a formal grammar that can be used to encode HL7 business rules. When an XML processor validates that a message is valid per its schema, it is also validating that a message is conformant to those HL7 rules that are explicitly represented in the XML schema.

Some HL7 rules are easy to explicitly represent within an XML schema, such as the optionality and repetition of a field within a segment.

You can carry HL7-valid messages in the constructs defined by this specification, but you can also carry a lot of HL7-invalid messages. An XML processor can't validate that a message received is a valid HL7 message. The decision in the XML representation presented here is to capture as many HL7 business rules as reasonably possible in terms of XML schemas. This includes enabling a validating parser to verify the optionality, repetition, and ordering of segments within messages and fields within segments; and the correct use of data types and their components within fields. Easing the burden on the application with regard to structural validity (e. g., are all the pieces in the proper place) is itself a big win, despite the fact that the application will still have to perform semantic validation (e. g., is that code really a valid SNOMED code or other business rules to be conformant to).

Some actions that are supported in vertical bar encoding, such as the forward-adoption of new data types cannot be handled by the XML encoding. Automation Considerations (

The XML representation of HL7 messages presented here is algorithmically derived directly from the HL7 Database. The algorithms used for this specification to derive the database excerpts and to create schemas are presented in this informative appendix.

The automatic creation process was considered in order to avoid handcrafting of the schemas, which would have involved a certain danger of introducing errors. Furthermore, necessary refinement of definitions during the development process could be achieved much easier.

For the second release, backward compatibility with the previous version of this specification should be guaranteed.

11.2.2 Extracting Subsets of the HL7 Database (3.2.2)

The following section describes the methods used for extracting information from the HL7 database that were necessary to generate the schemas. Messages and Their Segments ( HL7 Database Tables Used (

The first table is just introduced in order to obtain the correct identifier for a specific variation of a version because the database is capable of maintaining different versions in parallel:

Field Name



long integer

version number



HL7 version






Release date of that version

The following HL7 Database table is used in the creation of the message schemas. (Only those fields being queried are shown. The field names and their descriptions are taken verbatim from the HL7 Database.)

Field Name




Code of this event



long integer

version number




consecutive increasing number used for 1:n relation



Standard Message Type (Sender)



Standard Message Type (Recipient)



Message Structure (Sender)



Message Structure (Recipient)



Chapter in which this message is described SQL Query (

The following two queries are used to gather together message structures from table HL7EventMessageTypes: Select Message Structures (

From database table HL7MsgStructIDSegments (see below) those message structures were extracted from HL7MsgStructv2xml using the SQL query shown below.

Field Name




Message Structure ID



long integer

version number




consecutive increasing number used for each field within the segment






String identifying the repetition of subsequent segments (logical embracement)







*The field names and their descriptions are taken verbatim from the HL7 Database.

This resulting table is exported to messages.txt. This file serves as input for the transformation algorithms (see also section ”3.2.3. Options”). Segments and Fields ( HL7 Database Tables Used (

The following HL7 Database tables are used in the creation of the segments and fields schema definitions. (Only those fields being queried are shown. The field names and their descriptions are taken verbatim from the HL7 Database.) They are used to generate Segments.xsd.

Field Name




Code for the Segment



long integer

version number



The name of the segment



The German interpretation of the name



Is this a segment being visible

Field Name




Code of the Segment



long integer

version number




Position within the segment


Long Integer

Data Element ID



required/optional/backward compatibility





long integer

Number of repetitions SQL Query (

The first query selects all relevant segments:

The following SQL query extracts data from tables HL7SegmentDataElements and HL7DataElements: Data Elements ( HL7 Database Tables Used (

The following HL7 Database tables are used in the creation of the data elements schema definitions. (Only those fields being queried are shown. The field names and their descriptions are taken verbatim from the HL7 Database.)

Field Name



Long Integer

ID of the Data Element



Long Integer

Version number



Field description according to the standard documentation



Name of the Data Structure


Long Integer

minimum length (new)


Long Integer

maximum length (new) - contains also the length of previous version



conformance length


Long Integer

ID assigned table SQL Query (

The following SQL query extracts data from tables HL7SegmentDataElements and HL7DataElements: to create Fields.xsd: Data Types and Their Data Type Components ( HL7 Database Tables Used (

The following HL7 Database tables are used in the creation of data types schema definitions. (Only those fields being queried are shown. The field names and their descriptions are taken verbatim from the HL7 Database.)

Field Name




logical data type



long integer

version number




Field Name




logical data type



long integer

version number



long integer

consecutive increasing number used for 1:n relation


long integer

identifying number of the component


long integer

Number of assigned table if different from component (overwrites table number of component)

Field Name



Long Integer

Component Number (ID)



long integer

version number





Long Integer

reference to an assigned Table



Data type SQL Query (

The following SQL query extracts data from tables DataStructures, DataStructureComponents, and Components:

11.2.3 Options (3.2.3)

The database contains a German interpretation of the different elements as well. They can be added to the schemas in form of further annotations if needed:

11.2.4 Algorithms (3.2.4)

The mapping from HL7 Version 2.x into the XML specification v2.xml is a formal algorithm, driven from the HL7 Database described above.

A VB program generates the XML schema definitions and additional HTML files for further information. The structure of the generated schemas follows from the design considerations described above. The algorithms instantiated in this VB program are not described in detail here and are not part of this specification, but will be available with the HL7 database and on the HL7 website.

11.2.5 Examples (3.2.5) Schema Fragments (

These are actual fragments of the real schemas provided as illustrations. There is not enough of the schemas included here to allow for validation of the example messages. Messages will validate against the complete schema. V2 and v2.xml Example Messages ( Short Example ( Long Example (

11.3 References (3.3)

(intentionally left blank)