This meeting was held shortly after the award by NLM and AHCPR of a number of cooperative agreements for research and testing sites which will address issues related to implementation of electronic medical records. Some of the areas of interest to NLM and AHCPR include: using emerging standards related to computer-based patient records, linking patient records to other types of information relevant to health care decisions using the UMLS Knowledge Sources, and abstracting data from patient records for use in health services research. The purpose of the meeting was to identify a set of existing vocabularies suitable for testing in patient record systems by the cooperative agreement recipients and, if possible, by other organizations such as the VA. The meeting was also designed to advance the broader agenda of establishing a reasonable starting point for the development and maintenance of a "standard" vocabulary for use in computer-based patient records in the United States.
Dr. Donald Lindberg, Director, NLM, welcomed the attendees. He expressed enthusiasm for the Library's cooperation with AHCPR in supporting research and testing sites related to the patient record. NLM clearly does not have the expertise or resources to lead all aspects of computer-based patient record development. AHCPR has the broader experience and mandate in the field of patient data and practice guidelines. NLM's focus has been on vocabulary issues. A primary goal of NLM's Unified Medical Language System (UMLS) project is to map disparate biomedical terminologies so that patient records can be linked effectively to decision support tools, like practice guidelines, MEDLINE, etc. After a number of years of groundwork, the UMLS Metathesaurus is ready to become a vehicle for distribution of terminology needed for health care and health services research.
Dr. Clifton Gaus, Administrator, AHCPR, also welcomed participants. He said that the NIH and AHCPR were focused on different points on the continuum that goes from clinical research to practice/health care delivery to health services research. When data from computer-based patient records can be integrated and aggregated, much more useful health services research will result. As we move away from encounter-based, fee- based health care to managed care with capitation payments, claims systems should be replaced by computer-based patient record systems. AHCPR is interested in computer-based patient records as a source of detailed and aggregatable data that can be used as a database for research on the quality and cost- effectiveness of health services. Health plans will need the same kind of data to monitor quality of care and to manage the production of cost-effective care. Current administrative systems do not describe what health care practitioners actually do and therefore exclude key data that can be used to measure the performance of different health plans. AHCPR's studies have shown that administrative data can only show gross variations in care. They don't explain what goes on in clinical care. Uniform patient data will allow much better research into what works and what doesn't and may also facilitate large simple trials. Vocabulary is at the heart of uniform patient data. Dr. Gaus commented that his personal equivalent of "landing a man on the moon" will be a national demonstration of compatible patient records. He expressed the hope that the meeting would be a step toward that goal.
Standards for messages or data structures are the most highly developed. There are 6 groups working in this area: ASTM, HL7, X12-N, NCPDP, ACR-NEMA, and IEEE. Their efforts are being coordinated by ANSI/HISPP.
Standard identifiers are needed for people, facilities, and providers. Identifiers for people are essential for pooling data, but are the most politically sensitive due to concerns about privacy. Dr. McDonald favors use of the social security number for health data, but he observed that there is significant opposition to this approach. The most progress has occurred on identifiers for providers. HCFA has developed what appears to be a workable system. Some work has been initiated regarding facilities.
Standard vocabularies are needed to supply the allowable values for the slots in the messages or data structures. Dr. McDonald favors an approach that focuses first on defining what will be used as the standard vocabularies for the various elements of messages that we wish to exchange, starting first with messages related to laboratory tests and values. He stated that a combination of NDC codes and the WHO drug names would handle most of the drugs; ECRI's vocabulary handles devices; LOINC/EUCLIDES (to be addressed on Dec. 6 by Dr. Huff) might provide the best approach for lab test names; and perhaps a combination of ICD-9, SNOMED International, and Read Codes for other elements.
Dr. McDonald concluded by saying that we need agreement on the content and structure of the major objects that have to be shared and exchanged; on definition and use of major data types (we are relatively close here); on a common representation or syntax for messages, such as ASN.1; and on the choice of preferred vocabularies. We also need the cooperation of major Federal agencies that collect health-related data, such as HCFA and FDA.
To the extent practical and useful: (1)explicitly identify concepts in patient records that are registered in the Standard Vocabulary, using both their unique identifiers AND the locally preferred names of the concepts; (2)explicitly identify concepts in patient records that are NOT known to be in the Standard Vocabulary and indicate their relationship to one or more concept(s)in the Standard Vocabulary. This vision does not involve the abstraction or encoding of content so that meaning is lost. The meaning is always represented in the record, whether it can be represented in the standard vocabulary or not.
The purpose of the meeting is to begin plans for a large scale test of clinical vocabularies that will be a useful step toward a standard health care vocabulary. Specifically, the meeting should identify a base set of vocabularies which provide substantial coverage of the concepts and terminology likely to be needed in computer-based patient records; should outline at least some of the steps required to set up testing of these vocabularies in patient record systems -- at a minimum at some NLM/AHCPR funded cooperative agreement sites, but also elsewhere if possible; and should lead to the formation of a small working group to develop procedures for collecting and evaluating feedback from testers.
Ms. Humphreys then summarized some working assumptions about clinical vocabularies and computer-based patient record systems that helped shape the agenda for the meeting:
Based on the results of the contract, NCHS will proceed to develop a "best statistical draft" which will then be turned over to HCFA for two years of testing. "User-friendly" notes on how to use and interpret the classification will have to be developed. There will be a two-year notice to the public before implementation. This should allow time for development of training materials and give industry sufficient time to modify encoding software, etc. It is unlikely that ICD-10 will be implemented for morbidity reporting before the year 2000.
There were comments and questions. Dr. Chute said that he had heard the the World Health Organization's copyright of ICD-10 would limit NCHS's ability to extend it. Ms. Meads said this was not the case. NCHS and WHO had an agreement that whatever extensions were made would be compatible with the basic ICD-10. Dr. J. Cimino asked whether ICD-9-CM could just be mapped to ICD- 10 and allow users to continue to use ICD-9-CM. Ms. Meads said that this was not possible because some sections of ICD-10 were in fact very different, e.g., leukemias and lymphomas, and much better than ICD-9-CM. These sections just don't match up very well. Dr. McDonald commented that the process for development of ICD-10 seemed somewhat closed. Ms. Meads said that comments and suggested changes had been solicited from all major U.S. medical societies for a 10 year period. Many changes proposed by these U.S. groups had in fact been incorporated into ICD-10. Dr. E. Hammond said it would be useful for NCHS to circulate its "best statistical draft" widely and as early as possible to get feedback from the community. Ms. Meads said it was difficult to continue to collect feedback while also moving the project forward. Users were currently suffering from not having an up- to-date classification and index.
In the current volume 3, codes are limited to 4 digits. Some categories, e.g., cardiovascular procedures, are too full, and there is no room to add new procedures. The restrictive outdated structure leads to a lack of specificity in coding that is probably having a negative effect on reimbursement and is definitely hampering statistical aggregation and health services research.
The objectives of the revision effort are to produce a coding scheme that:
Ms. Fagan received a number of questions. One attendee asked if HCFA's expectation was that after the system had been developed for inpatient procedures its use would eventually carry over to outpatient procedures. Ms. Fagan said that this was a politically charged issue, and there were no plans for this to happen.
Dr. McDonald asked how the new codes would be structured. Ms. Fagan explained that it would be 7-digit alphanumeric code that would make use of 24 letters (not i or o) and all 10 numbers. The 1st digit would designate the body system; the 2nd-3rd digits would be allocated to the specific procedure; the 4th digit would be for the body site; the 5th digit would cover the approach; the 6th and 7th digits would cover any medical device used or any qualifier, if applicable. There would be an explicit value in the 6th and 7th digits to indicate if they were not applicable.
Ms. Fagan emphasized that this was the draft proposal that might not be used if something better came along. She said that those with alternative products that they considered better were encouraged to contact HCFA's Office of Research. Dr. McDonald commented that some people thought that it would be far better not to embed meaning in the code itself as HCFA was proposing to do.
Health care is an information intensive industry which needs accurate and complete data for the whole continuum that Dr. Gaus discussed: clinical research, practice, health services research, and clinical epidemiology. Research results can be seriously biased if the data used are insufficiently detailed. Dr. Chute illustrated this point by showing how the addition of a single variable (extent of disease) to data for lung and colon cancer patients improved the accuracy of mortality predictions substantially.
Vocabulary is critical to the collection of accurate and aggregatable health care data and to linking patient records to decision support tools. As William Farr stated a century and a half ago, nomenclature has as much importance to medical care as weights and measures have to the physical sciences. A 1992 GAO study indicated that the development of a standard vocabulary was lagging signficantly behind the development of other parts of the essential infrastructure for computer-based patient records.
The CPRI undertook a quantifiable evaluation of existing major codes and vocabularies, initially to evaluate their coverage of clinical terms encountered in patient records. The systems studied were ICD-9-CM, ICD-10, SNOMED International, the Read Code (version 2), the Gabrieli Nomenclature, the UMLS Metathesaurus (Version 1.3), and two more narrowly focused systems, CPT-4 and NANDA. The method selected was to obtain a body of machine-readable clinical text from a number of different institutions. The text came from a range of sources including inpatient and outpatient records, discharge summaries, nursing notes, progress notes, etc. Samples of first 1,000 and then another 2,000 clinical text strings were extracted. The text strings were parsed by hand and each segment was assigned to one of thirty categories, such as primary diagnosis, severity modifier, etc. While no claims of perfection are made, the categorization was reviewed by multiple members of the study team.
The categorized strings were then distributed to be coded in the various classifications and vocabularies under study. A three part scoring system was used to indicate the degree to which the system covered the concept. (0 - not at all, 1 - partially covered, 2 - covered.) When the scoring was completed, the categories were collapsed into 5 aggregate categories for data analysis and reporting. Dr. Chute showed overheads that represented the coverage of each system studied (with the exception of the Gabrieli system) of the text in these 5 broad categories: diagnosis, findings, modifiers, treatment and procedures, and other.
The results showed that SNOMED International had coverage above 90% for text in all categories. ICD-9-CM's coverage was substantially less, as was the combination of ICD-9-CM and CPT-4. The data therefore did not support the view, which has been espoused by some, that ICD-9-CM and CPT-4 together will cover most of what is needed for computer-based patient records. ICD- 10's coverage was less than ICD-9-CM's, a not surprising result since ICD-9-CM includes extensive clinical additions made by the U.S. Similar additions have not been made to the basic ICD-10. The performance of the Read Code (version 2) and the UMLS Metathesaurus (Version 1.3) fell between that of SNOMED International and ICD-9-CM. Since NANDA has relatively narrow focus, it did not cover most of the text examined.
The study did NOT examine or compare the structure of the systems, including such features as relationships among concepts, compositional rules, etc. The results may be affected by the restricted sample size, although the results for the first sample of 1000 and the second sample of 2000 were identical. It is important to note that several of the systems studied have undergone substantial revision and expansion since the study data were compiled. The data for the Gabrieli system are being double-checked now and will be included in the published report of the study.
In the discussion that followed Dr. Chute's presentation, Dr. J. Cimino asked whether plain ICD-9 had been studied. It had not, due to its known lesser clinical coverage (as compared to the CM version) and because it is infrequently used in the U.S. Dr. Kolodner asked whether the text used included a broad cross- section of health problems, including mental health. Dr. Chute indicated that no special efforts were made to ensure this and, in fact, mental health was deliberately excluded because of the heavy use of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders (DSM) to represent such data. Dr. Hersh asked whether the data set, including the text strings and categorizations, would be made available to other investigators. It will be made available, but probably with some restrictions since the committee intends to make further research use of it. Dr. Cohn, chair of the Committee, indicated that it probably needs to be expanded to be more representative of different health problems.
Dr. McDonald asked whether the frequency of occurrence of certain concepts was weighted when the results were compiled. Dr. Chute indicated that only one occurrence of the concept was taken from each note, but the same concept might occur in more than one note and ,if so, was counted as more than one of the sample concepts. Dr. McDonald asked whether the study data distinguished between concepts that appeared as wholes in a particular vocabulary and those that could be constructed by combining more atomic concepts from that vocabulary. He thought that in some cases a "constructed" match was quite different in character from a "whole" match and used the example of "blue sclera" to illustrate. Dr. Chute said that many matches were "constructed" and that the study data did not distinguish these from other matches.
Dr. Keith Campbell then gave an overview of what was and was not represented in the research literature on SNOMED and other clinical vocabularies. The literature includes studies of domain coverage and of concept redundancy within specific classifications. It also includes proposed solutions for deficiencies identified, e.g., use medical records as source material for creation of terminology, use linguistic tools in thesaurus construction and evaluation, improve the structure and reduce unintended redundancy by expressing compositional rules in an explicit syntax. By and large, the literature does NOT include studies of the quality and completeness of hierarchies, assessments of the relevance of terms present, research on the impact of use of a particular classfication on subsequent data retrieval, and discussions of the economic, social, and political factors affecting use of particular classifications. Dr. Campbell stated that this last set of issues needs to be discussed openly so that competing priorities can be evaluated and workable strategies emerge.
Dr. Campbell said that the literature highlighted a problem that had already been discussed by Dr. Chute. All systems change and evolve over time and by the time any study is published it is likely to report results that are not relevant to the current version of what has been studied. This is not only a problem for research results, it is a major maintenance challenge for local systems which make use of these evolving systems. Dr. Campbell advocates careful review by domain experts to evaluate hierarchies and the relevance of terms, despite its subjective and labor-intensive nature. We also need a set of established metrics that can be applied to successive versions of vocabularies so we can see if things are getting better or worse.
In the ensuing discussion, Dr. K. Hammond commented that the mapping projects were laudable, but that we also needed studies of how usable the systems were to people who were trying to browse a problem list and select an appropriate concept. Dr. K. Campbell indicated that such studies were complicated because of the confounding factors, like the interface used. Dr. Hammond said that that sometimes it was precise to be imprecise and that a vocabulary system should not force someone to be more precise than the information known at the time. Dr. K. Campbell agreed, but said that this was also a coverage issue. Vocabularies should have terminology in their hierarchies for intermediate hypotheses.
Ms. Humphreys stated that ONE hierarchy cannot suit all purposes. Any useful vocabulary system will have to allow the representation of multiple perspectives. It should also allow concepts to be identified as belonging to a variety of subsets that have been used successfully for different purposes, e.g., to the set of problems that the VA includes in its problem list, the set of interest to pediatricians.
Dr. J. Campbell said that he thought that the mapping/coverage studies done so far were in fact partially done to prompt this kind of meeting and discussion. Dr. Hersh commented that the discussion was analogous to the debate over whether user satisfaction studies or more formal information retrieval studies were better; in fact both are needed. Dr. J. Cimino commented that the Barrows study (listed in the bibliography in Attachment 3) did look at user performance in selecting terms and found many different problems accounted for failure to find appropriate concepts, including coverage of SNOMED II, a poor user interface, lack of synonyms, etc. Dr. Cimino thought there must be a number of other studies of this type.
Mr. Tuttle asked what should be the priorities for vocabulary evaluation studies for the immediate future. Dr. K.Campbell said he thought that retrieval studies were critical. Dr. Henry agreed, but also thought that emphasis should be placed on the development of standardized measures or metrics and on development of larger databases of test data that could be shared by investigators. Dr. Barnett said the focus should be on what physicians find acceptable. Although the use of a controlled vocabulary may in fact modify behavior, it is going to be difficult, if not impossible, to evaluate the actual impact of a vocabulary on care provided, let alone on outcomes. Dr. R. Miller commented that one metric for evaluating vocabularies that had been alluded to, but not mentioned explicitly is the set of evolutionary forces affecting the vocabulary. In the case of MeSH, a strong evolutionary force has been feedback from large numbers of real users. Various billing issues can provide evolutionary forces that may be less desirable.
Dr. Hole began by presenting summary statistics for both SNOMED International and the preliminary version 3.1 of the Read System. Both have roughly 132,600 terms of which about 100,000 are preferred or hierarchical terms. Despite this overall numerical similarity, the distribution of the types of concepts within the two systems is quite different. For example, Read has roughly 69,000 disorders and findings as opposed to about 32,000 terms in the somewhat comparable categories in SNOMED. SNOMED has substantially more anatomical terms and organisms than Read. The version of Read examined did not contain drugs and chemicals or nursing and allied health terminology. These will be present in the version 3.1 to be released in January 1995.
The methodology used in NLM's comparison of the two vocabularies involved initial lexical matching of identical normalized strings in Read, SNOMED, and the 1994 version of the UMLS Metathesaurus. "Concept" matches, rather than string matches, were counted, e.g., one match was counted if both a SNOMED preferred term and one or more of its synonyms matched to either a Read concept or a Metathesaurus concept. This procedure identified 60,695 SNOMED concepts (56%) that did not match lexically to either the Read Code or the Metathesaurus (more than 20,000 SNOMED procedures are part of the 1994 Metathesaurus) and 85,566 Read concepts (81%) that did not match either SNOMED or the Metathesaurus. Random samples of 300 of the non-matching SNOMED concepts and the non- matching Read concepts were then manually reviewed to determine whether they were actually present as WHOLE concepts in the other system. 13% of lexically unique SNOMED concepts were found in Read. 16% of lexically unique Read concepts were found in SNOMED. Although both Read and SNOMED allow the combination of concepts for coding (SNOMED is multi-axial, and Read allows qualifiers to be combined with concepts according to specific rules), no attempt was made to determine if combinations of concepts in one system could adequately represent a lexically unique concept from the other. This was obviously a small and preliminary study, but its results indicate that SNOMED International and Read 3.1 may well provide useful complementary coverage. Both might contribute to a standard US health care vocabulary. (The overheads used by Dr. Hole are included in Attachment 5).
In the discussion that followed, Dr. Huff and Dr. Korpman both expressed surprise at the outcome of the study, given their perception that SNOMED International could cover such a high percentage of important clinical concepts. Dr. Hole said that the broad comparison of the numbers of terms in different categories in the two systems probably offered the best explanation. Larger studies are needed to elucidate the differences between the two systems. Ms. Humphreys, who had assisted Dr. Hole in the human review of the lexical matches, indicated that Read has more pre-coordinated terms which probably accounts for some of its larger total number of disorders.
Dr. J. Campbell asked about the extent to which cultural differences were responsible for a large part of the unique Read concepts. Dr. Hole said the the Read administrative terms, which definitely reflected cultural differences, were a very small part of the unique terms. Comparison of a much larger sample or terms in different specialties is needed to answer the question definitively, but NLM's impression is that cultural differences probably don't account for the majority of the unique Read concepts.
Dr. Lincoln asked whether there had been studies done in England of the extent to which the Read system accurately captured the concepts in clinical narrative. Dr. O'Neill said that a large study was currently being planned that would compare Read 3 with Read 2, ICD-9, ICD-10, and the UK procedure code. This was not because the National Coding Center thought that ICD or the procedure codes were adequate for clinical concepts, but because it was necessary to provide data to refute claims that this was the case.
Dr. E. Hammond asked if the real issue was that SNOMED and Read version 3.1 could not be mapped adequately or merged. If so, perhaps that is a lesson for the whole effort to develop a useful clinical vocabulary. Maybe it isn't possible to map two different views of the world. Ms. Humphreys said that in the case of SNOMED and Read it looked like the two probably could be mapped reasonably well. Large scale testing ought to help us to determine whether one approach or the other is more useful or whether we need both.
Dr. Chute said that agreement on a basic structure would be needed to move forward. Ms. Humphreys said that we certainly needed an envelope that will accommodate both the multiaxial approach and the pre-coordinated approach because we can't predict which will be most useful in which circumstances. To a certain extent the UMLS Metathesaurus already provides this kind of envelope.
Dr. Hammond said that it was likely that the structure of procedures, billing, etc. were very different in the UK and the US and these areas would have to be examined very carefully.
In response to a question from Dr. Hole, Dr. O'Neill clarified that the purpose of Read version 2 was to summarize or abstract patient data. The purpose of Read version 3.1 is to represent the complete information present in a computer-based patient record. Although version 3 of the Read code does have many pre- coordinated terms, it also has an information model that explicitly defines when qualifiers can be combined with these terms.
The American College of Radiology began work on data interchange standards for images (then defined strictly as radiological images) about 10 years ago. At that time standards were desperately needed to allow use of equipment from different manufacturers in PACS (Picture Archiving and Communications Systems). The initial DICOM standard was developed with excellent leadership from industry engineers. The DICOM standard includes a header with text designed to disambiguate the image that follows it from all other transmitted images. Dr. Bidgood emphasized that it is impossible to interpret images correctly without multiple types of context, including the orientation, the method used to capture the image, etc. Without detailed context information image data are useless and can be dangerous. There is no one view or one hierarchy that can represent the appropriate context for all images. The vocabulary used to describe the elements of the context must support multiple views.
The DICOM standard includes layers for hardware, software, and the information model. Industry standards are adopted for the hardware and software levels. The focus of current work is on improving the information model. Tools are needed to apply the standard efficiently, including tools that are specific to medical applications. The ANSI HISPP MSDS Common Data Types document is used to define certain ubiquitous data types used in the standard. (Reference: ANSI HISPP Common Data Types for Harmonization of Communication Standards in Medical Informatics. Final Draft. November, 1993. Bidgood, W.D. Jr. (Editor). American National Standards Institute. Healthcare Informatics PAlnning Panel. Message Standards Developers Subcommittee.) The ASTM convention of triplet encoding, e.g., 1)the coding scheme/2)the code/3)optional text, is used in the DICOM standard in the values of different header elements.
The goal in the transfer of any data is that the information sent is identical to the information received. To achieve this the concepts present in the message must be represented as fully as possible. The initial DICOM standard had only 22 concepts for different anatomical locations. Not surprisingly this was quickly found to be inadequate. Each successive effort to define larger universes of allowed anatomical concepts inevitably ran into the same problem. Since the list of allowed terms started out as an integral part of the official standard, the standard had to be re-balloted every time new terms were added. The current, more appropriate approach is to refer to vocabularies that can be used with the standard and not to include actual lists of terms in the standard itself.
Work on the evolution of the DICOM standard is now carried out under the aegis of the ANSI HISPP Message Standards Developers Committee. This provides an umbrella that is seen as more neutral than the ACR and has therefore been helpful in bringing representatives of other diagnostic imaging specialties together to work on a version of DICOM that can handle a range of imaging data. In the version under development, an anatomic region, an anatomic region qualifier, a specific site, and a site qualifier can all be specified. Any vocabulary or coding scheme can be used with the standard, identified by triplet encoding.
The College of American Pathologists (CAP) plans to release the Topography and General Linkage/Qualifiers modules of SNOMED International for the DICOM group to use (without charge) in the development of a SNOMED Microglossary of anatomical terms needed for image data. Dr. Bidgood commented that if the CAP goes forward with this plan they are to be commended, given the level of effort and resources they have expended in the development of SNOMED. The DICOM group will start with SNOMED topography and modifiers and then conduct a large multi-specialty review. Subsets of the terms will be prepared for different specialities to match to their existing glossaries. Matching across specialties and mapping of multiaxial to combined forms will follow. The CAP will receive the input from the DICOM project and will exercise ultimate version control.
Although multiple encodings will be possible, DICOM will include default rules. Dr. Bidgood described a range from no constraint on input to default contraints to dynamic negotiation of encoding levels. The third level is not really possible today, although DICOM will have to include a robust specification for conformance claims in the vocabulary area. Dr. Bidgood concluded by saying that preconceived notions are the enemy of progress-- and of the cooperation that is needed to move vocabulary standards forward.
Dr. Lowe asked if the plan was to include the anatomical and qualifier subsets of SNOMED International in the DICOM header. Dr. Bidgood said that there would be an indirect reference to them in the standard, but they would not be included in entirety. Dr. Lowe expressed concern about how this level of textual information related to images would be captured. Dr. Bidgood indicated that the more robust encoding would at first be optional, but would probably become mandatory in conformance claims. A broad standard that covers all imaging is highly desirable. The problem of data capture, through structured data entry or other means, is obviously difficult. Capturing the data in a controlled vocabulary will enable automated links to the literature and other knowledge sources that can provide information immediately, while the practitioner is cogitating about a problem.
Dr. Lindberg commented that the radiologists deserved credit for advancing data standards in a very practical way. At meetings of the Radiological Society of America, vendors who exhibit are required to demonstrate that their equipment supports current standards on the exhibit floor. Attendees at the meeting can bring their own data and see how the machines on exhibit handle them. While scholarly studies are also needed, this real-world approach has merit.
The CPMC MED is a semantic network of medical terminology that includes classes of terms, subclasses, and individual concepts. The Intermed dictionary is a stripped down version of the MED, excluding names that are solely of local interest to CPMC. The Intermed dictionary effort allows a fresh start and an opportunity to eliminate some of the "ugliness" in the MED, while also addressing the issue of meeting the needs of multiple institutions with a single dictionary. In addition to Columbia, the Intermed institutions are Harvard, Stanford, and Utah. There are a number of other collaborating institutions testing use of the MED. The CPMC MED and the Intermed dictionary currently have links to the UMLS and to SNOMED.
The initial Intermed dictionary focuses on the narrow field of urine chemistry as a vehicle for working out dictionary structure, procedures for updating, etc. Dr. Cimino briefly outlined the sections of SNOMED International of particular interest for urine chemistry concepts: P3: Laboratory Procedures and Services and particularly P3-02: Specimen Collection, topograpic terms, and analytes. A comparison of SNOMED terms for urine chemistry with those in Intermed (chiefly derived from the CPMC MED) showed that 36 concepts appeared in both, 62 in SNOMED only, and 31 in Intermed only. The 31 "Intermed only" concepts are in fact representable in SNOMED by coordinating terms from different axes. Thus SNOMED has some precoordinated lab test terms and some that are not. Multiple encodings are possible, because the precoordinated terms could also be represented by combining items from different axes.
Urine chemistry terms are split among multiple hierarchies or classes in SNOMED. This occurs because SNOMED is a strict hierarchy in which each concept appears only once. Each urine chemistry concept appears in one reasonable place, but may not appear in some of the places you would expect to find it. There are some ambiguous connections in SNOMED, i.e., two lab test terms may be SNOMED related terms to the same preferred term and therefore share the same code.
In Read, laboratory terms are found in the sections on Samples, Analytes, and Laboratory Test Observations, which include both test names and actual findings. A comparison of Read terms for urine chemistry with those in Intermed showed that 48 concepts appeared in both, 29 in Read only, and 19 in Intermed only. There was some redundancy in terms found in the preliminary version of Read 3.1 supplied by NLM for use in the comparison. Read classifies the terms in multiple locations, although the classification was incomplete in the version used.
Dr. Cimino commented that the MED gets its chemical names from the UMLS. He has found almost all he needed at the time he looked for them. The few not found have shown up in the next edition, sometimes due to his input and sometimes not.
Dr. Cimino outlined the strategy for expansion of the Intermed dictionary. Intermed will prefer precoordinated terms, but, where possible, will have an underlying semantic description. Concepts will be mapped to the UMLS and SNOMED and potentially to Read. Users may map local concepts to Intermed concepts. If users submit additions, they must be accompanied by formal semantic descriptions in Intermed format. The urine chemistry part of Intermed is now available on the Internet.
Dr. K. Hammond asked if the Intermed project was going to look at EUCLIDES/LOINC as a potential source of information. Dr. Cimino said that the Intermed structure had already been influenced by EUCLIDES/LOINC and that he would be working with Dr. Huff to incorporate more information from LOINC.
Dr. Hersh asked, regarding the issue of precoordinated vs. atomic concepts, why Intermed could not have the atomic view mapped to the precoordinated view and then hide it from users who had no need of it. Dr. Cimino said the mapping of the atomic concepts to the precoordinated concepts was inherent in the Intermed semantic model.
Dr. Lindberg asked whether the normal ranges were included in the semantic definition of the test, which provoked a lively discussion. Dr. Cimino said they were a part of the definition in the original MED, and if the range was changed by a lab for any reason a new concept was created. The ranges are not part of the definition for new concepts added to the CPMC MED, nor are they part of Intermed since they may be institution specific. Dr. Lindberg commented that normal ranges are a function of test methodology, rather than institution. Dr. McDonald said that in HL7 messages related to lab tests, the ranges are sent separately, not as part of the name. In many cases the methodology changes at will, with the same lab using different methodologies for the same test from one day to the next. Dr. Kohane asked what happens when two methodologies used for the same test give different units in the results. Dr. Cimino said in that case there would have to be two test concepts in the dictionary.
Mr. Tuttle asked whether differences in Intermed, SNOMED, and Read were principally due to word use. Dr. Cimino said they were not; in general the three used similar names for the tests they had in common. Dr. Hole mentioned that Read appeared to have a range of specimens, including such things as conjunctival swab, that were not present in SNOMED. Dr. Cimino could not comment, because he had focused his investigation strictly on the urine chemistry area. Dr. McDonald said that the discussion would be more interesting if more people could get copies of Read to evaluate. Dr. O'Neill said evaluation copies were available to anyone who wanted them free of charge. Anyone interested should write to Dr. Payne or to him. (Addresses in the list of attendess in Attachment 2).
The meeting was adjourned at about 5:15, to be reconvened the following morning at 8:30 a.m.
Following an empirical approach, the LOINC group collected
existing lab test names from a number of sources, such as
MetPath, LDS Hospital, Regenstrief, the Department of Veterans
Affairs, and the ASTM E3113 terms. After analyzing these
sources, the LOINC effort came up with the general form of a
laboratory test result name:
NOT in the name but transmitted elsewhere in the HL7 message are:
the instrument used, details of specimen collection, test
priority, and the volume of the sample. If any of these elements
are included in the result name, it changes the underlying
information model. Dr. Huff showed a number of examples of test
names in the current draft LOINC format. The microbiology
examples illustrated that LOINC accommodates both of the two
common approaches to specifying microbiology test results in
existing systems.
Dr. Huff reviewed the trade-offs associated with aggregate vs.
atomic names. Both are needed. People use aggregate names. It
is important to focus on names that are actually used, however,
not to generate the universe of all possible aggregate names.
Atomic names are more parsimonious, more expressive, and support
more flexible information retrieval and aggregation.
Dr. Huff then described the Euclides OpenLab Coding Scheme, which
has influenced LOINC development. Euclides is a European system
developed under of the direction of Georges DeMoor of Belgium.
It covers the complete domain of the clinical laboratory using a
multi-axial approach with 39 canonical axes. HL7 represents some
of the these axes in different segments of messages. Euclides
contains about 8,200 analyte names, including drugs, cells,
micro-organisms, etc. and about 420 "function tests" which are
more complex procedures. Dr. Huff showed examples of Euclides
analytes, function tests, and procedures. An informal evaluation
that mapped lab terms from several American sites to Euclides
found that Euclides could represent nearly 100% of the terms.
Dr. Huff concluded by saying that the future development of LOINC
involved making a draft widely available for use, finding a
permanent home for the system, and adding content to it.
In the discussion that followed, Dr. J. Cimino asked if the LOINC
group had thought about using other source vocabularies for
particular sections of the name. Dr. Huff said yes, they hoped
to point to other systems for parts of the name. Ms. Moholt
asked about the underlying structure of EUCLIDES. Dr. Huff said
that it had a principled hierarchical structure used by its
maintainers, but that it was distributed as a linear list. Mr.
Tuttle asked Dr. Huff if he had a feel for the rate of change in
vocabulary in the laboratory test area. Dr. Huff said 10-20
terms per month were added to support the member facilities in
Intermountain Healthcare. Dr. McDonald pointed out that Arden
Forrey who directed ASTM vocabulary efforts in this area
exchanged terms regularly with Dr. DeMoor so their coverage was
naturally similar. Dr. R. Miller asked whether there would be
"emeritus terms" for retired tests. Dr. Huff said that in the
HELP system the terms and codes for tests were kept forever, but
those that were not currently used were flagged as inactive. He
thought that a similar approach would be needed in any standard
vocabulary.
The effort to select a basis for the VA's lexicon involved many
people including Dr. Michael Lincoln of the Salt Lake City VA
hospital and a Problem List Expert Panel representing a wide
spectrum of VA practitioners. A number of candidates were looked
at, including ICD-9-CM, the Read Code (version 2), and the UMLS
Metathesaurus. (SNOMED International was not yet released.) The
UMLS Metathesaurus was selected for a number of reasons: its
ability to engulf and encompass other systems; the potential for
linking across systems; specific elements of its coverage
including COSTAR, nursing vocabularies, and the promise of more
CPT (although this last has not yet materialized); the potential
value of semantic types and relationships; the Metathesaurus
structure which seemed promising for management of vocabularies
in a distributed, decentralized system; the ability to leverage
NLM's investment; and the likelihood of continuing support.
The decision was made for the VA to develop its own local
lexicon, that imports terms from the UMLS or other systems as
needed. This ensures that the day-to-day needs of operational
systems can be met, including frequent updates for new drugs.
Most of the UMLS Metathesaurus (version 1.3) was imported as the
basis of the lexicon. Another 2,000 terms that the VA facilities
needed to function were added. These included billing codes, the
Omaha Visiting Nurses Association terms (which will be added to
the 1995 Metathesaurus), social work terminology, and the ICD-9 E
and V codes. What started out as a resource for the problem list
is now considered the vocabulary support for the complete patient
record. It therefore must be a stable, maintainable system.
The VA clinical lexicon has been in use only since June of 1994.
It includes most of the features of the UMLS Metathesaurus. It
also allows associating a billing code with a more specific term
and adding local usage synonyms. It allows users to add concepts
but flags them for subsequent review by Dr. Lincoln's group in
Salt Lake. It occupies about 150 megabytes in the VA's file
structure in MUMPS. It is now used solely for the problem list,
but future applications will include order entry, order checking,
reminders, the VA's National Drug file, procedure recording, and
supporting point of care information and knowledge retrieval
services.
Dr. Hammond outlined the VA's needs for expanded UMLS coverage:
links to CPT procedures; laboratory procedures such as those in
LOINC/Euclides; more dental terminology; more signs, symptoms,
and findings perhaps from Read and QMR; more terms related to
health maintenance, health status, home care, etc.; Title 38
disability codes; reasons for cancelling orders; abbreviations
and acronyms; qualifiers. The Clinical Lexicon interface seems
to work well. Users like the multi-term fragment look-up
capability and the ability to select a particular view of the
information. The mapping to ICD-9 is helpful for billing and
doesn't compromise the underlying representation of clinical
reality.
The negatives associated with the VA's use of the UMLS
Metathesaurus include the need for expanded coverage outlined
above, the labor-intensive procedures required to update the
lexicon when new editions of the Metathesaurus are issued
(complicated in the VA's case because they were using the
discontinued unit record format), the potential impact of newly
announced changes in SUI semantics, and lack of regular
communication with NLM regarding plans for the UMLS, although
steps are being taken to improve this.
The VA's wish list includes better contact with other UMLS
implementors; easier updates, perhaps through the use of
something like the draft ETIF (Electronic Terminology Interchange
Format) standard that relies on SGML (Standard Generalized Markup
Language) encoding; more freguent and smaller updates; more
physical exam and finding terminology which would help to address
the skepticism among some VA clinicians regarding the UMLS; and
particularly better liaison or strategic consultation with the
UMLS developers to ensure better support for real world needs.
Closer liaison should also help the VA to exploit the UMLS to
assist with linkages to knowledge sources and with aggregation of
patient data. Dr. Hammond asked Dr. Lincoln if he had anything
to add. Dr. Lincoln underscored the importance of more history
and physical concepts and more rapid turnaround of smaller
updates. Dr. Hammond said that it was likely that strategic
consultation with NLM could reveal practical ways to "ease the
pain" of keeping a local system in sync with an evolving national
product.
During the follow-up questions and comments, Dr. Hersh said that
it was difficult for client-server university systems to interact
with the VA system architecture and asked if there were efforts
to move to a client-server approach. Dr. Kolodner responded that
the VA was moving in this direction and a number of sites would
be testing a client-server interface to the VA system in 1995.
There is also a program to define interface standards for
interaction with the VA system.
Dr. Payne asked how much of the UMLS Metathesaurus was
incorporated in VA lexicon. Dr. Hammond responded that most of
it was taken. VA system users can limit displays to the
particular source vocabularies they are interested in, however.
Dr. Payne asked if the VA had looked at Read (which had been
referenced on some of Dr. Hammond's slides). Dr. Hammond said
they had looked at version 2 and had decided not to base their
system on it. Dr. Lindberg asked what progress the VA had made
on identifying and naming the problems needed on the VA problem
list. Dr. Hammond and Dr. Lincoln both said they were very happy
with the UMLS coverage of problems. In their initial review of
the 1,000 most commonly seen problems in the VA they found 89% in
the UMLS Metathesaurus. This is probably due to the inclusion in
the Metathesaurus of frquently seen problems from COSTAR sites.
They are providing data to NLM on the problems not found.
Dr. McDonald asked if the VA lexicon would be generally
available. Dr. Hammond said that it would be. Mr. Tuttle said
that the VA appeared to have accomplished an enormous amount in a
relatively short time. Dr. Hammond said that the work to select
a basis for the problem list began in 1991, the UMLS
Metathesaurus was selected in June of 1992, and work on the
lexicon began in December 1992. The lexicon is being shipped to
sites with version 1.3 of the Metathesaurus. They would like to
upgrade to 1.4 since it contains more content and more
connections between ICD-9-CM and MeSH. Dr. Corn asked if the VA
problem list was semi-permanently attached to the patient and
moved with him as he went from facility to facility. Dr. Hammond
responded that this was the goal, but not the current reality.
It is difficult to achieve because all the VA sites don't operate
from a common clinical database.
The FDA is interested in a broad range of clinical data, but its
need is a regulatory need, not a health care, billing, or
outcomes research need. The FDA's responsibility is to ensure
the safety and effectiveness of a range of products. To perform
its regulatory function, the FDA receives and analyzes both pre-
market and post-market drug, biologics, and device data.
Pre-market data are submitted by industry when applying for
approval of a drug or device. Each submission includes masses of
data, 7-10 years worth of information collected at multiple sites
including patient data for 200 to 2,000 individuals. Some of the
pre-market data is submitted in machine-readable form, but not in
a standard form. Drug companies use their own vocabularies,
usually based on a combination of ICD, COSTART, etc., in the pre-
market data. Pre-market data are collected in a very structured,
controlled data environment.
The post-market data are reports of adverse effects of drugs,
biologics, or devices either submitted by manufacturers or
directly by those providing care. The FDA received about 125,000
adverse drug reports and about 75,000 adverse device reports last
year. The post-market adverse reports are coded by the FDA using
COSTART, an adverse reaction terminology originally developed by
the FDA in the 1960s. The post-market data more random data.
The FDA needs to be able to search these data in ways that are
likely to reveal any underlying patterns in the independent
reports.
As part of a broad-based effort to streamline the new drug and
device applications process and to improve its regulatory
effectiveness, FDA has a strategic initiative to establish
standards for automated submission of both pre and post-market
data. Implementation of such standards will support more
effective automated assistance to the reviewers, who must
validate the large amounts of clinical information the FDA
receives and identify patterns that may be indicative of safety
and effectiveness problems, and more consistent labelling of FDA-
approved products. In this environment the FDA's requirements
for a vocabulary include: comprehensive coverage of the signs,
symptoms, procedures, diagnoses, etc. that are relevant to both
pre- and post-market surveillance data in order to minimize loss
of clinical detail when data are encoded; ease of coding and
minimization of subjective choices to improve data consistency;
ability to query, retrieve, and aggregate data for multiple
purposes; availability in the public domain; international
accessibility and usability; and support for seamless access to
large retrospective databases that are primarily encoded using
COSTART or WHOART.
FDA is approaching the standards effort from an Agency-wide
perspective, trying to standardize across programs dealing with
biologics, drugs, blood products, devices, etc. Until this
initiative was launched, the FDA had no mechanism for making
decisions about adopting Agency-wide standards. The goal is
consistent labelling regarding safety and effectiveness for all
FDA-approved products. Four terminology areas have been selected
as the initial focus: safety data, toxicology data, laboratory
data, and demographics. Each area is going through a three phase
process. The first phase includes scoping the problem from the
FDA's perspective and from the perspective of the regulated
industry and looking at existing terminologies applicable to the
area. The FDA does not want to reinvent the wheel. The second
phase involves choosing some existing terminologies as a basis
and developing a plan of attack to achieve a useful standard.
The third phase will be adopting the standard and getting it
used. All four areas are moving into the second phase at varying
rates.
Ms. Veverka provided additional detail about the safety data
area, on which the most progress has been made to date. The
safety data area also has the most international involvement. It
is being pursued under the auspices of the International
Conference on Harmonization, which is an industry-initiated
effort to develop data standards for regulatory submissions by
the pharmaceutical and biotechnology industries. The members
include the EEC Countries, Japan, and the FDA. WHO and Canada
participate with observer status.
Phase one involved an examination of three existing safety
terminologies, COSTART, WHOART, and MEDDRA. COSTART is known to
be deficient. WHOART, a similar terminology developed by the
World Health Organization in the 1960s, is also generally
considered to be inadequate. It is currently used for most
international adverse reports data. Neither FDA (in the case of
COSTART) nor WHO (in the case of WHOART) has invested sufficient
resources in updating its vocabulary. MEDDRA was begun in the
1980s by the Medicines Control Agency of Great Britain, in part
in response to the deficiencies of COSTART and WHOART. It
includes most of COSTART and WHOART, plus many other concepts, in
flexible, multi-axial, hierarchical data structure. Two more
general systems, ICD-9 and SNOMED International, were looked at
more superficially.
Multiple agendas were being addressed in the initial examination
of these systems, including the need to develop a consensus on
how to proceed and to move the discussion to a technical level
and away from politics and personalities. The results were not
surprising, given the known deficiencies of COSTART and WHOART.
MEDDRA looked the best, based on a preliminary evaluation which
looked at its ability to encode verbatim reports in a restricted
domain and to support flexible retrieval of data for a set of
regulatory inquiries. MEDDRA's architecture is good, but it is
known to be deficient in content. Alpha-testing in the
regulatory environment will help to identify content
deficiencies. Then decisions must be made about which existing
term sets, such as SNOMED International, will be used to populate
the MEDDRA system.
The FDA's next steps include examination of samples of data
submissions to identify areas, beyond safety, toxicology,
laboratory, and demographics, that also need vocabulary
standards. The Agency will also be looking at the requirements
for the system that will hold the standard terminologies
designated for use in regulatory activity. This system should
help regulated industry in the transition of their own systems to
the new standards, should permit linking back to the health care
systems that in the future will generate data received by
regulatory agencies, and should support efficient updating. One
of the big issues will be ensuring that there are appropriate
updating mechanisms. FDA has a broad range of subject expertise
to support its vocabulary standards efforts.
Ms. Veverka's talk provoked numerous questions and comments. Ms.
Humphreys asked what was happening in the device area. Ms.
Veverka said they had a funding stream from industry for the drug
area so they were focusing there first. FDA's device experts
were being consulted about safety terminology specifically
related to devices, however.
Dr. McDonald stated that both the NCPDP and HL7 standards are
broadly used and can accommodate adverse report messages of the
type FDA needs to collect. FDA's regulatory needs really involve
the same patients, the same diseases, drugs, and symptoms that
are of interest in the health care environment. There is
relevant work on vocabularies and message standards going on
under the auspices of ANSI HISPP. There is not enough FDA
presence at these meetings. As a result, FDA is ignoring some
robust existing standards that could be applied or modified
slightly to meet their needs and is looking at others that are
only in the early stages of development.
Dr. Chute thanked Ms. Veverka for presenting the FDA's plans and
applauded the Agency's increased interest in data standards. He
also questioned the distinction between regulatory needs and
needs of clinical epidemiology, for example, since all are based
on patient data. He asked how interested the FDA was in ensuring
that standards for computer-based patient records evolved so that
data from these records could meet the Agency's needs. If this
was of interest, it was important to ensure that the FDA's
vocabulary efforts were coordinated with the efforts to develop a
standard health care vocabulary that would be used in patient
record systems. Ms. Veverka said that the FDA had no intention
of developing another vocabulary. It was trying to get
regulatory consensus on a preferred term and wanted to obtain
these preferred terms from other existing systems, such as those
used in patient records. FDA would not receive the bulk of its
data from patient records for quite a long time and must move
ahead on streamlining data submissions in the near term. FDA has
formally asked professional societies to indicate what they
consider to be the gold standard terminologies for their fields.
[NOTE: To date (January 11, 1995), the FDA has received
relatively few responses from professional societies. Most
indicate that there is no current "gold standard" terminology for
their field.]
Dr. Korpman said that he was speaking from the perspective of a
vendor. He said that whether data were going to the FDA or HCFA
or whomever they were still the same patient data. His company
is already translating data N-ways for various requirements. He
was hoping that this meeting would lead to progress toward a
single vocabulary that could be used for multiple purposes,
instead he was hearing about additional independent vocabulary
development efforts like LOINC, MEDDRA, etc. ICD-10 is going off
in another direction. If we have to code everything in our
system another 10 ways, then we will do it, but this is really
idiotic. Ms. Veverka said that the FDA does not want to reinvent
the wheel and would love to have the medical community come to
some consensus. It doesn't appear that the tower of babel will
disappear too soon, however, and the FDA has got to proceed to
meet its immediate needs. The FDA does not want responsibility
for a terminology it can't maintain. Dr. K. Campbell asked who
should take the responsibility for maintenance and how should it
be funded. Ms. Veverka said that part of their project would be
an assessment of the cost and resources needed.
Several people pointed out that the existing HL7 standard might
meet FDA's need for a messaging standard for adverse event
reports. Mr. Shafarman invited the FDA to attend the next HL7
meeting and bring its requirements for messages to the Ancillary
Data Reporting Technical Committee. The Committee could then
determine the best way to use the HL7 standards to meet the FDA's
message requirements. [NOTE: HL7 is currently looking at a 2.3
proposal to create special messages for reporting clinical trials
data.] Ms. Veverka agreed that the FDA should look at the HL7
standard.
The ANSI HISPP Working Group is preparing a framework for
evaluating clinical vocabularies. Dr. Cohn thanked Drs. Chute
and J. Campbell for their contributions to the current draft and
indicated that it had been informed by discussions of the full
ANSI HISPP Working Group, by the CPRI, and by interactions with
CEN TC 251 WG2. Dr. Cohn said that the Working Group is not
looking for perfection. It is looking for a reasonable starting
point and a strategy for moving ahead. The Working Group hopes
that the framework document can help the process by focusing on
what we need in a clinical vocabulary. It should also serve as
an important communications tool.
Comparable data are essential for communication between health
care providers, for outcomes research, for continuous quality
improvement, for reimbursement and resource allocation, and, as
Ken Hammond said earlier, decision support. Decision support
can't be implemented without data standards. Some people fear
that the effort to develop clinical data standards is a thinly
disguised effort to increase regulation of health care that will
increase the administrative burden on health care providers. We
need a communications plan to allay these fears and to involve
people in the standards development process. We need standard
vocabulary in many domains. James Campbell has prepared a draft
document that outlines these domains which is available for
review and comment. Dr. Cohn said that he had hoped that there
were some domains that didn't really need a controlled
vocabulary. He has come to the realization that they all need
standard vocabulary, but he thinks we may be able to identify
priorities among the different domains.
The Working Group has stated the goal as "the evolution of a
unified set of non-conflicting, non-redundant terminologies
suitable for the complete patient record". The expectation is
that these vocabularies will support efficient structured data
entry. The Working Group draft identifies four dimensions or
attributes for evaluation of vocabularies: scope, structural
characteristics, maintenance characteristics, and useability
characteristics.
The desiderata for scope include representation of the full range
of concepts needed for the patient record; inclusion of synonyms,
variants, and related terms; modifiers; representation of time
intervals; natural or customary terminology; and context-free
concepts. Desired structural characteristics include: atomic
terms, with mapping to precoordinated terms; explict rules for
structure; definitions; terms that are not vague, ambiguous, or
redundant; rules for combining terms; multiple classification and
inheritance; logical relationship linkages, such as is-a and
caused-by; language independent structure; and unique identifiers
with no intrinsic meaning.
Dr. Cohn said that maintenance is the critical issue. Where we
start is less important that having supported evolutionary
development to where we want to go. In his view, the clinical
vocabulary should be developed as a National standard -- which
means that no one agency, such as NLM, can do it alone. Since
most people don't care about underlying data standards, attention
must be paid to a lexicon that meets the needs of end users and
developers and translation software that converts locally
preferred terminology to the standard.
Dr. Cohn concluded by saying that the current version of the
framework was a draft, and the Working Group was soliciting
suggestions for improvements. The intent is to come away with a
useful framework for evaluating clinical vocabularies and a 2-
year work plan for both the CPRI codes committee and for ANSI
HISPP.
Dr. McDonald asked whether the current draft was a consensus
document. Dr. Cohn said it was the third pass, and he expected
the final to be the seventh pass. Much more input is needed.
Dr. McDonald said that the initial statement of the goal sounded
like the group was looking for a single monolithic system. Since
this is probably not possible and will be unpopular in some
circles, it would be better to say that multiple systems will be
merged to form the standard. Dr. McDonald also commented that
the 2 year horizon seemed at odds with the magnitude of the task.
Dr. Cohn said that the Working Group members did not think a
solution would be reached in two years. They just wanted to
define the steps that should be taken in the next two years to
make progress toward the ultimate goal. Dr. Korpman (also a
member of the Working Group) said that the Working Group wanted
to look at the whole problem rather than a little piece of it,
develop a framework for evaluating what's out there now, and see
how close you can get to achieving the goal. Dr. McDonald
commented that there was probably something that could be tackled
that fell between "little" and "enormous".
Dr. Oliver inquired about the relationships between terms that
should be in the standard vocabulary, i.e., what belongs in a
Knowledge Base and what belongs in a vocabulary. Dr. Cohn said
that this was an important point. Dr. Chute said that at the end
of the day, say in the year 2094, we should be using a common
knowledge base. Ms. Humphreys said that Dr. Oliver was raising
the issue of whether rapidly changing information, such as the
best treatment for any condition, should be maintained in the
standard vocabulary. Dr. Barnett said there is a class of
information that would be very useful to have in the vocabulary,
e.g., a particular drug belongs to the class of penicillins, a
condition has an effect on the liver, which was probably stable
enough to be maintained. Ms. Humphreys commented that the
Metathesaurus co-occurrence information, which represents an
automated statistical analysis of certain information sources,
was one approach to representing this kind of information without
a huge maintenance burden.
Dr. K. Campbell said that at present only a portion of SNOMED
International was in the Metathesaurus and asked whether NLM
intended to change this. Ms. Humphreys responded that it was
NLM's intention to add all of SNOMED to the Metathesaurus. Since
it was time-consuming to review additions to the Metathesaurus to
ensure correct synonymy, etc., this would not be completed for
the 1995 edition. For the proposed large-scale testing of
vocabularies for the patient record, an interim approach would
have to be taken. One option was to convert the sections of
SNOMED not yet integrated into the Metathesaurus into a format
similar to the Metathesaurus, but without the mapping, so that
testers would not have to deal with multiple data formats.
Mr. Martin asked whether NLM's strategy included allowing
responsible groups to maintain their vocabularies within the
Metathesaurus. Ms. Humphreys said that was definitely the goal.
ECRI was already doing this partially for the Universal Medical
Device Nomenclature. NLM hoped that the National Cancer
Institute could be the test case for the full capability.
Ms. Moholt asked what happens if no one can afford to update
clinical vocabulary, as had been implied in earlier discussions.
Ms. Humphreys agreed that if we carry that argument to its
logical conclusion, we are doomed. If we can make appropriate
software tools available to vocabulary developers within the UMLS
environment, we can reduce the costs. Vocabulary development
will remain a labor-intensive process that needs stable funding,
however. Dr. Barnett commented that it was difficult to
establish the boundaries of a health care vocabulary. For
example, should various kinds of pregnancy counselling be
included. Ms. Humphreys said that, while the boundary problem
was real, version 3.1 of Read appeared to have quite detailed
coverage of pregnancy counselling and similar concepts. Although
SNOMED also has some coverage in this area, this may be one of
the places where the two are complementary.
Dr. R. Miller asked for clarification of what was meant by
providing a migration path for developers. Ms. Humphreys
responded that eventually the standard health care vocabulary
will be a subset of the Metathesaurus, which will continue to
cover concepts and terminology related to other parts of the
broad biomedical and health enterprise. If you incorporate a
UMLS unique identifier for a concept into your system today, the
Metathesaurus will ensure that 10 years from now it still
connects you to that concept in the standard health care
vocabulary.
Ms. Humphreys then briefly repeated the purposes of the meeting:
(1) to identify a set of vocabularies that should undergo large-
scale testing to determine their suitability as a base for an
eventual standard health care vocabulary;
(2) to outline some of the major issues that have to be addressed
in setting up the test;
(3) to designate a small working group to develop procedures for
the test, in particular for collecting and analyzing feedback
from the testers. The working group will include some people
from Cooperative Agreement sites since most of them will have to
deal with whatever procedures are established.
In response to a comment from Mr. Tuttle, Ms. Humphreys said that
if the vocabulary set selected is big enough then the need for
sites to create their own concepts will be diminished. The
assumption is that people will have to add concepts to meet local
needs, however. The test should help us to see how often this
happens, where the gaps are, and what resources will be required
both to fill gaps and to deal with new concepts.
Dr. J. Campbell said that he gathered that the test would focus
on the concept coverage issue and would not look at the set of
relationships among concepts that have high clinical utility.
Ms. Humphreys confirmed that the test as proposed would address
the extent to which a selected set of clinical vocabularies met
the needs of computer-based patient record systems. Work on
relationships was not part of the specific agenda for the test
but could certainly be pursued on a parallel track. Dr. J.
Cimino said that he assumed that one purpose was still to link
the clinical vocabulary to the vocabulary used in knowledge
sources, such as MEDLINE, that are helpful in clinical decision
making. Ms. Humphreys confirmed that this was still a high
priority.
Ms. Humphreys introduced Dr. Milton Corn, Acting Associate
Director for Extramural Programs, NLM, who chaired the rest of
the discussion. Dr. Corn opened by saying that he was grateful
for the practice because he was leaving right after the meeting
to mediate in Bosnia for the U.N. The time has come to test
drive a set of vocabularies. Previous speakers had revealed a
parade of flaws in existing vocabularies, but perfection will
take many years to achieve. In the meantime, people are not
waiting. We have heard from several Federal agencies who have
immediate needs and are going ahead. People in the room today
are not the only interested parties; Fortune 500 companies are
also going ahead. It is not wise for us to assume that the
commercial sector will come up on its own with something that
will meet all needs. It is not wise to wait forever. Some of
the vocabularies we already have are really pretty good. We need
to test them and see what happens. Dr. Corn assured the group
that the test is not a disguised attack from the Federal
government. It is a legitimate attempt to learn something that
will get us closer to producing patient data that are can be
exchanged and aggregated.
Dr. Corn said that if attendees would accept the debating
resolution that we will go forward with a large-scale test of
existing vocabulaires, he would like recommendations for what
should be included in the set. If some thought the test was a
bad idea, he wanted to hear that, too.
Dr. McDonald said he thought that the diagnoses, morphology, and
organisms axes from SNOMED Internatinoal should be included. He
did not think it was necessary to take all of each vocabulary
included in the test set. Dr. Cohn disagreed and said that all
of each selected vocabulary should be included.
Dr. Milholland said that she thought the test was a great idea
and should be pursued. She said that the four nursing
vocabularies endorsed by the American Nurses Association should
be included in the test set. Dr. Corn commented that he thought
their inclusion was a given.
Dr. Lincoln strongly supported the inclusion of CPT in the test
set and in the UMLS Metathesaurus as soon as possible. Dr. R.
Miller said that he thought all vocabularies currently in the
Metathesaurus should definitely be included, plus all parts of
SNOMED, and probably the Read system, although he knew much less
about it.
Dr. E. Hammond said we needed rules for determining how the
disparate vocabularies are to be used. He thought we needed to
define what is to be used on multiple levels. We need to know
what is needed for existing message standards, but we also need
to know what is needed for the complete patient record. The more
quickly we can identify areas of terminology that are missing
from existing systems and get working on filling the gaps the
better. Dr. Corn said that identification of gaps should be one
of the fall-outs from the proposed test.
Dr. J. Campbell said that the data on SNOMED International was
certainly compelling. What was needed was some sort of
assurances about its future, including ownership, etc. One
reason the UMLS has been so successful and influential is that it
has been freely available and has ongoing support. He commented
that it is important to know the CAP's intentions regarding
SNOMED and what strategic alliances might be built. SNOMED has
things that the UMLS currently lacks. Ms. Humphreys said that
the issue of what arrangements could and should be made with
vocabulary owners is certainly important. The goal of this
meeting was to focus on what are the best vocabularies to
include. One of the follow-on strategy issues will be how to
make the requisite arrangements with vocabulary developers.
Dr. McDonald reiterated that he thought it was better to take
some, not all, of SNOMED. He thought the drug portion should not
be included because NDC and the WHO drug terminology were better
in this area. The standard vocabulary will have to include CPT
and HCFA's terminology. Dr. K. Campbell said CPT couldn't
represent the level of granularity needed. The standard health
care vocabulary should map to the billing codes.
Dr. Cote, co-editor of SNOMED, said we should get an inventory of
what the large vendors are using now. Maybe we can get vendors
to supply useful data on what concepts are being used now in
their systems and where local sites have to build their own
terminology. He reported that CAP is forming a number of
strategic alliances with other professional societies to improve
SNOMED in specific areas. These include the American Nurses
Association, the American Dental Association, and the American
College of Radiologists. CAP is very open to different
approaches. In fact, the CAP has been trying to transfer
responsibility for all of this to NLM for years. CAP is
committed to keeping the distribution price for SNOMED very low.
The copyright of SNOMED is just so the CAP can keep control over
what is done to it. Dr. Sennhauser, CAP's official
representative to the meeting, said that with the release of
SNOMED International CAP had attained a new plateau and a new
level of recognition. He is chairing a select CAP committee to
decide the future direction of SNOMED. The committee is open to
suggestions. The views presented at the meeting have been
helpful to him.
Dr. Chute said that on the issue of whether to have one
monolithic system or to cut and paste, he wants a non-
overlapping, non-redundant system. The UMLS can provide the
links to any billing or other systems that will not go away. All
the suggestions made previously were reasonable. He wonders if
the best approach is to start with a blank sheet, agree on a
structure, and then add in the pieces we need from the different
existing systems.
Dr. Korpman strongly supported the inclusion of SNOMED. His
company has found it useful for a wide range of purposes and it
is clearly worthy of inclusion in the test. He thought it was
better to include a whole that is logical than a piece of this
and a piece of that. SNOMED is already mapping to other systems.
Ms. Humphreys asked what was meant by the phrase "non-redundant"
coverage. She said that to meet all needs we would have to have
pre-coordinated terms mapped to atomic terms. This was a form of
redundancy. If one system had the atomic terms and others had
useful precoordinated terms, it was surely better to use the
existing pre-coordinated terms rather than forcing people to
create new ones. Dr. Chute said that he agreed that mapping
between atomic and precoordinated terms was needed and did not
constitute the type of redundancy he wished to avoid. This gets
at the need to decide on the structure in which these
relationships can be represented. Dr. Hersh agreed and said we
should agree on the structure at the outset of the test. He
doesn't think it will take long to do this.
Dr. K. Hammond said that the worst kind of redundancy occurred
within a single patient record when the use of different names
separated information about the same problem. The value of the
UMLS is its synonymy and ability to identify these risk areas.
One type of evaluation would be to see if use of the UMLS reduces
undetected synonymy in patient record systems.
Dr. K. Campbell said that people were talking about two kinds of
redundancy: (1) within a coding system, where it was necessary to
represent the connections between the different ways of saying
the same thing and (2) between coding systems, where different
terms might be used for the same concepts. He thought that
people in the room were more concerned about the latter. Ms.
Humphreys thought that the second kind was not a problem at all
as long as the different names from the different systems were
explicitly linked and labelled as synonyms. Dr. J. Cimino said
that synonymy was recognized redundancy and that wasn't a
problem. What is needed is a representational scheme that will
help us recognize, when a precoordinated term is added, that the
corresponding atoms are there so we can link them to the
precoordinated term.
Dr. Barnett said we need to focus on the vocabulary that will be
needed to move from patient records to important information
sources like practice guidelines, results of PORT studies, etc.
What data elements are important for these connections and what
vocabulary is needed to fill them?
Dr. Corn asked to review what had been said regarding selection
of vocabularies: SNOMED was strongly recommended, and Read
sounds positive, too. Dr. McDonald said LOINC should be added to
the list. He stated that we will not achieve a single
vocabulary, and we might as well recognize that fact now. SNOMED
doesn't have much in the way of supplies, and its drug section is
not as good as the WHO nomenclature. NDC, CPT, ICD-9-CM, and
ECRI are all heavily used now and are important. Dr. Korpman
said he didn't disagree that more than one would be needed, but
it might be a better strategy to get as much as you can from one
and then add things from other systems.
Dr. Lowe said that existing databases use non-standard
terminology and we want to be able to aggregate them. Having a
source that links concepts from many different systems will
faciliate this. For creating new data, the need is a large
population of concepts at the level of granularity required for
the data at hand. We don't want to have to invent concept names
de novo. There is no problem with having multiple labels or
names for the same concept. We do have to decide on a structure.
Both SNOMED and Read have rich structures. The short cut route
is to start with SNOMED and then proceed to work out the
structural issues.
Dr. Lincoln agreed that structure was important. He is concerned
about closely related terms. The semantics of relationships need
to be addressed. His experience with Iliad taught him there are
many nuances of meaning.
Dr. Cote said he could support Dr. McDonald's contention that it
was possible to take pieces from different systems. In his
discussions with other countries, the drug codes and the
occupations were always a problem. Other countries would usually
elect to use their own systems for these sections. You can in
fact unplug parts of SNOMED and use the rest. It will work. CAP
has no objection to this for the drug area.
Dr. J. Campbell said that the next logical step might be to
expand the CPRI study to include version 3.1 of Read. This would
provide the additional data needed to determine whether SNOMED
International and the Read system offer complementary advantages.
Ms. Humphreys said it would be good for such a study to be done
while we also get the type of input from operational systems that
the proposed test can provide. The test would offer a reservoir
of concepts and concept names that people can use. While we
collect information on what operational systems really need,
these operational systems will be creating more data that is
likely to be aggregatable in the future.
Dr. Huff said he also wished to emphasize the importance of the
structural issues. If we don't start with an agreed-upon
information model, we will not be able to exchange data. Things
will be too amorphous without a defined context and data
structure.
Dr. Kohane said that we can't wait for semantic purity. We need
a pragmatic approach to moving ahead. Dr. Corn supported this
view saying that a good system that comes too late is still
useless.
Dr. R. Miller commented that there are more structures that can
be imposed on good lexicons than there are good lexicons. He
thought the best approach is to include all the credible
candidates in their entirety, see where the gaps are, and proceed
with work on the structure simultaneously.
Dr. E. Hammond said he thought we were missing a fundamental
first step. It was important to define what we are really trying
to evaluate in the test. We must agree on the use or the uses to
which we will put coding systems. Are we talking about
structured input or are we talking about providers using natural
language that will then be converted into codes? Are we talking
about identifying the best codes for reimbursement or for
statistics or for exchanging data between systems? Ms. Humphreys
said that she was not talking about codes at all. She was
addressing a clinical vocabulary that lets us say what is really
wrong with the patient and what was actually done about it. This
clinical vocabulary should link to various codes when we need
them for various purposes such as billing and statistics. Dr. E.
Hammond said he meant to refer to vocabulary rather than codes,
but his point about the need to decide what is being proposed and
evaluated and why remains an important one. He is afraid that if
we do not reach agreement about the purpose of the test up front
we will have reached no conclusion two years from now.
Dr. Erlbaum commented that in some sense the Metathesaurus IS a
coding scheme. It assigns a unique identifier (with no intrinsic
meaning) to each concept. If you want a single system of
identifiers for all concepts you will be able to use the UMLS
identifiers when the Metathesaurus incorporates all of the
vocabularies of interest, including SNOMED and Read. The current
Metathesaurus structure already accommodates designation of
allowable qualifiers and mapping of atomic concepts to
precoordinated ones.
Dr. Fuller raised the question of when the Internet-based UMLS
Knowledge Server would be available to all UMLS users. This will
be useful for the testing planned. The UMLS also has utility as
it stands as a reference tool. Dr. McCray responded that the
UMLS Knowledge Source Server was in beta-testing now. It is a
client-server system. It will be available to all UMLS
developers by mid-1995.
Dr. K. Hammond supported Dr. R. Miller's point that it was
important to include a broad range of vocabularies in the
evaluation. We should not be too narrow at the outset.
Mr. Tuttle said that if we leave this room without agreeing to
some form of the test proposal we will send a very bad message to
the many people in the country who want forward motion toward a
health care vocabulary. There is no way the test can make the
situation worse. There is a good chance it may make it better.
Let's make all the credible candidates available to testers in a
UMLS-like format and see what happens.
Dr. Corn thanked the group for a very useful discussion and
offered Ms. Humphreys the opportunity to have the last word. Ms.
Humphreys said that NLM would go forward with the test. Several
issues/action items necessary to set up the testing had come out
in the discussion. These include: defining the nature of the
test more clearly; making suitable arrangements with those who
have intellectual property rights for vocabularies that will be
included in the test; putting all the vocabularies to be tested
into a format similar to the Metathesaurus, even though some of
them have not yet been incorporated into the Metathesaurus; and
looking at the current Metathesaurus structure and how it can be
augmented to represent all essential features of a standard
health care vocabulary. The Working Group that will draft the
procedures for the testing will include: Jim Cimino, Simon Cohn,
Chris Chute, Mark Tuttle, Bill Hole, someone from the VA to be
designated by Rob Kolodner, someone from AHCPR, and herself.
Although sympathetic to Dr. E. Hammond's point about defining the
purpose and parameters of the evaluation, Ms. Humphreys said that
there will be value in letting a range of institutions test the
ability of the set of vocabularies to meet their individual
purposes. This will provide a broad view of the extent to which
they encompass the concepts needed in a health care vocabulary.
Ms. Humphreys thanked the participants and said that minutes and
copies of slides used at the meeting would be sent to all of
them.
VA Experience in Developing a Clinical Lexicon
At this point the focus of the meeting shifted slightly to the
clinical vocabulary needs of two Federal agencies and their
current efforts to meet these needs. Dr. Kenric Hammond of the
Department of Veterans Affairs Medical Center, American Lake
(Tacoma), Washington described the VA's project to develop a
lexicon for use in their highly developed, distributed and
decentralized clinical information system. Their need is to
develop a consistent data representation that can handle patients
who move from one VA facility to another and also support the
aggregation of data across VA care sites. The VA came to the
UMLS several years ago, first looking for the basis for a single
problem list per site that would cover all types of problems.
The VA system includes 170 hospitals, many more clinics, and 3-4
million veterans are seen each year. The VA therefore has a need
for a clinical lexicon that can meet the needs of many types of
sites. Synonym control is extremely important.
Food and Drug Administration's Requirements, Background
Investigations, and Plans for a Comprehensive Vocabulary for
International Regulatory Activities
In opening her presentation, Mary Jo Veverka, Deputy Commissioner
for Management and Systems, FDA, likened the FDA's current
situation to that of the VA a few years ago. The FDA is early
in the process of identifying and evaluating terminologies that
might serve their needs.
Specialized Vocabularies likely to be useful in Patient-Record
Systems
Peri Schuyler, Head, Medical Subject Headings Section, NLM opened
by saying that although much of the discussion at the meeting had
focused on the more general and comprehensive clinical
vocabularies, more narrowly focused vocabularies could also
contribute to the development of standard health care vocabulary.
Such vocabularies offer enriched content, added depth, and
perspectives of interest to particular groups. A number of the
specialized vocabularies ARE regularly reviewed and maintained.
Ms. Schuyler provided several illustrations, including the
nursing vocabularies endorsed by the American Nurses Association,
the PDQ Cancer Thesaurus, ECRI's Universal Medical Device
Nomenclature System, and the Medical Subject Headings. The UMLS
Metathesaurus already serves as a vehicle for linking these
specialized vocabularies to each other and to more general
clinical vocabularies and for distributing them in a uniform
format.
ANSI HISPP Working Group on Codes and Vocabularies Draft
Framework for Evaluating Clinical Vocabularies
Dr. Simon Cohn is co-chair of the CPRI Codes and Classifications
Committee, chair of the ANSI HISPP Codes and Vocabularies Group,
a member of the CPRI Board, and also, as the Clinical Information
Systems Coordinator for Kaiser Permanente, a co-principal
investigator on one of the Cooperative Agreements. (A copy of
Dr. Cohn's overheads is included in Attachment 8.) Kaiser
Permanente has 6.5 million members, operates 30 hospitals, and
many more clinics. Dr. Cohn said that he thought Dr. R. Miller
had come up with a useful expression when he referred to
"emeritus terms". Today many of us have "emeritus" or legacy
systems and are using "emeritus codes" that are not suitable for
current purposes. Kaiser has reached an historic point. It has
recognized that it can support different systems, but it can't
support different data structures or different content standards.
If we just automate what we have, we will have "paved a cow path"
that doesn't lead us anywhere. We have to start on a new course
and use appropriate methods to evaluate our progress and to
improve on our direction.
Discussion: Vocabularies Suitable for Immediate Large Scale
Testing and Evaluation
As a prelude to the discussion, Betsy Humphreys reiterated NLM's
view of the role of the UMLS Metathesaurus in the development of
a standard health care vocabulary. The UMLS Metathesaurus can
serve as: (1) a distribution vehicle, (2) a means for mapping:
between concepts within the health care vocabulary and from the
health care vocabulary to other relevant classifications and
vocabularies, including those used in billing and statistical
systems and in knowledge sources, (3) a means for representing
many different useful perspectives on the same concepts, and (4)
a reasonable migration path for developers and users as we move
from the current situation of multiple vocabularies to an
eventual coherent standard U.S. health care vocabulary.
NLM HyperDOC / Clinical Vocabulary Meeting / 21 February 1995